# Searching for New Particle Resonances

*Students will “discover” a new particle by characterising a resonant bump in the invariant mass distribution from a set of simulated particle collisions.*

**Topics: data processing (raw data → histograms), curve fitting, regression Relevant packages: NumPy, SciPy, Matplotlib**

Students will be provided with a simulated dataset coming from a $pp \rightarrow X(\rightarrow ab)cd$ decay. The dataset will consist of a list the daughter particles' 4-momenta and charges. Students will plot the invariant mass of the mother particle and observe by eye a resonant “bump” corresponding to the mother particle.

Students will then write their own regression-based curve-fitting code to fit the background and resonant distributions, thus determining the mother particle mass and decay width. For a more particle physics-specific tangent, students might look into common curve-fit functions (i.e. Gaussian, Landau, Crystal Ball) and understand when a given functional form is most useful. Once students are happy with the performance of their curve-fitting tool, they will redo the analysis using scipy.optimize (which is a useful module for them to know for future research analyses).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

path_to_data = "test.txt"

# 1

First, write a python function that will read in the particle collision data from the provided text file.

The first few lines of the data file will look something like:

```
E px py pz charge

<event>
1.27893787226 -0.967943427485 -0.196782063814 0.4972631904 1
0.65484957674 0.0799580860542 0.085630174222 0.621646982287 -1
1.07972424322 -0.634807604926 -0.394274396233 -0.750563397282 -1
1.73527280522 1.42962557946 0.48622159803 -0.272846669269 1
</event>

```

The very first line of the file is a header telling you what information is provided for each particle.

Collision events begin with a line ```<event>``` and end with a line ```</event>```. Each line in a given event represents a particle.

In [None]:
# Write your function here

def read_in_particle_data(path_to_file):

    """
    INPUTS:
    
    filename: a string containing the absolute filepath to the collision dataset
    
    **********
    **********
    
    OUTPUTS: 
    
    dict_of_events: a dictionary of the collision events of the form {event_id: [p4_a, p4_b, p4_c, p4_d]}
                        Each p4_i is a list of floats containing the 4-momenta and charge [E, px, py, pz, charge]
    
    """
          

In [None]:
# Execute your function here

# 2

Now, we need to get a sense of what our dataset looks like! How energetic are the particles? Are they distributed relatively evenly in momentum space? What symmetries do these collision events seem to obey?

Plot histograms of the energy and momenta of the daughter particles. Comment briefly on their distributions (perhaps address the three questions in the previous line).

Note: we're going to be making a lot of histograms in this notebook. It might be useful to write a function that can quickly make nice-looking histograms so you don't have to explicitly type out the same formatting-related lines multiple times.


In [None]:
# Write your plotting function here
    
def plot_histogram(observables, bins, xlabel):

    """
    INPUTS:
    
    observables: a list of floats (or ints) containing the data to be histogrammed
    
    bins: a np.array containing the histogram bin edges
    
    xlabel: a string of the histogram x-axis label
    
    **********
    **********
    
    OUTPUTS: 
    
    None
    
    """



Most particle physics analyses do not use $E$, $p_x$, $p_y$, and $p_z$ as the variables for analysis. Instead, it is easier to analyze the variables $p_T$ (the momentum component transverse to the beam axis), $y$ (rapidity), and $\phi$.

Created histograms of these three variables. (You will may have to look up the definitions of these variables in terms of the Cartesian 4-momenta). Comment briefly on their distributions.

In [None]:
# Write your function here

def calculate_coords(event):
    
    """
    INPUTS:
    
    event: a list of lists [p4_a, p4_b, p4_c, p4_d] corresponding to a given collision event.
            Each p4_i is a list of floats containing the 4-momenta and charge [E, px, py, pz, charge]
    
    **********
    **********
    
    OUTPUTS: 
    
    collision_pT: a list of the transverse momenta for each of the daughter particles
                    collision_y, collision_phi are defined similarly
    
    
    """


In [None]:
# Write your plotting code here


# 3 

Now we want to calculate the mass of the resonant particle $X$ that was produced in the collision event. However, we don't know which two of the four particles in each event decayed from $X$. We do, however, know that $X$ is a neutral particle.

For every valid combination of daughter particles that could have come from a decay of $X$, calculate the hypothetical mass of $X$. Then plot a histgram of all of the invariant masses and make a guess at what the true mass of $X$ is.

In [None]:
# Write your functions here

def calculate_m2(particle1, particle2):
    
    """
    INPUTS:
    
    particle1: a list [E, px, py, pz] of 4-momentum components 
    particle2: a list [E, px, py, pz] of 4-momentum components 
    
    **********
    **********
    
    OUTPUTS: 
    
    m: A float of the invariant mass of a hypothetical mother particle that could have decayed into particle1 + particle2
    
    """
    
        

def get_event_invariant_masses(event):
    
    """
    INPUTS:
    
    event: a list of 4-momenta
    
    **********
    **********
    
    OUTPUTS: 
    
    masses: a list of the hypothetical invariant masses of the particle X
    
    """
         



In [None]:
# Execute your code here
    


# 4

Of course, estimating $m_X$ by eye is not at all rigorous. We should curve fit the invariant mass distribution, then extract the mean of the fit curve. 

We're going to use ```scipy.optimize.curvefit``` to do this.

We'll start by defining the form of the curve that we want to fit to the invariant mass distribution. In the vast majority of such cases in particle physics, we can approximate the "bump" corresponding to the resonant particle by a Gaussian, and the background by an exponential decay.

Define a function for this distribution below. 

As a starting hint: your "background" fit exponential should have the form $f(m) = N_b \times e^{-\alpha m}$. There are two free parameters: $N_b$ is a normalization constant, and $\alpha$ is the decay constant.

In [None]:
# Write your functions here

def gaussian():
    
    
def exponential():


def model_fit():
   

Now we need to extract the $x$- (invariant mass $m_X$) and $y$- (Counts) coordinates of the curve that we want ```scipy.optimize``` to fit to. Write a function to do so below.

Note: you will probably find that the leftmost bin edge should *not* be at $m = 0$ in order for the Gaussian + exponential fit function to be applicable.

In [None]:
# Write your function here

def extract_curve_from_histgram(observables, bin_edges):
    
    """
    INPUTS:
    
    observables: a list of floats (or ints) containing the data to be histogrammed
    
    bins: a np.array containing the histogram bin edges
    
    **********
    **********
    
    OUTPUTS: 
    
    bin_centers: a np.array of the bin 
    
    counts_y_data: a np.array of the histogram counts for each bin
    
    """
  


In [None]:
# Execute your code here

Finally, use ```scipy.optimize.curve_fit``` to fit the histogram data to the model function. Make a plot contaning both the histogram for $m_X$, and the best-fit model function found by ```scipy.optimize.curve_fit```. Also print out the best-fit value for $m_X$ along with the uncertainty on the measurement. 

In [None]:
from scipy.optimize import curve_fit

# Write your code here


# 5

As mentioned above, the vast majority of particle resonances can be modeled by a Gaussian. However, other functional forms for "bumps" exist that better model lossy processes (i.e. those where the tail of the distribution doesn't fall off as sharply as it does for a Gaussian).

Repeat the above curve fitting analysis, but swap out the Gaussian signal function with a Crystal Ball distribution. Comment on the "goodness" of the fit with a Crystal Ball distribution compared with that of a Gaussian.



In [None]:
# Write your functions here

def CB():
   
 

def model_fit_CB():
    
    


In [None]:
# Write your code here

