## Introduction for exercise

### Recording the data

As mentioned before, there happens about billion particle collision in the CMS in one second and it is impossible to record all data from these collisions. Therefore, right after a collision trigger system will decide whether the collision has been potentially interesting or not. Non-interesting collision will not be recorded. This multi-staged triggering process reduces the amount of recorded collisions from billion to about thousand collisions per second.

Data collected from collisions will be saved to AOD (Analysis Object Data) files that can be opened with the ROOT program (https://root.cern.ch/). Structures of the files are very complicated so those can't be handled for example in simple data tables.

In this exercise a CSV (comma-separated-values) file format is used instead of the AOD format. A CSV file is a regular text file that contains different values separated with commas. These files can be easily read and handled with the Python programming language.

### Indirect detection of particles

Not every particle can be detected directly with the CMS or other particle detectors. Often, processes of interest are short-lived. These processes can be detected indirectly.

For example the Z boson, the particle that mediates weak interaction, can't be detected directly with the CMS since the lifetime of the Z is very short.
That means that the Z boson will decay before it even reaches the silicon detector of the CMS.

#### How do we detect Z boson?

Indirect way of detecting Z boson is by detecting particles that originate from the decay of the Z boson.
Z boson has many decay channel, but in todays' exercise we will consider only decay of the Z boson to the muon and antimuon pair.

Important note is that the events with detected muon and antimuon pair are not all originating from Z boson. Therefore, different selections need to be applied in order to reconstruct Z boson. 
What selection would you use?

## The invariant mass

The mass of the Z boson can be determined with the help of a concept called _invariant mass_.

Let's observe a situation where a particle with mass $M$ and energy $E$ decays to two particles with masses $m_1$ and $m_2$, and energies $E_1$ and $E_2$. Energy $E$ and momentum $\vec{p}$ is concerved in the decay process so $E = E_1 +E_2$ and $\vec{p} = \vec{p}_1+ \vec{p}_2$.

Particles will obey the relativistic dispersion relation:

$$
Mc^2 = \sqrt{E^2 - c^2\vec{p}^2}.
$$

And with the concervation of energy and momentum this can be shown as

$$
Mc^2 = \sqrt{(E_1+E_2)^2 - c^2(\vec{p_1} + \vec{p_2})^2}
$$
$$
=\sqrt{E_1^2+2E_1E_2+E_2^2 -c^2\vec{p_1}^2-2c^2\vec{p_1}\cdot\vec{p_2}-c^2\vec{p_2}^2}
$$
$$
=\sqrt{2E_1E_2 - 2c^2 |\vec{p_1}||\vec{p_2}|\cos(\theta)+m_1^2c^4+m_2^2c^4}. \qquad (1)
$$

The relativistic dispersion relation can be brought to the following format

$$
M^2c^4 = E^2 - c^2\vec{p}^2
$$
$$
E = \sqrt{c^2\vec{p}^2 + M^2c^4},
$$

from where by setting $c = 1$ (very common in particle physics) and by assuming masses of the particles very small compared to momenta, it is possible to get the following:

$$
E = \sqrt{\vec{p}^2 + M^2} = |\vec{p}|\sqrt{1+\frac{M^2}{\vec{p}^2}}
\stackrel{M<<|\vec{p}|}{\longrightarrow}|\vec{p}|.
$$

By applying the result $E = |\vec{p}|$ derived above and the setting $c=1$ to the equation (1), it can be reduced to the format

$$
M=\sqrt{2E_1E_2(1-\cos(\theta))},
$$

where $\theta$ is the angle between the momentum vector of the particles. With this equation it is possible to calculate the invariant mass for the particle pair if energies of the particles and the angle $\theta$ is known.

In experimental particle physics the equation for the invariant mass is often in the form

$$
M = \sqrt{2p_{T1}p_{T2}( \cosh(\eta_1-\eta_2)-\cos(\phi_1-\phi_2) )}, \qquad (2)
$$





## Hands on!

This exercise uses data that contains collisions where two muons have been detected (among with other particles). 
By computing the invariant mass of dimuon pair, we will try to find Z boson!

To indentify Z boson the invariant mass for two muons is calculated for the big amount of collision events. Then a histogram is made from the calcuated values. The histogram shows how many invariant mass values will be in each bin of the histogram.

If a peak is formed in the histogram, it can prove that in the collision events there has been a particle which mass corresponds to the peak. 


### Getting data

In the code below Python programming language will be used to get and analyse data.
Python is widely used in scientific community for computing, modifying and analyzing data, and for these purposes Python is greatly optimized. Part of Python is to use different kind of modules, which are files containing definitions (functions) and statements. 

You can run the code cell by clicking it active and then pressing CTRL + ENTER


In [None]:
# Import the needed modules. Pandas is for the data-analysis, numpy for scientific calculation 
# and matplotlib for making plots.
# Name these to "pd", "np" and "plt".

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


Open data from CMS-experiment is in .csv files. For a computer, this kind of data is easy to read using pandas-module. Saving the read file in a variable makes the variable type dataframe. 

In [None]:
# Create a new DataFrame structure from the file "Zmumu_Run2011A.csv". Name it as "dataset".

dataset = pd.read_csv('http://opendata.cern.ch/record/545/files/Dimuon_DoubleMu.csv')

We can check the content that we saved to the variable _dataset_ by printing the 5 first rows with the following code. This can be done with the function variablename.head():

In [None]:
# Print first 5 rows


#### Can you identify the columns of the table? What are they corresponding to?

### Calculating invariant mass

The invariant mass of dimuon pair is computed using formula (2).

In the calculation below we will use the _numpy_ module which was named as _np_ in the first code cell. With _numpy_ it is possible to use mathematical commands like _sqrt_ and _cosh_ by calling first the name of the module (_np_) and then the command separated by a dot. So for example the square root could be called by writing _np.sqrt( )_.

The _pt1_, _pt2_, _eta1_, _eta2_, _phi1_ and _phi2_ refer to the columns of the data. In the code it has to be told from where the values will be taken. So for example if you want to get the column _pt1_, you have to write _dataset.pt1_ in the code.

Now we are ready to calculate the values of the invariant masses for the different events!

In [None]:
# compute invariant mass 

invariant_mass = 


After the calculation we can check which values were saved in the variable _invariant_\__mass_ by printing the content of the variable:

In [None]:
print(invariant_mass)

### Plotting histogram

Histograms can be created with Python with the matplotlib.pyplot module that was imported before and named as plt. With the function plt.hist() it is possible to create a histogram by giving different parameters inside the brackets. 
Now only the first three of the parameters are needed: a variable from which values the histogram is created (x), number of bins (bins=) and the lower and upper range of the bins (range=()).

In [None]:
# create histogram
plt.hist()

# We can name the axes and the title and show the histogram.
plt.xlabel('Invariant mass [GeV]')
plt.ylabel('Number of events')
plt.show()

#### Describe the histogram. What information you can get from it? How many peaks do you see?


### The effect of pseudorapidity to the resolution of the measurement

In this section it will be shortly studied how does pseudorapidities of muons that are detected in the CMS detector affect to the mass distribution.
As explained before, pseudorapidity $\eta$ describes an angle of which the detected particle has differed from the particle beam (z-axis).

For doing that, two different histograms will be made: an one with only muon pairs with small pseudorapidities and an one with great pseduorapidities.
The histograms will be made with the familiar method from the earlier part of this exercise.

_Note : The division of data set example with variables a and b, satisfying condition c dataset[(example.a > c) & (example.b > c)]_

In [None]:
# Import modules needed for performing this study

# Open dataset


# Set the conditions to large and small etas. 
selection_high = 
selection_small = 

# Create two DataFrames. Select to "large_etas" events where the pseudorapidities
# of the both muons are larger than "selection_large". Select to "small_etas" events where
# the pseudorapidities of the both muons are smaller than "selection_small".
# Note: remember what is the range for pseudorapidity

high_etas = 
small_etas = 

print('The amount of all events = %d' % len(dataset))
print('The amount of events where the pseudorapidity of both muons has been large = %d' %len(high_etas))
print('The amount of events where the pseudorapidity of both muons has been small = %d' %len(small_etas))

### Plot histograms


#### Histogram for high eta


In [None]:
# Save the invariant masses to variable "inv_mass1".
inv_mass1 = high_etas['M']

# Create the histogram from data in variable "inv_mass1". Set bins and range.


# Set y-axis range from 0 to 60.
axes = plt.gca()
axes.set_ylim([0,60])

# Name the axises, give a title and show the histogram.



#### Histogram for low eta


In [None]:
# Plot the histogram for the small eta region


#### Compare the created histograms. Is the mass distribution affected by pseudorapidity of muons? 
#### If yes, how? What could possibly explain your observation?

### The transverse momentum distribution of muons



In the next part of the exercise, plot the distribution of muons transverse momentum.


In [None]:
# Write the code for plotting muon transverse momentum
# Follow the steps from the previous part

#### What can you see from the plots?
#### Study the invariant mass distributon for different muon transverser momentum ranges.