<br>
 <img src="https://github.com/cms-opendata-education/cms-jupyter-materials-finnish/blob/master/Kuvat/CMSlogo_color_label_1024_May2014.png?raw=true"  align="right" width="100px" title="CMS projektin oma logo"> 
 <br>

# Open data workshop for CERN summer students

In a regular outreach-notebook there would be an introduction to CERN, CMS-detector and basic particle physics concepts such as the standard model. To save time, we will skip this part as many of you are already familiar with these subjects. 

If you are interested in seeing the introduction part, give [this example exercise](https://mybinder.org/v2/gh/cms-opendata-education/zboson-exercise/master) a look.

<img src="https://highenergy.physics.uiowa.edu/application/files/9815/4732/1791/CMS_Detector.png"  align="center" width="900px" title="CMS detector at CERN"> 

## Task

Summer student at CERN has a task to study the data from proton-proton collisions at the CMS-detector from 2011. This data contains events where two muons were observed. However, the summer student has been really careless and he has somehow managed to divide the data into six weirdly named datafiles. Each datafile contains data from a decay of some particle. The desperate summer student begs for your help to define which data corresponds to which particle. Are you able to help him?

## How can I do that?

It is known from previous research that multiple particles can decay into two muons and that the initial particles can be distinguished by calculating the invariant mass of the muons. With the CMS-detector we can measure the energy and momentum of the muons. If the energy and momentum of two muons are known, the invariant mass of those muons is

$M = \sqrt{(E_1 + E_2)^2 - \|\textbf{p}_1 + \textbf{p}_2 \| ^2}$,

where $\|\textbf{p}_1 + \textbf{p}_2 \|^2$ is the square of the vector norm. This can be calculated by

$\|\textbf{p}_1 + \textbf{p}_2 \|^2=(p_{x1}+p_{x2})^2+(p_{y1}+p_{y2})^2+(p_{z1}+p_{z2})^2$.
 
If the muons were initially from the same particle, the invariant mass corresponds to the mass of the parent particle. If the muons were not from the same particle, the value means nothing.

Now we just need to calculate the invariant mass for each pair of muons and plot a histogram to see if there is something interesting going on!

## Analysing the data using Python

The steps needed to plot the histogram are:
1. Import the needed python modules. You can use pandas to read csv-file, numpy to make calculations and matplotlib.pylab to make plots. 
1. Read the datafile.
1. Calculate the invariant masses for each event.
1. Plot the histogram.
1. Compare the histogram to the complete muon spectrum at the bottom of this page and determine which particle decayed into two muons.

If you're new to programming or Python, you can follow the guide to do the histogram. If you are more experienced you can also try it on your own!

### Step 1: Import the modules

In programming, **modules** are just packages of functions, which can be used to perform certain tasks. In order to be able to read a csv-datafile you need a module called _pandas_. To be able to perform advanced calculations (such as taking the square root) you can use a module called _numpy_. Finally, for plotting the histogram you need a module called _matplotlib.pylab_. In Python you can import modules by writing

> $\color{green}{\text{import}}\text{ package_name }\color{green}{\text{as}}\text{ abbrevation}$

After that you can just use the abbrevation to access that module. For example let's calculate the square root of number 4 using the _numpy_-module:

In [None]:
# This is a code-cell. You can run this cell by clicking it active and pressing CTRL+ENTER.

import numpy as np
np.sqrt(4)

Now we have already imported the _numpy_-module. Next you should import the _pandas_ and the _matplotlib.pylab_ modules. The standard abbrevations for those modules are pd and plt, respectively.

In [None]:
# Import modules here

### Step 2: Read the datafile

Next, we need to read the datafile. The datafiles are located in a GitHub repository and in order to access them we need their URL-addresses. Here are the URLs:

peakdata1: https://raw.githubusercontent.com/cms-opendata-education/cms-jupyter-materials-english/master/Data/peakdata1.csv
peakdata2: https://raw.githubusercontent.com/cms-opendata-education/cms-jupyter-materials-english/master/Data/peakdata2.csv
peakdata3: https://raw.githubusercontent.com/cms-opendata-education/cms-jupyter-materials-english/master/Data/peakdata3.csv
peakdata4: https://raw.githubusercontent.com/cms-opendata-education/cms-jupyter-materials-english/master/Data/peakdata4.csv
peakdata5: https://raw.githubusercontent.com/cms-opendata-education/cms-jupyter-materials-english/master/Data/peakdata5.csv
peakdata6: https://raw.githubusercontent.com/cms-opendata-education/cms-jupyter-materials-english/master/Data/peakdata6.csv

You can use the *read_csv()*-function in the _pandas_-module to read your file. Choose the file corresponding to your group number, read it and save it's contents to a variable. The function is called as

> $\text{variable = pd.}\color{blue}{\text{read_csv}}\text{(url)}$

To see what kind of data this file contains, you can see some rows of the data by writing

> $\text{variable.head()}$

**Optional**: If you're curious how large the datafile is, you can print the length of the file by

> $\color{green}{\text{print}}\text{(}\color{green}{\text{len}}\text{(variable))}$

In [None]:
# Read datafile here

### Step 3: Calculate the invariant masses

Now that we know what kind of data we have, we can calculate the invariant mass of the two muons. We need to extract the needed columns from the datafile. You can do this by writing

> $\text{column }\color{purple}{\text{=}}\text{ variable.}\color{blue}{\text{column_title}}$

Now you can calculate the invariant mass by using the equation above. You can calculate the square of a value by writing

> $\text{value}\color{purple}{\text{**}}\color{green}{\text{2}}$

Remember to save the invariant mass values to a variable. Note that if you do the calculation by using entire columns, you can calculate all events at once.

In [None]:
# Calculate invariant mass here

### Step 4: Plot the histogram

To plot a histogram, you need to know a couple of functions from the _matplotlib.pylab_-module. The basic functions you need are

> $\text{plt.}\color{blue}{\text{hist}}\text{(variable_to_plot, bins=number_of_bins)}$

> $\text{plt.}\color{blue}{\text{title}}\text{('main_title')}$

> $\text{plt.}\color{blue}{\text{xlabel}}\text{('x_axis_title')}$

> $\text{plt.}\color{blue}{\text{ylabel}}\text{('y_axis_title')}$

> $\text{plt.}\color{blue}{\text{show}}\text{()}$

Now you should be able to plot the histogram.

**Optional**: You can zoom into a specific range by adding _range_-attribute to the _hist_-function:

> $\text{plt.}\color{blue}{\text{hist}}\text{(variable_to_plot, bins}\color{purple}{\text{=}}\color{green}{\text{number_of_bins}}\text{, }\color{green}{\text{range}}\color{purple}{\text{=}}\color{green}{\text{(min,max))}}$

In [None]:
# Plot the histogram here

### Step 5: Compare the histogram

Compare your histogram to the figure below and determine, which particle is present in your slice of data. The desperate summer student will be grateful!

<img src="https://github.com/cms-opendata-education/cms-jupyter-materials-finnish/blob/master/Kuvat/inv_massa.PNG?raw=true"  align="right" width="600px" title="Invariant mass spectrum for two muon events observed by the CMS-detector.">

 |	Particle | Mass [GeV]    |
|----------|:-----------------:|
| η (eta) | 0.548 |
|  ρ,  (rho) | 0.775|
|    ω (omega)     |  0.782     |
|   φ (phi)     | 1.019  |
|   J/ψ (J/psi) |  3.097  |
|   ψ’ (psi) | 3.686 |
| Υ (ypsilon) | 9.460 |
| Z-boson  | 91.188 |

**Note**: It is also possible to do this directly from ROOT-data but for educational use csv-files are much more convenient.