# Add column to a DataFrame

This example demonstrates how to import a csv.-file, calculate using its values and how to add calculated values to the original table. In this example we calculate the invariant masses for different events.

### Getting the data

We need to import packages *pandas* and *numpy* in order to read files and make calculations.

In [1]:
import pandas as pd
import numpy as np

Read data from file *Zmumu_Run2011A.csv* which is located in the folder *Data*. Save the information into a variable  *dataset*.

In [2]:
dataset = pd.read_csv('../Data/Zmumu_Run2011A.csv')

We can check the content that we saved to the variable _dataset_ by printing the 5 first rows with the following code

In [3]:
dataset.head()

Unnamed: 0,Run,Event,pt1,eta1,phi1,Q1,dxy1,iso1,pt2,eta2,phi2,Q2,dxy2,iso2
0,165617,74969122,54.7055,-0.432396,2.57421,1,-0.074544,0.499921,34.2464,-0.98848,-0.498704,-1,0.071222,3.42214
1,165617,75138253,24.5872,-2.0522,2.86657,-1,-0.055437,0.0,28.5389,0.385163,-1.99117,1,0.051477,0.0
2,165617,75887636,31.7386,-2.25945,-1.33229,-1,0.087917,0.0,30.2344,-0.468419,1.88331,1,-0.087639,0.0
3,165617,75779415,39.7394,-0.712338,-0.312266,1,0.058481,0.0,48.279,-0.195625,2.97032,-1,-0.049201,0.0
4,165617,75098104,41.2998,-0.157055,-3.04077,1,-0.030463,1.22804,43.4508,0.590958,-0.042756,-1,0.044175,0.0


and also the type of the variable. We need to know the type since we want to combine our calcutations with this variable.

In [4]:
type(dataset)

pandas.core.frame.DataFrame

### Performing the calculation

Let's use the following expression for the invariant mass $M$ in the calculation

$$M = \sqrt{2p_{T1}p_{T2}(\cosh(\eta_1-\eta_2)-\cos(\phi_1-\phi_2))}.$$
and use *numpy (np)* for performing the calculation.

In [5]:
invariant_mass = np.sqrt(2*dataset.pt1*dataset.pt2*(np.cosh(dataset.eta1-dataset.eta2) - np.cos(dataset.phi1-dataset.phi2)))

After the calculation we can check which values were saved in the variable _invariant_\__mass_ by printing the content of the variable:

In [6]:
print(invariant_mass)

0         89.885744
1         88.810987
2         88.472502
3         90.332620
4         90.514507
5         78.860094
6         92.362439
7         63.757254
8         93.118647
9         92.941701
10        88.896587
11        85.363437
12        87.934228
13        90.414225
14        65.034823
15        89.348601
16        91.297563
17        92.258489
18        92.493002
19        68.358712
20        92.112023
21        91.171556
22        85.909734
23        91.645777
24        88.392430
25        91.760237
26        91.179805
27        91.785881
28        91.349970
29        99.112667
            ...    
10553     91.027596
10554     93.261764
10555     76.810535
10556     80.206206
10557     89.126026
10558     96.116627
10559     89.928690
10560     87.901985
10561     90.919888
10562     90.767179
10563     63.837491
10564     76.605770
10565     67.312657
10566     89.984530
10567     94.494826
10568     89.234208
10569     91.644511
10570     67.081134
10571     91.920312


Let's add the column of invariant masses to the original *dataset* which is the type DataFrame. First we need to know the type of *invariant_mass*

In [7]:
type(invariant_mass)

pandas.core.series.Series

Since *invariant_mass* is a Series, we need to convert it into a DataFrame. Let's name the conversion *inv_masses* and give the column a heading *M*.

In [8]:
inv_masses = invariant_mass.to_frame('M')

Now we can combine *dataset* with *inv_masses* using the command *merge*. Let's save it into a variable *all_data*.

In [9]:
all_data = dataset.merge(inv_masses, left_index=True, right_index=True)
all_data.head()

Unnamed: 0,Run,Event,pt1,eta1,phi1,Q1,dxy1,iso1,pt2,eta2,phi2,Q2,dxy2,iso2,M
0,165617,74969122,54.7055,-0.432396,2.57421,1,-0.074544,0.499921,34.2464,-0.98848,-0.498704,-1,0.071222,3.42214,89.885744
1,165617,75138253,24.5872,-2.0522,2.86657,-1,-0.055437,0.0,28.5389,0.385163,-1.99117,1,0.051477,0.0,88.810987
2,165617,75887636,31.7386,-2.25945,-1.33229,-1,0.087917,0.0,30.2344,-0.468419,1.88331,1,-0.087639,0.0,88.472502
3,165617,75779415,39.7394,-0.712338,-0.312266,1,0.058481,0.0,48.279,-0.195625,2.97032,-1,-0.049201,0.0,90.33262
4,165617,75098104,41.2998,-0.157055,-3.04077,1,-0.030463,1.22804,43.4508,0.590958,-0.042756,-1,0.044175,0.0,90.514507


As you can see, the calculated invariant masses are now in the last column with heading *M*.