# Test of CSV file produced using ROOT converter

In this notebook I'll test the macro, written in python, used for converting a ROOT structure (Tree) into a CSV file for ML analysis.

The CSV is located inside the directory output, created by the python macro **examplemacro.py**:

In [1]:
!ls ../MuonPOGAnalysisTemplate/

ls: ../MuonPOGAnalysisTemplate/: No such file or directory


In [2]:
%cd ../MuonPOGAnalysisTemplate/output
!head -5 output_bxcut.csv

[Errno 2] No such file or directory: '../MuonPOGAnalysisTemplate/output'
/Users/tommaso/TESI_MAGISTRALE/MuonPOGAnalysisTemplate/Jupyter_Notebooks_miscellaneous
head: output_bxcut.csv: No such file or directory


To analyze it, we need the ROOT python module and pandas for the CSV analysis:

In [3]:
import pandas as pd
import numpy as np

Now we need to read the CSV:

In [4]:
df2 = pd.read_csv('../output/output_bxcut2.csv')
df2

Unnamed: 0,Event,dtPrimitive.id_r,dtPrimitive.id_eta,dtPrimitive.id_phi,dtPrimitive.phiGlb(),dtPrimitive.phiB,genParticle.pt
0,1,1,1,2,0.547525,-91.0,9.452828
1,1,2,1,2,0.519693,-71.0,9.452828
2,1,3,2,2,0.499429,-25.0,9.452828
3,1,4,2,2,0.489663,-22.0,9.452828
4,2,4,1,1,-0.322266,-3.0,164.478928
5,2,3,1,12,5.961246,-2.0,164.478928
6,3,1,0,4,1.816158,-10.0,91.713860
7,3,3,0,4,1.812007,-6.0,91.713860
8,3,4,0,4,1.811519,13.0,91.713860
9,4,2,-2,10,4.605700,-2.0,139.442352


We sort the dataframe with ascending order of sector primitive

In [5]:
df2 = df2.sort_values(["Event","dtPrimitive.id_r"])
df2 = df2.reset_index(drop=True)
df2["dtPrimitive.phiB"] = df2["dtPrimitive.phiGlb()"] + df2["dtPrimitive.phiB"]/512.

### Change CSV structure for a suitable ML format

Using the following code, the structure of the input CSV changes. Instead of considering each line as a single primitive, now each line represents a single event (with multiple primitives structured by columns).

The main difference, therefore, is an increasing number of columns: one for each primitive.

In [6]:
a = 1
new_column = []
row_list=[]
final_row=[]
for column in df2.columns.values.tolist():
    if column==df2.columns.values.tolist()[0]:
        new_column.append(column)
        new_column.append("n_Primitive")
        continue
    for count in range(1,df2["Event"].value_counts().max()+1):
        if "()" in column:
            column = column.replace("()", "")
        new_column.append(str(count) + column)
        
df = pd.DataFrame(columns=new_column)
for index, row in df2.iterrows():
    if (row["Event"] == a):
        row_list.append(row.tolist())
        continue
    elif(row["Event"] != a):
        for i in range(0,len(df2.columns)):
            if i==0:
                final_row.append(row_list[0][0])
                final_row.append(len(row_list))
                continue
            for j in range(0,len(row_list)):
                final_row.append(row_list[j][i])
                if j == len(row_list)-1:   
                    if len(row_list) != df2["Event"].value_counts().max():
                        final_row += [0]*(df2["Event"].value_counts().max()-len(row_list))
        a = row["Event"]
        df.loc[row["Event"]-1] = final_row
        del row_list[:]
        del final_row[:]
        row_list.append(row.tolist())

for i in range(0,len(df2.columns)):
    if i==0:
        final_row.append(row_list[0][0])
        final_row.append(len(row_list))
        continue
    for j in range(0,len(row_list)):
        final_row.append(row_list[j][i])
        if j == len(row_list)-1:   
            if len(row_list) != df2["Event"].value_counts().max():
                final_row += [0]*(df2["Event"].value_counts().max()-len(row_list))
df.loc[row["Event"]] = final_row

In [7]:
df = df.rename(columns = {"1genParticle.pt":"genParticle.pt","1dtPrimitive.phiGlb()":"1dtPrimitive.phiGlb"})

This is the new csv table, using the new format.

In [8]:
df

Unnamed: 0,Event,n_Primitive,1dtPrimitive.id_r,2dtPrimitive.id_r,3dtPrimitive.id_r,4dtPrimitive.id_r,5dtPrimitive.id_r,1dtPrimitive.id_eta,2dtPrimitive.id_eta,3dtPrimitive.id_eta,...,1dtPrimitive.phiB,2dtPrimitive.phiB,3dtPrimitive.phiB,4dtPrimitive.phiB,5dtPrimitive.phiB,genParticle.pt,2genParticle.pt,3genParticle.pt,4genParticle.pt,5genParticle.pt
1.0,1.0,4.0,1.0,2.0,3.0,4.0,0.0,1.0,1.0,2.0,...,0.369790,0.381021,0.450601,0.446694,0.000000,9.452828,9.452828,9.452828,9.452828,0.000000
2.0,2.0,2.0,3.0,4.0,0.0,0.0,0.0,1.0,1.0,0.0,...,5.957340,-0.328125,0.000000,0.000000,0.000000,164.478928,164.478928,0.000000,0.000000,0.000000
3.0,3.0,3.0,1.0,3.0,4.0,0.0,0.0,0.0,0.0,0.0,...,1.796626,1.800289,1.836910,0.000000,0.000000,91.713860,91.713860,91.713860,0.000000,0.000000
4.0,4.0,4.0,1.0,2.0,3.0,4.0,0.0,-1.0,-2.0,-2.0,...,4.611315,4.601793,4.606920,4.663317,0.000000,139.442352,139.442352,139.442352,139.442352,0.000000
5.0,5.0,3.0,1.0,3.0,4.0,0.0,0.0,1.0,2.0,2.0,...,5.297755,5.040431,5.527980,0.000000,0.000000,198.223480,198.223480,198.223480,0.000000,0.000000
6.0,6.0,1.0,3.0,0.0,0.0,0.0,0.0,-2.0,0.0,0.0,...,4.424386,0.000000,0.000000,0.000000,0.000000,47.726688,0.000000,0.000000,0.000000,0.000000
7.0,7.0,3.0,2.0,3.0,4.0,0.0,0.0,1.0,1.0,1.0,...,3.996652,3.997384,3.957428,0.000000,0.000000,146.894150,146.894150,146.894150,0.000000,0.000000
8.0,8.0,4.0,1.0,2.0,3.0,4.0,0.0,1.0,2.0,2.0,...,4.819323,4.818102,4.822985,4.822741,0.000000,149.884445,149.884445,149.884445,149.884445,0.000000
9.0,9.0,2.0,2.0,3.0,0.0,0.0,0.0,2.0,2.0,0.0,...,1.030352,1.035967,0.000000,0.000000,0.000000,74.841820,74.841820,0.000000,0.000000,0.000000
10.0,10.0,4.0,1.0,2.0,3.0,4.0,0.0,0.0,0.0,0.0,...,1.978673,1.993321,1.988682,1.988194,0.000000,159.808716,159.808716,159.808716,159.808716,0.000000


In [9]:
for i in range(2,df2["Event"].value_counts().max()+1):
    title = str(i) + "genParticle.pt"
    df = df.drop(title,1)

In [10]:
for column in df.columns.values.tolist():
    if column[-1] == '5':
        df = df.drop(column,axis=1)

This line changes the status for the sector that receives a muon. Instead of using the number of radial sector, a binary value (0 or 1) is used. This should make the work much easier for the ML algorithm.

In [11]:
df = df[df.n_Primitive != 5.0]
df.loc[df["1dtPrimitive.id_r"] != 0, '1dtPrimitive.id_r'] = 1.0
df.loc[df["2dtPrimitive.id_r"] != 0, '2dtPrimitive.id_r'] = 1.0
df.loc[df["3dtPrimitive.id_r"] != 0, '3dtPrimitive.id_r'] = 1.0
df.loc[df["4dtPrimitive.id_r"] != 0, '4dtPrimitive.id_r'] = 1.0

Then put the organized table inside a csv file.

In [13]:
df.to_csv("../output/bxcut_org2.csv",na_rep=0,index=False)