# Exploring ROOT Tuples
This notebook will use the examples given by Dan in his notebook and play around and look at some other variables. This could be considered 'having a play'. The first part of this is to ensure we are using the lb-dog kernel and then import all the necessary libaries.

In [22]:
import uproot as up
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import norm
import os
import sys
import copy
import loadCutPlot as lcp

## Reading in the Tuple
Firstly we need to read in the tuple. We define the fileLoc variable to be the absolute path to this file. Maybe this is less redundant but resolves some complexities for now.

In [4]:
fName="/disk/moose/lhcb/djdt/Lb2L1520mueTuples/MC/2016MD/100FilesCheck/job185-CombDVntuple-15314000-MC2016MD_100F-pKmue-MC.root"

Now we can open the file in the standard way. Note that using the with keyword is reccomended on the UpRoot documentation. 

In [5]:
with up.open(fName) as f:
    print(f.keys())

['DTT1520me;1', 'DTT1520me/DecayTree;1']


Now we look at some different branches within the DTT1520me/DecayTree tree. That is all the keys associated with said tree.

In [7]:
with up.open(fName + ":DTT1520me/DecayTree") as f:
    # print(f.keys())
    print(f.show())
    # It is better to use f.show() as this provides some *much needed* formatting.

name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
Lb_MINIP             | double                   | AsDtype('>f8')
Lb_MINIPCHI2         | double                   | AsDtype('>f8')
Lb_MINIPNEXTBEST     | double                   | AsDtype('>f8')
Lb_MINIPCHI2NEXTBEST | double                   | AsDtype('>f8')
Lb_ENDVERTEX_X       | double                   | AsDtype('>f8')
Lb_ENDVERTEX_Y       | double                   | AsDtype('>f8')
Lb_ENDVERTEX_Z       | double                   | AsDtype('>f8')
Lb_ENDVERTEX_XERR    | double                   | AsDtype('>f8')
Lb_ENDVERTEX_YERR    | double                   | AsDtype('>f8')
Lb_ENDVERTEX_ZERR    | double                   | AsDtype('>f8')
Lb_ENDVERTEX_CHI2    | double                   | AsDtype('>f8')
Lb_ENDVERTEX_NDOF    | int32_t                  | AsDtype('>i4')
Lb_ENDVERTEX_COV_    | float[3][3]              | AsDtype(

L0Data_Muon1_Pt      | int32_t                  | AsDtype('>i4')
L0Data_Muon1_Sgn     | int32_t                  | AsDtype('>i4')
L0Data_Muon2_Pt      | int32_t                  | AsDtype('>i4')
L0Data_Muon2_Sgn     | int32_t                  | AsDtype('>i4')
L0Data_Muon3_Pt      | int32_t                  | AsDtype('>i4')
L0Data_Muon3_Sgn     | int32_t                  | AsDtype('>i4')
L0Data_PUHits_Mult   | int32_t                  | AsDtype('>i4')
L0Data_PUPeak1_Cont  | int32_t                  | AsDtype('>i4')
L0Data_PUPeak1_Pos   | int32_t                  | AsDtype('>i4')
L0Data_PUPeak2_Cont  | int32_t                  | AsDtype('>i4')
L0Data_PUPeak2_Pos   | int32_t                  | AsDtype('>i4')
L0Data_Photon_Et     | int32_t                  | AsDtype('>i4')
L0Data_Spd_Mult      | int32_t                  | AsDtype('>i4')
L0Data_Sum_Et        | int32_t                  | AsDtype('>i4')
L0Data_Sum_Et,Next1  | int32_t                  | AsDtype('>i4')
L0Data_Sum_Et,Next2  | in

Now we can open a particular variables data as either a numpy array or a Pandas dataframe. We elect to use both some of the functions Dan wrote are numpy only but can probably be adapted for use in Pandas.

In [17]:
with up.open(fName + ":DTT1520me/DecayTree") as f:
    eventNum = f["eventNumber"].array(library="pd") # Using 'np' here will give you a numpy version
    
print(eventNum.head(), type(eventNum))

0    4826
1    4826
2    4826
3    4826
4    4826
dtype: uint64 <class 'pandas.core.series.Series'>


As can be seen we have a lovely Pandas series object with the index column and then the event number in the next column. You should probably rename the columns in the above just to keep things pretty and tidy. 

### Reading Multiple Arrays at Once
We can also read in multiple variables or arrays at once, which is significantly faster and easier than doing it one at a time. An example is given below where the eventNumber, runNumber and arbitrarily the PVNTRACKS are read in.

In [18]:
with up.open(fName + ":DTT1520me/DecayTree") as f:
    df = f.arrays(["eventNumber", "runNumber", "PVNTRACKS"], library="pd")
    # You can also use Wildcards to call in particular sets of varialbles the example is commented out below
    # allPT=f.arrays(filter_name="*_PT",library="pd")
    
print(df.head())

                eventNumber  runNumber  PVNTRACKS
entry subentry                                   
0     0                4826   14469097       91.0
      1                4826   14469097       10.0
1     0                4826   14469097       91.0
      1                4826   14469097       10.0
2     0                4826   14469097       91.0


Now consider doing calculations with Pandas maybe a little more strenuous but I love the organisation of the dataframes. 

In [None]:
with up.open(fName + ":DTT1520me/DecayTree") as f:
    df = f.arrays(["Lb_PX","Lb_PY","Lb_PZ", 'Lb_P'], library="pd")
    # You can also use Wildcards to call in particular sets of varialbles the example is commented out below
    # allPT=f.arrays(filter_name="*_PT",library="pd")
    
print(df.head())

"Lb_PX","Lb_PY","Lb_PZ"