# Explore a root ntuple using uproot

Uproot is a python library allowing to easily use root-based ntuples (TTrees) together with common python analysis libraries such as `matplotlib` and `numpy`. Further details can be found inside [uproot.readthedocs.io](https://uproot.readthedocs.io/en/latest/) and [https://github.com/scikit-hep/uproot3](https://github.com/scikit-hep/uproot3). Inside this notebook we want to provide a few examples about how to import a root ntuple inside this framework and how to quickly take look ntuple variables.

Firstly, let's import uproot:

In [2]:
import uproot as ur

If you get no error it means you imported this correctly. Congrats! :)

Now it's time to do more serious stuff. First of all we need to load the root ntuple from a root file.

In [3]:
filename='/eos/home-v/valentem/work/ATLAS/HH/HL-LHC/grid_outputs/user.valentem.HH4B.600463.hh_signals..AB21.2.163.valentem_hllhc_2021-04-29_2.full_nonres_MiniNTuple.root/user.valentem.25450677._000001.MiniNTuple.root'
file = ur.open(filename)
file.keys()

['XhhMiniNtuple;1',
 'cutflow_XhhMiniNtuple;1',
 'cutflow_weighted_XhhMiniNtuple;1',
 'MetaData_EventCount_XhhMiniNtuple;1']

The output should tell you that there is a `TTree` called `XhhMiniNtuple` and 3 other histograms (these shows some data information, but they are not really useful now). In order to load the TTree in python you can do the following:

In [4]:
tree_name='XhhMiniNtuple'
tree = file[tree_name]
print(tree)

<TTree 'XhhMiniNtuple' (79 branches) at 0x7f410fbe2ac8>


And if it's all successful you should see as output a printout of the tree object! :) Now, you can look at which branches this TTree contains by using again the `keys()` method, similarly to a standard python dictionary:

In [5]:
tree.keys()

['runNumber',
 'eventNumber',
 'lumiBlock',
 'coreFlags',
 'bcid',
 'mcEventNumber',
 'mcChannelNumber',
 'mcEventWeight',
 'NPV',
 'actualInteractionsPerCrossing',
 'averageInteractionsPerCrossing',
 'weight_pileup',
 'correctedAverageMu',
 'correctedAndScaledAverageMu',
 'correctedActualMu',
 'correctedAndScaledActualMu',
 'rand_run_nr',
 'rand_lumiblock_nr',
 'nresolvedJets',
 'resolvedJets_E',
 'resolvedJets_pt',
 'resolvedJets_px',
 'resolvedJets_py',
 'resolvedJets_pz',
 'resolvedJets_phi',
 'resolvedJets_eta',
 'resolvedJets_ConeTruthLabelID',
 'resolvedJets_TruthCount',
 'resolvedJets_TruthLabelDeltaR_B',
 'resolvedJets_TruthLabelDeltaR_C',
 'resolvedJets_TruthLabelDeltaR_T',
 'resolvedJets_PartonTruthLabelID',
 'resolvedJets_GhostTruthAssociationFraction',
 'resolvedJets_truth_E',
 'resolvedJets_truth_pt',
 'resolvedJets_truth_phi',
 'resolvedJets_truth_eta',
 'ntruth',
 'truth_E',
 'truth_pt',
 'truth_px',
 'truth_py',
 'truth_pz',
 'truth_phi',
 'truth_eta',
 'truth_pdgId',


If all it's right you should see a bunch of keys telling you what is stored in the ntuple. You won't need all these variables but the important ones will be the ones called `resolvedJets_*` which, for each event, will contain a `vector<float>` corresponding to the jets energy, transverse momentum $p_T$, etc. Another way to explore the TTree is through the `show()` method:

In [11]:
tree.show()
print('\nThe number of entries for this TTree is', tree.num_entries)

name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
runNumber            | int32_t                  | AsDtype('>i4')
eventNumber          | int64_t                  | AsDtype('>i8')
lumiBlock            | int32_t                  | AsDtype('>i4')
coreFlags            | uint32_t                 | AsDtype('>u4')
bcid                 | int32_t                  | AsDtype('>i4')
mcEventNumber        | int32_t                  | AsDtype('>i4')
mcChannelNumber      | int32_t                  | AsDtype('>i4')
mcEventWeight        | float                    | AsDtype('>f4')
NPV                  | int32_t                  | AsDtype('>i4')
actualInteraction... | float                    | AsDtype('>f4')
averageInteractio... | float                    | AsDtype('>f4')
weight_pileup        | float                    | AsDtype('>f4')
correctedAverageMu   | float                    | AsDtype(

which also tells you the variable type associated to each branch. :) Cool right?! Now we can start to explore some numbers from the tree, for example for the jet transverse momentum called `resolvedJets_pt`.

In [20]:
jet_pts = tree['resolvedJets_pt'].array()
print(jet_pts)

[[85.9, 70.5, 59.7, 59, 57.9, 52.5, 46.2, ... 24.1, 23.4, 21.5, 21.2, 20.9, 20.4]]


The loaded object will be a list containing all the jet $p_T$ for all the events. For example, for a specific event we can load the pT of the jets with:

In [43]:
evt_num=1
print(jet_pts[evt_num],len(jet_pts[evt_num]))

[146, 85, 73.8, 62.2, 51.9, 50.4, 49.9, ... 23.9, 23.8, 22.6, 22.5, 22.1, 21.8, 21.3] 29


This vector is the vector containing the $p_T$ or all the jets in the event corresponding to index 1. You can see that this event has a total of 29 jets (this is quite high because of pileup). You can explore other events by changing the index. :)

For the same event, you can look also at other variables of the jets, for example the $\eta$ and $\phi$ coordinates:

In [44]:
jet_eta = tree['resolvedJets_eta'].array()
jet_phi = tree['resolvedJets_phi'].array()

print(jet_eta[evt_num])
print(jet_phi[evt_num])

[-1.89, -1.98, 2.11, -4.41, -3.65, 4.65, ... 1.14, 0.648, -2.44, -2.31, -1.06, 1.92]
[-2.69, 0.273, 1.24, -2.84, 0.681, 2.66, ... -1.7, -1.69, -2.8, 2.77, -2.33, -2.82]


We can even apply some selections using standard `numpy` array operations. For example, if I want to check the $\eta$ coordinate of all the jets with $p_T > 100\;\text{GeV}$ what I can do is:

In [47]:
selection=jet_pts>100.
print(selection) #This will correspond to an array of booleans arrays (True or False) specifying if the jet in this position satisfies indeed pT > 100

[[False, False, False, False, False, False, ... False, False, False, False, False]]


We can then select all the jets passing this selection with:

In [48]:
jet_pts_100GeV = jet_pts[selection]
print(jet_pts_100GeV)

[[], [146], [283, 243, 139], [166, 138], ... [125], [195, 170], [185, 150, 126]]


If everything is right, you should see that all the pT of `jet_pts_100GeV` is larger than 100 GeV. You can do the same thing with the jet_eta, phi, etc. For example:

In [49]:
jet_eta_100GeV = jet_eta[selection]
jet_phi_100GeV = jet_phi[selection]
print(jet_eta_100GeV)

[[], [-1.89], [0.73, 1.51, 2.02], ... [0.0325, -0.0159], [-1.94, -1.51, 1.8]]


This will tell you the $\eta$ coordinate of the jets having $p_T$ greater than $100\;\text{GeV}$. Now, what you can do is: learn more about uproot and the variables stored in the ntuples. You don't need to understand all of those for now, but try to understand what these are as much as you can by making selections about some of these.