Since we're doing a dijet analysis, we'll want to use TLorentzVectors to do things like computing the invariant mass of a two-jet system. But TLorentzVectors are notoriously slow in pyROOT. Even if this weren't the case, looping over big trees is really something you should never do in pyROOT. But pretty much everything besides those CPU-intensive tasks is better in pyROOT :-P

Please take a look at the minimal changes made to `hatsTrees.C` and `hatsTrees.h` that you can find in the `sample_code` directory. A good philosophy with using `TTree.MakeClass()` is to change as little as possible. Please read the below diff -- it contains useful tips on e.g. setting up the class to take arguments. Without the comments, there are about 30 lines of code added, but they're sufficient for all the heavy lifting in the calculation of complicated physical variables.

In [None]:
!diff hatsTrees.C sample_code/hatsTrees.C  # could try adding --side-by-side to this command

Now that we've prepared our C++ class to do the heavy lifting, we will create a python-environment script where we can load it and use it to process our big datasets, while leveraging python to do the things that are annoying in C++. We'll design it to be suitable for use in batch submissions. Please follow along by looking at sample_code/runHatsTrees.py 
> Now let's go through `runHatsTrees.py`

Note that runHatsTrees.py has an OptionParser defined. OptionParser is a commonly used bit of python that will automatically generate a help message for someone trying to use the script. Let's see what it says:

In [None]:
!python sample_code/runHatsTrees.py --help

Let's try running it according to the help message (this will take some time)

In [None]:
!python sample_code/runHatsTrees.py -i /store/user/hats/PyRoot/2017/qcd_samples/QCD_HT1000to1500_0_0 -o hatsTrees_QCD_HT700to1000_0_0.root -t "ntuplizer/tree"

In [None]:
!which xrdfs

In [None]:
!ls output

How much data did we just process? We can check using the XRootD bindings for Python.

In [None]:
from XRootD import client
xrdClient = client.FileSystem("root://cmseos.fnal.gov/")

processedDir = "/store/user/hats/PyRoot/2017/qcd_samples/QCD_HT700to1000_0_0/"
(status, files) = xrdClient.dirlist(processedDir)
bytes = 0
for file in files:
    (status, info) = xrdClient.stat(processedDir + file.name)
    bytes += info.size
print "bytes:", bytes
print "gigabytes:", bytes/float(1024**3)


Let's make sure that our output looks reasonable.

In [None]:
import ROOT as r
firstHatsFile = r.TFile("output/hatsTrees_QCD_HT700to1000_0_0.root")
firstHatsFile.ls()

In [None]:
firstHatsTree = firstHatsFile.Get("hatsDijets")
firstHatsTree.Print()

In [None]:
can = r.TCanvas()
can.SetLogy()
firstHatsTree.Draw("dijetMass")
can.Draw()

Now that we've seen that our python script works running over one of our input files, we're ready to do a batch submission to process all of our ntuples. However, during this HATS session, we won't actually submit the jobs. They have been made already and you can find them here:

In [None]:
!xrdfs root://xrootd.accre.vanderbilt.edu ls /store/user/hats/PyRoot/2017/hatsDijetTrees/ | sort -u

If you are interested in using a python script of this sort in batch submission, please see `sample_code/condorSubmission` for an example.

>Please continue on in `backgroundHists.ipynb`.