Now that we have our hatsTrees that have the interesting physical variables calculated, let's do some analysis with them. To combine our MC background samples, we will need to compute weights for them. Fortunately, we have these defined in python `.ini` files.

`.ini` files are a standard format for python configuration files. They have a simple syntax and are quite flexible -- this is another example of where python can help us from falling in the trap of re-inventing the wheel by writing custom code for every simple task, like parsing text files.

Let's take a look at `hatsConfig.ini`.

In [None]:
!cat hatsConfig.ini

In [None]:
from ConfigParser import RawConfigParser
config = RawConfigParser()   
config.optionxform = str       # Last two lines are done because ConfigParser will not preserve case
config.read("hatsConfig.ini")

Here is a nifty way to create a dict of the cross sections and number of events:

In [None]:
crossSections = dict([sample, float(xsec)] for sample, xsec in config.items('hatsXsects'))
nProcessed    = dict([sample, int(nPro)] for sample, nPro in config.items('hatsNprocessed'))

from pprint import pprint
print "cross sections:" 
pprint(crossSections)
print "number of events processed:"
pprint(nProcessed)

python `dict`s are extremely useful, because we can give descriptive names to the data they hold. Let's use our dicts to calculate the weights for our MC background samples.

In [None]:
weights = {}
luminosity = 1.42    # This is just an example value
for sample in crossSections.keys():
    weights[sample] = luminosity * crossSections[sample]/nProcessed[sample]
pprint(weights)

Now we will use the Python XRootD bindings to access all of our input files. Since our input files live on eos, we follow the recommendations on http://uscms.org/uscms_at_work/computing/LPC/usingEOSAtLPC.shtml, which instructs us to always list and open files via XRootD. First, we can use the Python XRootD files to look at our input directory, as we did in firstLook.ipynb using shell commands.

In [None]:
from XRootD import client
xrdClient = client.FileSystem("root://cmseos.fnal.gov//")
hatsTreesDir = "//store/user/hats/PyRoot/2017/hatsDijetTrees"
status, dirList = xrdClient.dirlist(hatsTreesDir)
for entry in dirList:
    print "file host:", entry.hostaddr, "  file name:", entry.name

For ROOT to open the input file, it will want a full url in the form `root://host:port//the/location/on/eos/file.root`. So we can make a dict to store what we need to build a full url, using a clever list comprehension. In the dict, we will store a tuple that separates out the logical filename from the rest of the full url.

In [None]:
from os import path
sampleDirURLs = {}
for sample in crossSections.keys():
    [matchingDir] = [("root://" + entry.hostaddr, path.join(hatsTreesDir, entry.name)) for entry in dirList if sample in entry.name]
    sampleDirURLs[sample]=matchingDir  
pprint(sampleDirURLs)

We can make dicts to hold TChains of all our data, and then draw them with weights. Also in this cell, we use the Python bindings for XRootD to generate our list of input files.

In [None]:
hatsChains = {}
import ROOT as r
r.gDebug = 1
for sample in crossSections.keys():
    chain = r.TChain('hatsDijets')
    status, fileList = xrdClient.dirlist(sampleDirURLs[sample][1]) # dirlist takes the logical filename
    for hatsFile in fileList:
        chain.Add(sampleDirURLs[sample][0] + path.join(sampleDirURLs[sample][1], hatsFile.name))  # ROOT takes the full url
    hatsChains[sample] = chain
pprint(hatsChains)

Now we can try to make weighted histograms of all the MC backgrounds using TChain.Draw(), and put them into a stackplot. Here we run into a classic pyROOT gotcha: it's not easy to prevent root from garbage collecting your histograms. It's best to keep them in a list or dict that isn't within the scope of a loop.

In [None]:
onechain = hatsChains['QCD_HT1000to1500']
for chain in hatsChains:
    print "%s: %s" % (chain, len(hatsChains[chain].GetListOfFiles()))

Generate the plots -- this will take some time

In [None]:
hists = {}
import sys
for sample in crossSections.keys():
    print "Processing %s" % sample
    varNames=[]
    sys.stderr.write("Sample: %s\n" % sample)
    for var in hatsChains[sample].GetListOfBranches():
        varNames.append(var.GetName())
    for varName in varNames:
        sys.stderr.write("  varName: %s\n" % varName)
        histLabel = "%s_%s" % (varName, sample)
        hists[histLabel]=r.TH1F(histLabel, histLabel, 100, 0, 0)
        hatsChains[sample].Draw("%s>>%s" % (varName, histLabel))

pprint(hists)

Now that we've made histograms of all our variables in all our samples, we can put together stack plots of them all. We will leave that as an exercise to work on for the rest of the HATS. The histograms are organized in a dictionary that you should be able to navigate easily using their keys.

In [None]:
canvas = r.TCanvas()
hists["cosThetaStar_QCD_HT1000to1500"].Draw()
canvas.Draw()