# Introduction to pyROOT

**NOTE** - If you get an error about a "missing kernel". Close this notebook and run the `setupLibraries` notebook to create the required `cmsdas-preexercise` kernel.

This exercise is intended to provide you with basic familiarity with pyROOT provides bindings for all classes within the ROOT libraries and allows for replacing the usual C++ with the often less cumbersome python. The goal is to obtain a general understanding of the syntax required to import and make use of the ROOT libraries within a basic python script. Various examples are provided in order to demonstrate TH1 histogram manipulation including; reading from a .root file, creating, binning, re-binning, scaling, plotting and fitting to a Gaussian.

Whether you use python or C++ to complete your analysis is a personal preference however with the current lack of documentation on pyROOT many students stick with C++ in order to ensure their access to coding examples and experts. It is our hope that through providing you with this basic introduction and Github repository of example scripts, which you are encouraged to add to, that we can bring together the existing pyROOT community within CMS and foster its growth.

## About Jupyter
Jupyter is a novel way to both execute and teach Python libraries. This `notebook` contains many `cells`, each of with contain snippets of executable Python code. You ran a cell earlier to create the `cmsdas-preexercise` kernel. Executions of cells within the same notebook happen within the same Python environment, so notebooks can be used to gradually and incrementally perform data analysis. Execute your first Python cell by clicking the following cell and pressing `Shift+Enter`

In [None]:
x = 2

This cell creates a new variable `x`. The next cell confirms that Python remembers the value you previously gave it. Jupyter puts the output for each cell directly after it, so you can see what code produced which output.

In [None]:
print x

# Our first pyROOT example
Let's examine what a typical pyROOT script will look like. As you read through the documentation, make sure to execute each cell in order.

All pyROOT scripts must install the ROOT libraries into the Python process. ROOT isn't a standard part of Python, so we load it from CMSSW via the kernel we installed. To load ROOT, execute the following cell

In [None]:
import ROOT

When you run this cell, you probably notice that the cell temporarily changed to look like the following ![Screen%20Shot%202018-11-04%20at%209.19.43%20PM.png](attachment:Screen%20Shot%202018-11-04%20at%209.19.43%20PM.png) before finally changing to this. ![Screen%20Shot%202018-11-04%20at%209.21.48%20PM.png](attachment:Screen%20Shot%202018-11-04%20at%209.21.48%20PM.png)

While a computation is occuring in the background, the status symbol for that line will change to `[*]`. Once the computation is complete, the status will change to `[3]`, where the number in the brackets is the execution counter. Each time a cell is executed, the counter is incremented to be one larger. This is helpful to keep track of the order cells were executed.

Now that ROOT is loaded, we will want to open our input and output files, and load a histogram from the input.

In [None]:
theInfile = ROOT.TFile("samples/infile.root","READ")
theOutfile = ROOT.TFile("outfile.root","RECREATE")
theHist = theInfile.Get("AK8MHist16")

If you're familiar with ROOT, you'll notice that this syntax is very similar to the C++ ROOT syntax. In fact, the pyROOT commands have a one-to-one correspondance to the C++ versions (with few exceptions). 



One great advantage to Jupyter is that plots can be displayed directly in the browser, without any additional steps or complicated configuration. Any `TCanvas` that is written to will be showed immediately after the code that produced it. Execute the following cell to see an example Jet Mass plot.

In [None]:
ROOT.gStyle.SetOptFit(1111)
theCanvas = ROOT.TCanvas('theCanvas','')
theHist.Draw('hist')
theCanvas.Update()
theCanvas.Draw()

Finally, like the C++ version, we need to write and close any open objects and files. The following cell implements this.

In [None]:
theHist.Write()
theCanvas.Write()

theOutfile.cd()
theOutfile.Write()
theOutfile.Close()

Great! We just looked at a very simple pyROOT example. Let's see a more complicated example which builds on the first but also illustrates some commonly used histogram manipulations such as; creating, binning, re-binning, scaling, fitting to a Gaussian and saving to various file types.

# pyROOT Example Two
Like before, we'll want to load our input file and histogram. The `sumw2()` call tells ROOT to properly propagate the $w^2$ for each histogram, which causes the errors to be produced properly.

In [None]:
theInfile = ROOT.TFile("samples/infile.root","READ")

histName = "AK8MPt400To500Hist15"   
theHist = theInfile.Get(histName)
# Ensure proper error propogation
theHist.Sumw2()

The pyROOT library lets you call any function defined in ROOT. Compare these two calls with their documentation:
* `scale()` - [doc](https://root.cern.ch/doc/master/classTH1.html#add929909dcb3745f6a52e9ae0860bfbd)
* `rebin()` - [doc](https://root.cern.ch/doc/master/classTH1.html#aff6520fdae026334bf34fa1800946790)

In [None]:
# Scale the histogram vertically -- this is useful for things like
# scaling a Monte Carlo sample to match the theorertical predictions
theHist.Scale(100)

# Change the number of bins from the value the histogram was initialized with
theHist.Rebin(10)

Once you run the cell, you should notice the following output
![Screen%20Shot%202018-11-04%20at%2010.19.34%20PM.png](attachment:Screen%20Shot%202018-11-04%20at%2010.19.34%20PM.png)
This is a convenience method to preent you from having to explicitly print the values. Jupyter will print the value of the last line in the cell

Next we will want to fit a function to our data. In this case, we will use a regular Gaussian distribution, but other distributions (e.g. Poisson) are also available. In ROOT, we do this via a [TF1](https://root.cern.ch/doc/master/classTF1.html) object. We can call the C++ functions described in the docs via pyROOT.

In [None]:
# Fitting the histogram to a Gaussian
theFitFunc = ROOT.TF1("theFitFunc", "gaus", 110, 210)
theFitFunc.SetLineColor(1)
theFitFunc.SetLineWidth(2)
theFitFunc.SetLineStyle(2)
# Perform the fit with our histogram we extracted from simulation
theHist.Fit(theFitFunc,'S')

Once the fit is performed, ROOT provides the values of your chosen equation. Since this is a gaussian, the equation is in the form of

$ y = \frac{c}{\sqrt{2 \pi \sigma^2}} e^{ - \frac{(x - \mu)^2}{2 \sigma} } $

where $ \sigma $ is the sigma (or width), $ \mu $ is the mean (or peak) value, and c is a scaling constant. The parameters output above are the parameters that cause the gaussian to best fit the data.

Let's now plot our data distribution

In [None]:
ROOT.gStyle.SetOptFit(1111)
c = ROOT.TCanvas('c-' + histName,'')

theHist.Draw('hist')

To overlay the fit over the data, we do `Draw()` again, but with the `same` parameter. This tells ROOT to draw on top of the old plot, without clearing it first.

In [None]:
theFitFunc.Draw("same")

Before we plot, we want to make sure to change the axes to be sensible values

In [None]:
# Set the maximum value for the y axis
max_y = theHist.GetMaximum() 
theHist.SetMaximum(max_y * 1.2)
theHist.GetXaxis().SetRangeUser(80 , 250)

Finally, let's show the result

In [None]:
c.Update()
c.Draw()

Note that ROOT helpfully puts the fit parameters in the legend.

When using pyROOT in Jupyter, plots are displayed inline, but it's sometimes useful to output plots to files. The following two lines will output this plot in both PDF and PNG formats.

In [None]:
c.Print('plots/'+ histName + '_' + 'GaussianFit.png', 'png' )
c.Print('plots/'+ histName + '_' + 'GaussianFit.pdf', 'pdf' )

Finally, it's good practice to close any open files once the script is done. In particular, if any objects were made, it's good to explicitly close (and write) files. This can help prevent file corruption

In [None]:
theInfile.Close()

## Conclusion
This extremely brief introduction to pyROOT and Jupyter just scratches the surface of what's possible with either. Other DAS exercises will build on these lessons to more advancd use-cases.

For CMSDAS@LPC2022 please submit your answer at the [Google Form sixth set](https://forms.gle/i5pAm573Z5JWb2Mo9).

Question 19.1: What is the mean value of the Gaussian fit of the jet mass spectrum for jets of pt 300-400 GeV?