# RooFit Basics

`RooFit` is a OO analysis environment built on `ROOT`. It has a collection of classes designed to augment root for data modeling.

This section covers a few of the basics of RooFit. There are many more tutorials available at this link: https://root.cern.ch/root/html600/tutorials/roofit/index.html

## Objects
In `RooFit`, any variable, data point, function, PDF (etc.) is represented by a c++ object, available in Python via automatic Python bindings.
The most basic of these is the `RooRealVar`. We will create one that will represent the mass of some hypothetical particle, we name it and give it an initial starting value and range.

In [None]:
import ROOT

In [None]:
MH = ROOT.RooRealVar("MH","mass of the Hypothetical Boson (H-boson) in GeV",125,120,130)
MH.Print()

Ok, great. This variable is now an object we can play around with. We can access this object and modify its properties, such as its value. 

In [None]:
MH.setVal(130)
MH.getVal()

In particle detectors we typically do not observe this particle mass, but usually define some observable which is sensitive to this mass. We will assume we can detect and reconstruct the decay products of the H-boson and measure the invariant mass of those particles. We need to make another variable that represents that invariant mass.

In [None]:
mass = ROOT.RooRealVar("m","m (GeV)",100,80,200)

In the perfect world we would perfectly measure the exact mass of the particle in every single event. However, our detectors are usually far from perfect so there will be some resolution effect. We will assume the resolution of our measurement of the invariant mass is 10 GeV and call it "sigma"

In [None]:
sigma = ROOT.RooRealVar("resolution","#sigma",10,0,20)

More exotic variables can be constructed out of these `RooRealVars` using `RooFormulaVars`. For example, suppose we wanted to make a function out of the variables that represented the relative resolution as a function of the hypothetical mass MH.

In [None]:
func = ROOT.RooFormulaVar("R","@0/@1", ROOT.RooArgList(sigma,mass))
func.Print("v")

Notice how there is a list of the variables we passed (the servers or "actual vars"). We can now plot the function. `RooFit` has a special plotting object `RooPlot` which keeps track of the objects (and their normalisations) that we want to draw. Since `RooFit` does not know the difference between objects that are and are not dependent, we need to tell it. 

Right now, we have the relative resolution as $R(m,\sigma)$, whereas we want to plot 
$R(m,\sigma(m))$!

In [None]:
can = ROOT.TCanvas()
plot = mass.frame()
func.plotOn(plot)
plot.Draw()
can.Update()
can.Draw()

The main objects we are interested in using from `RooFit` are *probability denisty functions* or (PDFs). We can construct the PDF,

$$
f(m|M_{H},\sigma)
$$

as a simple Gaussian shape for example or a `RooGaussian` in `RooFit` language (think McDonald's logic, everything is a `RooSomethingOrOther`)

In [None]:
gauss = ROOT.RooGaussian("gauss", "f(m|M_{H},#sigma)", mass, MH, sigma)
gauss.Print("V")

Notice how the gaussian PDF, like the `RooFormulaVar` depends on our `RooRealVar` objects, these are its servers.  Its evaluation will depend on their values. 

The main difference between PDFs and Functions in RooFit is that PDFs are *automatically normalised to unitiy*, hence they represent a probability density, you don't need to normalise yourself. Lets plot it for the different values of $m$.

In [None]:
can = ROOT.TCanvas()
plot = mass.frame()
gauss.plotOn(plot)

MH.setVal(120)
gauss.plotOn(plot, ROOT.RooFit.LineColor(ROOT.kBlue))

MH.setVal(125)
gauss.plotOn(plot, ROOT.RooFit.LineColor(ROOT.kRed));

MH.setVal(135)
gauss.plotOn(plot, ROOT.RooFit.LineColor(ROOT.kGreen))

plot.Draw()
can.Update()
can.Draw()

Note that as we change the value of `MH`, the PDF gets updated at the same time.

PDFs can be used to generate Monte Carlo data. One of the benefits of `RooFit` is that to do so only uses a single line of code! As before, we have to tell `RooFit` which variables to generate in (e.g which are the observables for an experiment). In this case, each of our events will be a single value of "mass" $m$. The arguments for the function are the set of observables, follwed by the number of events,

In [None]:
gen_data = gauss.generate(ROOT.RooArgSet(mass), 500)

Now we can plot the data as with other RooFit objects.

In [None]:
can = ROOT.TCanvas()
plot = mass.frame()

gen_data.plotOn(plot)
gauss.plotOn(plot)
gauss.paramOn(plot)

plot.Draw()
can.Update()
can.Draw()

Of course we are not in the business of generating MC events, but collecting *real data!*. Next we will look at using real data in `RooFit`.

## Datasets

A dataset is essentially just a collection of points in N-dimensional (N-observables) space. There are two basic implementations in `RooFit`, 

1) an "unbinned" dataset - `RooDataSet`

2) a "binned" dataset - `RooDataHist`

both of these use the same basic structure as below

![Alt Text](https://raw.githubusercontent.com/cms-analysis/HiggsAnalysis-CombinedLimit/main/docs/part5/images/datastructure.png)

We will create an empty dataset where the only observable is the mass. Points can be added to the dataset one by one...

In [None]:
mydata = ROOT.RooDataSet("dummy","My dummy dataset", ROOT.RooArgSet(mass)) # We've made a dataset with one observable (mass)

mass.setVal(123.4)
mydata.add(ROOT.RooArgSet(mass))
mass.setVal(145.2)
mydata.add(ROOT.RooArgSet(mass))
mass.setVal(170.8)
mydata.add(ROOT.RooArgSet(mass))

mydata.Print()

There are also other ways to manipulate datasets in this way as shown in the diagram below
![Alt Text](https://raw.githubusercontent.com/cms-analysis/HiggsAnalysis-CombinedLimit/main/docs/part5/images/datasets_manip.png)

Luckily there are also Constructors for a `RooDataSet` from a `TTree` and for a `RooDataHist` from a `TH1` so its simple to convert from your usual ROOT objects.

We will take an example dataset put together already. The file `tutorial.root` can be downloaded [here](https://github.com/amarini/Prefit2020/blob/master/Session%201/tutorial.root) and is also available in `./`.

In [None]:
file = ROOT.TFile.Open("tutorial.root")
file.ls()

Inside the file, there is something called a `RooWorkspace`. This is just the `RooFit` way of keeping a persistent link between the objects for a model. It is a very useful way to share data and PDFs/functions etc among CMS collaborators.

We will now take a look at it. It contains a `RooDataSet` and one variable. This time we called our variable (or observable) `CMS_hgg_mass`, we will assume that this is the invariant mass of photon pairs where we assume our H-boson decays to photons.

In [None]:
wspace = file.Get("workspace")
wspace.Print("v")

Now we will have a look at the data. The `RooWorkspace` has several accessor functions, we will use the `RooWorkspace::data` one. 
There are also `RooWorkspace::var`, `RooWorkspace::function` and `RooWorkspace::pdf` with (hopefully) obvious purposes.

In [None]:
hgg_data = wspace.data("dataset")
hgg_mass = wspace.var("CMS_hgg_mass")

plot = hgg_mass.frame()

hgg_data.plotOn(plot, ROOT.RooFit.Binning(160))

can = ROOT.TCanvas()
plot.Draw()
can.Update()
can.Draw()

## Likelihoods and Fitting to data 

The data we have in our file does not look like a Gaussian distribution. Instead, we could probably use something like an exponential to describe it. 

There is an exponential PDF already in `RooFit` (yes, you guessed it) `RooExponential`. For a PDF, we only need one parameter which is the exponential slope $\alpha$ so our pdf is,  

$$ f(m|\alpha) = \dfrac{1}{N} e^{-\alpha m}$$


Where of course, $N = \int_{110}^{150} e^{-\alpha m} dm$ is the normalisation constant.

You can find several available `RooFit` functions here: [https://root.cern.ch/root/html/ROOFIT_ROOFIT_Index.html](https://root.cern.ch/root/html/ROOFIT_ROOFIT_Index.html)

There is also support for a generic PDF in the form of a `RooGenericPdf`, check this link: [https://root.cern.ch/doc/v608/classRooGenericPdf.html](https://root.cern.ch/doc/v608/classRooGenericPdf.html)

Now we will create an exponential PDF for our background,

In [None]:
alpha = ROOT.RooRealVar("alpha", "#alpha", -0.05, -0.2, 0.01)
expo = ROOT.RooExponential("exp","exponential function", hgg_mass, alpha)

We can use `RooFit` to tell us to estimate the value of $\alpha$ using this dataset. You will learn more about parameter estimation, but for now we will just assume you know about maximizing likelihoods. This *maximum likelihood estimator* is common in HEP and is known to give unbiased estimates for things like distribution means etc. 

This also introduces the other main use of PDFs in `RooFit`. They can be used to construct *likelihoods* easily.

The likelihood $\mathcal{L}$ is defined for a particluar dataset (and model) as being proportional to the probability to observe the data assuming some pdf. For our data, the probability to observe an event with a value in an interval bounded by a and b is given by,

$$ P\left(m~\epsilon~[a,b] \right) = \int_{a}^{b} f(m|\alpha)dm  $$


As that interval shrinks we can say this probability just becomes equal to $f(m|\alpha)dm$.

The probability to observe the dataset we have is given by the product of such probabilities for each of our data points, so that 

$$\mathcal{L}(\alpha) \propto \prod_{i} f(m_{i}|\alpha)$$

Note that for a specific dataset, the $dm$ factors which should be there are constnant. They can therefore be absorbed into the constant of proportionality!

The maximum likelihood esitmator for $\alpha$, usually written as $\hat{\alpha}$, is found by maximising $\mathcal{L}(\alpha)$.

Note that this will not depend on the value of the constant of proportionality so we can ignore it. This is true in most scenarios because usually only the *ratio* of likelihoods is needed, in which the constant factors out. 

Obviously this multiplication of exponentials can lead to very large (or very small) numbers which can lead to numerical instabilities. To avoid this, we can take logs of the likelihood. Its also common to multiply this by -1 and minimize the resulting **N**egative **L**og **L**ikelihood : $\mathrm{-Log}\mathcal{L}(\alpha)$.

`RooFit` can construct the **NLL** for us.

In [None]:
nll = expo.createNLL(hgg_data)
nll.Print()

Notice that the NLL object knows which RooRealVar is the parameter because it doesn't find that one in the dataset. This is how `RooFit` distiguishes between *observables* and *parameters*.

`RooFit` has an interface to Minuit via the `RooMinimizer` class which takes the NLL as an argument. To minimize, we just call the `RooMinimizer::minimize()` function. **`Minuit2`** is the program and **`migrad`** is the minimization routine which uses gradient descent.

In [None]:
minim = ROOT.RooMinimizer(nll)
minim.minimize("Minuit2", "migrad")

`RooFit` has found the best fit value of alpha for this dataset. It also estimates an uncertainty on alpha using the Hessian matrix from the fit.

In [None]:
alpha.Print("v")

We will plot the resulting exponential on top of the data. Notice that the value of $\hat{\alpha}$ is used for the exponential.

In [None]:
plot = hgg_mass.frame()
hgg_data.plotOn(plot, ROOT.RooFit.Binning(160))
expo.plotOn(plot)
expo.paramOn(plot)

can = ROOT.TCanvas()
plot.Draw()
can.Update()
can.Draw()

It looks like there could be a small region near 125 GeV for which our fit does not quite go through the points. Maybe our hypothetical H-boson is not so hypothetical after all!

We will now see what happens if we include some resonant signal into the fit. We can take our Gaussian function again and use that as a signal model. A reasonable value for the resolution of a resonant signal with a mass around 125 GeV decaying to a pair of photons is around a GeV.

In [None]:
sigma.setVal(1.)
sigma.setConstant()

MH.setVal(125)
MH.setConstant()

hgg_signal = ROOT.RooGaussian("signal", "Gaussian PDF", hgg_mass, MH, sigma)

By setting these parameters constant, `RooFit` knows (either when creating the NLL by hand or when using `fitTo`) that there is not need to fit for these parameters. 

We need to add this to our exponential model and fit a "Sigmal+Background model" by creating a `RooAddPdf`. In `RooFit` there are two ways to add PDFs, recursively where the fraction of yields for the signal and background is a parameter or absolutely where each PDF has its own normalization. We're going to use the second one.

In [None]:
norm_s = ROOT.RooRealVar("norm_s","N_{s}",10,100)
norm_b = ROOT.RooRealVar("norm_b","N_{b}",0,1000)

components = ROOT.RooArgList(hgg_signal,expo)
coeffs = ROOT.RooArgList(norm_s,norm_b)

model = ROOT.RooAddPdf("model","f_{s+b}",components,coeffs)
model.Print("v")

Ok, now we will fit the model. Note this time we add the option `Extended()`, which tells `RooFit` that we care about the overall number of observed events in the data $n$ too. It will add an additional Poisson term in the likelihood to account for this so our likelihood this time looks like,

$$L_{s+b}(N_{s},N_{b},\alpha) = \dfrac{ (N_{s}+N_{b}^{n}) e^{N_{s}+N_{b}} }{n!} \cdot \prod_{i}^{n} \left[ c f_{s}(m_{i}|M_{H},\sigma)+ (1-c)f_{b}(m_{i}|\alpha)  \right] $$


where $c = \dfrac{ N_{s} }{ N_{s} + N_{b} }$,   $f_{s}(m|M_{H},\sigma)$ is the Gaussian signal pdf and $f_{b}(m|\alpha)$ is the exponential pdf. Remember that $M_{H}$ and $\sigma$ are fixed so that they are no longer parameters of the likelihood.

There is a simpler interface for maximum-likelihood fits which is the `RooAbsPdf::fitTo` method. With this simple method, `RooFit` will construct the negative log-likelihood function, from the pdf, and minimize all of the free parameters in one step.

In [None]:
plot = hgg_mass.frame()
hgg_data.plotOn(plot, ROOT.RooFit.Binning(160))
expo.plotOn(plot)
expo.paramOn(plot)

model.fitTo(hgg_data, ROOT.RooFit.Extended())

model.plotOn(plot, ROOT.RooFit.Components("exp"), ROOT.RooFit.LineColor(ROOT.kGreen));
model.plotOn(plot, ROOT.RooFit.LineColor(ROOT.kRed));
model.paramOn(plot);

can = ROOT.TCanvas()
plot.Draw()
can.Update()
can.Draw()

What if we also fit for the mass ($M_{H}$)? we can easily do this by removing the constant setting on MH.

In [None]:
MH.setConstant(False)
model.fitTo(hgg_data, ROOT.RooFit.Extended())

Notice the result for the fitted MH is not 125 and is included in the list of fitted parameters. 
We can get more information about the fit, via the `RooFitResult`, using the option `Save()`.

In [None]:
fit_res = model.fitTo(hgg_data, ROOT.RooFit.Extended(), ROOT.RooFit.Save())

For example, we can get the Correlation Matrix from the fit result... Note that the order of the parameters are the same as listed in the "Floating Parameter" list above

In [None]:
cormat = fit_res.correlationMatrix();
cormat.Print()

A nice feature of `RooFit` is that once we have a PDF, data and results like this, we can import this new model into our `RooWorkspace` and show off our new discovery to our LHC friends (if we weren't already too late!). We can also save the "state" of our parameters for later, by creating a snapshot of the current values. 

In [None]:
getattr(wspace, "import")(model)  
params = model.getParameters(hgg_data)
wspace.saveSnapshot("nominal_values", params)
wspace.Print("V")

This is exactly what needs to be done when you want to use shape based datacards in <span style="font-variant:small-caps;">Combine</span> with parametric models.

## A likelihood for a counting experiment
An introductory presentation about likelihoods and interval estimation is available [here](https://indico.cern.ch/event/976099/contributions/4138517/).

We have seen how to create variables and PDFs, and how to fit a PDF to data. But what if we have a counting experiment, or a histogram template shape? And what about systematic uncertainties?  We are going to build a likelihood 
for this:

$\mathcal{L} \propto p(\text{data}|\text{parameters})$

where our parameters are parameters of interest, $\mu$, and nuisance parameters, $\nu$. The nuisance parameters are constrained by external measurements, so we add constraint terms $\pi(\vec{\nu}_0|\vec{\nu})$

So we have
$\mathcal{L} \propto p(\text{data}|\mu,\vec{\nu})\cdot \pi(\vec{\nu}_0|\vec{\nu})$

now we will try to build the likelihood by hand for a 1-bin counting experiment.
The data is the number of observed events $N$, and the probability is just a Poisson probability $p(N|\lambda) = \frac{\lambda^N e^{-\lambda}}{N!}$, where $\lambda$ is the number of events expected in our signal+background model: $\lambda = \mu\cdot s(\vec{\nu}) + b(\vec{\nu})$. 

In the expression, s and b are the numbers of expected signal and background events, which both depend on the nuisance parameters. We will start by building a simple likelihood function with one signal process and one background process. We will assume there are no nuisance parameters for now. The number of observed events in data is 15, the expected number of signal events is 5 and the expected number of background events 8.1.

It is easiest to use the `RooFit` workspace factory to build our model ([this tutorial](https://root.cern/doc/master/rf511__wsfactory__basic_8py.html) has more information on the factory syntax).

In [None]:
w = ROOT.RooWorkspace("w")

We need to create an expression for the number of events in our model, $\mu s +b$:

In [None]:
w.factory('expr::n("mu*s +b", mu[1.0,0,4], s[5], b[8.1])')

Now we can build the likelihood, which is just our Poisson PDF:

In [None]:
w.factory('Poisson::poisN(N[15],n)')

To find the best fit value for our parameter of interest $\mu$ we need to maximize the likelihood. In practice it is actually easier to minimize the **N**egative **l**og of the **l**ikelihood, or NLL:

In [None]:
w.factory('expr::NLL("-log(@0)",poisN)')

We can now use the `RooMinimizer` to find the minimum of the NLL

In [None]:
nll = w.function("NLL")
minim = ROOT.RooMinimizer(nll)
minim.setErrorLevel(0.5)
minim.minimize("Minuit2","migrad")
bestfitnll = nll.getVal()

Notice that we need to set the error level to 0.5 to get the uncertainties (relying on Wilks' theorem!) - note that there is a more reliable way of extracting the confidence interval (explicitly rather than relying on migrad). We will discuss this a bit later in this section.

Now we will add a nuisance parameter, *lumi*, which represents the luminosity uncertainty. It has a 2.5% effect on both the signal and the background. The parameter will be log-normally distributed: when it's 0, the normalization of the signal and background are not modified; at $+1\sigma$ the signal and background normalizations will be multiplied by 1.025 and at $-1\sigma$ they will be divided by 1.025.  We should modify the expression for the number of events in our model:

In [None]:
w = ROOT.RooWorkspace("w")
w.factory('expr::n("mu*s*pow(1.025,lumi)+b*pow(1.025,lumi)", mu[1.0,0,4], s[5], b[8.1], lumi[0,-4,4])')
w.factory('Poisson::poisN(N[15],n)')

**Important**: RooFit does not allow to redefine functions with the same name in the workspace, that's why we defined a new workspace and the poisson function that depended on  n.

And we add a unit gaussian constraint 

In [None]:
w.factory('Gaussian::lumiconstr(lumi,0,1)')

Our full likelihood will now be

In [None]:
w.factory('PROD::likelihood(poisN,lumiconstr)')

and the NLL

In [None]:
w.factory('expr::NLL("-log(@0)",likelihood)')

Now we will extend our model a bit. 

- Expanding on what was demonstrated above, build the likelihood for $N=15$, a signal process *s* with expectation 5 events, a background *ztt* with expectation 3.7 events and a background *tt* with expectation 4.4 events. The luminosity uncertainty applies to all three processes. The signal process is further subject to a 5% log-normally distributed uncertainty *sigth*, *tt* is subject to a 6% log-normally distributed uncertainty *ttxs*, and *ztt* is subject to a 4% log-normally distributed uncertainty *zttxs*. Find the best-fit value and the associated uncertainty
- Also perform an explicit scan of the $\Delta$ NLL ( = log of profile likelihood ratio) and make a graph of the scan. Some example code can be found below to get you started. Hint: you'll need to perform fits for different values of mu, where mu is fixed. In `RooFit` you can set a variable to be constant as `var("VARNAME").setConstant(True)`
- From the curve that you have created by performing an explicit scan, we can extract the 68% CL interval. You can do so by eye or by writing some code to find the relevant intersections of the curve. 


In [None]:
import numpy as np
import matplotlib.pyplot as plt

w = ROOT.RooWorkspace("w")
w.factory('''
    expr::n(
    "mu*s*pow(1.025,lumi)*pow(1.05,sigth)+ztt*pow(1.025,lumi)*pow(1.05,zttxs)+tt*pow(1.025,lumi)*pow(1.06,ttxs)", 
    mu[1.0,0,4], 
    s[5], 
    ztt[3.7],
    tt[4.4],
    lumi[0,-4,4],
    sigth[0,-4,4],
    zttxs[0,-4,4],
    ttxs[0,-4,4]
    )
    ''')
w.factory('Poisson::poisN(N[15],n)')
w.factory('Gaussian::lumiconstr(lumi,0,1)')
w.factory('Gaussian::sigthconstr(sigth,0,1)')
w.factory('Gaussian::zttxsconstr(zttxs,0,1)')
w.factory('Gaussian::ttxsconstr(ttxs,0,1)')
w.factory('PROD::likelihood(poisN,lumiconstr,sigthconstr,zttxsconstr,ttxsconstr)')
w.factory('expr::NLL("-log(@0)",likelihood)')

nll = w.function("NLL")
minim = ROOT.RooMinimizer(nll)
minim.setErrorLevel(0.5)
minim.setPrintLevel(-1)
minim.setVerbose(False)
minim.minimize("Minuit2","migrad")
bestfitnll = nll.getVal()

def profile_lr(profiled_nll, best_nll):
    return 2*(profiled_nll - best_nll)

xs = np.linspace(0., 3.5, 30)
ys = []
for x in xs:
    mu = w.var("mu")
    mu.setVal(x)
    mu.setConstant()
    minim.minimize("Minuit2","migrad")
    profiled_nll = nll.getVal()
    ys.append(profile_lr(profiled_nll, bestfitnll))

fig, ax = plt.subplots()
ax.plot(xs, ys, label="NLL scan")
ax.axhline(0.98, color="k")
ax.axhline(3.84, color="k")
ax.set_ylim(0, 5)
ax.legend();