Skip to content

LailinXu/hepstat-tutorial

Repository files navigation

Tutorials for HEP data analysis

References

Statistics and data analysis

ROOT and RooFit

Schools

  • INFN School Of Statistics, 2019
  • IN2P3 School Of Statistics, 2021

Preparation of the tutorials

You need ROOT (v6.22 or newer) and python installed for tutorials here. If you don't have them available, either from your local computer or a linux server, follow the instructions here to install the root docker container.

The Jupyter setup is optional and if you are interested, you can find some intruction here to set it up, or the docker setup in the above instruction.

Tutorials

Parameter fitting in ROOT/RooFit

Hands-on 1: Basic fitting

Fit example with ROOT, with following objectives:

  • Fit a histogram with a linear chi-squre fit, and compare results with by-hand calculations
  • Different fiting options
  • Compare chi-squre fit and likelihood fit

Fit examples with RooFit, composite p.d.f with signal and background component

pdf = f_bkg * bkg(x,a0,a1) + (1-fbkg) * (f_sig1 * sig1(x,m,s1 + (1-f_sig1) * sig2(x,m,s2)))

with following objectives:

  • Construct a simple fit in RooFit and plot the NLL
  • Compare binned and unbinned fit results
  • Compare un-extended and extended likelihoof it

A bit advanced fit examples with RooFit, composite p.d.f with signal and background component, extended

pdf = n_bkg * bkg(x,a0,a1) + n_sig * (f_sig1 * sig1(x,m,s1 + (1-f_sig1) * sig2(x,m,s2)))

or using a signal strength

pdf = n_bkg * bkg(x,a0,a1) + mu * n_sig * (f_sig1 * sig1(x,m,s1 + (1-f_sig1) * sig2(x,m,s2)))

with following objectives:

  • Compare plain likelihood fit and profile likelihood fit
  • Fit with nuisance parameters with constraints

Homework

Fit the Higgs peak in ATLAS H4l open "data": http://opendata.cern.ch/record/3823, MC: gg->H->ZZ->4l with mH = 125 GeV, for 2016 ATLAS open data release. The (Monte Carlo) data is a Ttree with lepton four-vector informaiton available. Reconstruct the invariant mass of the four-lepton final state. An example code to process the TTree can be found here.

Tips and requirements:

  • Construct a S+B model: S: signal, Gaussian, B: background, polynomial
  • Restrict to the mass range of 110 GeV to 160 GeV
  • Step 1: Fit the mass peak, compare binned and unbinned fit results, using 20 and 500 events, respectively (four fits in total)
  • Step 2: fix the mass peak and fit the signal and background yields

When the Higgs boson was discovered in 2012, there were about 20 events in the H4l channel within the mass range of 110 GeV to 160 GeV, see Fig.2 of the ATLAS Higgs discovery paper. With LHC Run-2 data, there are about 500 events in this mass range, see Fig. 5 of https://arxiv.org/abs/2004.03447.

Hypothesis test, Confidence intervals and Exclusion limits

Create a workspace using HistFactory.

RooWorkspace is a persistable container for RooFit projects. It can contain and own variables, p.d.f.s, functions and datasets. The entire RooWorkspace can be saved into a ROOT TFile and organises the consistent streaming of its contents without duplication.

HistFactory is a package that creates a RooFit probability density function from ROOT histograms of expected distributions and histograms that represent the +/- 1 sigma variations from systematic effects. The resulting probability density function, or a RooWorkspace, can then be used with any of the statistical tools provided within RooStats, such as the profile likelihood ratio, Feldman-Cousins, etc.

In this tutorial, the model is basically the same as hepstat_tutorial_roofit_extended.py: i.e, composite p.d.f with signal and background component

pdf = n_bkg * bkg(x,a0,a1) + mu * n_sig * (f_sig1 * sig1(x,m,s1 + (1-f_sig1) * sig2(x,m,s2)))

and our goals are the following:

  • Create a workspace using Workspace Factory
  • Example operations at the workspace level

In the above example (Histfactory example), a workspace is built using parametrized functions. In reality, non-parametrized PDFs are more often being used, for example, from Monte Carlo simulations. In this example, we build a workspace using histograms, and we also show you how to include systematic uncertainties in the likelihood model. Our objectives are:

  • Create a workspace using histograms
  • Include systematic uncertainties

RooStats example: compute the p0 and significance (Hypothesis Test) The signal is a simple Gaussian and the background is a smoothly falling spectrum. To estimate the significance, we need to perform an hypothesis test. We want to disprove the null model, i.e the background only model against the alternate model, the background plus the signal. In RooStats, we do this by defining two two ModelConfig objects, one for the null model (the background only model in this case) and one for the alternate model (the signal plus background).

Objectives of this tutorial are the following:

  • Compute the null hypo significance using the Asymptotic calculator
  • Compute the significance by hand using the asymptotic formula
  • Compute the significance using frequentist method

Hands-on 7: CLs upper limits

Use the StandardHypoTestInvDemo tutorial macro to perform an inverted hypothesis test for computing an interval (one-sided upper limits). This macro will perform a scan of the p-values for computing the upper limit. Both asymptotic and frequentist methos will be shown.

Objectives of this tutorial are the following:

  • Create the HypoTestInverter class and configure it
  • Compute the CLs upper limits using the asymptotic formula
  • Compute the CLs upper limits using the frequentist method (time consuming)

Homework (Optional)

Use the workspaces created from Build a workspace using histograms:

  • Plot the p0 scan as a function of the signal mass
  • Plot the CLs upper limits as a function of the signal mass

Machine learning: TMVA

Consider a typical physics problem at the LHC: searching a heavy resonance decaying to a pair of top-quarks, with both top-quarks decaying semileptonically, i.e., pp > Z' > tt, t>Wb, W->lv. The signal is expected to have a resonance by looking at the invariant mass of the top-quark pair. The dominant background is the SM tt production, which is expected to fall smoothly in the high mass tail. This should be an easy task if one can reconstruct the invariant mass of the top-quark pair. But how? The final states are llbb + missing energy due to the neutrinos. It is not possible to fully reconstruct the invariant mass of the top-quark pair (mtt). More over, without such a powerful discriminant, mtt, it might be challenging to separate the signal from the dominant background.

This tutorial aims to solve two problems:

  • Use machine learning based regression to reconstruct, or infer, the mass mtt, based on the kinematic information of the experimental observables.
  • Use machine learning based classification to separate the signal from the background.

The following experimental observables are considered:

  • 4-momenta (pT , η, ϕ, m) of the 2 leptons (4+4)
  • 4-momenta (pT , η, ϕ, m) of the jets, at lest 2 jets and up to 3 jets (4+4+4)
  • missing ET and the phi angle (2)

Hands-on: Regression with BDT

Hands-on: Classification with BDT

Machine learning: Artificial Neural Networks with PyTorch

Before you start this toturial, follow the instruction here to set up the PyTorch docker environment.

This ANN-based machine learning tutorial aims to solve the same problem as for the TMVA tutorial above. It would be interesting to compare the performance of BDT vs DNN.

Hands-on: Regression with ANN

Hands-on: Classification with ANN

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published