# An introductional notebook to HEP analysis in Python
In this notebook, you'll explore computing techniques commonly used in High Energy Physics (HEP) analysis. We'll guide you through creating, filling, and plotting a histogram to visualize physics data, such as the number of leptons, all in under 20 lines of code!

<CENTER>
    <a href="http://opendata.atlas.cern" class="icons"><img src="../../images/ATLASOD.gif" style="width:40%"></a>
</CENTER>

This tutorial also serves as an introduction to ROOT, a scientific data analysis framework. ROOT offers a comprehensive set of tools for big data processing, statistical analysis, visualization, and storage—making it useful for modern HEP research.

"The following analysis is looking at events where [Z bosons](https://en.wikipedia.org/wiki/W_and_Z_bosons) decay to two leptons of same flavour and opposite charge, (e.g., Z → e$^+$e$^-$ or Z → μ$^+$μ$^-$), as shown in the [Feynman diagram](https://en.wikipedia.org/wiki/Feynman_diagram).",

<CENTER><img src="../../images/Z_ElectronPositron.png" style="width:30%"></CENTER>

## What is the Z Boson?
The Z boson is one of the mediators of the weak force, which is responsible for processes such as [beta decay](https://en.wikipedia.org/wiki/Beta_decay) in atomic nuclei. It interacts with all known fermions (quarks and leptons), but unlike the W boson, it does not change the type (flavor) of particle it interacts with. The Z boson couples to both [left-handed and right-handed](https://en.wikipedia.org/wiki/Chirality_(physics)) particles, making its behavior distinct from the charged W boson.

Since the Z boson is electrically neutral, its decay products must have balanced charges. The decays of the Z boson into leptons (electrons, muons, and taus) are particularly useful for experimental studies because these particles can be precisely measured in detectors, giving a clear signature of the Z boson's presence.

## The Decay of the Z Boson
The Z boson decays rapidly due to its high mass, with a mean lifetime of around 3 × 10$^{-25}$ seconds. Its decay channels include hadrons (quarks) and leptons, but in this analysis, we are particularly interested in the lepton channels because they produce clean final states that are easier to measure.

## Running a Jupyter notebook
A Jupyter notebook consists of cell blocks, each containing lines of Python code. Each cell can be run independently of each other, yielding respective outputs below the cells. Conventionally,cells are run in order from top to bottom.
- To run the whole notebook, in the top menu click Cell $\\to$ Run All.
- To propagate a change you've made to a piece of code, click Cell $\\to$ Run All Below.
- You can also run a single code cell, by clicking Cell $\\to$ Run Cells, or using the keyboard shortcut Shift+Enter.
- For more information, refer to [How To Use Jupyter Notebooks](https://www.codecademy.com/article/how-to-use-jupyter-notebooks).

By the end of this notebook you will be able to:
1. Learn to process large data sets using cuts
2. Understand some general principles of a particle physics analysis
3. Discover the Z boson!

## Initializing the notebook
To begin, we need to **include the `ROOT` library** that will support our analysis:

In [1]:
import ROOT

Welcome to JupyROOT 6.28/04


To enable you **interactive visualization** of the histogram we'll create later, we can use the **JSROOT** magic command. This command activates JSROOT, a JavaScript-based ROOT viewer, allowing you to interact with the plots directly within the notebook. This makes it easier to explore the data by zooming in, rotating, or hovering over specific parts of the plot.

In [2]:
%jsroot on

## Making a histogram\n",
We begin by fetching the URL of the data file we wish to analyze. For this we are using the [CERN Open Data client](https://cernopendata-client.readthedocs.io/en/latest/).

In [3]:
pip install cernopendata-client fsspec-xrootd

Note: you may need to restart the kernel to use updated packages.


As explained in the ["How to Use the Data" section](https://opendata.atlas.cern/docs/data/for_education/8TeV_whattodo), we will query the client for the URL using the DOI of the record we want to use: https://opendata.cern.ch/record/3802

In [4]:
!cernopendata-client get-file-locations --doi 10.7483/OPENDATA.ATLAS.7X9L.ZZ8H --protocol xrootd

http://opendata.cern.ch//eos/opendata/atlas/OutreachDatasets/2016-07-29/MC/mc_105987.WZ.root[0m


The data is stored in a ***.root*** file, which consists of a tree structure containing branches and leaves. In this example, we are reading the data directly from a remote source, using the URL retrieve in the previous cell.

In [5]:
f = ROOT.TFile.Open("root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2016-07-29/MC/mc_105987.WZ.root") ## 8 TeV sample
##f = ROOT.TFile.Open("/home/student/datasets/MC/mc_105987.WZ.root") ## local file exampl

After the data is opened we create a canvas on which we can draw a histogram. Its name is _Canvas_ and its header is _A way to plot a variable_. The two following arguments define the width and the height of the canvas.

In [6]:
canvas = ROOT.TCanvas("Canvas","A way to plot a variable",800,600)

Now we define a histogram that will later be placed on this canvas. Its name is _variable_ and the header of the histogram is _Number of leptons_. The three following arguments indicate that this histogram contains 4 bins which have a range from 0 to 4.

In [9]:
hist = ROOT.TH1F("variable","Number of leptons; Number of leptons ; Events ",5,-0.5,4.5)

Next, we define a tree (we'll name it ***tree***) to extract the data from the ***.root*** file, from the tree called `mini`, that holds the data."

In [7]:
tree = f.Get("mini")

To analyze the dataset, we need to extract specific variables. In this case, we will plot the number of leptons. Here, we fill the histogram *h* that we already defined with the data on the `lep_n` branch.

In [10]:
for event in tree:
    hist.Fill(tree.lep_n)
    
print("Done!")

Done!


<div class=alert alert-info>
    <b>‼️ NOTE:</b>
    To know more about the contents of the ATLAS Open Data datasets, please visit our <a href="https://opendata.atlas.cern/docs/data/for_education/8TeV_details" target="_blank">documentation.</a>
</div>

After filling the histogram we want to see the results of the analysis. First we draw the histogram on the canvas and then the canvas on which the histogram lies.

In [11]:
hist.SetFillColor(2)
hist.Draw()

In [12]:
canvas.Draw()

The next cell will rescale the histogram so that the y axis is between 0 and 1. This will allow to see proportions in the histogram itself.
**This is called normalisation**

In [13]:
scale = hist.Integral()
hist.Scale(1/scale)
hist.SetFillColor(2)

In [14]:
hist.Draw()
canvas.Draw("hist")

### Interpreting the histogram
In the plot above, we visualize the distribution of the number of leptons per event. This histogram provides insight into the frequency of events containing different numbers of leptons.
- **X-axis**: Represents the number of leptons detected in each event. The values range from 0 to 4, where each bin corresponds to an integer number of leptons.
- **Y-axis**: Shows the number of events (scaled to thousands) that contain the corresponding number of leptons.

From the data:
- The majority of events contain either 1 lepton. This number of leptons is the most common in the analyzed dataset.
- Events with 0, 2, or 3 leptons are significantly less frequent, as indicated by the lower heights of their corresponding bins.

In the statistics box on the top right:

- We see that the total number of events analyzed is 50,000.
- The mean number of leptons per event is around 1.322, indicating that most events contain slightly more than 1 leptons on average.
- The standard deviation of around 0.56 shows a relatively low spread, meaning that most events are clustered around the mean number of leptons, with little variation.

This histogram gives us a snapshot of the lepton content in the events, which can be further analyzed to study processes like lepton production in proton-proton collisions at high energies. The distribution is an important aspect of understanding the data and may inform further cuts or selection criteria for a complex physics analysis.