# Discovering the Higgs Boson Through Its Decay into Two Photons (H → γγ)

 In this notebook we are going to analyse ATLAS Open Data to find the Higgs boson in it's decay into two photons. Let's get started!

<CENTER><img src="../../images/ATLASOD.gif" style="width:40%"></CENTER>

## What is the Higgs boson?
The Higgs boson is a fundamental particle predicted by the Standard Model. It is a manifestation of the Higgs field, which gives mass to fundamental particles. However, it is incredibly difficult to produce. At the LHC, a Higgs particle is produced about once every 10 billion collisions, making it very challenging to detect.

Despite this tiny fraction, years of data collection led to the discovery of the Higgs boson in 2012 by the CMS and ATLAS experiments at CERN. In this tutorial, we will follow their example.

## Higgs boson production
The Higgs boson can be generated through various mechanisms. In particle physics, we use Feynman diagrams to illustrate these production modes. These diagrams help us visualize particle interactions and serve as essential tools for computations. For additional details on Feynman diagrams, see this [link](https://cds.cern.ch/record/2759490/files/Feynman%20Diagrams%20-%20ATLAS%20Cheat%20Sheet.pdf).

There are four primary production modes for the Higgs boson, each represented by its own Feynman diagram:

<div style="display: flex; flex-wrap: wrap; justify-content: center;">
  <div style="text-align: center; width: 40%; margin: 10px;">
    <img src="../uproot_python/images/ImagesHiggs/ggH.png" style="width: 100%;">
    <p>a) Gluon-gluon fusion</p>
  </div>

  <div style="text-align: center; width: 35%; margin: 10px;">
    <img src="../uproot_python/images/ImagesHiggs/VBFH.png" style="width: 100%;">
    <p>b) Vector boson fusion</p>
  </div>

  <div style="text-align: center; width: 40%; margin: 10px;">
    <img src="../uproot_python/images/ImagesHiggs/WH.png" style="width: 100%;">
    <p>c) Vector boson bremsstrahlung</p>
  </div>

  <div style="text-align: center; width: 35%; margin: 10px;">
    <img src="../uproot_python/images/ImagesHiggs/ttbarfusion.png" style="width: 100%;">
    <p>d) Top-antitop fusion</p>
  </div>
</div>

## The decay into two photons

The Higgs boson has an extremely brief lifetime, approximately $10^{-22} \,\text{s}$. It decays almost immediately after it is produced, making direct detection of the particle impossible. However, by using the Standard Model, we can predict the various decay products of the Higgs, such as photons, Z bosons, quarks, and others, each occurring with different probabilities. These **decay channels** help us identify the presence of the Higgs boson. In this notebook, we will focus on analyzing one specific decay channel: $H \rightarrow \gamma\gamma$

<CENTER><img src="../uproot_python/images/feynman_diagrams/Hyy_feynman.png" style="width:40%"></CENTER>

We refer to this as our desired **signal**. Ideally, we aim to identify collisions that produce two photons, which would indicate the presence of a Higgs boson. However, along with our signal, many photons detected do not originate from Higgs boson decay but rather from other processes, forming the **background**.

Backgrounds are classified into two categories: reducible and irreducible. **Reducible backgrounds** can be significantly minimized using experimental techniques such as data cuts, particle identification, and isolation criteria. For instance, in our case, a reducible background might involve events where a jet is misidentified as a photon. By applying stricter criteria to ensure that the detected particles are indeed photons (and not misidentified jets), this background can be reduced.

On the other hand, irreducible backgrounds cannot be easily distinguished from the signal because they involve the same final states or processes that the signal would produce. In the scenario of Higgs decay into two photons, an **irreducible background** would be the direct production of two photons from other Standard Model processes, such as quark-antiquark annihilation. These events are fundamentally indistinguishable from the signal events based on final state particles alone.

    
To address this, we can consider the total invariant mass of the photon products. By conservation of energy and momentum, the invariant mass of the products must equal the Higgs mass, whereas other background processes will have different invariant masses. The final step is to plot the invariant mass of each event and identify the peak around 125 GeV, which corresponds to the mass of the Higgs boson.

Check our [cheat sheet](https://cds.cern.ch/record/2800577/files/Signal%20and%20Background%20Physics%20Cheat%20Sheet.pdf) for more information on signals and backgrounds!

<div class="alert alert-info">
  <b>‼️ NOTE:</b>
  This analysis loosely follows the <a href="https://www.sciencedirect.com/science/article/pii/S037026931200857X" target="_blank">discovery of the Higgs boson by ATLAS</a> (Section 5).
</div>

## Running a Jupyter notebook
A Jupyter notebook consists of cell blocks, each containing lines of Python code. Each cell can be run independently of each other, yielding respective outputs below the cells. Conventionally,cells are run in order from top to bottom.

- To run the whole notebook, in the top menu click Cell $\to$ Run All.
- To propagate a change you've made to a piece of code, click Cell $\to$ Run All Below.
- You can also run a single code cell, by clicking Cell $\to$ Run Cells, or using the keyboard shortcut Shift+Enter.

For more information, refer to [How To Use Jupyter Notebooks](https://www.codecademy.com/article/how-to-use-jupyter-notebooks).

By the end of this notebook you will be able to:
1. Learn to process large data sets using cuts
2. Understand some general principles of a particle physics analysis
3. Discover the Higgs boson!

## The Analysis

### Setup
We're going to be using several key tools to perform the analysis:
* **ROOT**: This is a powerful data analysis framework widely used in High Energy Physics (HEP).
* **os**: This module will allow us to work with file paths.

In [1]:
import ROOT
import os

Welcome to JupyROOT 6.28/04


We enable multi-threading within ROOT to improve the performance of data processing. This will allow parallel execution of tasks where possible.

In [2]:
# Enable multi-threading
ROOT.ROOT.EnableImplicitMT()

### Loading data
We define functions to load our data samples. The `get_data_samples` function loads real data, while `get_ggH125_samples` loads simulated events of the Higgs boson decaying into two photons (denoted ggH $\rightarrow$ γγ). These samples are stored remotely and fetched via URLs.


In [3]:
path = "https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/"

def get_data_samples():
    samples = ROOT.std.vector("string")()
    for tag in ["A", "B", "C", "D"]:
        samples.push_back(os.path.join(path, "GamGam/Data/data_{}.GamGam.root".format(tag)))
    return samples

def get_ggH125_samples():
    samples = ROOT.std.vector("string")()
    samples.push_back(os.path.join(path, "GamGam/MC/mc_343981.ggH125_gamgam.GamGam.root"))
    return samples

### Creating DataFrames
We now create `RDataFrame` objects for the data and the Higgs signal samples. Each `RDataFrame` will handle the data analysis, and we'll store these in a dictionary for easier access later.

In [4]:
df = {}
df["data"] = ROOT.RDataFrame("mini", get_data_samples())
df["ggH"] = ROOT.RDataFrame("mini", get_ggH125_samples())
processes = list(df.keys())

### Applying Weights
We apply event weights to both the simulated and real datasets. For the Higgs samples (Monte Carlo simulations), we use scale factors to account for detector effects and event simulation details. For real data, we assign a weight of 1.

In [5]:
# Apply scale factors and MC weight for simulated events and a weight of 1 for the data
for p in ["ggH"]:
    df[p] = df[p].Define("weight", "scaleFactor_PHOTON * scaleFactor_PhotonTRIGGER * scaleFactor_PILEUP * mcWeight");
df["data"] = df["data"].Define("weight", "1.0")

### Preselection and Photon Selection
Next, we apply some basic event selection:
- **Photon trigger filter**: We only keep events that passed the photon trigger.
- **Good photon selection**: We select events with two "tight" photons that pass certain kinematic cuts, because there is a risk of misidentifying jets and other particles that can mimic photon signals as actual photons. We also exclude a region of the detector, because the transition between the the barrel and end-cap of the calorimeter can introduce uncertainties in the energy measurements of particles
- **Isolation criteria**: Finally, we apply isolation cuts to ensure that the photons are truly isolated from other particles, to make sure the photons detected are not originating from jets.

In [6]:
for p in processes:
    # Apply preselection cut on photon trigger
    df[p] = df[p].Filter("trigP")

    # Find two good muons with tight ID, pt > 25 GeV and not in the transition region between barrel and encap
    df[p] = df[p].Define("goodphotons", "photon_isTightID && (photon_pt > 25000) && (abs(photon_eta) < 2.37) && ((abs(photon_eta) < 1.37) || (abs(photon_eta) > 1.52))")\
                 .Filter("Sum(goodphotons) == 2")

    # Take only isolated photons
    df[p] = df[p].Filter("Sum(photon_ptcone30[goodphotons] / photon_pt[goodphotons] < 0.065) == 2")\
                 .Filter("Sum(photon_etcone20[goodphotons] / photon_pt[goodphotons] < 0.065) == 2")

### Calculating Invariant Mass
We define a custom function to calculate the invariant mass of the two selected photons using their kinematic properties. This is key to identifying the Higgs boson as its mass is around 125 GeV.

The invariant mass is calculated as follows:
$$m_{\gamma\gamma} = \sqrt{E^2_\text{tot}-\mathbf{p}_\text{tot}\cdot\mathbf{p}_\text{tot}}$$
in units where $c=1$.

In [7]:
ROOT.gInterpreter.Declare(
"""
#include <math.h> // for M_PI
using Vec_t = const ROOT::VecOps::RVec<float>;
float ComputeInvariantMass(Vec_t& pt, Vec_t& eta, Vec_t& phi, Vec_t& e) {
    float dphi = abs(phi[0] - phi[1]);
    dphi = dphi < M_PI ? dphi : 2 * M_PI - dphi;
    return sqrt(2 * pt[0] / 1000.0 * pt[1] / 1000.0 * (cosh(eta[0] - eta[1]) - cos(dphi)));
}
""");

### Histogram Creation and Mass Cuts
We now compute the invariant mass for each event and apply additional kinematic cuts to refine our selection. Finally, we define histograms to store the diphoton invariant mass for both the data and the simulated Higgs samples.

In [8]:
hists = {}
for p in processes:
    # Make four vectors and compute invariant mass
    df[p] = df[p].Define("m_yy", "ComputeInvariantMass(photon_pt[goodphotons], photon_eta[goodphotons], photon_phi[goodphotons], photon_E[goodphotons])")

    # Make additional kinematic cuts and select mass window
    df[p] = df[p].Filter("photon_pt[goodphotons][0] / 1000.0 / m_yy > 0.35")\
                 .Filter("photon_pt[goodphotons][1] / 1000.0 / m_yy > 0.25")\
                 .Filter("(m_yy > 105) && (m_yy < 160)")

    # Book histogram of the invariant mass with this selection
    hists[p] = df[p].Histo1D(
            ROOT.ROOT.RDF.TH1DModel(p, "Diphoton invariant mass; m_{#gamma#gamma} [GeV];Events / bin", 30, 105, 160),
            "m_yy", "weight")

### Running the Event Loop
We now run the event loop to process each event in both the Higgs signal and data samples. During this step, ROOT loops over all events, applies the selection criteria (such as photon quality and isolation), computes the invariant mass of the photon pairs, and fills the histograms with the resulting events. This is where the actual data processing happens.

In [9]:
# Run the event loop
ggh = hists["ggH"].GetValue()
data = hists["data"].GetValue()

### Plotting
We set some styles for our plot and create a canvas with two pads: one for the main plot and another for the ratio plot (data minus background).

In [10]:
# Set styles
ROOT.gStyle.SetOptStat(0)
ROOT.gStyle.SetOptTitle(0)
ROOT.gStyle.SetMarkerStyle(20)
ROOT.gStyle.SetMarkerSize(1.2)
size = 0.08
ROOT.gStyle.SetLabelSize(size, "x")
ROOT.gStyle.SetLabelSize(size, "y")
ROOT.gStyle.SetTitleSize(size, "x")
ROOT.gStyle.SetTitleSize(size, "y")

# Create canvas with pads for main plot and data/MC ratio
c = ROOT.TCanvas("c", "", 700, 750)

upper_pad = ROOT.TPad("upper_pad", "", 0, 0.29, 1, 1)
lower_pad = ROOT.TPad("lower_pad", "", 0, 0, 1, 0.29)
for p in [upper_pad, lower_pad]:
    p.SetLeftMargin(0.14)
    p.SetRightMargin(0.05)
upper_pad.SetBottomMargin(0)
lower_pad.SetTopMargin(0)

upper_pad.Draw()
lower_pad.Draw()

We fit a model to the data that consists of a signal (the Higgs peak) and a polynomial background. The background is modeled with a cubic function, and the signal with a Gaussian centered around 125 GeV

The Gaussian model is used to fit the signal due to the nature of the detector's resolution. The fourth-order polynomial is chosen for the background because it offers enough flexibility to capture the overall shape without overfitting, thereby reducing the influence of spurious data—random, irrelevant fluctuations or noise that do not correspond to the true signal or background.


In [11]:
data.SetStats(0)
data.SetTitle("")

# Fit signal + background model to data
upper_pad.cd()
fit = ROOT.TF1("fit", "([0]+[1]*x+[2]*x^2+[3]*x^3)+[4]*exp(-0.5*((x-[5])/[6])^2)", 105, 160)
fit.FixParameter(5, 125.0)
fit.FixParameter(4, 119.1)
fit.FixParameter(6, 2.39)
data.Fit("fit", "", "E SAME", 105, 160)
fit.SetLineColor(2)
fit.SetLineStyle(1)
fit.SetLineWidth(2)
fit.Draw("SAME")

# Draw background
bkg = ROOT.TF1("bkg", "([0]+[1]*x+[2]*x^2+[3]*x^3)", 105, 160)
for i in range(4):
    bkg.SetParameter(i, fit.GetParameter(i))
bkg.SetLineColor(4)
bkg.SetLineStyle(2)
bkg.SetLineWidth(2)
bkg.Draw("SAME")


 FCN=19.9699 FROM HESSE     STATUS=NOT POSDEF     23 CALLS         158 TOTAL
                     EDM=5.16612e-12    STRATEGY= 1      ERR MATRIX NOT POS-DEF
  EXT PARAMETER                APPROXIMATE        STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  p0           9.43252e+04   7.20520e+01   2.24889e-02   1.11944e-08
   2  p1          -1.77723e+03   7.78149e-01   4.23724e-04  -2.11331e-06
   3  p2           1.15606e+01   5.36061e-03   2.75626e-06  -2.03300e-04
   4  p3          -2.56282e-02   2.66824e-05   6.11023e-09   3.85682e-02
   5  p4           1.19100e+02     fixed    
   6  p5           1.25000e+02     fixed    
   7  p6           2.39000e+00     fixed    


We draw the data points with error bars and add the Higgs signal on top. We scale the simulated Higgs events to match the data's luminosity and cross-section, then plot the signal with a solid line.

In [12]:
# Draw data
data.SetMarkerStyle(20)
data.SetMarkerSize(1.2)
data.SetLineWidth(2)
data.SetLineColor(ROOT.kBlack)
data.Draw("E SAME")
data.SetMinimum(1e-3)
data.SetMaximum(8e3)

# Scale simulated events with luminosity * cross-section / sum of weights
# and merge to single Higgs signal
lumi = 10064.0
ggh.Scale(lumi * 0.102 / ggh.Integral())
higgs = ggh
higgs.Draw("HIST SAME")


In the lower pad, we plot the difference between the data and the fitted background to highlight the signal.


In [13]:
# Draw ratio
lower_pad.cd()

ratiofit = ROOT.TH1F("ratiofit", "ratiofit", 5500, 105, 160)
ratiofit.Eval(fit)
ratiofit.SetLineColor(2)
ratiofit.SetLineStyle(1)
ratiofit.SetLineWidth(2)
ratiofit.Add(bkg, -1)
ratiofit.Draw()
ratiofit.SetMinimum(-150)
ratiofit.SetMaximum(225)
ratiofit.GetYaxis().SetTitle("Data - bkg")
ratiofit.GetYaxis().CenterTitle()
ratiofit.GetYaxis().SetNdivisions(503, False)
ratiofit.SetTitle("")
ratiofit.GetXaxis().SetTitle("m_{#gamma#gamma} [GeV]")

ratio = data.Clone()
ratio.Add(bkg, -1)
ratio.Draw("E SAME")
for i in range(1, data.GetNbinsX()):
    ratio.SetBinError(i, data.GetBinError(i))


Finally, we add a legend to explain the components of the plot and a label for the ATLAS experiment.


In [14]:
# Add legend
upper_pad.cd()
legend = ROOT.TLegend(0.60, 0.55, 0.89, 0.85)
legend.SetFillStyle(0)
legend.SetBorderSize(0)
legend.SetTextSize(0.05)
legend.SetTextAlign(32)
legend.AddEntry(data, "Data" ,"lep")
legend.AddEntry(bkg, "Background", "l")
legend.AddEntry(fit, "Signal + Bkg.", "l")
legend.AddEntry(higgs, "Signal", "l")
legend.Draw("SAME")

# Add ATLAS label
text = ROOT.TLatex()
text.SetNDC()
text.SetTextFont(72)
text.SetTextSize(0.05)
text.DrawLatex(0.18, 0.84, "ATLAS")

text.SetTextFont(42)
text.DrawLatex(0.18 + 0.13, 0.84, "Open Data")

text.SetTextSize(0.04)
text.DrawLatex(0.18, 0.78, "#sqrt{s} = 13 TeV, 10 fb^{-1}");

### Final Plot Display
Now we display the final canvas with the fitted data, background, and signal.

In [15]:
%jsroot on
c.Draw()

The plot shows the invariant mass distribution of diphoton pairs. The data points with error bars represent the actual collision data collected by the ATLAS detector, while the solid and dashed lines represent different components of the model used to describe the data.

#### Main Plot (Upper Panel):

- **Black points with error bars**: These represent the observed data from the collision events. Each point corresponds to the number of events in a particular bin of diphoton invariant mass (`m_γγ`).
- **Red solid line (Signal + Background)**: This is the combined fit of the signal (Higgs boson) and background processes to the data. The smooth curve shows how well the model describes the data across the entire mass range.
- **Blue dashed line (Background)**: The blue line represents the background-only contribution. This is modeled by a cubic polynomial, and it describes non-Higgs events that still produce photon pairs.
- **Blue solid line (Signal)**: The blue solid line represents the contribution from the simulated Higgs boson signal. It is shown as a small peak around 125 GeV, corresponding to the expected mass of the Higgs boson.

#### Residual Plot (Lower Panel):
- **Data - Background (black points)**: This shows the difference between the observed data and the fitted background model. If the Higgs signal is present, we expect to see a bump around 125 GeV.
- **Red line (Signal fit)**: This shows the remaining signal after subtracting the background. The peak at around 125 GeV corresponds to the Higgs boson, providing evidence of its presence in the data.