# New Physics Searches at the Large Hadron Collider - Introduction

In 2012, the Large Hadron Collider (LHC) found evidence for the existence of the **Higgs Boson**. This particle was the final plank of the **Stand Model** (SM) of particle physics, a predicitve theory that explains the behaviour of the fundamental particles that make up our universe. While SM is a very successful theory there are many questions it cannot answer, such as what **dark matter** is. For this reason, particle physics want to test their **models** to even higher energies to find where the SM stops being a good model of reality.

The current era of the LHC (called the **high-luminosity** LHC, more on that later!) will be coming to an end the late 2030s. There is a proposal to build an even larger collider called the FCC that will continue to look for new physics at even higher energies, from around 2040-2070 by colliding **electrons** together (FCC-ee).

## Your Task

It is 2045 and you are a **phenomenologist** - it is your job to take particle physics models and make predicitions for experiments like FCC-ee. You will then compare them against experimental data and work out if we have any evidence of new physics beyond the Standard Model (BSM).

This notebook will guide you through the background relevant to the research project on comparing theoretical SM and BSM predictions to simulated data for $e^+e^- \to e^+e^-$ scattering at the FCC-ee.

## Events at hadron collider experiments

The FCC-ee will collide electrons and **positrons**, at high energies and collect information on the debris produced in each collision.

The experiments will have detectors which record individual collison **events**. These detectors will work in very similar ways to the ones at the LHC such as [ATLAS](https://www.google.com/search?client=firefox-b-d&q=atlasch) and [CMS](https://home.cern/science/experiments/cms). 

Below we display a typical event produced at a particle collider. This one is actually a one we might have seen in 2012, a $pp$ collision event with a Higgs boson produced, which proceeds to decay to four electrons:

$$pp\rightarrow (H\rightarrow e^+ e^- e^+ e^-)\,+\mathrm{background}$$

<img src="h-event-display.jpg" alt="Higgs production event display from the LHC" width="700"/>

Analysing multiple events can allow us to understand the structure of different sectors of the **Standard Model**, and detect new physics beyond the Standard Model.
________

### Cross sections

The quantum nature of the interactions we observe means that we can not determine the precise scattering process which occurred in the hard interaction.

Experiments analyse the final states of an ensemble of events, and use this to determine the **cross section** ($\sigma$) which expresses the probability of a particular process:

$$\sigma=\frac{1}{L}\frac{\mathrm{d}N}{\mathrm{d}t}$$

Here $L$ is the **luminosity** or the rate of collisions of initial state particles per unit area, and $N$ is the number of interactions $\rightarrow$ $\mathrm{d}N/\mathrm{d}t$ is the interaction rate.

The cross section is measured typically measured in units of Barn (b), where 1b = $10^{-28}\mathrm{cm}^{2}$.


#### Bonus Exercise: Dimensional analysis
What are the dimensions of the luminosity $L$ if the cross section is measured in b?
___


From many events over a period of time, and with an understanding of the luminosity we can calculate the cross section for a process!

E.g. Looking over all Higgs production events over a certain time, we can calculate the Higgs production cross section.

**Trivia**: The name *cross section* relates to classical ideas of particle collisions - this was previously thought to be an effective area between colliding spheres and the name stuck! 

___
#### Exercise 1a: Computing cross sections

Write a function in python to calculate the cross section below, assuming a constant rate of interation

In [None]:
# Define a function for the cross section, assuming
# time is given in seconds, and the luminosity in pb/s.

# Comments in python start with '#'

def cross_section(luminosity, interactions, time):

    # Your code here

    # Hints:
    # Arithmetic in python:
    # a + b MEANS a plus b
    # a - b MEANS a minus b
    # a * b MEANS a multiplied by b
    # a / b MEANS a divided by b
    # a**b MEANS a raised to the power of b

    return # Replace this comment by the value you want to return

For a luminosity of $L=10 000$ pb/s, and $N=1.5768\times 10^{12}$ interactions over a time $t=1$ year, what is the total cross section?

In [None]:
# Use the cross section function you defined earlier
# with the given values to calculate the cross section.

# Be careful with your units!

# Hints:
# x = some_function() assigns to x the value of a function called 'some_function()'
# Standard form in python: 1E12 = 1 * 10**12
# print(variable_name) will print the value of 'variable_name' to the screen



___

## Discovering new particles

Analysing the total number of events (i.e. the total cross section) is useful, but the detectors can allow us to calculate more complex quantities!

At detectors we can observe quantities relating to the **kinematics** of observed particles. We can calculate the **masses** and different components of the **momenta** of observed particles or particle systems.

We can produce **distributions**, which display the dependence of the cross section on these variables:

$$\frac{\mathrm{d}\sigma}{\mathrm{d} m_{X}}, \quad \frac{\mathrm{d}\sigma}{\mathrm{d} p_{T}}, \quad \dots$$

Below we show an example from the study which discovered the Higgs boson at the LHC - the distribution of the cross section in terms of the **invariant mass** of the pair of photons $\gamma$ poduced in a Higgs decay $H\rightarrow \gamma \gamma$ [(ATLAS Collaboration, 2012)](https://www.sciencedirect.com/science/article/pii/S037026931200857X?via%3Dihub):

<img src="new-physics.png" alt="DIstribution that helped identify Higgs discovery" width="400"/>

The bump in the data relative to the background was a clear sign of a new particle at mass $m_H=m_{\gamma \gamma}\approx 125$ GeV$/c^2$

In particle physics, we often use the natural system of units, where $c=h=1$, and write quantities in
terms of eV (electron-volts) since the physics occurs at such small scales that normal SI units become unwieldy to work with!

Since particles at collider experiments travel at very high speeds nearing the speed of light, we require results from the **Special Theory of Relativity**, and can write:

$$E=mc^2 \rightarrow m = E/c^2$$

___
#### Exercise 1b: Unit conversions
Write a function that converts masses in eV/c^2 to kg.

In [None]:
# Define a function to convert a mass in
# eV/c^2 to kg

# Useful constants
PLANCK_CONSTANT = 6.626E-34 # Js
ELECTRON_VOLT = 1.602E-19 # J/eV
SPEED_OF_LIGHT = 3E+8 # m/s

def convert_ev_to_kg(mass_ev):

    # Your code here

    return # Replace this comment by the value you want to return


What is the mass of the Higgs boson in kg, given $m_H = 125$ GeV/$c^2$?

In [None]:
# Use the conversion function you defined earlier
# to get the Higgs mass in kg

# Be careful again with your units!


___

### Statistical significance and uncertainties (incl. combination of uncertainties)

Why can measuring a bump in the data signify new physics?

This is because we compare to **theoretical predictions** - a deviation in the data from the theoretical prediction for what we know indicates new physics!

Q: How can we distinguish noise in the data from new physics?

A: **PRECISION**!

The data exhibits a statistical uncertainty, which can be characterised by a **variance**:

$$\sigma^2[X] = \frac{1}{N}\sum_i (x_i - \mu)^2$$

with $\mu$ the **mean** of the dataset $X=\{x_1, \dots, x_i, \dots, x_N\}$.

You may have seen the standard deviation $\sigma[X]$ before, which is the square root of the variance.

The **uncertainty** of the theoretical predictions (see next section) must be taken into account as well.

Uncertainties have to be combined **in quadrature** (Hughes, Hase, 2010) i.e. if there are two sources of error on a result $\alpha_1$, and $\alpha_2$, the combined uncertainty is **not** in general $\alpha_1 + \alpha_2$.

Instead, the total uncertainty would be:

$$\text{Total uncertainty} = \sqrt{\alpha_1^2 + \alpha_2^2}$$

The typical threshold that is taken to indicate the presence of new physics beyond statistical doubt is five standard deviations - [five sigma](https://home.cern/resources/faqs/five-sigma).

This means that the difference between the observed value and the theoretical prediction (or an experimental background fit) must differ by more than five standard deviations to suggest new physics with confidence.

**Trivia**: At five sigma, there is only a 0.00003% chance that the observed difference/bump is a statistical fluctuation!

## Theoretical predictions

With a theoretical basis for particle physics, predictions can be produced to higher and higher precision, permitting searches for new physics in experimental data. But how are these predictions produced?

### Standard Model

The Standard Model is a well-tested theory accounting for the physics of three fundamental forces (i.e. the three **sectors**):

$$\mathrm{SM} = \text{Strong Force} \times \text{Weak Force} \times \text{Electromagnetism}$$

The figure below shows the particle content of the Standard Model, with masses, spins and electric charges (in units of the electron charge $e$):

<img src="sm-particles.png" alt="Particle content of SM" width="600"/>

- The quarks lie in the **strong** sector, with the gluon ($g$) being the boson that mediates the strong interaction.
- The leptons and neutrinos lie in the **weak** sector, with the $W,Z$ bosons acting as the weak force mediators. 
- All fermions (excluding neutrinos) are in the **electromagnetic** sector, with the photon (i.e. light) being the force mediator.

The **Higgs boson** is responsible for the masses of the quarks, leptons and the $W,Z$ bosons.

The particles present in the SM are coupled and interact in many complex ways; we can not isolate/choose a sector in which hadron-hadron interactions occur.

Thus we have to **combine** contributions to a process from all different **channels** and **mechanisms**.

**Trivia**: Quarks and gluons are referred to as **partons**, dating back to the parton model of the internal structure of protons and neutrons!

#### Bonus Exercise: Composition of nucleons
Protons and neutrons are composed of quarks and gluons. Considering only $u$ and $d$ quarks, and knowing that the proton has electric charge $+e$ and the neutron has electric charge $0$, what combination of quarks can compose the correct electrical charge for each?
___

### Feynman diagrams

Feynman diagrams are a useful framework for calculating SM cross sections on the theoretical side.

We can separate the interaction between quarks and gluons (the **hard** or **partonic** interaction) from the interaction between protons (the **hadronic** interaction) to create a cross section:

$$\sigma = \int \text{Hadronic factor} \times \text{Partonic factor}$$

Feynman diagrams relate to the partonic part of the cross section - some examples for Higgs production and decay:

<img src="higgs-production.png" alt="Feynman diagrams for Higgs production" width="400"/>

The production in the top-left diagram is mediated by the **strong** force, while the others are mediated by the **weak** force ($V$ is a stand-in for either $W,Z$ production).

We only ever see the final states of the interaction in our detector - we have no way of knowing for certain the initial partonic interaction!

Predictions from all channels have to be combined to obtain the correct total cross section!


## Monte Carlo integration

The cross section on the theoretical side is represented by an integral as shown above.

Integrals of simple functions can be calculated **analytically**, e.g.:

$$\int_a^b x \, \mathrm{d}x = \left [\frac{x^2}{2}\right ]_a^b = \frac{b^2-a^2}{2},$$

The integrals we evaluate are much more complex, and are impossible to evaluate analytically $\Rightarrow$ evaluate **numerically**.

One such method is the **Monte Carlo** method, which is suited to our complicated integrals.

This method is simple - we sample points **randomly** in the region required and approximate the integral:

$$\int_\mathrm{region} f(x) \mathrm{d}x \approx V \times \frac{1}{N} \sum_{i=1}^N f(x_i),$$

where $N$ points are sampled, and $V$ is the size of the integration region:

$$V = \int_\mathrm{region} \mathrm{d}x.$$

___
#### Exercise 1c: Monte Carlo integration 
Produce a Monte Carlo approximation for the integral of $f(x)=x$ in the region $0 \leq x \leq 1$, which takes the number of points $N$ as an argument.

In [None]:
# We will need some external libraries
# The numpy - numerical python - library
# contains many useful functions
import numpy as np
from numpy import random

# Define a function to integrate x between
# 0 and 1 with Monte Carlo for sampling 'n_points'

def integrate_x_0_1(n_points):
    
    # Your code here

    # Hints:
    # The numpy function 'random.uniform(a,b)'
    # will generate a uniformly sampled random
    # number between a and b
    
    # Google 'numpy documentation' for help
    # with numpy-specific problems!
    
    # You may need a 'for' loop or a 'while' loop
    # these allow you to execute code for multiple
    # values.
    
    return # Replace comment with return value

Evaluate your function for $N=10$, $N=100$, and $N=1000$, what do you notice?

In [None]:
# Use the function you defined earlier to
# integrate the function for the specified N
# values

# What should the answer be?

#### Bonus Exercise: Extending the function

Can you modify your solution to 1c, allowing the end-points as arguments (such that you can integrate for any region)?
___

### Approximating $\pi$ with Monte Carlo

With the Monte Carlo (MC) method, a fun exercise is to approximate $\pi$!

If we construct a circle with radius $r$, and house it in a square of side-length $2r$, then the ratio of the area of the circle to the square is:

$$\frac{\text{area of circle}}{\text{area of square}} = \frac{\pi r^2}{4r^2} = \frac{\pi}{4}.$$

**Trivia**: This result is independent of the radius!

Then, if we sample $N$ points in the square at random, we can expect the following:

$$\frac{\text{area of circle}}{\text{area of square}} = \frac{\pi}{4} \approx \frac{\text{number of points in the circle}}{\text{total number of points sampled ($N$)}}.$$

Below is an example plot ([link](https://www.geeksforgeeks.org/estimating-value-pi-using-monte-carlo/)) for such a procedure:

<img src="MonteCarlo.png" alt="Monte Carlo approximating pi" width="300"/>

This means we can write:

$$\pi \approx 4\times \frac{\text{number of points in the circle}}{\text{total number of points sampled ($N$)}}$$

___
#### Exercise 1d: Approximating $\pi$

Produce a Monte Carlo approximation for $\pi$ as outlined above, taking the total number of points $N$ as argument

In [None]:
# We will need the numpy libraries again
import numpy as np
from numpy import random

# Define a function to approximate pi
# with Monte Carlo for sampling 'n_points'

def approximate_pi(n_points):
    
    # Your code here

    # Hints:
    # You can use a 'for' or 'while' loop as
    # for the previous exercise.
    
    # You may need an 'if' statement, this will
    # evaluate code satisfying a true/false
    # condition
    
    return # Replace comment with return value

Evaluate your function for $N=10$, $N=100$, and $N=1000$, what do you notice?

In [None]:
# Use the function you defined earlier to
# integrate the function for the specified N
# values

# What do you notice about the convergence?

___

## Plotting results

As important as being able to produce sophisticated calculations is presenting your findings in a **comprehensible** way: **plotting** theoretical predictions and data in a clear way helps greatly in understanding!

Example below is a distribution showing two Monte Carlo theoretical predictions and experimental data from [(Andersen et al., 2022)](https://link.springer.com/article/10.1007/JHEP03(2023)001), for Higgs production as analysed at the LHC [(ATLAS Collaboration, 2014)](https://link.springer.com/article/10.1007/JHEP09(2014)112):

<img src="example-distribution.png" alt="Example distribution for Higgs production" width="500"/>

While not a strict checklist, consider whether a good plot satisfies the following:
- Clear axis labels, with **units** shown.
- Sensible and accessible choices of colour scheme.
- Clearly highlighted **uncertainties**: in the figure above, theoretical uncertainties (on the theoretical prediction) are shown as shaded bands, statistical uncertainties (on the data) are shown as error bars.
- A robust legend that clearly describes the analysis.
- The **bin edges** are clearly defined and visible with no clusttering.
- Suitable $x$ and $y$ ranges to see the full behaviour of the distributions.
- Is not cluttered with too many lines/features (if this is the case, consider if you can make one plot into two).

Python has several useful libraries for plotting, we will use `matplotlib`

___
#### Exercise 1e: Monte Carlo plots

Produce a similar plot to the circle with sampled points from [here](https://www.geeksforgeeks.org/estimating-value-pi-using-monte-carlo/)

In [None]:
import numpy as np
from numpy import random
# We need the plotting libraries
import matplotlib.pyplot as plt

# Define a function to approximate pi
# with Monte Carlo for sampling 'n_points'
# and plotting the results

def approximate_pi(n_points):
    
    # We have added some 'boiler-plate' code
    # to set up the plot here, add more code
    # around your pi approximation code from
    # earlier to make a plot
    
    # Hints:
    # Look for the matplotlib documentation
    # online to help
    
    # The best function to use will be the
    # plt.scatter function - the docs can
    # help with this

    # Set aspect ratios to equal
    plt.gca().set_aspect("equal")
    
    # Set x,y labels
    plt.xlabel("Your x axis label")
    plt.ylabel("Your y axis label")
    
    # Set x,y ranges
    plt.xlim([-1, 1])
    plt.ylim([-1, 1])
    
    # Plot square
    plt.plot([1,-1,-1,1,1],[1,1,-1,-1,1], color="white")
    
    # Plot circle
    circle = plt.Circle((0, 0), 1, edgecolor="black", facecolor="white")
    plt.gca().add_patch(circle)

    # Show plot
    plt.show()    
    
    return # Replace comment with return value


N_POINTS = 10
print(approximate_pi(N_POINTS))

#### Bonus Exercise: Accuracy of the approximation

Modify the above code to produce a chart showing how accurate the approximation to $\pi$ is for different values of $N$, similar to the one from [here](https://www.kaggle.com/code/soachishti/monte-carlo-tutorial-calculating-pi):

<img src="pi-approx.png" alt="Accuracy of pi approximation example" width="400"/>
