Version 0.0.2

For the graphical application and the events for the Ispy-WebGL the for_masterclass.zip must be unpacked in the folder 'data'.
The 'ig' files must be imported into the Ispy-WebGL after the mixing/splitting has been performed. The detailed information about the imported .ig files can be found under 'detailed_information'.

Within the scope of this task, pupils are to have the opportunity to assign event images to certain decays and to make statements about the significance of having determined a certain excess of events by summarising the results.

In [None]:
import sys
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

sys.path.append("..")

The event images used for this task have been taken from the [published data sets from 2011/2012](http://opendata.cern.ch/record/5500). In the 'data' folder there are exemplary events containing a decay into two Z-bosons and then four leptons.There are also decays into two photons or two W-bosons and then two leptons and two neutrinos. The non-$H\rightarrow ZZ \rightarrow 4\ell$ decays are intended to show the different decay possibilities. For the subsequent summary of the events, only the decays in four leptons are used. The presentation is done using the [Ispy-webgl-Interface](http://ispy-webgl.web.cern.ch/ispy-webgl/)<sup>[1](https://iopscience.iop.org/article/10.1088/1742-6596/396/2/022022)</sup>.


The task of the students is to find certain decays in the pre-selected events and to determine an invariant mass for the case of a decay into four leptons. 
The missing quantities such as the components of the individual four impulses can be determined by the students by applying the basics of vector calculus (see exemplary `get_energy_px_py_pz` below).
 With the help of these and other variables the students can determine the invariant mass using $$M_{\mathrm{inv}}=\sqrt{\left( \sum_i E_i \right)^2 - \left( \sum_i \vec{p}_i \right)^2}\, ,$$. The questions about the electric charges of the leptons decayed from the Z boson or the combination of leptons only from the same families can be taken from the additional information in the Ispy-webgl interface when looking at single selected particles.

For the calculation of invariant masses or other quantities important to them, the students are free to create their own or use allready created functions:

In [None]:
# May be implemented by pupils
import numpy as np

def get_energy_px_py_pz(pt:list, eta:list, phi:list):
    pt, eta, phi = np.array(pt), np.array(eta), np.array(phi)
    # eta = -ln(tan(theta/2))
    theta = 2 * np.arrayctan(np.exp(-eta))
    p = pt / np.cos(np.pi / 2. - theta)
    px, py, pz = pt * np.cos(phi), pt * np.sin(phi), np.sqrt(p ** 2 - pt ** 2)
    # m << E
    energy = p
    return energy, px, py, pz
    
def invariant_mass_four_lepton(px: list, py: list, pz:list, energy=None):
    px, py, pz= np.array(px), np.array(py), np.array(pz)
    energy_sum = np.sum(energy) if energy is not None else np.sum(np.sqrt(px ** 2 + py ** 2 + pz ** 2))
    return np.sqrt(energy_sum ** 2 - (np.sum(px) ** 2 + np.sum(py) ** 2 + np.sum(pz) ** 2))

#...

$\eta$ is the pseudorapidity, a spatial coordinate that specifies the angle between a vector and the beam axis and is converted back to the solid angle $\theta$ in the above function. The beam axis points in the z direction. The transverse impulse lies in the x-y-plane and is described by its length and the azimuthal angle $\phi$.

For the calculation of the energy ($E^2 = m^2 + p^2 \stackrel{p\gg m}{\approx} p^2$) the fact is used that the considered pulses (> 5 GeV) are significantly larger than the rest masses of the electrons (0.51 MeV) or muons (105.7 MeV).

In [None]:
# possible calculations

# my_pt = []
# my_eta = []
# my_phi = []

# my_energy, my_px, my_py, my_pz = get_energy_px_py_pz(pt=my_pt, eta=my_eta, phi=my_phi)
# my_mass = invariant_mass_four_lepton(px=my_px, py=my_py, pz=my_pz, energy=my_energy)
# print(my_mass)

In [None]:
%%html
<iframe src="https://ispy-webgl.web.cern.ch/ispy-webgl/" width="100%" height="700"></iframe>

Students can then use the graphical application to compile the results into a histogram. If ipyparallel is not installed it is recommended to start the widget from a separate console to allow parallel working.

In [None]:
from include.widget.HiggsWidget import WidgetHiggs as WH

In [None]:
def call_UI(my_function=None):
    WH(b_num=37, hist_range=(70, 181), info=[["2012"], ["B", "C"]],  # bins, range, records: [["year1", "year2"], ["run1", "run2"]]
       mc_dir_="../data/for_widgets/mc_aftH",  # Folder with the underground simulations and the 125 GeV signal simulation
       mc_other_dir="../data/for_widgets/other_mc/d---/mc_aftH",  # Folder with further signal simulations
       stat_eval_func=my_function # Statistical evaluation function, see below
       )
call_UI()

Under 'View - MC simulations on' the simulations of Higgs hypotheses for other Higgs masses can be viewed and under 'View - Signal MC scaling on' the simulations can be scaled accordingly.

To evaluate which of the signal simulations is most appropriate, we want to determine (and implement) a pupil defined quantity that quantifies a difference between two hypotheses. In 'for_pupils_statistcs_basic_examples_v_0-0-1_EN' the significance value $p_0$, determined from $\chi^2$, is presented for the evaluation of a hypothesis. Now two hypotheses are to be compared directly ($H_0$: only background (blue) or $H_{1, i}$: background and a signal hypothesis of mass $m_i$). The choice of this quantity can be freely chosen and can be adapted and changed by modifying the following function.

Introduced is the relationship of two [likelihood functions](https://en.wikipedia.org/wiki/Likelihood_function). A likelihood function can be interpreted as an total probability to find the measurement $\{X_1, X_2, ... X_n \}$ under a known probability of the individual events in any order. To use the example from `for_pupils_statistcs_basic_examples_v_0-0-1_EN`: How probable is it to measure $\{1,1,2,6,3\}$ with an ideal dice?

The answer (If the order is not important, an additional factor is added as the number of possible combinations, but this factor will be removed later when the ratio is formed): $$P_{\mathrm{tot}} = \prod_{X_i\in \{1,1,2,6,3\}} P(X_i) = P(1)\cdot P(1)\cdot P(2)\cdot P(6)\cdot P(3) = \frac{1}{6^5} \, .$$
The likelihood function can be defined as follows: $$\mathcal{L} = \prod_{X_i}^N P(X_i)\, .$$ and is thus similar to $P_{\mathrm{tot}}$. The probability of receiving individual events can also be expressed as a function if the events are no longer discrete but continuous. In this application only histograms are considered. For these $P(X_i)$ can be written as: $$ P(X_i) = \frac{A_i^{X_i} }{X_i!} \mathrm{e}^{-A_i} \, $$ which corresponds to the Poisson statistics. $X_i$ is the number of measured events in a bin of the histogram, $A_i$ is the expected number of events in the respective bin. In the ratio of the two likelihood functions one is $A_i =A_{i,U}$ for the background only and $A_{i, U+S}$ which represents the background and a signal. To calculate the ratio numerically well the logarithm of the ratio is determined.

In [None]:
# Exemplary code

def statistical_evaluation(measurement,  # Measurement
                           background_simulation,  # Simulation of the background
                           signal_simulation,  # Simulation of the signal
                           background_name="b",  # Name of the background (optional)
                           signal_name="s"  # Name of the signal (optional)
                           ):

    # Logarithm of the likelihood function for background only
    b_nll = sum(bac_s - m + m * np.log(m / bac_s) for (m, bac_s) in zip(measurement,
                                                                        background_simulation) if float(m) != 0.0)
    # Logarithm of the likelihood function for background and signal simulation
    bs_nll = sum((bac_s + sig_s) - m + m * np.log(m / (bac_s + sig_s)) for (m, bac_s, sig_s) in zip(measurement, 
                                                                                                    background_simulation, 
                                                                                                    signal_simulation) if float(m) != 0.0)
    
    # Ratio of the two ln(L) functions. 
    # The factor 2 has been introduced for comparison with other sizes for the purpose of simplicity
    nlr_ = 2 * (b_nll - bs_nll)
    # we just want to detect a surplus of events. This will be a positive ratio:
    q0_ = np.round(nlr_, 3) if nlr_ > 0 else 0
    
    # naming, optional
    bn_, sn_ = background_name, signal_name
    name_ = f"$ 2 \\ln \\left( \\dfrac{{ \\mathcal{{L}}_{{ {bn_} + {sn_} }} }}{{ \\mathcal{{L}}_{{ {bn_} }} }}  \\right)$"
    
    # The return value must be a tuple of the name (str) and the value (float/int)
    return name_, q0_

The larger the value from the ratio of the two likelihood functions, the better the combination of signal and background fits rather than the background alone. The choice of a limit value at which the hypothesis $H_0$ is rejected is to a certain extent arbitrary. For a high value the probability of rejecting the null hypothesis $H_0$ although it is correct ([type I error](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors#Definition)) is reduced but not excluded. The problem of confirming the null hypothesis although the alternative hypothesis is true ([type II error](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors#Definition)) occurs when the signal strength cannot be distinguished from the natural fluctuations that always occur in such measurements.

The signal distributions are created with the known theory. The deviation of the respective scaling from the factor 1 can also indicate that possible considerations in the theory are not completely covered. At the same time, the fluctuation of the measurement must be taken into account, which also results in a scaling of $\mu$ - only integer events can be measured. The non-integer predictions result from scaling the simulation to the integrated luminosity of the measurement, similar to 'for_pupils_statistcs_basic_examples_v_0-0-1_EN'.


Thus, the students should determine the most appropriate signal simulation taking into account the scaled signal distributions and the previously defined limit that the value must exceed the ratio.

In [None]:
call_UI(my_function=statistical_evaluation)