# Discover the Higgs with Deep Neural Networks
# Chapter 7: Application for Higgs Search

In this chapter we will use the neural network of the last chapters to search for the higgs boson.

In [None]:
# Necessary imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import seed
import os

# Import the tensorflow module to create a neural network
import tensorflow as tf
from tensorflow.data import Dataset

# Import function to split data into train and test data
from sklearn.model_selection import train_test_split

# Import some common functions created for this notebook
import common

# Random state
random_state = 21
_ = np.random.RandomState(random_state)

## Data Preparation

### Load the Data

In [None]:
# Define the input samples
sample_list_signal = ['ggH125_ZZ4lep', 'VBFH125_ZZ4lep', 'WH125_ZZ4lep', 'ZH125_ZZ4lep']
sample_list_background = ['llll', 'Zee', 'Zmumu', 'ttbar_lep']

In [None]:
sample_path = 'input'
# Read all the samples
no_selection_data_frames = {}
for sample in sample_list_signal + sample_list_background:
    no_selection_data_frames[sample] = pd.read_csv(os.path.join(sample_path, sample + '.csv'))

### Event Pre-Selection

Import the pre-selection functions saved during the first chapter. If the modules are not found solve and execute the notebook of the first chapter.

In [None]:
from functions.selection_lepton_charge import selection_lepton_charge
from functions.selection_lepton_type import selection_lepton_type

In [None]:
# Create a copy of the original data frame to investigate later
data_frames = no_selection_data_frames.copy()

# Apply the chosen selection criteria
for sample in sample_list_signal + sample_list_background:
    # Selection on lepton type
    type_selection = np.vectorize(selection_lepton_type)(
        data_frames[sample].lep1_pdgId,
        data_frames[sample].lep2_pdgId,
        data_frames[sample].lep3_pdgId,
        data_frames[sample].lep4_pdgId)
    data_frames[sample] = data_frames[sample][type_selection]

    # Selection on lepton charge
    charge_selection = np.vectorize(selection_lepton_charge)(
        data_frames[sample].lep1_charge,
        data_frames[sample].lep2_charge,
        data_frames[sample].lep3_charge,
        data_frames[sample].lep4_charge)
    data_frames[sample] = data_frames[sample][charge_selection]

### Get Training and Test Data

In [None]:
# Split data to keep 40% for testing
train_data_frames, test_data_frames = common.split_data_frames(data_frames, 0.6)

## Statistical Significance

In the search for new physics, the question arises whether one has really discovered something new or it was just a random fluctuation in the data. If, for example, 25 events are expected, but 30 events are measured, is this simply coincidence or are there unknown phenomena behind it? This decision can be made by using the significance.

First, a null hypothesis must be chosen. This hypothesis can then be either rejected or held by the measurement. However, a final confirmation of a hypothesis is not possible. A final confirmation of a hypothesis is however not possible, since one can never be finally sure that deviations from the hypothesis cannot exist. or our measurement we choose the following null hypothesis:

$H_0$: The Higgs boson does not exist and the measurement is fully described by teh backgrounds.

Now we assume for the moment that the null hypothesis $H_0$ is correct. Under this assumption, we calculate the probability for results that deviate at least as much from the null hypothesis $H_0$ as the actual measurement. Applied to the above example, this means that one would expect 25 events and calculates the probability of a deviation of more than 5 events. The probability distribution for such counting experiments is discribed by the Poisson distribution. Its expectation value $\mu$ is given by the prediction $N_{pred}$ and the standard deviation $\sigma$ is given by $\sqrt{N_{pred}}$. If the number of expected events is high enough the Poisson distribution more and more gaussian.

The visualization of this probability function for $N_{pred} = 25$ can be seen in the following plot. The probability (p-value) of a deviation of more than 5 events is 32%. This means that if our null hypothesis of 25 events is correct, the probability of measurements more extrem than 30 events is 32%. Thus it is quite likely that this deviation is only a fluctuation and the null hypothesis of 25 events can be held.

<div>
<img src='figures/significance_pred_25_meas_30.png' width='500'/>
</div>

Lets assume that we have measured 35 events. The probability of such a fluctuation would be about 4.6%. In many scientific studies like in medicine, null hypotheses are rejected with a p-value below 5%. 
<div>
<img src='figures/significance_pred_25_meas_35.png' width='500'/>
</div>

If one would measure now 40 events, this would correspond to a p-value of 0.3%. The null hypothesis could still be true, but the probability that this measurement was only a fluctuation is very low.

<div>
<img src='figures/significance_pred_25_meas_40.png' width='500'/>
</div>

Instead of the p-value, the deviations from the p-value are also often given in standard deviations. The resulting significance $Z$ is given by the number of standard deviations by which the measured value deviates from the prediction.<br>
Thus, the statistical significance $Z$ is given by:<br>
$Z_{stat} = \frac{|N_{pred} - N_{meas}|}{\sqrt{N_{pred}}}$

For our previous examples, the following significances result:
- $N_{pred} = 25$ and $N_{meas} = 30$ $\rightarrow$ $Z_{stat} = 1$
- $N_{pred} = 25$ and $N_{meas} = 35$ $\rightarrow$ $Z_{stat} = 2$
- $N_{pred} = 25$ and $N_{meas} = 40$ $\rightarrow$ $Z_{stat} = 3$

<b>This calculation of the significance is only an approximation. If the number of predicted events becomes too low the approximation failes. Therefore, one should not use this approximation for $N_{pred}$ of less than 10 events.</b>

<font color='blue'>
Task:

In the first chapter we have observed a prediction of 390.6 background events and 9.7 Higgs events. The prediction without the Higgs boson is our null hypothesis $H_0$. Which significance can we expect for a measurement with Higgs events? Would you reject the null hypothesis?
</font>

<font color='green'>
Answer:

For the significance calculation we have $N_{pred} = N_{bkg}$ and $N_{meas} = N_{bkg} + N_{Higgs}$:<br>
$Z_{stat} = \frac{|N_{bkg} - (N_{bkg} + N_{Higgs})|}{\sqrt{N_{bkg}}}$<br>
$Z_{stat} = \frac{N_{Higgs}}{\sqrt{N_{bkg}}}$<br>
$Z_{stat} = 0.49$
    
A measurement with Higgs boson would deviate only by half a standard deviation from the background-only prediction. Thus, it is quite likely that this deviation is only a fluctuation and we cannot claim a Higgs observation.
</font>

## Higgs Measurement with Neural Networks

Since the significance on the sum of all events is very low, we now apply our neuron networks to boost this sensitivity. To improve the significance $Z$ there are two options, increase the Higgs signal or decrese the backgrounds. Since our data is fix we won't get more Higgs events and have to decrease the background contribution. This can be realized by the classification resulting from our neuron networks. For each event the neural network returns a score between 0 and 1 and the closer the score is to 1 the higher is the probability that it is a Higgs event. Thus, we can apply a cut value similar to the preselection in chapter 1. We will only use the events with a classification score higher than the cut value for the significance calculation.

<div>
<img src='figures/significance_cut_value.png' width='500'/>
</div>

To avoid a bias resulting from the training we will only use the test data frame not used for any training so far.

In [None]:
# The training input variables
training_variables = ['lep1_pt', 'lep2_pt', 'lep3_pt', 'lep4_pt']

In [None]:
# Extract the values, weights, and classification of the test dataset
test_values, test_weights, test_classification = common.get_dnn_input(test_data_frames, training_variables, sample_list_signal, sample_list_background)

For the significance calculation we split the test data into signal and bakcground events.

In [None]:
# Split the data in signal and background
test_signal_values = test_values[test_classification > 0.5]
test_signal_weights = test_weights[test_classification > 0.5]
test_bkg_values = test_values[test_classification < 0.5]
test_bkg_weights = test_weights[test_classification < 0.5]

Now we can use our neural networks to improve the significance. Lets try this procedure for our very first model created in chapter 2.

In [None]:
# Load the models of chapter 2
model_chapter2 = tf.keras.models.load_model('models/chapter2_model')

As in the chapters before we apply our model but now seperately for signal and background events. In order to simplify the next steps we transform this prediction into one dimensional numpy arrays.

In [None]:
# Model prediction from chapter 2
test_signal_chapter2_prediction = model_chapter2.predict(test_signal_values)
test_bkg_chapter2_prediction = model_chapter2.predict(test_bkg_values)

# Transform predicton to array
test_signal_chapter2_prediction = np.array([element[0] for element in test_signal_chapter2_prediction])
test_bkg_chapter2_prediction = np.array([element[0] for element in test_bkg_chapter2_prediction])

<font color='blue'>
Task:

In the following cell you can find the significance calculation for a given cut value. Vary the cut value and describe which effects you can see.
</font>

<font color='green'>
Answer:

A cut value of 0 is passed by all events and thus we get the significance we have calculated for the full data set.<br>
A cut value of 0.5 rejects already 69% of the background events while keeping 81% of the Higgs events resulting in a significance of 0.7 sigma.<br>
However, the higher the cut value is chosen the less Higgs events can pass it. Thus the significance improvement reaches its limit at a certain point. At a cut value of 0.8 only 2.9 Higgs events are expected to enter the significance calculation.
</font>

In [None]:
cut_value = 0.0
# Number of signal and background events passing the prediction selection
n_signal = test_signal_weights[test_signal_chapter2_prediction > cut_value].sum()
n_bkg = test_bkg_weights[test_bkg_chapter2_prediction > cut_value].sum()

# Significance
significance = n_signal / np.sqrt(n_bkg)

print(f'The prediction selection is passed by {round(n_signal, 2)} signal and {round(n_bkg, 2)} background events.')
print(f'This results in a significance of {round(significance, 3)}')

So what would be the best cut value and its corresponding significance?

<font color='blue'>
Task:

Define a function which applies a model on given signal and background events and calculates the significance for different cut values. Do the calculation in a for loop and break if the number of background events is lower than 10. Apply this function for the model of chapter 2.
</font>

In [None]:
def get_significances(model, signal_values, bkg_values, signal_weights, bkg_weights):
    # Model prediction
    signal_prediction = model.predict(signal_values)
    bkg_prediction = model.predict(bkg_values)

    # Transform predicton to array
    signal_prediction = np.array([element[0] for element in signal_prediction])
    bkg_prediction = np.array([element[0] for element in bkg_prediction])
    
    # Calculate the significance for different cut values in a for loop
    cut_values = []
    significances = []
    for cut_value in np.linspace(0, 1, 1000):
        # Number of signal and background events passing the prediction selection
        n_signal = signal_weights[signal_prediction > cut_value].sum()
        n_bkg = bkg_weights[bkg_prediction > cut_value].sum()

        # Break if less than 10 background events
        if n_bkg < 10:
            break

        # Significance calculation
        significance = n_signal / np.sqrt(n_bkg)
        
        # Append the cut value and the significances to their lists
        cut_values.append(cut_value)
        significances.append(significance)
    return cut_values, significances

In [None]:
# Calculate the significances by the model of chapter 2
model_chapter2_cut_values, model_chapter2_significances = get_significances(model_chapter2, test_signal_values, test_bkg_values, test_signal_weights, test_bkg_weights)
print(model_chapter2_significances[:50])

Save this function for the next chapter

In [None]:
%%writefile functions/get_significances.py
import numpy as np

In [None]:
from inspect import getsource, getmodulename
%save -a functions/get_significances.py getsource(get_significances)

Now lets plot the significance for different cut values.

In [None]:
# Plot the significances
fig, ax = plt.subplots(figsize=(7, 6))
ax.plot(model_chapter2_cut_values, model_chapter2_significances)
ax.set_title('Significances for model of chapter 2')
ax.set_xlabel('cut at prediction value')
ax.set_ylabel('significance')
ax.set_xlim(0, 1)
_ = plt.show()

<font color='blue'>
Task:

What is the best significance one get by the model of chapter 2?
</font>

In [None]:
print(f'The best significance by the model of chapter 2 is {round(max(model_chapter2_cut_values), 3)}')

<font color='blue'>
Task:

Lets assume you have an extrem powerfull nerual network for the Higgs search. What would be the best possible significance you could get?
</font>

Hint: You still need at least 10 backgfround events to apply our significance calculation.

<font color='green'>
Answer:

The best possible significance with 10 background events would be:<br>
$Z_{stat;best} = \frac{9.7}{\sqrt{10}} = 3.07$
</font>

<font color='blue'>
Task:

Load the neural networks created in chapter 4 and chapter 5 and calculate their significances for different cut values. Compare the significances for all of the three models in one plot. Describe what you can see and compare their maximal significances.
</font>

In [None]:
# Load the models of chapter 4 and chapter 5
model_chapter4 = tf.keras.models.load_model('models/chapter4_model')
model_chapter5 = tf.keras.models.load_model('models/chapter5_model')

In [None]:
# Calculate the significances by the model of chapter 4 and chapter 5
model_chapter4_cut_values, model_chapter4_significances = get_significances(model_chapter4, test_signal_values, test_bkg_values, test_signal_weights, test_bkg_weights)
model_chapter5_cut_values, model_chapter5_significances = get_significances(model_chapter5, test_signal_values, test_bkg_values, test_signal_weights, test_bkg_weights)

In [None]:
# Plot the significances
fig, ax = plt.subplots(figsize=(7, 6))
ax.plot(model_chapter2_cut_values, model_chapter2_significances, label='chapter 2: first model')
ax.plot(model_chapter4_cut_values, model_chapter4_significances, label='chapter 4: early stopping')
ax.plot(model_chapter5_cut_values, model_chapter5_significances, label='chapter 5: event weights')
ax.set_xlabel('cut value')
ax.set_ylabel('significance')
ax.set_xlim(0, 1)
ax.legend()
_ = plt.show()

In [None]:
print(f'The best significance by the model of chapter 2 is {round(max(model_chapter2_cut_values), 3)}')
print(f'The best significance by the model of chapter 4 is {round(max(model_chapter4_cut_values), 3)}')
print(f'The best significance by the model of chapter 5 is {round(max(model_chapter5_cut_values), 3)}')

<font color='green'>
Answer:

All three significance distributions start at 0.49 and reach their maximum around a cut value of 0.75. The distributions for the models of chapter 2 and chapter 4 appear very similar at first sight. If one considers that in chapter 4 1/3 of the data was no longer used for training but for performance validation, a significance decrease could be expected. However, this is compensated by the utilization of the optimal training duration. The clearest difference can be seen for the use of event weights. The neural network of chapter 5 results in the highest significance for all cut values and also its optimal significance is significantly higher than for the other two models.
</font>