### 1.2 The WH 1-Lepton Channel

The process we will be searching for is shown in the Feynman Diagram below. A Higgs Boson is radiated off a $W^\pm$ boson which subsequently decays to a pair of _b_-quarks. The $W^\pm$ boson then goes on to decay into a lepton and a corresponding neutrino. 

<img src="../docs/images/one-lepton.png" width="350" />


The final state products of a 1-lepton channel $H\rightarrow b\bar{b}$ process are:
   * A Neutrino [characterised as missing transverse energy in the detector].
   * A charged Lepton (e u) [characterised by the transverse momentum and direction].
   * 2 _b_-jets [characterised by their transverse momentum, direction, distance between them and their reconstructed mass].


### 1.3 Separating Signal from Background


We identify what particles were created in collisions using kinematic and topological variables, which describe the properties of the objects reconstructed by the ATLAS detector. The signal and background processes all have a distinct signature based upon the underlying production and decay of the process, which can be identified from studying these variables. A list of the variables that can be used in this exercise is shown below: 



| Variable        | Description           | Label  |
| ------------- |:-------------:| -----:|
|$n_J$                   | Number of jet in the event (this is always 2 in this exercise) | nJ |
|$n_{\text{Tags}}$       | Number of b-tagged jets in the event (this is always two in this exercise) | nTags |
|$\Delta R(b_1b_2)$      | Angular distance between the two *b*-tagged jets | dRBB |
| $p_T^B1$                | Reconstructed transverse momentum of the b-tagged jet with the highest $p_{T}$                      | pTB1 |
| $p_T^B2$                | Reconstructed transverse momentum of the b-tagged jet with the 2nd highest $p_{T}$                      | pTB2 |
| $p_T^V$                | Reconstructed transverse momentum of the vector boson                      | pTV |
| $m_{BB}$               | Reconstructed mass of the Higgs boson from the two b-tagged jets                     | mBB |
| $m_{top}$              | Reconstructed top quark mass                     | Mtop |
| $m_{T}^{W}$              | Reconstructed transverse mass of the W boson                     | mTW |
| $E^{Miss}_{T}$         | Missing transverse energy                        | MET |
| $m^{W}_{T}$            | Reconstructed transverse mass of the W boson                        | mTW |
| $dY(W, H)$             | Separation between the W boson and the Higgs candidate                        | dYWH |
| $d\phi(W, H)$          | Angular seperationg in the transverse plane between the W boson and the Higgs candidate                        | dPhiVBB |
| $MV1^{B1}_\text{cont}$        | The classification output of whether the leading jet is a b or not (the higher the value the more likely it is a b-jet) | MV1cB_cont |
| $MV1^{B2}_{\text{cont}}$        | The classification output of whether the sub-leading jet is a b or not (the higher the value the more likely it is a b-jet) | MV1cB2_cont |
| $n^{\text{Jets}}_{\text{cont}}$        | Number of additional jets found in the event | nTrackJetsOR |


                    Table 1: Kinematic and topological paramaeters used to identify events. 



### 1.4 Sensitivity

The key metric that will be used to determine how well the model performs, is called the _signal sensitivity_, which determines given the number of signal events compared to the number of background events, the likelihood that you will see the process of interest. Functions that will calculate the _signal sensitivity_ are provided, but if you want to read more about how this is calculated please refer to the profile likelihood ratio test (1(b) on p23) in this <a href="https://www.pp.rhul.ac.uk/~cowan/stat/cowan_munich16.pdf">talk</a>.

### 1.5 Tasks

Baseline
- Produce an optimised cut-based analysis using the di-jet mass as a discriminant to use as baseline and for comparison (shouldn’t spend more than 1-2 days on this). A notebook that reads in the data, visualises the various distributions and calculates the signal sensitivity is provided as a starting point here:
	+ https://github.com/samvanstroud/in2HEP/blob/practicalMLproject/practicalMLproject/ATLAS_Cut_Based.ipynb
- Produce a simple optimised NN-based supervised classifier to seperate signal vs background. A notebook that reads in the data, draws the classifier output and calculates the _signal sensitivity_ is provided as a starting point here: 
	+ https://github.com/samvanstroud/in2HEP/blob/practicalMLproject/practicalMLproject/ATLAS_NN.ipynb
- Determine what the optimal settings are for the following parameters:
	+ Pre-processing of input variables
	+ Number of nodes in layers
	+ Number of hidden layers
	+ Training parameters
	+ Activation functions
	+ Optimisation algorithms
	+ Loss function
- Determine:
	+ Improvement over a cut-based approach
	+ If we have enough training statistics (vary number of input events used for training separately for signal and background from 0 -> 100%, does it plateau?)
	+ The importance of the input variables (remove one variable at a time and retrain, how much does the sensitivity degrade by?).

Possible extensions:
- Investigate which events are selected in the most sensitive region (the high NN output region), are these similar to those selected in the cut-based approach?
- Determine training uncertainty (how much does performance vary when re-training with an identical configuration a large number of times).
- What happens to the performance if we add some distortion to all the distributions (to replicate what would happen if our simulation is not an accurate depiction of the collision data)?
- What is the best way to determine the hyper-parameter optimisation (Baysien, stochastic), compare to a grid search
- Train a multi-classifier or MVA cascade for the different backgrounds (V+bb, tt, diboson), how does this compare 





**Based upon material originally produced by hackingEducation for use in outreach**  
<img src="../docs/images/logo-black.png" width="50" align = 'left'/>