# Lab 5

- Do all features provide discrimination power between signal and background?
- Are there correlations among these features?
- Compute expected discovery sensitivity by normalizing each sample appropriately.
- Develop a plan to optimize the discovery sensitivity by applying selections to these features.

Pickle files: training sample and normalization

HDF5 files: pseduo-experiments

In [1]:
import pickle

# open the file of interest, and use pickle loading
infile = open("qcd_100000_pt_1000_1200.pkl",'rb')
new_dict = pickle.load(infile)

# list all keys of the files
new_dict.keys()

# Print two variables, mass and d2, of the first 10 jets
# for i in range(10):
#     print(new_dict['mass'][i],new_dict['d2'][i])

Index(['pt', 'eta', 'phi', 'mass', 'ee2', 'ee3', 'd2', 'angularity', 't1',
       't2', 't3', 't21', 't32', 'KtDeltaR'],
      dtype='object')

# Background
Goal: create simple 'analysis' optimizing the selection of signal while minimizing background.

Looking for standard model Higgs boson in pp collisions at √s = 13 TeV. The Higgs bosons are produced with large transverse momentum (pT) and decaying to a bottom quark-antiquark pair. With simulated set of LHC events:
- signal: Higgs decay to bb, forming jets
- background: generic QCD processes resulting in jets

Mass of Higgs boson $m_h = 125.09\pm 0.3$ GeV. This means we need to collide 1 billion protons to produce a Higgs. At 13 TeV, the total cross section for pp → h production is closer to 40 pb, but this estimate is not bad. Sensitivity: assuming Higgs boson exists, what kind of probability would I expect to record? Want high sensitivity so as to see something above signal.

[Generated data source](https://github.com/AlexSchuy/qsvm_jet_tagging/blob/master/qsvm_jet_tagging/generation/generate_samples.py)

[General info](https://arxiv.org/pdf/1709.04533.pdf)

[Subjet structure](https://arxiv.org/abs/1201.0008)

[Examples](https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/PERF-2017-04/#tables)

ATLAS is a general purpose detector for proton-proton collisions.
- Pseudorapidity η is a geometric quantity(equal to 1 when perpendicular to z-axis). It is a function of polar angle θ that goes from inf to −inf as θ goes from 0 to π. Azimuthal angle ϕ goes around the beam. 
- A boosted object refers to a particle that travels at high speed.

Standard model particles:

<img src="./images/standard_model.jpg" width="600"/>

## Jet

Group of particles that travel in the same direction. Easiest to define with a cone of $R = \sqrt{(\Delta\eta)^2 + (\Delta\phi)^2}$ around a particle and include particles in cone as a jet. After jets are found, multiple purpose; one of which relies on jet’s 4-momentum, defined as the sum of the 4-momenta of all the particles in the jet. Used to approximate the momentum of a primordial parton. Now we want to see jet substructure; with increased angular resolution one can approach a complete classification of all the particles in the jet, rather than
just the aggregate energy in a region. Also for helpful for general particle searches.

Iterative clustering:
1. Calculate the pairwise distance dij between every pair of objects.
2. Merge the two closest particles.
3. Repeat until no two particles are closer than some given R.
Result in n jets of size R.

JADE algorithm: $d_{ij} = m_{ij}/Q^2$, where $m_{ij}$ is the invariant mass of the two objects.

## Exploring the data
Each sample contains 14 features(only high level jet information): 
‘pt', 'eta', 'phi', 'mass', 'ee2', 'ee3', 'd2', 'angularity', 't1', 't2', 't3', 't21', 't32', 'KtDeltaR'

* Transverse momentum: pt=(px,py) should also vary depending on the mass which can be distinguished
* Pseudorapidity: $(\eta = \ln cot \frac{\theta}{2})$function of theta and equal to 0 when perpendicular to z-axis; not as distinguihsable given jets grouped on theta
* Phi: $(\phi=tan^{-1}\frac{px}{py})$ dependent on x,y momentum so determinable
* Mass: most particle masses vary by enough GeV to distinguish them apart; the Higgs boson having a mass closest to the Muon.
* ee2: ECF functions for two subjettiness(CalcEECorr); varies by jet
* ee3: ECF functions for two subjettiness
* d2: ee3/ee2\**3 not distinguishable
* angularity: sum_theta/mass
* t1: Mandelstram var: t-channel where t = (-2 p1 * p3) or (-2 p2 * p4); depends on momentum so distinguishable
* t2: sum of momentums in 2 given subjets
* t3:
* t21: t2/t1; 
* t32: t3/t2; 
* KtDeltaR: delta-R of two subjets within large-R jet; could be dicriminating depending on decay and energy of particle type

In [10]:
# Exploring data

# Print two variables, mass and d2, of the first 10 jets
for i in range(10):
    print(new_dict['mass'][i],new_dict['d2'][i], new_dict['t1'][i])

272.01088105661836 3.244342529835572 0.9616974917894365
139.7944080424717 6.481473283811561 0.8960029743624829
245.4131462450914 2.6347878568119545 0.8616781285948955
89.97591625467423 13.389844946183699 0.9529173295276647
85.89395606420216 11.342156212232954 0.8687701882337261
298.82680391098984 2.525105538419591 0.7177227592967348
120.11684917359369 3.0180758748270313 1.1359647157110184
144.69776811306357 7.180292494621935 1.0246857236856954
99.67094594080851 7.004058591448358 1.0012372993211158
246.64401978773606 1.2917788826098395 1.1627538638154173
