# Coding Dojo Working Notebook: Plotting Jet Kinematic Variables

## Jet Basics

Protons are composed of gluons and quarks. When protons collide at high eneries at the LHC, these consituents can fly apart. However, due to a principle called color confinment which dictates that color-charged particles such as quarks and gluons cannot be in isolation, these free quarks/gluons undergo a process called hadronization in which they combine with quarks and antiquarks spontaneously created from the vacuum, thus producing color-neutral particles.

This hadronization process results in a shower of particles known as a jet. To distinguish between particles that come from quarks/gluons (jets) and those from other processes, we employ clustering algorithms which compute a measure of distance and combine particles that meet specific criteria. The measure of distance between a pair of particles is given by:

$$
d_{ij} = \min\left(p_{ti}^{2p}, p_{tj}^{2p}\right)\frac{\Delta R_{ij}^2}{R^2}
$$

Particles are combined if $d_{ij} < d_{iB}$ and the iteration process stops once $d_{ij}>d_{iB}$. The parameter $p$ depends on the algorithm being used:
* KT algorithm: $p=1$
* Cambridge Aachen algorithm (CA): $p=0$
* antiKT (AK) algorithm: $p=-1$

For the anti-KT algorithm, jet cones can have one of two radii: $R=0.8$ for AK8 ("fat jets") and $R=0.4$ for AK4.

In this Coding Dojo, you will be plotting the kinematic variables of AK4 jets, namely the mass, the $p_T$, the pseudotapidity ($\eta$) and the azimuthal angle ($\phi$).

## Setup

If you do not already have the data, go back to your terminal and do so. (Hint: Look up what `wget` does and how it works.)

## Instructions
For AK4 jets, plot histograms for each of the kinematic variables using the tools specified in the following table.

<table>
  <tr>
    <th>AK4 Jet Kinematic Variable</th>
    <th>Tools to Plot With</th>
  </tr>
  <tr>
    <td>Jet eta</td>
    <td>PyROOT</td>
  </tr>
  <tr>
    <td>Jet phi</td>
    <td>RDataFrame</td>
  </tr>
  <tr>
    <td>Jet pt</td>
    <td>Uproot, Hist</td>
  </tr>
  <tr>
    <td>Jet mass</td>
    <td>Matplotlib</td>
  </tr>
</table>

If you finish before time is up, there is an extra exercise

## Hints:
1. Check the content of data.root (e.g. What is the name of the TTree?)
    - The file contains multipe TTrees
    - Only the **first** one is relavent to this exercise
2. Check the content of TTree (e.g. What branches does TTree have? What is the branch name of the quantity that we want to plot?)
3. Fill a histogram using the branch and draw

In [None]:
# check the content of data.root:


## `PyROOT` for AK4 Jet $\eta$

**To-do**: Plot a histogram for jet $\eta$ using PyROOT

**Link to PyROOT tutorials:** https://github.com/Ari-mu-l/software-carpentry

Import relavent package(s)

Read the ROOT File

Load the Tree from the ROOT file (Use the name of the first TTree from the step "check the content of data.root")

Check the branch names of the ttree and select the desired branch (jet eta)

- Hint1: Use the Print() function

- Hint2: We often act on an object with the format of \<object\>.\<some function\> like hist.Draw()

- Hint3: If you cannot figure it out, try to google for answers!

Create a canvas to plot the histogram

Create an empty Histogram to plot the AK4 Jet $\eta$

- Requirement: Range from -6 to 6. Split into 100 bins.

Fill the histogram with jet $\eta$

- The code has been written for you.

- Read the code and explain what each line is doing with the help of the documentation: https://root.cern.ch/doc/master/classTTree.html .
  If it doesn't contain everything you need, use google.

Write it down in this cell (double click to edit) or leave comments in the code:

In [None]:
nEntries = tree.GetEntries()

for i in range(nEntries):
    if(tree.GetEntry(i)>0):
        for j in range(len(tree.Jet_eta)): # Loop through all the jets in each event
             hist.Fill(tree.Jet_eta[j])

Draw the histogram on canvas

## `RDataFrame` for AK4 Jet $\phi$

**To-do**: Plot the distribution of jet $\phi$

**Requirement**: the histogram should range from -2 to 2 and have 10 bins

**Documentation** of relavent functions: https://root.cern/doc/master/classROOT_1_1RDataFrame.html

Import relavent package (hint: ROOT)

Load TTree into a RDataFrame (aka create a RDataFrame with the content in TTree)

Check the column names

Fill a histogram with the desired branch

- Requirement: the histogram should range from -2 to 2 and have 10 bins

Plot histogram

## `Hist` for Jet $p_T$

**To-do**: Plot the distribution of Jet $p_T$

**Requirement**: the histogram should range from 0 to 500 and have 500 bins

**Link to Uproot tutorial**: 
https://hsf-training.github.io/hsf-training-scikit-hep-webpage/

Import relavent packages (hint: Hist from hist, awkward, uproot)

Load data.root data into a variable with uproot

Use the .array() method to extract Jet_pt data with library='ak'. Search for Jet_pt in 'Events' TTree

Flatten the Jet_pt data

Create, fill and plot a histogram:
- Requirement: the histogram should range from 0 to 500 and have 500 bins

## `Matplotlib` for AK4 Jet mass

This is probably the longest method of making plots with our data. So we've provided detailed instructions and starter code. 

Reuse the code from the Hist section i.e. :

1. Open the file with uproot
2. Get the Jet_mass branch and get the awkard array.
3. Make the histogram by flattening and using `<myhist>.fill()`. *However this time do not use `<myhist>.plot()`*.

`Hist` calls `matplotlib.pyplot` internally and allows you to make quick plots. Now that you were able to plot the histogram with hist objects in the past section, it is your turn to make it look pretty.

**To Do**

Instead of using the `hist.Hist.plot1d()` to plot the histogram we are going to use the `<myhist>.to_numpy()` to get the bin edges and bin contents like `np.histogram()` would normally return.

**Requirements**:

1. Make the histogram filled with red color
1. Add a legend
1. Set the y-scale to "log"
1. Add a title at the top of the plot that says "Jet Mass Distribution"
1. Add a label on the x axis that says "Jet Mass [GeV]"
1. Set the words "Count" as the y axis label

**HINT**

Use google for adding stuff in matplotlib. Take this code as a starting point.
```python
# In order to use bin edges from a pre-binned histogram you need to do the following
fig, ax = plt.subplots()
values , edges = <myhist>.to_numpy()
ax.hist(x = edges[:-1], bins = edges, weights = values)
# where values and edges come from the output of <myhist>.to_numpy()

plt.show()
```

**Optional**

If you haven't heard about it, try out `mplhep`. If it's not avaialable, install it with `pip install mplhep` and use the following

```python
import mplhep as hep
hep.style.use("CMS")
```

This will now make your plots look nicer and closer to publication style!

For extra points try to add the *CMS Experiment* label at the top with `mplhep`. (Google is your friend!!)

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import uproot
import hist
import awkward as ak

Reuse the code from the `Hist` section.

Use the starter code to extract the bin contents and the bin edges from the `hist` histogram you've made and plot a histogram with matplotlib.

Remember to meet the requirements above.

If all goes well you should have something like below (after using `mplhep`)

![jetmass.png](jetmass.png)
