# Jet Types and Algorithms

The jet algorithms take as input a set of 4-vectors. At CMS, the most popular jet type is the "Particle Flow Jet", which attempts to use the entire detector at once and derive single four-vectors representing specific particles.For this reason it is very comparable (ideally) to clustering generator-level four-vectors also.

## Particle Flow Jets (PFJets)

Particle Flow candidates (PFCandidates) combine information from various detectors to make a combined estimation of particle properties based on their assigned identities (photon, electron, muon, charged hadron, neutral hadron).

PFJets are created by clustering PFCandidates into jets, and contain information about contributions of every particle class: Electromagnetic/hadronic, Charged/neutral etc.

The jet response is high. The jet pT resolution is good: starting at 15--20% at low pT and asymptotically reaching 5% at high pT.

## Monte Carlo Generator-level Jets (GenJets)

GenJets are pure Monte Carlo simulated jets. They are useful for analysis with MC samples. GenJets are formed by clustering the four-momenta of Monte Carlo truth particles. This may include “invisible” particles (muons, neutrinos, WIMPs, etc.).

As there are no detector effects involved, the jet response (or jet energy scale) is 1, and the jet resolution is perfect, by definition.

GenJets include information about the 4-vectors of the constituent particles, the hadronic and electromagnetic components of the energy etc.

## Calorimeter Jets (CaloJets)

CaloJets are formed from energy deposits in the calorimeters (hadronic and electromagnetic), with no tracking information considered. In the barrel region, a calorimeter tower consists of a single HCAL cell and the associated 5x5 array of ECAL crystals (the HCAL-ECAL association is similar but more complicated in the endcap region). The four-momentum of a tower is assigned from the energy of the tower, assuming zero mass, with the direction corresponding to the tower position from the interaction point.

In CMS, CaloJets are used less often than PFJets. Examples of their use include performance studies to disentangle tracker and calorimeter effects, and trigger-level analyses where the tracker is neglected to reduce the event processing time. ATLAS makes much more use of CaloJets, as their version of particle flow is not as mature as CMS's.

## Exercise: Reconstructed vs. Generator-Level Jets

Execute the following cell to make some more jet histograms, this time from QCD MC. 

In [None]:
%%bash
python $CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_make_histograms.py --files=$CMSSW_BASE/src/Analysis/JMEDAS/data/MiniAODs/RunIIFall17MiniAODv2/QCD_Pt_300to470.txt --outname=$CMSSW_BASE/src/Analysis/JMEDAS/notebooks/files/qcd_300to470.root --maxevents=10000 --maxFiles 20 --maxjets=2


...and then make the plots:

In [None]:
import ROOT
f = ROOT.TFile("$CMSSW_BASE/src/Analysis/JMEDAS/notebooks/files/qcd_300to470.root")

h_ptAK4   = f.Get("h_ptAK4")
h_etaAK4  = f.Get("h_etaAK4")
h_phiAK4  = f.Get("h_phiAK4")
h_mAK4    = f.Get("h_mAK4")

h_ptAK4Gen   = f.Get("h_ptAK4Gen")
h_etaAK4Gen  = f.Get("h_etaAK4Gen")
h_phiAK4Gen  = f.Get("h_phiAK4Gen")
h_mAK4Gen    = f.Get("h_mAK4Gen")

h_ptAK4Gen.SetLineStyle(2) 
h_etaAK4Gen.SetLineStyle(2) 
h_phiAK4Gen.SetLineStyle(2) 
h_mAK4Gen.SetLineStyle(2) 

h_ptAK4Gen.SetLineColor(2) 
h_etaAK4Gen.SetLineColor(2) 
h_phiAK4Gen.SetLineColor(2) 
h_mAK4Gen.SetLineColor(2)

c = ROOT.TCanvas('c', 'c', 800, 800)

c.Divide(2,2)
c.cd(1)
ROOT.gPad.SetLogy()
h_ptAK4.Draw()
h_ptAK4Gen.Draw("same")
h_ptAK4.GetXaxis().SetRangeUser(0, 1000)
leg = ROOT.TLegend(0.6, 0.6, 0.8, 0.8)
leg.AddEntry(h_ptAK4, "RECO", "l")
leg.AddEntry(h_ptAK4Gen, "GEN", "l")
leg.SetFillColor(0)
leg.SetLineColor(0)
leg.Draw("same")
c.cd(2)
h_etaAK4.Draw()
h_etaAK4Gen.Draw("same")
c.cd(3)
h_phiAK4.Draw()
h_phiAK4Gen.Draw("same")
h_phiAK4.SetMinimum(0)
c.cd(4)
h_mAK4.Draw()
h_mAK4Gen.Draw("same")
h_mAK4.GetXaxis().SetRangeUser(0, 200)
ROOT.gPad.SetLogy()

ROOT.enableJSVis()
c.Draw()


As you can see, the agreement isn't very good! Can you guess why?

<details>
<summary>
    <font color='blue'>Show answer...</font>
</summary>
We need to apply the jet energy corrections (JEC), which are described in the next exercise. But before we do that, we'll go over the jet clustering algorithms used in CMS.
</details>

# Jet Clustering Algorithms

The majority of jet algorithms at CMS use a so-called "clustering sequence". This is essentially a pairwise examination of the input four-vectors. If the pair satisfy some criteria, they are merged. The process is repeated until the entire list of constituents is exhausted. In addition, there are several ways to determine the "area" of the jet over which the input constituents lay. This is very important in correcting for pileup, as we will see, because some algorithms tend to "consume" more constituents than others and hence are more susceptible to pileup. Furthermore, the amount of energy that is inside of a jet due to pileup is proportional to the area, so to correct for this effect it is very important to know the jet area.



<img src="files/JHEP04_2008_063.jpg" alt="" style="width: 600px;"/>
Figure: Comparison of jet areas for four different jet algorithms, from "The anti-kt Clustering Algorithm" by Cacciari, Salam, and Soyez [JHEP04, 063 (2008), arXiv:0802.1189].

Some excellent references about jet algorithms can be found here:

- [Toward Jetography](http://arxiv.org/abs/0906.1833) by Gavin Salam.
- [Jets in Hadron-Hadron Collisions](http://arxiv.org/abs/0712.2447) by Ellis, Huston, Hatakeyama, Loch, and Toennesmann
- [The Catchment Area of Jets](http://arxiv.org/abs/0802.1188) by Cacciari, Salam, and Soyez.
- [The anti-kt Clustering Algorithm](http://arxiv.org/abs/0802.1189) by Cacciari, Salam, and Soyez.


## Exercise: Comparing jet areas between AK4 and AK8

Run the cell below to plot a comparison of the jets areas between AK4 and AK8 jets. A priori, what type of distribution do you expect?

In [None]:
import math
h_areaAK4 = f.Get("h_areaAK4")
h_areaAK8 = f.Get("h_areaAK8")
h_areaAK8.SetLineStyle(4)
h_areaAK8.SetLineColor(4)

h_areaAK4.Scale( 1.0 / h_areaAK4.Integral() )
h_areaAK8.Scale( 1.0 / h_areaAK8.Integral() )

c_area = ROOT.TCanvas('c_area', 'c_area')
frame = h_areaAK4.Clone()
frame.Reset()
frame.SetTitle("Jet Areas")
frame.SetMaximum(h_areaAK4.GetMaximum() * 1.2)
frame.Draw('axis')
h_areaAK4.Draw('hist same')
h_areaAK8.Draw("hist same")

l = ROOT.TLegend(0.6, 0.7, 0.8, 0.8)
l.SetFillColor(0)
l.SetBorderSize(0)
l.AddEntry(h_areaAK4, "AK4", "l")
l.AddEntry(h_areaAK8, "AK8", "l")
l.Draw()

c_area.Draw()

Try modifying the above cell to add vertical lines at area values corresponding to $\pi R^2$. Do the histogram peaks line up with these values?

<details>
<summary>
    <font color='blue'>Show answer...</font>
</summary>
The area plot should look like this:

![Basic jet kinematics](../files/jet_areas.png)
The histograms indeed peak at the expected value of $\pi R^2$. 
</details>


# Jet ID

In order to avoid using fake jets, which can originate from a hot calorimeter cell or electronic read-out box, we need to require some basic quality criteria for jets. These criteria are collectively called "jet ID". Details on the jet ID for PFJets can be found in the following twiki:

https://twiki.cern.ch/twiki/bin/viewauth/CMS/JetID

The JetMET POG recommends the "loose" jet ID for most physics analysis in CMS. Some important observations from the above twiki:

- Jet ID is defined for uncorrected jets only. Never apply jet ID on corrected jets. This means that in your analysis you should apply jet ID first, and then apply JECs on those jets that pass jet ID.
- Loose jet Id is fully efficient (>99.9%) for real, high-$p_{\mathrm{T}}$ jets used in most physics analysis. Its background rejection power is similarly high.

## Applying Jet ID

There are several ways to apply jet ID. In our above exercises, we have run the cuts "on-the-fly" in our python FWLite macro (the first option here). Others are listed for your convenience.

The following examples use somewhat out of date numbers. See the above link to the JetID twiki for the current numbers.

To apply the cuts on pat::Jet (like in miniAOD) in python then you can do : 
<details>
<summary>
    <font color='blue'>Show...</font>
</summary>
<code>
# Apply jet ID to uncorrected jet
nhf = jet.neutralHadronEnergy() / uncorrJet.E()
nef = jet.neutralEmEnergy() / uncorrJet.E()
chf = jet.chargedHadronEnergy() / uncorrJet.E()
cef = jet.chargedEmEnergy() / uncorrJet.E()
nconstituents = jet.numberOfDaughters()
nch = jet.chargedMultiplicity()
goodJet = \
  nhf < 0.99 and \
  nef < 0.99 and \
  chf > 0.00 and \
  cef < 0.99 and \
  nconstituents > 1 and \
  nch > 0
</code>
</details>

To apply the cuts on pat::Jet (like in miniAOD) in C++ then you can do:
<details>
<summary>
    <font color='blue'>Show...</font>
</summary>
<code>
// Apply jet ID to uncorrected jet
double nhf = jet.neutralHadronEnergy() / uncorrJet.E();
double nef = jet.neutralEmEnergy() / uncorrJet.E();
double chf = jet.chargedHadronEnergy() / uncorrJet.E();
double cef = jet.chargedEmEnergy() / uncorrJet.E();
int nconstituents = jet.numberOfDaughters();
int nch = jet.chargedMultiplicity();
bool goodJet = 
  nhf < 0.99 &&
  nef < 0.99 &&
  chf > 0.00 &&
  cef < 0.99 &&
  nconstituents > 1 &&
  nch > 0;
</code>
</details>

To create selected jets in cmsRun:
<details>
<summary>
    <font color='blue'>Show...</font>
</summary>
<code>
from PhysicsTools.SelectorUtils.pfJetIDSelector_cfi import pfJetIDSelector
process.tightPatJetsPFlow = cms.EDFilter("PFJetIDSelectionFunctorFilter",
                                         filterParams = pfJetIDSelector.clone(quality=cms.string("TIGHT")),
                                         src = cms.InputTag("slimmedJets")
                                         )
</code>
</details>

It is also possible to use the `PFJetIDSelectionFunctor` C++ selector (actually, either in C++ or python), but this was primarily developed in the days before PF when applying CaloJet ID was not possible very easily. Nevertheless, the functionality of more complicated selection still exists for PFJets, but is almost never used other than the few lines above. If you would still like to use that C++ class, it is documented as an example here.
