# Jet types at the LHC

Jets are reconstructed physics objects representing the hadronization and fragmentation of quarks and gluons. CMS mostly uses anti-$k_{\mathrm{T}}$ jets with a cone-size of $R=0.4$ to reconstruct this type of jet. We have algorithms that distinguish heavy-flavour (b or c) quarks (which are in the domain of the BTV POG), quark- vs gluon-originated jets, and jets from the main $pp$ collision versus jets formed largely from pileup particles. 

However, quarks and gluons are only part of the story! At the LHC, the typical collision energy is much greater than the mass scale of the known SM particles, and hence even heavier particles like top quarks, W/Z/Higgs bosons, and heavy beyond-the-Standard-Model particles can be produced with large Lorentz boosts. When these particles decay to quarks and gluons, their decay products are collimated and overlap in the detector, making them difficult to reconstruct as individual AK4 jets. 

Therefore, LHC analyses use jet algorithms with a large radius parameter to reconstruct these objects, which we called "large radius" or "fat" jets. CMS uses anti-$k_{\mathrm{T}}$ jets with $R=0.8$ (AK8) as the standard large-radius jet, while ATLAS uses AK10. 

This topic was explained in more detailed in the slides [ADD LINK TO SLIDES]. You can also read these excellent overviews of jet substructure techniques:

- [Boosted objects: a probe of beyond the Standard Model physics](http://arxiv.org/abs/1012.5412) by Abdesselam et al.
- [Looking inside jets: an introduction to jet substructure and boosted-object phenomenology](https://arxiv.org/abs/1901.10342) by Marzani, Soyez, and Spannowsky.

## Jet types and algorithms in CMS

The standard jet algorithms are all implemented in the CMS reconstruction software, [CMSSW](github.com/cms-sw/cmssw). However, a few algorithms with specific parameters (namely AK4, AK8, and CA15) have become standard tools in CMS; these jet types are extensively studied by the JetMET POG, and are highly recommended. These algorithms are included in the centrally produced CMS samples, at the AOD, miniAOD, and nanoAOD data tiers (note that miniAOD and nanoAOD are most commonly used for analysis, while AOD is much less common these days, and is not widely available on the grid). Other algorithms can be implemented and tested using the **JetToolbox** (discussed later in the tutorial).  

In this part of the tutorial, you will learn how to access the jet collection included in the CMS datasets, do some comparisons of the different jet types, and how to create your own collections. 


### AOD 

[This twiki](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideDataFormatRecoJets) summarizes the respective labels by which each jet collection can be retrieved from the event record for general AOD files. This format is currently been used for specialized studies, but for most of the analyses you can use the other formats.

### MiniAOD

There are three main jet collections stored in the MiniAOD format, as described [here](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2017#Jets).
 * **slimmedJets**: are AK4 energy-corrected jets using charged hadron substraction (CHS) as the pileup removal algorithm. This is the default jet collection for CMS analyses for Run II. In this collection you can find the following jet algorithms, as well as other jet related quantities:
   * b-tagging 
   * Pileup jet ID
   * Quark/gluon likelihood info embedded.
 * **slimmedJetsPUPPI**: are AK4 energy-corrected jets using the PUPPI algorithm for pileup removal. This collection will be the default for Run III analyses.
 * **slimmedJetsAK8**: ak4 AK8 energy-corrected jets using the PUPPI algoritm for pileup removal. This has been the default collection for boosted jets in Run II. In this collection you can find the following jet algorithms, as well as other jet related quantities:
   * Softdrop mass
   * n-subjettiness and energy correlation variables
   * Access to softdrop subjets
   * Access to the associated AK8 CHS jet four momentum, including softdrop and pruned mass, and n-subjettiness.

### Examples of how to access jet collections in miniAOD samples

Below are two examples of how to access jet collections from these samples. This exercise does not intend for you to modify code in order to access these collections, but rather for you to look at the code and get an idea about how you could access this information if needed.

##### In C++

Please take a look at the file `$CMSSW_BASE/src/Analysis/JMEDAS/src/jmedas_miniAODAnalyzer.C` with your favourite code viewer.

You can run this code by using the python config file `$CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest.py` from your terminal. This script will only print out some information about the jets in that sample. Again, the most important part of this exercise is to get familiar with how to access jet collections from miniAOD. Take a good look at the prints this script produces to your terminal.

In [None]:
cmsRun $CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest.py

##### In Python

Now take a look at the file `$CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest_purePython.py`.

This code can be run with simple python in your terminal. Similar as in the case for C++, the output of this job is some information about jets. The most important part of the exercise is to get familiar with how to access jet collections using python from miniAOD.

In [None]:
python $CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest_purePython.py

### NanoAOD

In nanoAOD, only AK4 CHS jets ( _Jet_ ) and AK8 PUPPI jets ( _FatJet_ ) are stored. The jets in nanoAOD are similar to those in miniAOD, but not identical (for example, the $p_{\mathrm{T}}$ cuts might be different). A full set of variables for each jet collection can be found in this [website](https://cms-nanoaod-integration.web.cern.ch/integration/master-102X/mc102X_doc.html).

NanoAOD is a "flat tree" format, meaning that you can access the information directly with simple ROOT, or even simple python tools (like numpy or pandas). This format is becoming more and more popular within CMS due to its simplicity and accesibility. An extremely simple example can be found below (or, if you prefer using your favorite editor, open `$CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_nanoAODtest.py`). Try running this now in your cmslpc session with python.

*Aside*: there are several more advanced tools on the market which allow you to do more sophisticated analysis using nanoAOD format, including [RDataFrame](https://root.cern/doc/master/classROOT_1_1RDataFrame.html), [NanoAOD-tools](https://github.com/cms-nanoAOD/nanoAOD-tools), or [Coffea](https://github.com/CoffeaTeam/coffea). We excourage you to look at them and use the one you like the most.

In [None]:
from ROOT import *

inputFile = TFile.Open('root://xrootd-cms.infn.it//store/mc/RunIIAutumn18NanoAODv4/QCD_HT1000to1500_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/Nano14Dec2018_102X_upgrade2018_realistic_v16-v1/110000/C23F77AA-3909-C74E-ADCD-8266FF68AB5D.root' )
events = inputFile.Get('Events')

for iev in xrange(events.GetEntries()):
    events.GetEntry(iev)
    if iev >= 10: break

    print "\nEvent %d: run %6d, lumi %4d, event %12d" % (iev, events.run, events.luminosityBlock, events.event )

    # AK4 CHS Jets
    for ijet in range(events.nJet):
        if events.Jet_pt < 20: continue
        print 'AK4 jet '+ str(ijet) + ': pt ' + str(events.Jet_pt[ijet]) + ', eta ' + str(events.Jet_eta[ijet]) + ', mass ' + str(events.Jet_mass[ijet]) + ',phi ' + str(events.Jet_phi[ijet]) + ', puId ' + str(events.Jet_puId[ijet]) + ', deepJet btag disc. ' + str(events.Jet_btagDeepB[ijet])

    # AK8 PUPPI Jets
    for ijet in range(events.nFatJet):
        if events.FatJet_pt < 20: continue
        print 'AK8 jet '+ str(ijet) + ': pt ' + str(events.FatJet_pt[ijet]) + ', eta ' + str(events.FatJet_eta[ijet]) + ', mass ' + str(events.FatJet_mass[ijet]) + ',phi ' + str(events.FatJet_phi[ijet]) + ', deepAK8 W tag disc. ' + str(events.FatJet_deepTag_WvsQCD[ijet])+ ', deepAK8 top tag disc. ' + str(events.FatJet_deepTag_TvsQCD[ijet])


### jetToolbox *For reading only*
#### The example commands given in this part do not work at the moment, so please just read the below descriptions and remember that maybe in the future jetToolBox can be useful to you!

Although JME generally recommends to use AK4 CHS and AK8 PUPPI jets for Run II analyses (moving fully to AK4 PUPPI jets for Run III), there are cases where certain analysis will need to use something else. Similar for the standard algorithms stored in mini/nanoAOD samples. For users who want to test a different jet collection or algorithms, JetMET had developed a user-friendly tool to compute them: [JetToolbox](https://twiki.cern.ch/twiki/bin/view/CMS/JetToolbox).

The JetToolbox is *not* part of CMSSW because JME wants to have the freedom to incorporate and test as many tools as possible without these algorithms being part of any central samples or code. That is the reason that, in real life, you would need to clone the [JetToolbox repository](https://github.com/cms-jet/JetToolbox) inside your CMSSW src folder like this:
~~~
cd $CMSSW_BASE/src/
git clone git@github.com:cms-jet/JetToolbox.git JMEAnalysis/JetToolbox -b jetToolbox_102X_v3
scram b
~~~

In this tutorial, this step was done for you in the initial setup. _You do not need to do it now_. You can find more information about how to set up the JetToolbox in the [README.md](https://github.com/cms-jet/JetToolbox) of the github repository or in the [twiki](https://twiki.cern.ch/twiki/bin/view/CMS/JetToolbox).

For instance, imagine that you want to create a new jet collection using the Cambridge-Aachen clustering algorithm with cone-size 1.2, using PUPPI as pileup removal algorithm, some selection, including the pruning and softdrop grooming algorithms, and use different parameters for the softdrop algoritm. Then in your python config file you need to include something like this:

```
from Analysis.JetToolbox.jetToolbox_cff import *                 # Load the jetTOolbox

jetToolbox( process, 'ca10', 'jetSequence', 'out',               # cone size and basic setup
  Cut="pt>170 && abs(eta)<2.5",                                  # selection
  dataTier="miniAOD",                                            # input file: miniAOD or nanoAOD?
  PUMethod='PUPPI',                                              # PUPPI/CHS/SK/PF
  addPruning=True, addSoftDrop=True , betaCut = 1.0,             # add basic grooming
  JETCorrPayload = 'AK8PFchs', JETCorrLevels = ['L2Relative', 'L3Absolute']   # for jet corrections
)
```

A full list of algorithms and parameters can be found in the [JetToolbox twiki](https://twiki.cern.ch/twiki/bin/viewauth/CMS/JetToolbox#Arguments). Take a look at the more complete example in `$CMSSW_BASE/src/Analysis/JMEDAS/scripts/ClusterWithToolboxAndMakeHistos.py`.


Now you can run the script in your terminal, or in the next cell to create a bunch of new jet collections:

In [None]:
cmsRun $CMSSW_BASE/src/Analysis/JMEDAS/scripts/ClusterWithToolboxAndMakeHistos.py

Running the jetToolbox as explained above does create a miniAOD-like output. To know the name of the newly create jet collection, jetToolbox prints out some information like this:
```
|---- jetToolBox: JETTOOLBOX RUNNING ON MiniAOD FOR AK8 JETS USING Puppi
|---- jetToolBox: Applying these corrections: ('AK8PFPuppi', ['L2Relative', 'L3Absolute'], 'None')
|---- jetToolBox: Creating packedPatJetsAK8PFPuppiSoftDrop collection with SoftDrop subjets.
|---- jetToolBox: Running ak8PFJetsPuppiSoftDropMass, selectedPatJetsAK8PFPuppiSoftDropPacked:SubJets, ak8PFJetsPuppiPrunedMass, ak8PFJetsPuppiTrimmedMass, ak8PFJetsPuppiFilteredMass, NjettinessAK8Puppi, nb1AK8PuppiSoftDrop.
|---- jetToolBox: Creating selectedPatJetsAK8PFPuppi collection.
```

The name of the new jet collection in this case will be `selectedPatJetsAK8PFPuppi`. To access it, you can follow the same procedure you did for the `slimmedJets` or `slimmedAK8Jets` in the miniAOD example above.

Because the nanoAOD format is becoming very popular in CMS, there is a "beta" version of the jetToolbox which will create nanoAOD-like samples including new jet collections. For example, to create a nanoAOD like sample, you need to include a file like the one below in your folder `$CMSSW_BASE/src/Analysis/JMEDAS/python/`:

<details>
    <summary><font color='blue'>SHOW CODE</font></summary>
<p>

```python
import FWCore.ParameterSet.Config as cms
from  PhysicsTools.NanoAOD.common_cff import *
### NanoAOD v5 (for 2016,2017,2018), for different recipe please modify accordingly
from Configuration.Eras.Modifier_run2_nanoAOD_94X2016_cff import run2_nanoAOD_94X2016
from Configuration.Eras.Modifier_run2_nanoAOD_94XMiniAODv2_cff import run2_nanoAOD_94XMiniAODv2
from Configuration.Eras.Modifier_run2_nanoAOD_102Xv1_cff import run2_nanoAOD_102Xv1
from JMEAnalysis.JetToolbox.jetToolbox_cff import jetToolbox

# ---------------------------------------------------------
# This is the part the user should modify
def setupCustomizedJetToolbox(process):

    #### AK4 PUPPI jets

    ak4btagdiscriminators = [
            'pfDeepCSVJetTags:probb',
            'pfDeepCSVJetTags:probbb',
            'pfDeepCSVJetTags:probc',
            'pfDeepCSVJetTags:probudsg',
    ]
    ak4btaginfos = [ 'pfDeepCSVTagInfos' ] 

    jetToolbox(process, 'ak4', 'dummyseq', 'noOutput',
               dataTier='nanoAOD',
               PUMethod='Puppi', JETCorrPayload='AK4PFPuppi',
               #addQGTagger=True,
               runOnMC=True,
               Cut='pt > 15.0 && abs(eta) < 2.4',
               bTagDiscriminators=ak4btagdiscriminators,
               bTagInfos=ak4btaginfos,
               verbosity=4
               )

    #### AK8 PUPPI jets
    ak8btagdiscriminators = [
                        'pfBoostedDoubleSecondaryVertexAK8BJetTags',
                        'pfMassIndependentDeepDoubleBvLJetTags:probQCD',
                        'pfMassIndependentDeepDoubleBvLJetTags:probHbb',
                        'pfMassIndependentDeepDoubleCvLJetTags:probQCD',
                        'pfMassIndependentDeepDoubleCvLJetTags:probHcc',
                        'pfMassIndependentDeepDoubleCvBJetTags:probHbb',
                        'pfMassIndependentDeepDoubleCvBJetTags:probHcc',
            ]

    jetToolbox(process, 'ak8', 'adummyseq', 'noOutput',
               dataTier='nanoAOD',
               PUMethod='Puppi', JETCorrPayload='AK8PFPuppi',
               runOnMC=True,
               Cut='pt > 170.0 && abs(eta) < 2.4',
               bTagDiscriminators=ak8btagdiscriminators,
               addSoftDrop=True,
               addSoftDropSubjets=True,
               addPruning=True,
               addNsub=True,
               addEnergyCorrFunc=True,
               )
    return process

# ---------------------------------------------------------

def nanoJTB_customizeMC(process):
    run2_nanoAOD_94X2016.toModify(process, setupCustomizedJetToolbox)
    run2_nanoAOD_94XMiniAODv2.toModify(process, setupCustomizedJetToolbox)
    run2_nanoAOD_102Xv1.toModify(process, setupCustomizedJetToolbox)
    process.NANOAODSIMoutput.fakeNameForCrab = cms.untracked.bool(True)  # needed for crab publication
    return process

```
            
</p>
</details>

This file is already included in this tutorial. _Do not try to copy this file in your working directory._

Then to create a nanoAOD-like file, run the next line in your terminal.

In [None]:
cmsDriver.py nanoAOD_jetToolbox_cff -s NANO --mc --eventcontent NANOAODSIM --datatier NANOAODSIM -n 100  --conditions 102X_upgrade2018_realistic_v20 --era Run2_2018,run2_nanoAOD_102Xv1 --customise_commands="process.add_(cms.Service('InitRootHandlers', EnableIMT = cms.untracked.bool(False)))" --customise Analysis/JMEDAS/nanoAOD_jetToolbox_cff.nanoJTB_customizeMC --filein /store/mc/RunIIAutumn18MiniAOD/QCD_HT1000to1500_TuneCP5_13TeV-madgraphMLM-pythia8/MINIAODSIM/102X_upgrade2018_realistic_v15-v1/210000/DA20DC21-E781-C540-9FCD-7BCF2144CA4E.root --fileout file:jetToolbox_nano_mc.root

The output file in this case, called `jetToolbox_nano_mc.root` will be a nanoAOD file with all the content from the central nanoAOD sample but including an extra collection which name starts with `selectedPatJetsAK8PFPuppi` (Or the name indicated in the jetToolbox printouts). 