# Exercise 2

__What is this?__ This is a CMSSW configuration file

__Does it work on Jupyter?__
Well... yes and no. Every CMSSW configuration file is a fully consistent python script, which means you can execute part of it in jupyter and see the effects, to run it on data, though you still need to export it in plain python and run it with `cmsRun`

__How do I export the notebook?__
Simply run in a shell:

`jupyter nbconvert --to script Exercise2.ipynb`


## Part I - a crash course on CMSSW configs

Every CMSSW config must import the CMS standard configuration module and define a process. The process is the class that contains all the modules that _can_ be run, the __Path__s and __Sequence__s that _must_ be run.
The process must have a name, and such name must be unique in the data chain, i.e. if the data have been processed by a process named `FOO`, you cannot run them again through a process with the same name
It's necessary to specify the era considered, as different taggers may have different trainings for different eras, see https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCmsDriverEras

In [1]:
import FWCore.ParameterSet.Config as cms
from Configuration.StandardSequences.Eras import eras
process = cms.Process("newPAT",eras.Run2_2016)

Calling `process.load(fragment_name)` will act very similarly to `import` in normal python, but all the CMSSW modules defined in the python fragment will be loaded directly into the process.
For our purposes we need a bunch of services that define detector geometry and magnetic field map.

In [5]:
process.load("Configuration.Geometry.GeometryRecoDB_cff")
process.load("Configuration.StandardSequences.FrontierConditions_GlobalTag_cff")
process.load("Configuration.StandardSequences.MagneticField_cff")
process.load("FWCore.MessageService.MessageLogger_cfi")
process.MessageLogger.cerr.FwkReport.reportEvery = 10

The `GlobalTag` defines a specific set of conditions (alignment, jet energy corrections etc.) valid for data or MC and for a specific set of range. You can look for the valid global tag for the data you are analyzing [here](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideFrontierConditions?redirectedfrom=CMS.SWGuideFrontierConditions)

In [6]:
from Configuration.AlCa.GlobalTag import GlobalTag
process.GlobalTag = GlobalTag(process.GlobalTag, 'auto:run2_mc')

Of course, you can define the input files, the number of events to run on, and if you want a full summary of what has been run

In [7]:
#Source
process.source = cms.Source(
    "PoolSource",
    fileNames = cms.untracked.vstring(
        'root://xrootd-cms.infn.it//store/mc/RunIISummer16MiniAODv3/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/MINIAODSIM/PUMoriond17_94X_mcRun2_asymptotic_v3-v1/00000/0A8930BA-88BE-E811-8BDD-20CF3027A582.root'
    )
)

#Events to run
process.maxEvents = cms.untracked.PSet( 
    input = cms.untracked.int32(100) 
)

#Long summary
process.options = cms.untracked.PSet( 
    wantSummary = cms.untracked.bool(True) 
)

This is how you define the output of the edm file

In [11]:
process.out = cms.OutputModule(
    "PoolOutputModule",
    fileName = cms.untracked.string('updated_btagging.root'),
    ## save only events passing the full path
    #SelectEvents = cms.untracked.PSet( SelectEvents = cms.vstring('p') ),
    outputCommands = cms.untracked.vstring(
        'drop *', ## Do not keep anything
        'keep *_slimmedJets_*_*' #keep only the slimmed jets
    )
)

__The format of the `keep` statement:__ Stars are allowed and mean anything like in POSIX regular expressions (the one you use in your shell), there are four fields separated by an underscore, in the same order as presented by the `edmDumpEventContent` command. They represent:
   1. The type of the object
   2. The name (a.k.a _label_) of the module producing it
   3. The _instance_. If a module produces multiple objects, it will make them with the same name, but different instances (and, potentially, types)
   4. The process name. This is used in case you want to reproduce some objects in your cfg (e.g. the whole HLT simulation) and save only the new one

In [8]:
from PhysicsTools.PatAlgos.tools.helpers import getPatAlgosToolsTask
patAlgosToolsTask = getPatAlgosToolsTask(process)

More information on what a `cms.Task` is are available [here](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideAboutPythonConfigFile#Task_Objects).

The EndPath contains the information of what needs to be run at the end of the execution of each event.

In [12]:
process.outpath = cms.EndPath(process.out, patAlgosToolsTask)

## Part II - remaking b-tag discriminators from MiniAOD

Everything is handled by a single helper function

In [13]:
from PhysicsTools.PatAlgos.tools.jetTools import updateJetCollection

Getting the full set of optional arguments, unfortunately, is a bit cumbersome. This approach, though, _should_ be similar for all PAT-based modifier functions

In [25]:
print updateJetCollection.__doc__
for par_name, par in updateJetCollection._parameters.iteritems():
    print '   - %s:  %s' % (par_name, par.description)


    Tool to update a jet collection in your PAT Tuple (primarily intended for MiniAOD for which the default input argument values have been set).
    
   - labelName:  Label name of the new patJet collection.
   - postfix:  Postfix from usePF2PAT.
   - btagPrefix:  Prefix to be added to b-tag discriminator and TagInfo names
   - jetSource:  Label of the input collection from which the new patJet collection should be created
   - pfCandidates:  Label of the input collection for candidatecandidatese used in b-tagging
   - explicitJTA:  Use explicit jet-track association
   - pvSource:  Label of the input collection for primary vertices used in b-tagging
   - svSource:  Label of the input collection for IVF vertices used in b-tagging
   - elSource:  Label of the input collection for electrons used in b-tagging
   - muSource:  Label of the input collection for muons used in b-tagging
   - runIVF:  Re-run IVF secondary vertex reconstruction
   - tightBTagNTkHits:  Enable legacy tight b-tag

In [26]:
updateJetCollection(
    process,
    jetSource = cms.InputTag('slimmedJets'),
    jetCorrections = ('AK4PFchs', cms.vstring(['L1FastJet', 'L2Relative', 'L3Absolute']), 'None'),
    btagDiscriminators = ['pfCombinedSecondaryVertexV2BJetTags'], ## to add discriminators
    btagPrefix = 'TEST'
)

**************************************************************
b tagging needs to be run on uncorrected jets. Hence, the JECs
will first be undone for 'updatedPatJets' and then applied to
'updatedPatJetsTransientCorrected'.
**************************************************************


Here you should write the necessary code to store the new discriminators. What will be their name?

In [27]:
process.out.outputCommands.append('keep *_selectedUpdatedPatJets_*_*')

Now you can convert the notebook to run on the data!