In [None]:
import hist
import warnings
import numpy as np
import awkward as ak
import matplotlib.pyplot as plt
from coffea.nanoevents import NanoEventsFactory, NanoAODSchema
import correctionlib

warnings.filterwarnings("ignore", message="Missing cross-reference index ")

fname = "root://xcache//store/mc/RunIISummer20UL17NanoAODv2/TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8/NANOAODSIM/106X_mc2017_realistic_v8-v1/120000/2A2F4EC9-F9BB-DF43-B08D-525B5389937E.root"
events = NanoEventsFactory.from_root(fname, schemaclass=NanoAODSchema).events()

## **Heavy-flavour tagging**

We have seen that a Jet is a narrow cone of hadrons, leptons, … produced by hadronization of a quark (i.e. u, d, s, c, b; t does not hadronize!) or a gluon. In CMS, a Jet is a reconstructed object via the anti-kT algorithm: clustering particles that are associated to ECAL or HCAL hits or to tracks into a jet cone.

Heavy-flavour jet identification exploits the properties of the hadrons originated in the jet to discriminate heavy flavour (b-,c-) initiated jets from those arising from light partons. 

B-tagging is an essential tool to be exploited to study physics processes with b-jets in their final state. 

<div>
<img src="https://camo.githubusercontent.com/1fefe17531f39b727c6b85b0de50e9cc131969b9b7b85accd5723c22306d36c8/68747470733a2f2f692e696d6775722e636f6d2f4f57685831334f2e6a7067" width="500"/>
</div>

B-jet tagging rely on b-hadron properties:
* Long B hadron lifetimes of ~1.5 ps → decay length of ~0.5 mm up to few mm in CMS detector leading to a distinct secondary vertex (SV) different from the primary vertex (PV) of the pp collision event

* Displaced tracks inside b and c jets compared to PV (due to hadron decay happening outside PV) → measured by impact parameter (IP)

* Bottom and charm quarks more massive compared to 1st generation quarks (4.2 and 1.3 GeV respectively) → B and D hadron decay products with higher transverse momenta, resulting in wider jet cones and higher number of jet constituents.

<div>
<img src="https://i.imgur.com/l0s4MHK.png" width="500"/>
</div>

A variety of b tagging algorithms has been developed by ATLAS and CMS. We will use *DeepJet* (https://arxiv.org/abs/2008.10519):
* DeepJet uses 4 inputs : jet-level variables, charged candidates tracks, neutral candidates and secondary vertices.
* Considers inputs as a sequence and recurrent layers, LSTMs, to process information. After this, the information obtained from the different inputs are concatenated together with also jet level variables. The obtained full jet information is filled into fully connected layers before classification

<div>
<img src="https://i.imgur.com/kBzsPHM.png" width="700"/>
</div>

We can access the DeepJet discriminator using the `btagDeepFlavB` field in the Jet collection

In [None]:
events.Jet.btagDeepFlavB

As in the case of muon identification and isolation, we can use the tagger in certain working points

```
deepjet_btag_wps =  {
    "2016APV": {
        "L": 0.0508,
        "M": 0.2598,
        "T": 0.6502
    },
    "2016": {
        "L": 0.048,
        "M": 0.2489,
        "T": 0.6377
    },
    "2017": {
        "L": 0.0532,
        "M": 0.304,
        "T": 0.7476
    },
    "2018": {
        "L": 0.049,
        "M": 0.2783,
        "T": 0.71
    }
}
```

<div>
<img src="https://i.imgur.com/2YAsGjA.png" width="500"/>
</div>



## B-tagging scale factors

BTV POG measures the b-tagging efficiency scale factors ($ \text{SF}= \varepsilon_{\text{DATA}}/\varepsilon_{\text{MC}} $) for b and light flavor jets. The scale factors in general depend on the jet flavor, jet $ p_{\text{T}} $, and jet $ \eta $. Scale factors for charm jets can be taken to be the same as for b jets as long as there are no dedicated scale factor measurements for charm jets. Since the b-tagging efficiencies measured in data are somewhat different than those predicted by the simulation, the scale factors need to be applied to simulated events to take this difference into account.

Since most methods require the knowledge of the MC b-tagging efficiencies, which depend on the event kinematics, it is important to emphasize that **the BTV POG only provides the scale factors and it is the analyst responsibility to compute the MC b-tagging efficiencies for each jet flavor in their signal and background MC samples before applying the scale factors**. The calculation of the MC b-tagging efficiencies is describe [here](https://github.com/deoache/wprime_plus_b/blob/refactor/corrections/binder/btag_eff.ipynb).

In [None]:
from coffea import util

# efficiency lookup table
efflookup = util.load("btag_eff_deepJet_M_2017.coffea")
efflookup

In [None]:
# bc jets
bc_jets = events.Jet[events.Jet.hadronFlavour > 0]

# b-tagging efficiency for bc jets
eff = efflookup(bc_jets.pt, np.abs(bc_jets.eta), bc_jets.hadronFlavour)
eff

b-tagging weights are computed as (see https://twiki.cern.ch/twiki/bin/viewauth/CMS/BTagSFMethods):

  $$w = \prod_{i=\text{tagged}} \frac{SF_{i} \cdot \varepsilon_i}{\varepsilon_i} \prod_{j=\text{not tagged}} \frac{1 - SF_{j} \cdot \varepsilon_j}{1-\varepsilon_j} \;\;\; [1]$$
  
  where $\varepsilon_i$ is the MC b-tagging efficiency and $\text{SF}$ are the b-tagging scale factors. $\text{SF}_i$ and $\varepsilon_i$ are functions of the jet flavor, jet $p_T$, and jet $\eta$. It's important to notice that the two products are 1. over jets tagged at the respective working point, and 2. over jets not tagged at the respective working point. **This is not to be confused with the flavor of the jets**.

### **Quiz**: 

Using the following dataset

`root://eospublic.cern.ch//eos/root-eos/HiggsTauTauReduced/DYJetsToLL.root`

compute the b-tagging weights using formula [1]. The b-tagging SF (`deepJet_comb`) can be obtained from the file `btagging.json.gz`. Its inputs are:
* systematic (string): "central"
* working_point (string): "M"
* hadron flavor (int)
* abseta (real) (Range:  [0.0, 2.5))
* pt (real)