# Applying corrections to columnar data

Here we will show how to use the `coffea.lookup_tools` package.
It is able to read in a variety of common correction file formats into a standardized lookup table format.
We also cover here some CMS-specific tools for jet corrections (`coffea.jetmet_tools`) and b-tagging efficiencies/uncertainties (`coffea.btag_tools`).

**Test data**:
We'll use NanoEvents to construct some test data.

In [1]:
from coffea.nanoevents import NanoEventsFactory

fname = "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root"
events = NanoEventsFactory.from_root(fname).events()

The entrypoint for `coffea.lookup_tools` is the [extractor class](https://coffeateam.github.io/coffea/api/coffea.lookup_tools.extractor.html#coffea.lookup_tools.extractor).

In [2]:
from coffea.lookup_tools import extractor

In [3]:
%%bash
# download some sample correction sources
mkdir -p data
pushd data
PREFIX=https://raw.githubusercontent.com/CoffeaTeam/coffea/master/tests/samples
curl -Os $PREFIX/testSF2d.histo.root
curl -Os $PREFIX/testBTagSF.btag.csv
curl -Os $PREFIX/EIDISO_WH_out.histo.json
curl -Os $PREFIX/Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi.jec.txt
curl -Os $PREFIX/Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi.junc.txt
curl -Os $PREFIX/DeepCSV_102XSF_V1.btag.csv.gz
popd

~/coffea/coffea/binder/data ~/coffea/coffea/binder
~/coffea/coffea/binder


## Opening a root file and using it as a lookup table

In [tests/samples](https://github.com/CoffeaTeam/coffea/tree/master/tests/samples), there is an example file with a `TH2F` histogram named `scalefactors_Tight_Electron`. The following code reads that histogram into an [evaluator](https://coffeateam.github.io/coffea/api/coffea.lookup_tools.evaluator.html#coffea.lookup_tools.evaluator) instance, under the key `testSF2d` and applies it to some electrons.

In [4]:
ext = extractor()
# several histograms can be imported at once using wildcards (*)
ext.add_weight_sets(["testSF2d scalefactors_Tight_Electron data/testSF2d.histo.root"])
ext.finalize()

evaluator = ext.make_evaluator()

print("available evaluator keys:")
for key in evaluator.keys():
    print("\t", key)
print("testSF2d:", evaluator['testSF2d'])
print("type of testSF2d:", type(evaluator['testSF2d']))

available evaluator keys:
	 testSF2d
testSF2d: 2 dimensional histogram with axes:
	1: [-2.5   -2.    -1.566 -1.444 -0.8    0.     0.8    1.444  1.566  2.
  2.5  ]
	2: [ 10.  20.  35.  50.  90. 150. 500.]

type of testSF2d: <class 'coffea.lookup_tools.dense_lookup.dense_lookup'>


In [5]:
print("Electron eta:", events.Electron.eta)
print("Electron pt:", events.Electron.pt)
print("Scale factor:", evaluator["testSF2d"](events.Electron.eta, events.Electron.pt))

Electron eta: [[], [1.83], [-0.293, -0.904], [-2.19, 1.65], ... [-0.0595], [], [0.381], [], []]
Electron pt: [[], [29.6], [60.1, 51.7], [10.7, 8.6], [], ... [], [15.6], [], [7.68], [], []]
Scale factor: [[], [0.909], [0.953, 0.972], [0.807, 0.827], ... [0.941], [], [0.946], [], []]


## Reading a CMS b-tag scale factor csv file

These files have the following structure:

In [6]:
%%bash
head -5 data/testBTagSF.btag.csv

CSVv2;OperatingPoint, measurementType, sysType, jetFlavor, etaMin, etaMax, ptMin, ptMax, discrMin, discrMax, formula 
0, mujets, central, 1, -2.4, 2.4, 20, 1000, 0, 1, "0.884016*((1.+(0.0331508*x))/(1.+(0.0285096*x)))" 
0, mujets, central, 0, -2.4, 2.4, 20, 1000, 0, 1, "0.884016*((1.+(0.0331508*x))/(1.+(0.0285096*x)))" 
0, mujets, down, 1, -2.4, 2.4, 20, 30, 0, 1, "(0.884016*((1.+(0.0331508*x))/(1.+(0.0285096*x))))-0.063606932759284973" 
0, mujets, down, 1, -2.4, 2.4, 30, 50, 0, 1, "(0.884016*((1.+(0.0331508*x))/(1.+(0.0285096*x))))-0.034989029169082642" 


The extractor assumes `*.csv` files have this structure and interprets them as so. The resulting scale factors can be used to calculate b-tagging corrections or uncertainties. **Note**: a high-level b-tagging correction class is also available, see the later sections of this guide.

In [7]:
ext = extractor()
ext.add_weight_sets(["testBTag * data/testBTagSF.btag.csv"])
ext.finalize()

evaluator = ext.make_evaluator()

print("available evaluator keys:")
for i, key in enumerate(evaluator.keys()):
    print("\t", key)
    if i > 5:
        print("\t ...")
        break
print("testBTagCSVv2_1_comb_up_0:", evaluator['testBTagCSVv2_1_comb_up_0'])
print("type of testBTagCSVv2_1_comb_up_0:", type(evaluator['testBTagCSVv2_1_comb_up_0']))



available evaluator keys:
	 testBTagCSVv2_0_comb_central_0
	 testBTagCSVv2_0_comb_central_1
	 testBTagCSVv2_0_comb_down_0
	 testBTagCSVv2_0_comb_down_1
	 testBTagCSVv2_0_comb_up_0
	 testBTagCSVv2_0_comb_up_1
	 testBTagCSVv2_0_incl_central_2
	 ...
testBTagCSVv2_1_comb_up_0: 3 dimensional histogram with axes:
	1: [-2.4  2.4]
	2: [  20.   30.   50.   70.  100.  140.  200.  300.  600. 1000.]
	3: [0. 1.]

type of testBTagCSVv2_1_comb_up_0: <class 'coffea.lookup_tools.dense_evaluated_lookup.dense_evaluated_lookup'>


In [8]:
# note: in a real situation you would want to apply the SF on the appropriate jet flavor
scalefactor = evaluator['testBTagCSVv2_1_comb_up_0'](events.Jet.eta, events.Jet.pt, events.Jet.btagCSVV2)
print(scalefactor)

  entrypoints.init_all()


[[0.979, 0.961, 0.966, 0.93, 0.918], [0.984, ... 0.945, 0.94, 0.918], [0.935, 0.933]]


## Importing JSON-encoded histograms

Some corrections are provided in a json format, with a structure like
```
data[category][name][axis1 bin][axis2 bin]["value"]
```

In [9]:
%%bash
head -10 data/EIDISO_WH_out.histo.json

{
	"EIDISO_WH" : {
		"eta_pt_ratio" : {
			"eta:[ 0.00, 0.80]":{
				"pt:[25.00,27.00]":{
					"value":0.903,
					"error":0.051
				},
				"pt:[27.00,30.00]":{
					"value":0.921,


The extractor assumes `*.json` files follow this format.

In [10]:
ext = extractor()
ext.add_weight_sets(["* * data/EIDISO_WH_out.histo.json"])
ext.finalize()
    
evaluator = ext.make_evaluator()

print("available evaluator keys:")
for key in evaluator.keys():
    print("\t", key)
print("EIDISO_WH/eta_pt_ratio_value:", evaluator['EIDISO_WH/eta_pt_ratio_value'])
print("type of EIDISO_WH/eta_pt_ratio_value:", type(evaluator['EIDISO_WH/eta_pt_ratio_value']))

available evaluator keys:
	 EIDISO_WH/eta_pt_ratio_value
	 EIDISO_WH/eta_pt_ratio_error
EIDISO_WH/eta_pt_ratio_value: 2 dimensional histogram with axes:
	1: [-2.5  -2.17 -1.8  -1.57 -1.44 -0.8   0.    0.8   1.44  1.57  1.8   2.17
  2.5 ]
	2: [ 25.  27.  30.  32.  35.  40.  50. 200.]

type of EIDISO_WH/eta_pt_ratio_value: <class 'coffea.lookup_tools.dense_lookup.dense_lookup'>


In [11]:
sf_out = evaluator['EIDISO_WH/eta_pt_ratio_value'](events.Electron.eta, events.Electron.pt)
sf_err_out = evaluator['EIDISO_WH/eta_pt_ratio_error'](events.Electron.eta, events.Electron.pt)
print(sf_out)
print(sf_err_out)

[[], [0.793], [0.904, 0.909], [0.788, 0.83], ... [], [0.862], [], [0.903], [], []]
[[], [0.061], [0.018, 0.023], [0.186, 0.035], ... [0.051], [], [0.051], [], []]


## Import CMS Jet Energy Scales and Uncertainties
In CMS, the jet energy scale and resolution corrections, as well as their uncertainties are available in a standalone text file format:

In [12]:
%%bash
head -5 data/Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi.jec.txt

{2 JetEta JetPt 1 JetPt max(0.0001,[0]+((x-[1])*([2]+((x-[1])*([3]+((x-[1])*[4])))))) Correction L2Relative}
  -5.191  -4.889     0.001   8.95328     7           8   8.9532766      2.638647259      7.816547526   -0.08418211224   -0.04894120539    0.03974185455
  -5.191  -4.889   8.95328   10.4135     7   8.9532766   10.413492      2.538089424      8.953276588   -0.04139022814    0.03931172893   -0.02027504013
  -5.191  -4.889   10.4135   12.1083     7   10.413492   12.108301      2.498345769        10.413492   -0.05627613174   -0.01124129111    0.00462608126
  -5.191  -4.889   12.1083   14.3796     7   12.108301   14.379634      2.393199605      12.10830116   -0.05451625473 -0.0004401913391   0.001090215218


In [13]:
%%bash
head -3 data/Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi.junc.txt

{1 JetEta 1 JetPt "" Correction Uncertainty}
-5.4 -5.0 150 9.0 0.1183 0.1183 11.0 0.1098 0.1098 13.5 0.1033 0.1033 16.5 0.0988 0.0988 19.5 0.0963 0.0963 22.5 0.0947 0.0947 26.0 0.0935 0.0935 30.0 0.0922 0.0922 34.5 0.0910 0.0910 40.0 0.0893 0.0893 46.0 0.0870 0.0870 52.5 0.0850 0.0850 60.0 0.0832 0.0832 69.0 0.0820 0.0820 79.0 0.0811 0.0811 90.5 0.0806 0.0806 105.5 0.0805 0.0805 123.5 0.0809 0.0809 143.0 0.0817 0.0817 163.5 0.0827 0.0827 185.0 0.0832 0.0832 208.0 0.0836 0.0836 232.5 0.0845 0.0845 258.5 0.0849 0.0849 286.0 0.0855 0.0855 331.0 0.0864 0.0864 396.0 0.0875 0.0875 468.5 0.0886 0.0886 549.5 0.0896 0.0896 639.0 0.0906 0.0906 738.0 0.0917 0.0917 847.5 0.0926 0.0926 968.5 0.0936 0.0936 1102.0 0.0945 0.0945 1249.5 0.0955 0.0955 1412.0 0.0964 0.0964 1590.5 0.0972 0.0972 1787.0 0.0981 0.0981 2003.0 0.0990 0.0990 2241.0 0.0998 0.0998 2503.0 0.1007 0.1007 2790.5 0.1015 0.1015 3107.0 0.1023 0.1023 3455.0 0.1031 0.1031 3837.0 0.1041 0.1041 4257.0 0.1051 0.1051 4719.0 0.1060 0.1060 5226

The extractor assumes files with `*.txt` are to be interpreted as such

In [14]:
ext = extractor()
ext.add_weight_sets([
    "* * data/Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi.jec.txt",
    "* * data/Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi.junc.txt",
])
ext.finalize()

evaluator = ext.make_evaluator()

print("available evaluator keys:")
for key in evaluator.keys():
    print("\t", key)

print()
print("Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi:")
print(evaluator['Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi'])
print("type of Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi:")
print(type(evaluator['Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi']))
print()
print("Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi:")
print(evaluator['Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi'])
print("type of Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi:")
print(type(evaluator['Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi']))

available evaluator keys:
	 Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi
	 Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi

Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi:
binned dims: ['JetEta', 'JetPt']
eval vars  : ['JetPt']
parameters : ['p0', 'p1', 'p2', 'p3', 'p4']
formula    : max(0.0001,p0+((JetPt-p1)*(p2+((JetPt-p1)*(p3+((JetPt-p1)*p4))))))
signature  : (JetEta,JetPt)

type of Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi:
<class 'coffea.lookup_tools.jme_standard_function.jme_standard_function'>

Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi:
binned dims   : ['JetEta']
eval vars     : ['JetPt']
signature     : (JetEta,JetPt)

type of Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi:
<class 'coffea.lookup_tools.jec_uncertainty_lookup.jec_uncertainty_lookup'>


In [15]:
jec_out = evaluator['Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi'](events.Jet.eta, events.Jet.pt)
junc_out = evaluator['Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi'](events.Jet.eta, events.Jet.pt)
print("Correction factor:", jec_out)
# junc_out is a double-jagged array, with the last index being the up, down values
# they can be separated via indexing, e.g.:
print("Uncertainty +:", junc_out[:, :, 0])
print("Uncertainty -:", junc_out[:, :, 1])

Correction factor: [[1.2, 1.27, 1.41, 1.98, 2], [1.09, 1.28, ... 1.67, 1.46, 1.37, 1.16], [1.36, 1.15]]
Uncertainty +: [[1.01, 1.05, 1.08, 1.14, 1.15], [1.01, ... 1.13, 1.04, 1.04, 1.04], [1.03, 1.03]]
Uncertainty -: [[0.989, 0.955, 0.924, 0.858, 0.848], ... 0.957, 0.961, 0.963], [0.969, 0.97]]


## Applying energy scale transformations to Jets

The `coffea.jetmet_tools` package provides a convenience class [JetTransformer](https://coffeateam.github.io/coffea/api/coffea.jetmet_tools.JetTransformer.html#coffea.jetmet_tools.JetTransformer) which applies specified corrections and computes uncertainties in one call.

In [16]:
from coffea.jetmet_tools import FactorizedJetCorrector, JetCorrectionUncertainty
from coffea.jetmet_tools import JECStack, CorrectedJetsFactory
import awkward as ak
import numpy as np

ext = extractor()
ext.add_weight_sets([
    "* * data/Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi.jec.txt",
    "* * data/Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi.junc.txt",
])
ext.finalize()

jec_stack_names = ["Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi",
                   "Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi"]

evaluator = ext.make_evaluator()

jec_inputs = {name: evaluator[name] for name in jec_stack_names}
jec_stack = JECStack(jec_inputs)
### more possibilities are available if you send in more pieces of the JEC stack
# mc2016_ak8_jxform = JECStack(["more", "names", "of", "JEC parts"])

print(dir(evaluator))
print()

name_map = jec_stack.blank_name_map
name_map['JetPt'] = 'pt'
name_map['JetMass'] = 'mass'
name_map['JetEta'] = 'eta'
name_map['JetA'] = 'area'

jets = events.Jet
    
jets['pt_raw'] = (1 - jets['rawFactor']) * jets['pt']
jets['mass_raw'] = (1 - jets['rawFactor']) * jets['mass']
jets['pt_gen'] = ak.values_astype(ak.fill_none(jets.matched_gen.pt, 0), np.float32)
jets['rho'] = ak.broadcast_arrays(events.fixedGridRhoFastjetAll, jets.pt)[0]
name_map['ptGenJet'] = 'pt_gen'
name_map['ptRaw'] = 'pt_raw'
name_map['massRaw'] = 'mass_raw'
name_map['Rho'] = 'rho'
    
events_cache = events.caches[0]
corrector = FactorizedJetCorrector(
    Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi=evaluator['Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi'],
)
uncertainties = JetCorrectionUncertainty(
    Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi=evaluator['Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi']
)

jet_factory = CorrectedJetsFactory(name_map, jec_stack)
corrected_jets = jet_factory.build(jets, lazy_cache=events_cache)


print()
print('starting columns:',ak.fields(jets))
print()

print('untransformed pt ratios',jets.pt/jets.pt_raw)
print('untransformed mass ratios',jets.mass/jets.mass_raw)

print('transformed pt ratios',corrected_jets.pt/corrected_jets.pt_raw)
print('transformed mass ratios',corrected_jets.mass/corrected_jets.mass_raw)

print()
print('transformed columns:', ak.fields(corrected_jets))
print()

print('JES UP pt ratio',corrected_jets.JES_jes.up.pt/corrected_jets.pt_raw)
print('JES DOWN pt ratio',corrected_jets.JES_jes.down.pt/corrected_jets.pt_raw)

['Fall17_17Nov2017_V32_MC_L2Relative_AK4PFPuppi', 'Fall17_17Nov2017_V32_MC_Uncertainty_AK4PFPuppi']


starting columns: ['area', 'bRegCorr', 'bRegRes', 'btagCMVA', 'btagCSVV2', 'btagDeepB', 'btagDeepC', 'btagDeepFlavB', 'btagDeepFlavC', 'chEmEF', 'chHEF', 'cleanmask', 'electronIdx1', 'electronIdx1G', 'electronIdx2', 'electronIdx2G', 'electronIdxG', 'eta', 'genJetIdx', 'genJetIdxG', 'hadronFlavour', 'jercCHF', 'jercCHPUF', 'jetId', 'mass', 'muEF', 'muonIdx1', 'muonIdx1G', 'muonIdx2', 'muonIdx2G', 'muonIdxG', 'muonSubtrFactor', 'nConstituents', 'nElectrons', 'nMuons', 'neEmEF', 'neHEF', 'partonFlavour', 'phi', 'pt', 'puId', 'qgl', 'rawFactor', 'pt_raw', 'mass_raw', 'pt_gen', 'rho']

untransformed pt ratios [[1.12, 1.09, 1.2, 1.35, 1.27], [1.03, ... 1.28, 1.1, 1.13, 0.989], [1.13, 0.978]]
untransformed mass ratios [[1.12, 1.09, 1.2, 1.35, 1.27], [1.03, ... 1.28, 1.1, 1.13, 0.989], [1.13, 0.978]]
transformed pt ratios [[1.2, 1.3, 1.46, 2.09, 2.1], [1.09, 1.29, ... 1.84, 1.47, 1.36, 1.16], 

## Applying CMS b-tagging corrections
The `coffea.btag_tools` module provides the high-level utility [BTagScaleFactor](https://coffeateam.github.io/coffea/api/coffea.btag_tools.BTagScaleFactor.html#coffea.btag_tools.BTagScaleFactor) which calculates per-jet weights for b-tagging as well as light flavor mis-tagging efficiencies. Uncertainties can be calculated as well.

In [17]:
from coffea.btag_tools import BTagScaleFactor

btag_sf = BTagScaleFactor("data/DeepCSV_102XSF_V1.btag.csv.gz", "medium")

print("SF:", btag_sf.eval("central", events.Jet.hadronFlavour, abs(events.Jet.eta), events.Jet.pt))
print("systematic +:", btag_sf.eval("up", events.Jet.hadronFlavour, abs(events.Jet.eta), events.Jet.pt))

SF: [[1.52, 1.56, 1.59, 1.6, 1.6], [0.969, 1.57, ... 1.59, 1.6, 1.6, 1.6], [1.6, 1.6]]
systematic +: [[1.72, 1.77, 1.79, 1.8, 1.8], [1.01, 1.78, ... 1.8, 1.8, 1.8, 1.8], [1.8, 1.8]]
