New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phase I Deep CSV Training #18315
Phase I Deep CSV Training #18315
Changes from 2 commits
2218370
8e18e0b
7808091
39fb835
c9793cb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -101,6 +101,17 @@ | |
,'pfCombinedCvsBJetTags' | ||
# ChargeTagging | ||
,'pfChargeBJetTags' | ||
#Deep Flavour | ||
,'pfDeepCSVJetTags:probb' | ||
,'pfDeepCSVJetTags:probc' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess you are aiming at the Phase I case only given the fact that there is no There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, it is more a "one size fit all" solution, this will work in both, and in the future the Phase 0 checks should be less and less relevant. |
||
,'pfDeepCSVJetTags:probudsg' | ||
,'pfDeepCSVJetTags:probbb' | ||
# DeepCMVA | ||
,'pfDeepCMVAJetTags:probb' | ||
,'pfDeepCMVAJetTags:probc' | ||
,'pfDeepCMVAJetTags:probudsg' | ||
,'pfDeepCMVAJetTags:probbb' | ||
,'pfDeepCMVAJetTags:probcc' | ||
] | ||
|
||
# uncomment the following lines to add ak4PFJets with new b-tags to your PAT output | ||
|
@@ -126,18 +137,19 @@ | |
process.patJetsAK8PFCHS.addTagInfos = True | ||
|
||
# uncomment the following lines to add subjets of ak8PFJetsCHSSoftDrop with new b-tags to your PAT output | ||
from pdb import set_trace | ||
addJetCollection( | ||
process, | ||
labelName = 'AK8PFCHSSoftDropSubjets', | ||
jetSource = cms.InputTag('ak8PFJetsCHSSoftDrop','SubJets'), | ||
jetCorrections = ('AK4PFchs', cms.vstring(['L1FastJet', 'L2Relative', 'L3Absolute']), 'Type-2'), # Using AK4 JECs for subjets which might not be completely appropriate | ||
algo = 'AK', # needed for subjet flavor clustering | ||
rParam = 0.8, # needed for subjet flavor clustering | ||
btagDiscriminators = btagDiscriminators, | ||
explicitJTA = True, # needed for subjet b tagging | ||
svClustering = True, # needed for subjet b tagging | ||
fatJets = cms.InputTag("ak8PFJetsCHS"), # needed for subjet flavor clustering | ||
groomedFatJets = cms.InputTag("ak8PFJetsCHSSoftDrop") # needed for subjet flavor clustering | ||
groomedFatJets = cms.InputTag("ak8PFJetsCHSSoftDrop"), # needed for subjet flavor clustering | ||
rParam = 0.8, # needed for subjet flavor clustering | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. was this reshuffling driven by some aesthetic reasons or was there a problem? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was actually pretty useful for debugging an error. If you issue "step" to pdb on the previous configuration it actually enters the InputTag creation rather then the pat sequence modifier, hence I left it there in case we will need debugging of the sequence in future. |
||
) | ||
process.patJetsAK8PFCHSSoftDropSubjets.addTagInfos = True | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,5 +3,12 @@ | |
pfDeepCSVJetTags = cms.EDProducer( | ||
'DeepFlavourJetTagsProducer', | ||
src = cms.InputTag('pfDeepCSVTagInfos'), | ||
checkSVForDefaults = cms.bool(False), | ||
meanPadding = cms.bool(False), | ||
NNConfig = cms.FileInPath('RecoBTag/Combined/data/DeepFlavourNoSL.json') | ||
) | ||
|
||
from Configuration.Eras.Modifier_phase1Pixel_cff import phase1Pixel | ||
phase1Pixel.toModify(pfDeepCSVJetTags, NNConfig = cms.FileInPath('RecoBTag/Combined/data/DeepCSV_PhaseI.json')) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is the plan to put these conditions in the GT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can json files such as this one used in the DeepCSV tagger be put in the CondDB? We can put GBRForest payloads in the CondDB for BDT-based taggers but for a general MVA technique, I'm not sure what are our options (of course, there are conditions used for the CSV algorithm based on a dedicated MVA computer code in |
||
phase1Pixel.toModify(pfDeepCSVJetTags, checkSVForDefaults = cms.bool(True)) | ||
phase1Pixel.toModify(pfDeepCSVJetTags, meanPadding = cms.bool(True)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are these parameters changing for 2017? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The model (line 12 and 14) has been explicitly trained for Phase I, and therefore should be used for Phase I only, there is no guarantee that the same training will perform better on Phase 0 as well. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,23 +1,9 @@ | ||
#! /bin/env bash | ||
|
||
sed -i 's|Jet_eta|jetEta|g' $1 | ||
sed -i 's|Jet_pt|jetPt|g' $1 | ||
#sed -i 's|jet_eta|jetEta|g' $1 | ||
sed -i 's|jet_eta|jetAbsEta|g' $1 | ||
sed -i 's|jet_pt|jetPt|g' $1 | ||
sed -i 's|TagVarCSV_||g' $1 | ||
sed -i 's|TagVarCSVTrk_||g' $1 | ||
sed -i 's|prob_|prob|g' $1 | ||
|
||
#bugfixes | ||
sed -i 's|jetNTracks|jetNSelectedTracks|g' $1 | ||
sed -i 's|jetNSelectedTracksEtaRel|jetNTracksEtaRel|g' $1 | ||
|
||
python <<EOF | ||
import json | ||
with open('$1') as infile: | ||
jmap = json.loads(infile.read()) | ||
|
||
for var in jmap['inputs']: | ||
var['offset'] *= -1 | ||
var['scale'] = 1./var['scale'] | ||
|
||
with open('$1', 'w') as out: | ||
out.write(json.dumps(jmap, indent=2, separators = (',', ': '))) | ||
EOF | ||
sed -i 's|trackJetDistVal|trackJetDist|g' $1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these better be AOD files
to match the purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems more appropriate to define a new vstring for phase1 or otherwise also uniformly migrate PhysicsTools/PatAlgos/python/patTemplate_cfg.py to use 2017 GT and era
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have two sets of input files, one set for Phase 0 and another for Phase 1, set by an era modifier? For this to be of some use, we would need a version of
PhysicsTools/PatAlgos/python/patTemplate_cfg.py
that uses the 2017 GT and era. But wouldn't this imply duplicating all the PAT test cfg files? It would be nice to have some way of running all PAT tests in multiple eras but without duplicating all the cfg files.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pat level workflows are now era dependent, then yes... [probably this breaks lots of analysis cfgs, no?..]
presumably that means have a few input arguments (era, file) that get passed down to configs..