Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase I Deep CSV Training #18315

Merged
merged 5 commits into from Apr 23, 2017
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 4 additions & 4 deletions PhysicsTools/PatAlgos/python/patInputFiles_cff.py
Expand Up @@ -11,7 +11,7 @@
#, numberOfFiles = 1
#, useDAS = True
#)
'/store/relval/CMSSW_8_0_0/RelValTTbar_13/MINIAODSIM/PU25ns_80X_mcRun2_asymptotic_v4-v1/10000/A65CD249-BFDA-E511-813A-0025905A6066.root'
'/store/relval/CMSSW_9_1_0_pre2/RelValTTbar_13/MINIAODSIM/90X_upgrade2017_realistic_v20-v2/00000/16132980-3019-E711-AD34-0025905A6110.root'
)

# /RelValProdTTbar_13/CMSSW_8_0_0-80X_mcRun2_asymptotic_v4-v1/AODSIM
Expand All @@ -24,7 +24,7 @@
#, numberOfFiles = 1
#, useDAS = True
#)
'/store/relval/CMSSW_8_0_0/RelValProdTTbar_13/AODSIM/80X_mcRun2_asymptotic_v4-v1/10000/DE81ABBF-1DDA-E511-8AF8-0026189438B5.root'
'/store/relval/CMSSW_9_1_0_pre2/RelValTTbar_13/GEN-SIM-RECO/90X_upgrade2017_realistic_v20-v2/00000/2257937F-3019-E711-BF48-0CC47A4D7678.root'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these better be AOD files
to match the purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems more appropriate to define a new vstring for phase1 or otherwise also uniformly migrate PhysicsTools/PatAlgos/python/patTemplate_cfg.py to use 2017 GT and era

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have two sets of input files, one set for Phase 0 and another for Phase 1, set by an era modifier? For this to be of some use, we would need a version of PhysicsTools/PatAlgos/python/patTemplate_cfg.py that uses the 2017 GT and era. But wouldn't this imply duplicating all the PAT test cfg files? It would be nice to have some way of running all PAT tests in multiple eras but without duplicating all the cfg files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pat level workflows are now era dependent, then yes... [probably this breaks lots of analysis cfgs, no?..]

presumably that means have a few input arguments (era, file) that get passed down to configs..

)

# /RelValTTbar_13/CMSSW_8_0_0-80X_mcRun2_asymptotic_v4-v1/GEN-SIM-RECO
Expand All @@ -37,7 +37,7 @@
#, numberOfFiles = 1
#, useDAS = True
#)
'/store/relval/CMSSW_8_0_0/RelValTTbar_13/GEN-SIM-RECO/80X_mcRun2_asymptotic_v4-v1/10000/1C687FB0-7BD9-E511-AFED-0CC47A78A4BA.root'
'/store/relval/CMSSW_9_1_0_pre2/RelValTTbar_13/GEN-SIM-RECO/90X_upgrade2017_realistic_v20-v2/00000/2257937F-3019-E711-BF48-0CC47A4D7678.root'
)

# /RelValTTbar_13/CMSSW_8_0_0-PU25ns_80X_mcRun2_asymptotic_v4_FastSim-v2/GEN-SIM-DIGI-RECO
Expand Down Expand Up @@ -76,7 +76,7 @@
#, numberOfFiles = 1
#, useDAS = True
#)
'/store/relval/CMSSW_8_0_0/SingleMu/MINIAOD/80X_dataRun2_v5_RelVal_mu2012D-v3/10000/06A44F40-ECDD-E511-89D7-0CC47A78A3D8.root'
'/store/relval/CMSSW_9_1_0_pre2/SingleMuon/MINIAOD/90X_dataRun2_relval_v6_RelVal_sigMu2016E-v1/00000/96231232-361A-E711-96B5-0CC47A7C3430.root'
)

# /SingleMu/CMSSW_8_0_0-80X_dataRun2_v5_RelVal_mu2012D-v3/RECO
Expand Down
22 changes: 15 additions & 7 deletions PhysicsTools/PatAlgos/python/producersLayer1/jetProducer_cfi.py
Expand Up @@ -49,14 +49,12 @@
# CTagging
cms.InputTag('pfCombinedCvsLJetTags'),
cms.InputTag('pfCombinedCvsBJetTags'),
# The following code is commented-out to avoid breaking any unit test
# waiting for a set of AOD RelVals which have the jet tags in the event content
# DeepFlavour
# cms.InputTag('pfDeepCSVJetTags:probb'),
# cms.InputTag('pfDeepCSVJetTags:probc'),
# cms.InputTag('pfDeepCSVJetTags:probudsg'),
# cms.InputTag('pfDeepCSVJetTags:probbb'),
# cms.InputTag('pfDeepCSVJetTags:probcc'),
cms.InputTag('pfDeepCSVJetTags:probb'),
cms.InputTag('pfDeepCSVJetTags:probc'),
cms.InputTag('pfDeepCSVJetTags:probudsg'),
cms.InputTag('pfDeepCSVJetTags:probbb'),
cms.InputTag('pfDeepCSVJetTags:probcc'),
# DeepCMVA
# cms.InputTag('pfDeepCMVAJetTags:probb'),
# cms.InputTag('pfDeepCMVAJetTags:probc'),
Expand Down Expand Up @@ -101,4 +99,14 @@
resolutions = cms.PSet()
)

from Configuration.Eras.Modifier_phase1Pixel_cff import phase1Pixel
_phaseI_taggers = cms.VInputTag(
*[i for i in _patJets.discriminatorSources if i.value() != 'pfDeepCSVJetTags:probcc']
)
phase1Pixel.toModify(
_patJets,
discriminatorSources = _phaseI_taggers
)


patJets = _patJets.clone()
Expand Up @@ -62,7 +62,7 @@ def applySubstructure( process ) :
jetSource = cms.InputTag('ak8PFJetsPuppi'),
algo= 'AK', rParam = 0.8,
jetCorrections = ('AK8PFPuppi', cms.vstring(['L2Relative', 'L3Absolute']), 'None'),
btagDiscriminators = ([x.getModuleLabel() for x in patJetsDefault.discriminatorSources] + ['pfBoostedDoubleSecondaryVertexAK8BJetTags']),
btagDiscriminators = ([x.value() for x in patJetsDefault.discriminatorSources] + ['pfBoostedDoubleSecondaryVertexAK8BJetTags']),
genJetCollection = cms.InputTag('slimmedGenJetsAK8')
)
process.patJetsAK8Puppi.userData.userFloats.src = [] # start with empty list of user floats
Expand Down
6 changes: 3 additions & 3 deletions PhysicsTools/PatAlgos/python/tools/jetTools.py
Expand Up @@ -381,7 +381,7 @@ def setupBTagging(process, jetSource, pfCandidates, explicitJTA, pvSource, svSou
if btagInfo == 'pfDeepCMVATagInfos':
addToProcessAndTask(btagPrefix+btagInfo+labelName+postfix,
btag.pfDeepCMVATagInfos.clone(
pfDeepCSVTagInfos = cms.InputTag(btagPrefix+'pfDeepCSVTagInfos'+labelName+postfix),
deepNNTagInfos = cms.InputTag(btagPrefix+'pfDeepCSVTagInfos'+labelName+postfix),
ipInfoSrc = cms.InputTag(btagPrefix+"pfImpactParameterTagInfos"+labelName+postfix),
muInfoSrc = cms.InputTag(btagPrefix+"softPFMuonsTagInfos"+labelName+postfix),
elInfoSrc = cms.InputTag(btagPrefix+"softPFElectronsTagInfos"+labelName+postfix)),
Expand All @@ -391,7 +391,7 @@ def setupBTagging(process, jetSource, pfCandidates, explicitJTA, pvSource, svSou
if btagInfo == 'pfDeepCMVANegativeTagInfos':
addToProcessAndTask(btagPrefix+btagInfo+labelName+postfix,
btag.pfDeepCMVATagInfos.clone(
pfDeepCSVTagInfos = cms.InputTag(btagPrefix+'pfDeepCSVTagInfos'+labelName+postfix),
deepNNTagInfos = cms.InputTag(btagPrefix+'pfDeepCSVTagInfos'+labelName+postfix),
ipInfoSrc = cms.InputTag(btagPrefix+"pfImpactParameterTagInfos"+labelName+postfix),
muInfoSrc = cms.InputTag(btagPrefix+"softPFMuonsTagInfos"+labelName+postfix),
elInfoSrc = cms.InputTag(btagPrefix+"softPFElectronsTagInfos"+labelName+postfix)),
Expand All @@ -401,7 +401,7 @@ def setupBTagging(process, jetSource, pfCandidates, explicitJTA, pvSource, svSou
if btagInfo == 'pfDeepCMVAPositiveTagInfos':
addToProcessAndTask(btagPrefix+btagInfo+labelName+postfix,
btag.pfDeepCMVATagInfos.clone(
pfDeepCSVTagInfos = cms.InputTag(btagPrefix+'pfDeepCSVTagInfos'+labelName+postfix),
deepNNTagInfos = cms.InputTag(btagPrefix+'pfDeepCSVTagInfos'+labelName+postfix),
ipInfoSrc = cms.InputTag(btagPrefix+"pfImpactParameterTagInfos"+labelName+postfix),
muInfoSrc = cms.InputTag(btagPrefix+"softPFMuonsTagInfos"+labelName+postfix),
elInfoSrc = cms.InputTag(btagPrefix+"softPFElectronsTagInfos"+labelName+postfix)),
Expand Down
16 changes: 14 additions & 2 deletions PhysicsTools/PatAlgos/test/patTuple_addBTagging_cfg.py
Expand Up @@ -101,6 +101,17 @@
,'pfCombinedCvsBJetTags'
# ChargeTagging
,'pfChargeBJetTags'
#Deep Flavour
,'pfDeepCSVJetTags:probb'
,'pfDeepCSVJetTags:probc'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you are aiming at the Phase I case only given the fact that there is no probcc here. This looks fine to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is more a "one size fit all" solution, this will work in both, and in the future the Phase 0 checks should be less and less relevant.

,'pfDeepCSVJetTags:probudsg'
,'pfDeepCSVJetTags:probbb'
# DeepCMVA
,'pfDeepCMVAJetTags:probb'
,'pfDeepCMVAJetTags:probc'
,'pfDeepCMVAJetTags:probudsg'
,'pfDeepCMVAJetTags:probbb'
,'pfDeepCMVAJetTags:probcc'
]

# uncomment the following lines to add ak4PFJets with new b-tags to your PAT output
Expand All @@ -126,18 +137,19 @@
process.patJetsAK8PFCHS.addTagInfos = True

# uncomment the following lines to add subjets of ak8PFJetsCHSSoftDrop with new b-tags to your PAT output
from pdb import set_trace
addJetCollection(
process,
labelName = 'AK8PFCHSSoftDropSubjets',
jetSource = cms.InputTag('ak8PFJetsCHSSoftDrop','SubJets'),
jetCorrections = ('AK4PFchs', cms.vstring(['L1FastJet', 'L2Relative', 'L3Absolute']), 'Type-2'), # Using AK4 JECs for subjets which might not be completely appropriate
algo = 'AK', # needed for subjet flavor clustering
rParam = 0.8, # needed for subjet flavor clustering
btagDiscriminators = btagDiscriminators,
explicitJTA = True, # needed for subjet b tagging
svClustering = True, # needed for subjet b tagging
fatJets = cms.InputTag("ak8PFJetsCHS"), # needed for subjet flavor clustering
groomedFatJets = cms.InputTag("ak8PFJetsCHSSoftDrop") # needed for subjet flavor clustering
groomedFatJets = cms.InputTag("ak8PFJetsCHSSoftDrop"), # needed for subjet flavor clustering
rParam = 0.8, # needed for subjet flavor clustering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this reshuffling driven by some aesthetic reasons or was there a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually pretty useful for debugging an error. If you issue "step" to pdb on the previous configuration it actually enters the InputTag creation rather then the pat sequence modifier, hence I left it there in case we will need debugging of the sequence in future.

)
process.patJetsAK8PFCHSSoftDropSubjets.addTagInfos = True

Expand Down
13 changes: 10 additions & 3 deletions RecoBTag/Combined/plugins/DeepFlavourJetTagsProducer.cc
Expand Up @@ -76,6 +76,8 @@ class DeepFlavourJetTagsProducer : public edm::stream::EDProducer<> {
// ----------member data ---------------------------
const edm::EDGetTokenT< INFOS > src_;
edm::FileInPath nnconfig_;
bool check_sv_for_defaults_;
bool mean_padding_;
lwt::LightweightNeuralNetwork *neural_network_;
lwt::ValueMap inputs_; //typedef of unordered_map<string, float>
vector<string> outputs_;
Expand All @@ -97,6 +99,8 @@ class DeepFlavourJetTagsProducer : public edm::stream::EDProducer<> {
DeepFlavourJetTagsProducer::DeepFlavourJetTagsProducer(const edm::ParameterSet& iConfig) :
src_( consumes< INFOS >(iConfig.getParameter<edm::InputTag>("src")) ),
nnconfig_(iConfig.getParameter<edm::FileInPath>("NNConfig")),
check_sv_for_defaults_(iConfig.getParameter<bool>("checkSVForDefaults")),
mean_padding_(iConfig.getParameter<bool>("meanPadding")),
neural_network_(NULL),
inputs_(),
outputs_(),
Expand Down Expand Up @@ -134,7 +138,8 @@ DeepFlavourJetTagsProducer::DeepFlavourJetTagsProducer(const edm::ParameterSet&
<< ". Please check the spelling" << std::endl;
}
var.index = (tokens.size() == 2) ? stoi(tokens.at(1)) : -1;
var.default_value = -1*input.offset; //set default to -offset so that when scaling (val+offset)*scale the outcome is 0
var.default_value = (mean_padding_) ? 0. : -1*input.offset; //set default to -offset so that when scaling (val+offset)*scale the outcome is 0
//for mean padding it is set to zero so that undefined values are assigned -mean/scale

variables_.push_back(var);
}
Expand Down Expand Up @@ -184,9 +189,11 @@ DeepFlavourJetTagsProducer::produce(edm::Event& iEvent, const edm::EventSetup& i
TaggingVariableList vars = info.taggingVariables();
//if there are no tracks there's no point in doing it
bool notracks = (vars.get(reco::btau::jetNSelectedTracks) == 0);
bool novtx = (vars.get(reco::btau::jetNSecondaryVertices) == 0);
bool defaulted = (check_sv_for_defaults_) ? (notracks && novtx) : notracks;
lwt::ValueMap nnout; //returned value

if(!notracks) {
if(!defaulted) {
for(auto& var : variables_) {
if(var.index >= 0){
std::vector<float> vals = vars.getList(var.id, false);
Expand All @@ -207,7 +214,7 @@ DeepFlavourJetTagsProducer::produce(edm::Event& iEvent, const edm::EventSetup& i

//dump the NN output(s)
for(size_t i=0; i<outputs_.size(); ++i) {
(*output_tags[i])[key] = (notracks) ? -1 : nnout[outputs_[i]];
(*output_tags[i])[key] = (defaulted) ? -1 : nnout[outputs_[i]];
//std::cout << i << ": " << nnout[outputs_[i]] << std::endl;
}
}
Expand Down
6 changes: 3 additions & 3 deletions RecoBTag/Combined/python/deepFlavour_cff.py
@@ -1,6 +1,6 @@
import FWCore.ParameterSet.Config as cms
from RecoBTag.Combined.pfDeepCSVTagInfos_cfi import pfDeepCSVTagInfos
from RecoBTag.Combined.DeepCMVATagInfoProducer_cfi import pfDeepCMVATagInfos
from RecoBTag.Combined.pfDeepCMVATagInfos_cfi import pfDeepCMVATagInfos
from RecoBTag.Combined.pfDeepCSVJetTags_cfi import pfDeepCSVJetTags
from RecoBTag.Combined.pfDeepCMVAJetTags_cfi import pfDeepCMVAJetTags

Expand Down Expand Up @@ -52,8 +52,8 @@
##
pfDeepFlavourTask = cms.Task(
pfDeepCSVTagInfos,
## pfDeepCMVATagInfos, #SKIP for the moment
pfDeepCMVATagInfos, #SKIP for the moment
pfDeepCSVJetTags
## , pfDeepCMVAJetTags
, pfDeepCMVAJetTags
)
pfDeepFlavour = cms.Sequence(pfDeepFlavourTask)
2 changes: 2 additions & 0 deletions RecoBTag/Combined/python/pfDeepCMVAJetTags_cfi.py
Expand Up @@ -3,5 +3,7 @@
pfDeepCMVAJetTags = cms.EDProducer(
'DeepFlavourJetTagsProducer',
src = cms.InputTag('pfDeepCMVATagInfos'),
checkSVForDefaults = cms.bool(False),
meanPadding = cms.bool(False),
NNConfig = cms.FileInPath('RecoBTag/Combined/data/Model_DeepCMVA.json')
)
7 changes: 7 additions & 0 deletions RecoBTag/Combined/python/pfDeepCSVJetTags_cfi.py
Expand Up @@ -3,5 +3,12 @@
pfDeepCSVJetTags = cms.EDProducer(
'DeepFlavourJetTagsProducer',
src = cms.InputTag('pfDeepCSVTagInfos'),
checkSVForDefaults = cms.bool(False),
meanPadding = cms.bool(False),
NNConfig = cms.FileInPath('RecoBTag/Combined/data/DeepFlavourNoSL.json')
)

from Configuration.Eras.Modifier_phase1Pixel_cff import phase1Pixel
phase1Pixel.toModify(pfDeepCSVJetTags, NNConfig = cms.FileInPath('RecoBTag/Combined/data/DeepCSV_PhaseI.json'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the plan to put these conditions in the GT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can json files such as this one used in the DeepCSV tagger be put in the CondDB? We can put GBRForest payloads in the CondDB for BDT-based taggers but for a general MVA technique, I'm not sure what are our options (of course, there are conditions used for the CSV algorithm based on a dedicated MVA computer code in PhysicsTools/MVAComputer but the general trend is to move away from that code since it is not very user-friendly and there are no longer developers around who really understand the inner workings of the code).

phase1Pixel.toModify(pfDeepCSVJetTags, checkSVForDefaults = cms.bool(True))
phase1Pixel.toModify(pfDeepCSVJetTags, meanPadding = cms.bool(True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these parameters changing for 2017?
Other than 4 pixel layer related logic, the same best kind of training should be provided for 2017 and the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model (line 12 and 14) has been explicitly trained for Phase I, and therefore should be used for Phase I only, there is no guarantee that the same training will perform better on Phase 0 as well.
Line 13 is a change in how the defaults are treated. We did not check the effect on the phase 0 training and we would like to keep it that way for compatibility reasons.

24 changes: 5 additions & 19 deletions RecoBTag/Combined/scripts/format_deepflavour_json.sh
@@ -1,23 +1,9 @@
#! /bin/env bash

sed -i 's|Jet_eta|jetEta|g' $1
sed -i 's|Jet_pt|jetPt|g' $1
#sed -i 's|jet_eta|jetEta|g' $1
sed -i 's|jet_eta|jetAbsEta|g' $1
sed -i 's|jet_pt|jetPt|g' $1
sed -i 's|TagVarCSV_||g' $1
sed -i 's|TagVarCSVTrk_||g' $1
sed -i 's|prob_|prob|g' $1

#bugfixes
sed -i 's|jetNTracks|jetNSelectedTracks|g' $1
sed -i 's|jetNSelectedTracksEtaRel|jetNTracksEtaRel|g' $1

python <<EOF
import json
with open('$1') as infile:
jmap = json.loads(infile.read())

for var in jmap['inputs']:
var['offset'] *= -1
var['scale'] = 1./var['scale']

with open('$1', 'w') as out:
out.write(json.dumps(jmap, indent=2, separators = (',', ': ')))
EOF
sed -i 's|trackJetDistVal|trackJetDist|g' $1