-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add DeepJet inputs #39
Conversation
Note: size and descriptions for 5000 events when DeepJet inputs (and output probabilities) are added. (Cf. old size and descriptions from recalculating variables. See also the same number of events, without DeepJet here: size and description.) |
python/addBTV.py
Outdated
@@ -6,7 +6,7 @@ | |||
from PhysicsTools.PatAlgos.tools.helpers import addToProcessAndTask, getPatAlgosToolsTask | |||
|
|||
|
|||
def update_jets_AK4(process): | |||
def update_jets_AK4(process, add_DeepJet): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would shuffle to logic around a bit here. Have the DeepCSV/DeepJet output nodes in as the default (~10 branches anyway) and keep the DeepCSV/DeepJet inputs as toggles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. And would you prefer one individual toggle per type of tagger or use only the keepInputs
toggle for enabling all tagger inputs at the same time? Don't know if there are use cases where one only needs e.g. DeepCSV but not DeepJet, or DDX...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking as we move on we might want to be dropping DeepCSV, also not all studies will need all taggers. I think my preference would be passing a list with up to ['DeepCSV', 'DeepJet', 'DDX']
and pattern match it after, but having it as 3 bools is not that bad either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so that would make 2^3=8 possible combinations to be added in pfnano_cff.py
for each MC or Data configuration, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you go with bools, I'd say it's not necessary that all combinations are part of the cff, can be added as/if needed
python/addBTV.py
Outdated
float, | ||
doc="DeepJet gluon tag probability", | ||
precision=10), | ||
# anstein: could add discriminators here (or just leave it as it is, only output nodes with individual probabilities) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add the discriminators, it's just 3 branches anyway, B, Cvl, CvB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw that these discriminators are already included via nano (declared inside jets_cff.py
):
'Jet_btagDeepFlavB',
'Jet_btagDeepFlavC', # this one only until 106x
'Jet_btagDeepFlavCvB',
'Jet_btagDeepFlavCvL',
'Jet_btagDeepFlavQG',
So I can probably keep it as it is? (btagDeepFlavC
will be duplicated, depending on the era (<= 106x), discriminators are always in, not dependent on the era, individual probabilities I add manually, not dependent on the era)
See
https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jets_cff.py#L233-L239
for the discriminators
and
https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jets_cff.py#L268
for btagDeepFlavC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 indeed, in fact if you could check there aren't any duplicate branches in the current output fill overall that would be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def duplicates(mykeys):
seen = set()
for x in mykeys:
if x in seen:
print(x)
seen.add(x)
return
duplicates(keys)
used for the configuration PFnano_customizeMC_add_DeepJet
gives me this output
FatJet_btagDDBvLV2
FatJet_btagDDCvBV2
FatJet_btagDDCvLV2
Jet_btagDeepFlavC
FatJet_nBHadrons
FatJet_nCHadrons
SubJet_nBHadrons
SubJet_nCHadrons
Jet_btagDeepFlavC
was expected, as explained above, the others are not related to what I added or modified in this PR I guess.
plugins/DeepJetTableProducer.cc
Outdated
const edm::EDGetTokenT<TagInfoCollection> tag_info_src_; | ||
// anstein: could be used later to manually change number of features and constituents per feature | ||
/* | ||
constexpr static unsigned n_features_global_ = 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better to use these, than to hardcode the number later on in each entry
plugins/DeepJetTableProducer.cc
Outdated
|
||
|
||
// ============================================================== Cpfs =================================================================== | ||
input_name = "DeepJet_Cpfcan_puppiw_" + s; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe these can be formated better? Defining and overwriting description
and inpu_name
for each entry is redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, looks good to me. I left a couple of comments, mostly to clean up commented out code, but then we can merge
plugins/DeepJetTableProducer.cc
Outdated
|
||
edm::EDGetTokenT<edm::View<T>> jet_token_; | ||
|
||
// anstein: from ONNX producer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// anstein: from ONNX producer |
plugins/DeepJetTableProducer.cc
Outdated
typedef std::vector<reco::DeepFlavourTagInfo> TagInfoCollection; | ||
const edm::EDGetTokenT<TagInfoCollection> tag_info_src_; | ||
|
||
// anstein: number of constituents will be used, number of features per constituent not necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// anstein: number of constituents will be used, number of features per constituent not necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove commented out stuff here
plugins/DeepJetTableProducer.cc
Outdated
for (unsigned int p = 0; p < n_cpf_; p++) { | ||
auto s = std::to_string(p); | ||
|
||
djTable->addColumn<float>("DeepJet_Cpfcan_puppiw_" + s, Cpfcan_puppiw_nCpf[p], "charged candidate PUPPI weight of the " + s + ". cpf", nanoaod::FlatTable::FloatColumn, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
djTable->addColumn<float>("DeepJet_Cpfcan_puppiw_" + s, Cpfcan_puppiw_nCpf[p], "charged candidate PUPPI weight of the " + s + ". cpf", nanoaod::FlatTable::FloatColumn, 10); | |
djTable->addColumn<float>("DeepJet_Cpfcan_puppiw_" + s, | |
Cpfcan_puppiw_nCpf[p], | |
"charged candidate PUPPI weight of the " + s + ". cpf", | |
nanoaod::FlatTable::FloatColumn, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we format like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we can! :) will change all affected lines and split them to improve readability.
plugins/DeepJetTableProducer.cc
Outdated
djTable->addColumn<float>("DeepJet_sv_d3dsig_" + s, sv_d3dsig_nSV[p], "3D impact parameter (flight distance) significance of the " + s + ". SV", nanoaod::FlatTable::FloatColumn, 10); | ||
djTable->addColumn<float>("DeepJet_sv_costhetasvpv_" + s, sv_costhetasvpv_nSV[p], "cosine of the angle between the " + s + ". SV flight direction and the direction of the " + s + ". SV momentum", nanoaod::FlatTable::FloatColumn, 10); | ||
/* | ||
// anstein: only relevant if also included in the tag info, not yet, maybe in future versions of the tagger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep these, as they might be relevant later, but as a rule most comments could/should be removed
All requested changes should be in now, but as I'm only Author, not Collaborator, I think someone else has to merge this PR. |
add_DeepJet
andkeep_inputs
add_DeepJet
) - I guess Keep full output nodes for DeepCSV and DeepJet #35 is relatedadd_DeepJet_noclip
, which adds more variables on top ofadd_DeepJet
: uses raw quantities without any preprocessing bycatch_infs_and_bound
(variables are not clipped / scaled)@andrzejnovak