-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switch to tau id MVA2017v2 in AOD and MiniAOD #26541
switch to tau id MVA2017v2 in AOD and MiniAOD #26541
Conversation
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-26541/9428
|
A new Pull Request was created by @swozniewski for master. It involves the following packages: RecoTauTag/Configuration @perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@cmsbuild please test |
The tests are being triggered in jenkins. |
@swozniewski |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
@slava77 The main purpose of this PR is to have this discriminator also available by default in all MiniAOD to have the most useful baseline for all ultra-legacy analyses and possible new trainings that will be performed with the ultra-legacy samples. @rmanzoni @roger-wolf |
thank you @steggema ! I've added the link to the PR description. |
@@ -526,7 +526,7 @@ | |||
PFTauProducer = cms.InputTag('hpsPFTauProducer'), | |||
Prediscriminants = requireDecayMode.clone(), | |||
loadMVAfromDB = cms.bool(True), | |||
mvaName = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2017v1"), | |||
mvaName = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2017v2"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the rationale to keep the same name of the producer with "v1"?
Should it be changed as well to v2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbluj may know better, but we have two things to consider,
a) the list of input variables, which I think can be considered to be represented by the tag "MVArun2v1DBoldDMwLT" and then resolved to "mvaOpt = cms.string("DBoldDMwLTwGJ")"
b) the version of the training, which is 2017v2
So I think it may be fine to keep the version as Sebastian has it. OTOH, maybe a case can be made for the training tag (2017v2) representing everything, but that may need to lead to even larger overall changes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm what @steggema wrote. Name of module in the python configuration represents version of MVA, i.e. list of inputs, while name of payloads stands for a given training of the MVA.
From practical perspective, it is also useful to keep names of modules and change only names of payloads as it reduces number of modifications (in terms of no. of modified lines and files).
BTW, when set of produces has been switched from original training with 2015 MC (w/o specified version in names of payload) to training with early 2017 MC (2017v1) names of producers were kept unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back to CMSSW_7_6_2, when this naming pattern was introduced, the previous payloads had the version of the MVA in the name (or at least it seemed like it).
Please share a pointer to a past example of a different case of ID where the payload does not contain the "vX" of the MVA in its name, just to be clear of a somewhat established practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR #21022 the 2017v1
yet, it was still a "v1".
I just wanted to understand if this "vX" is becoming more formalized just now.
Perhaps for consistency of future changes you can make a note in some tau POG area to have that as a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean I agree that some version bookkeeping will be useful for final users who usually know name of a producer or/and access name within PAT (and a release version), but not a name of a related payload. It is up to conveners to decide if it makes sense to change name of producer or it will be enough to provide a table in Tau POG recommendation TWiki.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I can wait until Thursday to signoff on this PR (considering the holiday tomorrow).
I hope that by then the decision to keep the current naming or to change it can be made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steggema and I agreed to keep the current naming.
The reason is not to force analysts to adjust their code now while we expect to move to recommend a newer DNN-based Tau ID sometime soon, which will demand code changes.
With that, we’ll have a chance to start right away with a more coherent version naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will sign off, but some better formalized naming would be quite helpful.
Without being aware of this discussion and digging into the git history, it would appear more and more that the configuration may be buggy.
@swozniewski I has been not able to check the PR earlier, but it looks that DBnewDMwLT and DBdR03oldDMwLT are not migrated to the 2017v2 training (whose are not present for 2017v1). Could you confirm that this migration is considered, please? And sorry for the late hint. |
@mbluj I was not aware that there again is a training in 2017v2 in contrast to 2017v1 for these two IDs. I will add this now and create a PR to cms-tau-pog:CMSSW_10_6_X_tau-pog_MVA2017v2, where you can check it. I assume that the naming scheme is the same. |
The tests are being triggered in jenkins. |
The second commit addresses Michals comment that 2017v2 is also available for DBnewDMwLT, DBdR03oldDMwLT and updates them. The raw output has been compared to the existing values in NanoAOD on some test events. |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
Switch from MVA2017v1 to MVA2017v2 tau ID in AOD and MiniAOD following the recommendation in https://indico.cern.ch/event/810741/contributions/3384093/attachments/1827349/2991114/TauID_CMSweek_10042019.pdf#page=3
PR validation:
Matrix tests passed.
Output of raw value has been compared to corresponding NanoAOD sample where MVA2017v2 is already available.
According differences in raw and WP values of IsolationMVArun2v1DBoldDMwLT are expected.