Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch to tau id MVA2017v2 in AOD and MiniAOD #26541

Merged
merged 2 commits into from
May 3, 2019

Conversation

swozniewski
Copy link
Contributor

@swozniewski swozniewski commented Apr 25, 2019

PR description:

Switch from MVA2017v1 to MVA2017v2 tau ID in AOD and MiniAOD following the recommendation in https://indico.cern.ch/event/810741/contributions/3384093/attachments/1827349/2991114/TauID_CMSweek_10042019.pdf#page=3

PR validation:

Matrix tests passed.
Output of raw value has been compared to corresponding NanoAOD sample where MVA2017v2 is already available.

According differences in raw and WP values of IsolationMVArun2v1DBoldDMwLT are expected.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-26541/9428

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @swozniewski for master.

It involves the following packages:

RecoTauTag/Configuration

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Apr 25, 2019

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 25, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/34351/console Started: 2019/04/25 22:13

@slava77
Copy link
Contributor

slava77 commented Apr 25, 2019

@swozniewski
please add a link to some slides shown in a TAU POG (sub)group meeting to the PR description to document the changes.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-26541/34351/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 187 differences found in the comparisons
  • DQMHistoTests: Total files compared: 33
  • DQMHistoTests: Total histograms compared: 3211964
  • DQMHistoTests: Total failures: 1607
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3210153
  • DQMHistoTests: Total skipped: 204
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
  • Checked 137 log files, 14 edm output root files, 33 DQM output files

@steggema
Copy link
Contributor

@swozniewski
please add a link to some slides shown in a TAU POG (sub)group meeting to the PR description to document the changes.

@slava77
We have recommended the 2017v2 discriminator for a while now. The most recent set of slides on the recommendations is here: https://indico.cern.ch/event/810741/contributions/3384093/attachments/1827349/2991114/TauID_CMSweek_10042019.pdf#page=3

The main purpose of this PR is to have this discriminator also available by default in all MiniAOD to have the most useful baseline for all ultra-legacy analyses and possible new trainings that will be performed with the ultra-legacy samples. @rmanzoni @roger-wolf

@swozniewski
Copy link
Contributor Author

thank you @steggema ! I've added the link to the PR description.

@@ -526,7 +526,7 @@
PFTauProducer = cms.InputTag('hpsPFTauProducer'),
Prediscriminants = requireDecayMode.clone(),
loadMVAfromDB = cms.bool(True),
mvaName = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2017v1"),
mvaName = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2017v2"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rationale to keep the same name of the producer with "v1"?
Should it be changed as well to v2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the impression that this version number is not really used because the year does not even show up, so v1 would not stand for 2017v1. But the naming history goes beyond my scope. @steggema @rmanzoni @mbluj could you comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbluj may know better, but we have two things to consider,

a) the list of input variables, which I think can be considered to be represented by the tag "MVArun2v1DBoldDMwLT" and then resolved to "mvaOpt = cms.string("DBoldDMwLTwGJ")"
b) the version of the training, which is 2017v2

So I think it may be fine to keep the version as Sebastian has it. OTOH, maybe a case can be made for the training tag (2017v2) representing everything, but that may need to lead to even larger overall changes...

Copy link
Contributor

@mbluj mbluj Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm what @steggema wrote. Name of module in the python configuration represents version of MVA, i.e. list of inputs, while name of payloads stands for a given training of the MVA.
From practical perspective, it is also useful to keep names of modules and change only names of payloads as it reduces number of modifications (in terms of no. of modified lines and files).
BTW, when set of produces has been switched from original training with 2015 MC (w/o specified version in names of payload) to training with early 2017 MC (2017v1) names of producers were kept unchanged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back to CMSSW_7_6_2, when this naming pattern was introduced, the previous payloads had the version of the MVA in the name (or at least it seemed like it).
Please share a pointer to a past example of a different case of ID where the payload does not contain the "vX" of the MVA in its name, just to be clear of a somewhat established practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR #21022 the 2017v1

yet, it was still a "v1".
I just wanted to understand if this "vX" is becoming more formalized just now.
Perhaps for consistency of future changes you can make a note in some tau POG area to have that as a reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean I agree that some version bookkeeping will be useful for final users who usually know name of a producer or/and access name within PAT (and a release version), but not a name of a related payload. It is up to conveners to decide if it makes sense to change name of producer or it will be enough to provide a table in Tau POG recommendation TWiki.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can wait until Thursday to signoff on this PR (considering the holiday tomorrow).
I hope that by then the decision to keep the current naming or to change it can be made.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steggema and I agreed to keep the current naming.

The reason is not to force analysts to adjust their code now while we expect to move to recommend a newer DNN-based Tau ID sometime soon, which will demand code changes.

With that, we’ll have a chance to start right away with a more coherent version naming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will sign off, but some better formalized naming would be quite helpful.
Without being aware of this discussion and digging into the git history, it would appear more and more that the configuration may be buggy.

@mbluj
Copy link
Contributor

mbluj commented Apr 30, 2019

@swozniewski I has been not able to check the PR earlier, but it looks that DBnewDMwLT and DBdR03oldDMwLT are not migrated to the 2017v2 training (whose are not present for 2017v1). Could you confirm that this migration is considered, please? And sorry for the late hint.

@swozniewski
Copy link
Contributor Author

@mbluj I was not aware that there again is a training in 2017v2 in contrast to 2017v1 for these two IDs. I will add this now and create a PR to cms-tau-pog:CMSSW_10_6_X_tau-pog_MVA2017v2, where you can check it. I assume that the naming scheme is the same.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 30, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/34426/console Started: 2019/04/30 17:12

@cmsbuild
Copy link
Contributor

Pull request #26541 was updated. @perrotta, @cmsbuild, @slava77 can you please check and sign again.

@swozniewski
Copy link
Contributor Author

The second commit addresses Michals comment that 2017v2 is also available for DBnewDMwLT, DBdR03oldDMwLT and updates them. The raw output has been compared to the existing values in NanoAOD on some test events.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-26541/34426/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 590 differences found in the comparisons
  • DQMHistoTests: Total files compared: 33
  • DQMHistoTests: Total histograms compared: 3211964
  • DQMHistoTests: Total failures: 3531
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3208229
  • DQMHistoTests: Total skipped: 204
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
  • Checked 137 log files, 14 edm output root files, 33 DQM output files

@slava77
Copy link
Contributor

slava77 commented May 2, 2019

+1

for #26541 b11ce72

  • code changes are in line with the PR description and the follow up review
  • jenkins tests pass and comparisons with the baseline show changes only in tau MVA and ID variables
    • changes start from reco workflows in hpsPFTauDiscriminationByIsolationMVArun2v1DBoldDMwLTraw, hpsPFTauDiscriminationByIsolationMVArun2v1DBdR03oldDMwLTraw, and hpsPFTauDiscriminationByIsolationMVArun2v1DBnewDMwLTraw raw discriminant values
    • the differences then propagate to the tau IDs, including the miniAOD and nanoAOD outputs in cases where the corresponding tau "reco" MVA tag modules are running in the tested workflow.

The performance is changing somewhat moderately, e.g. in 250202.181 (ttbar with PU 2018 pmx setup)
wf250202 181_tau_summary

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2019

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

fabiocos commented May 3, 2019

@peruzzim @fgolf FYI

@fabiocos
Copy link
Contributor

fabiocos commented May 3, 2019

+1

@cmsbuild cmsbuild merged commit 2413cca into cms-sw:master May 3, 2019
@mbluj mbluj deleted the CMSSW_10_6_X_tau-pog_MVA2017v2 branch October 10, 2023 10:15
@mbluj mbluj restored the CMSSW_10_6_X_tau-pog_MVA2017v2 branch October 10, 2023 10:16
@mbluj mbluj deleted the CMSSW_10_6_X_tau-pog_MVA2017v2 branch October 10, 2023 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants