Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tau anti electron discriminators in RECO and MiniAOD #27465

Merged

Conversation

swozniewski
Copy link
Contributor

PR description:

This PR updates the version of the tau AntiElectronDiscriminator in RECO and miniAOD to the latest 2018 version. This is to have the latest tau discriminator versions ready for a coming integration of tau ID versions into global tag.
The NanoAOD config is updated such that 2018 version is taken from PAT taus and the old 2015 version is recalculated. For already existing input samples from former productions a switch is introduced to do it the other way round (as it was before this PR).

PR validation:

Reco and PAT step were run on a test set and output compared to corresponding NanoAOD sample that already contains both versions of AntiEleDiscriminator. NanoAOD step was also run on a test set with different config eras and output checked.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2019

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2019

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-27465/10792

  • This PR adds an extra 28KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2019

A new Pull Request was created by @swozniewski for master.

It involves the following packages:

PhysicsTools/NanoAOD
RecoTauTag/Configuration
RecoTauTag/RecoTau

@perrotta, @cmsbuild, @fgolf, @slava77, @santocch, @peruzzim can you please review it and eventually sign? Thanks.
@gpetruc this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Jul 9, 2019

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/1374/console Started: 2019/07/09 17:36

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2019

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2019

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2019

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d5a518/1374/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 218 differences found in the comparisons
  • DQMHistoTests: Total files compared: 33
  • DQMHistoTests: Total histograms compared: 3160399
  • DQMHistoTests: Total failures: 1146
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3158919
  • DQMHistoTests: Total skipped: 334
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
  • Checked 137 log files, 14 edm output root files, 33 DQM output files

Copy link
Contributor

@slava77 slava77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implementations to change WPeffXX to WPeffYY appear to be extremely verbose.
Unless I missed some irregularity in the pattern, it seems more practical to compactify

hpsPFTauDiscriminationByMVA6LooseElectronRejection.mapping[5].cut = cms.string("RecoTauTag_antiElectronMVA6v1_gbr_NoEleMatch_wGwoGSF_EC_WPEff96")
hpsPFTauDiscriminationByMVA6LooseElectronRejection.mapping[6].cut = cms.string("RecoTauTag_antiElectronMVA6v1_gbr_woGwGSF_EC_WPEff96")
hpsPFTauDiscriminationByMVA6LooseElectronRejection.mapping[7].cut = cms.string("RecoTauTag_antiElectronMVA6v1_gbr_wGwGSF_EC_WPEff96")
hpsPFTauDiscriminationByMVA6LooseElectronRejection.mapping[0].cut = cms.string("RecoTauTag_antiElectronMVA6v3_noeveto_gbr_NoEleMatch_woGwoGSF_BL_WPeff90")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't all of this be rolled up in a simple loop? smth like

for m in hpsPFTauDiscriminationByMVA6LooseElectronRejection.mapping : 
    m.cut = m.cut.replace("WPeff98", "WPeff90")

)
# Loose
patTauDiscriminationByLooseElectronRejectionMVA62015 = patTauDiscriminationByVLooseElectronRejectionMVA62015.clone(
mapping = cms.VPSet(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't all of the below be rolled up in a simple loop? smth like

patTauDiscriminationByLooseElectronRejectionMVA62015 = patTauDiscriminationByVLooseElectronRejectionMVA62015.clone()
for m in patTauDiscriminationByLooseElectronRejectionMVA62015.mapping : 
    m.cut = m.cut.replace("WPeff99", "WPeff96")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, why not. Update follows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cms.string does not seem to support replace or conversion to python string. Do you have an idea, how to do this in an elegant way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it gets too ugly, we can also wait for the new tau discriminator format, where these WPs and categories will be separated in the configurations and then 'multiplied' by the module. https://github.com/swozniewski/cmssw/blob/restructure_PFTauDiscriminators_v3/RecoTauTag/Configuration/python/HPSPFTaus_cff.py#L320-L376

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@swozniewski swozniewski Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint, this could be used! However, this function puts the string in single quotes, which I have to remove:
m.cut = m.cut.pythonValue().strip("'").replace("WPeff98", "WPeff90")
Does not look very nice but would work...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also a .configValue return.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use m.cut.value() which returns the held python string. Remember, you can use python to look at any of the classes to find what methods they have

bash>python
>>> import FWCore.ParameterSet.Config as cms
>>> help(cms.string)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks good

@mbluj
Copy link
Contributor

mbluj commented Jul 31, 2019

Hello Marco,
I think that there is some misunderstanding, let try to explain what was done and then answer your comments.
What this PR does is an update of anti-electron tauID to the newer training in AOD and hence in MiniAOD and NanoAOD which base on AOD. But, the anti-electron tauID with the new training has been already run on top of MiniAOD and stored to NanoAOD, Therefore, to keep content of NanoAOD unchanged when produced with new inputs it was mandatory to swap new and old anti-e tauID in NanoAOD sequences, i.e. read new from MiniAOD and produce old on top of it. For old eras noting is changed.

1. We have to really try to limit this kind of PR, that change things "in cascade" through several data formats. It's very difficult to review. Are the changes to nanoAOD only needed for future miniAOD productions, or are you also touching the output on past productions?

I think it is impossible to avoid such "cascade" PRs if changes are introduced to any pre-Nano level in case they were already intdouced to NanoAOD. I think it happens because release cycle of Nano is shorther than for AOD and MiniAOD.
Answering your question: the changes do not modify NanoAOD content regardless if it is produced on top of new MiniAOD samples with >=110X or old ones with 94X/102X as long as appropriate era modifiers are used.

2. What do you plan to do for 102X?

Nothing, i.e. no backport. It is as changes in AOD/MiniAOD are not foreseen/allowed for closed releases.

Finally, Tau POG plans to clear NanoAOD from outdated Run-2 tauIDs, but it is beyond scope of this PR (and will be performed via NanoAOD repo)

@peruzzim
Copy link
Contributor

Thanks for clarifying. I am going to test this and then sign, but I don't foresee any issues.

@slava77
Copy link
Contributor

slava77 commented Aug 5, 2019

Thanks for clarifying. I am going to test this and then sign, but I don't foresee any issues.

@peruzzim
please clarify on the status of your checks, or perhaps sign if all is OK.
Thank you.

@fabiocos
Copy link
Contributor

@peruzzim any news about this?

@slava77
Copy link
Contributor

slava77 commented Aug 19, 2019

@peruzzim any news about this?

ping

@cmsbuild cmsbuild mentioned this pull request Aug 19, 2019
@peruzzim
Copy link
Contributor

+xpog

I have not detected changes that would break the existing setup, the status of discriminants stored in nanoAOD for different eras will be rediscussed for the future when x-checking the sync between 11_0 and previous releases.

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@kpedro88
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 808a7f6 into cms-sw:master Aug 22, 2019
@fabiocos
Copy link
Contributor

@mbluj @swozniewski @slava77 @perrotta @peruzzim the PPD/PdmV validation report about CMSSW_11_0_0_pre7 gave positive feedback on this PR. Is its backport to 10_6_X forseen? With protections like those added in #27882?

@mbluj
Copy link
Contributor

mbluj commented Sep 20, 2019

@mbluj @swozniewski @slava77 @perrotta @peruzzim the PPD/PdmV validation report about CMSSW_11_0_0_pre7 gave positive feedback on this PR. Is its backport to 10_6_X forseen? With protections like those added in #27882?

No, backport of this is not expected. There are two reasons: the main one is the fact that DeepTauID overperforms all current BDT-based TauIDs (including anti-e 2018) so it is going to be the main POG recomended set of TauID, the other reason is that anti-e MVA 2018 is already available for users via NanoAOD (or can be run on top of MiniAOD by whose are not using NanoAOD).

@swozniewski swozniewski deleted the CMSSW_11_0_X_tau-pog_updateAntiEleDiscr branch June 19, 2020 13:22
@slava77
Copy link
Contributor

slava77 commented Jul 31, 2020

@mbluj @swozniewski @slava77 @perrotta @peruzzim the PPD/PdmV validation report about CMSSW_11_0_0_pre7 gave positive feedback on this PR. Is its backport to 10_6_X forseen? With protections like those added in #27882?

No, backport of this is not expected. There are two reasons: the main one is the fact that DeepTauID overperforms all current BDT-based TauIDs (including anti-e 2018) so it is going to be the main POG recomended set of TauID, the other reason is that anti-e MVA 2018 is already available for users via NanoAOD (or can be run on top of MiniAOD by whose are not using NanoAOD).

I'm curious if this expectation changed since Sep 20 of last year.
While investigating differences between the variables in re-miniAOD of UL (with features collected in #27889) I noticed that againstElectron*MVA6 discriminants differ the most

@mbluj
Copy link
Contributor

mbluj commented Aug 4, 2020

@mbluj @swozniewski @slava77 @perrotta @peruzzim the PPD/PdmV validation report about CMSSW_11_0_0_pre7 gave positive feedback on this PR. Is its backport to 10_6_X forseen? With protections like those added in #27882?

No, backport of this is not expected. There are two reasons: the main one is the fact that DeepTauID overperforms all current BDT-based TauIDs (including anti-e 2018) so it is going to be the main POG recomended set of TauID, the other reason is that anti-e MVA 2018 is already available for users via NanoAOD (or can be run on top of MiniAOD by whose are not using NanoAOD).

I'm curious if this expectation changed since Sep 20 of last year.
While investigating differences between the variables in re-miniAOD of UL (with features collected in #27889) I noticed that againstElectron*MVA6 discriminants differ the most

I think that the issue is related with the fact that for some time payloads of BDT-based tauIDs (including againstElectron*MVA6 ones) in 11XY series are being mapped via a GT and the newest training is used in this mapping. In 106X series mappig is set by python config and newest training is used for IsolationMVArun2* discriminants, but not againstElectron*MVA6 ones. To agree behavior of re-miniAOD of UL with 106X and 11XY one needs rerun the againstElectron*MVA6 discriminants with the newest training (and adjust NanoAOD) with 106X - can be done if requested, but an effort needed.

@slava77
Copy link
Contributor

slava77 commented Aug 4, 2020

I think that the issue is related with the fact that for some time payloads of BDT-based tauIDs (including againstElectron*MVA6 ones) in 11XY series are being mapped via a GT and the newest training is used in this mapping. In 106X series mappig is set by python config and newest training is used for IsolationMVArun2* discriminants, but not againstElectron*MVA6 ones. To agree behavior of re-miniAOD of UL with 106X and 11XY one needs rerun the againstElectron*MVA6 discriminants with the newest training (and adjust NanoAOD) with 106X - can be done if requested, but an effort needed.

Thank you for clarifying that the main difference is in the payloads.
This means that an update for UL workflows can be rather straightforward (configuration/GT switch).
Would any analysis using UL re-miniAOD be recommended to use the againstElectron*MVA6 ?
If so, then an update is in order. If these are instead unused tags computed just in case, then it may be fine to leave them without an update.

@mbluj
Copy link
Contributor

mbluj commented Aug 5, 2020

I think that the issue is related with the fact that for some time payloads of BDT-based tauIDs (including againstElectron*MVA6 ones) in 11XY series are being mapped via a GT and the newest training is used in this mapping. In 106X series mappig is set by python config and newest training is used for IsolationMVArun2* discriminants, but not againstElectron*MVA6 ones. To agree behavior of re-miniAOD of UL with 106X and 11XY one needs rerun the againstElectron*MVA6 discriminants with the newest training (and adjust NanoAOD) with 106X - can be done if requested, but an effort needed.

Thank you for clarifying that the main difference is in the payloads.
This means that an update for UL workflows can be rather straightforward (configuration/GT switch).
Would any analysis using UL re-miniAOD be recommended to use the againstElectron*MVA6 ?
If so, then an update is in order. If these are instead unused tags computed just in case, then it may be fine to leave them without an update.

After some thinking, we find that it makes sense to update againstElectron*MVA6 to most recent training also for UL re-miniAOD with 10_6_X (at least for coherency with 11_X_Y). A related PR will be prepared soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants