Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add boosted taus to NanoAOD #33528

Merged

Conversation

swozniewski
Copy link
Contributor

@swozniewski swozniewski commented Apr 26, 2021

PR description:

Adds quantities for boosted taus to nanoAOD.
For this, a new cff-file similar to the one of std taus is added, but with less content.
Sequences are modified such that boosted taus are included by default, but not for modifiers corresponding to past productions.
Histograms are added to NanoAOD DQM

Using 1000evts of a RelValTTbar sample:

  • CPU time of nanoAOD step increases by ~1%
  • nanoAOD filesize increases by 1.2%

To be backported to 10_6_X targeting nanoAOD v9.

PR validation:

  • checked that new quantities are written out
  • code/format checks passed, unit tests passed, limited matrix tests passed

* Merge pull request cms-sw#33150 from cms-tau-pog/CMSSW_11_3_X_tau-pog_tauIDtoolsDev

Updates to tauID python tool

* Initial working commit of boosted taus in CMSSW_11_3 NanoAOD

* Remove 2015 anti-E for 10_6v2 era compatibility in boosted taus

* Initial working commit of boosted taus in CMSSW_11_3 NanoAOD

* Remove 2015 anti-E for 10_6v2 era compatibility in boosted taus

* Remove commented boosted tau nanoAOD code

* Remove main nanoAOD config comments

* Remove leading charged hadronic candidate dxy and dz

* Update boosted tau configuration to remove excess ID variables

* Fix removal of boosted tau vars base

* Remove boosted tau sequences from previous eras

* Remove redundant decay mode information

* some polishing

* change Gen information for boosted taus

* Update nanoDQM for boosted taus

Co-authored-by: cmsbuild <cmsbuild@cern.ch>
Co-authored-by: Andrew Loeliger <aloelige@cern.ch>
Co-authored-by: Andrew David Loeliger <andrew.loeliger@cern.ch>
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33528/22296

@gouskos
Copy link
Contributor

gouskos commented Apr 26, 2021

Hi @swozniewski can you please add the increase in size/evt and processing time/evt?

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @swozniewski for master.

It involves the following packages:

PhysicsTools/NanoAOD

@cmsbuild, @mariadalfonso, @gouskos, @fgolf can you please review it and eventually sign? Thanks.
@gpetruc, @swertz this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@gouskos
Copy link
Contributor

gouskos commented Apr 26, 2021

please test

@@ -371,6 +395,7 @@ def nanoAOD_customizeCommon(process):
addParticleNet=nanoAOD_addDeepInfoAK8_switch.nanoAOD_addParticleNet_switch,
jecPayload=nanoAOD_addDeepInfoAK8_switch.jecPayload)
(run2_nanoAOD_94XMiniAODv1 | run2_nanoAOD_94X2016 | run2_nanoAOD_94XMiniAODv2 | run2_nanoAOD_102Xv1 | run2_nanoAOD_106Xv1).toModify(process, lambda p : nanoAOD_addTauIds(p))
(~(run2_nanoAOD_94XMiniAODv1 | run2_nanoAOD_94X2016 | run2_nanoAOD_94XMiniAODv2 | run2_nanoAOD_102Xv1 | run2_nanoAOD_106Xv1)).toModify(process, lambda p : nanoAOD_addBoostedTauIds(p))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swozniewski in the CMSSW master we put the latest-greatest developments i.e., we do not protect against previews CMSSW versions. Please remove the modifiers here and elsewhere in this file. The modifiers would be need when you make the PR in CMSSW_106X.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gouskos I'm not sure how much we care in master, but as outlined during the meeting, previous miniAOD versions do not provide the correct input for boosted taus. Assuming that 11_X is not used with those, shall I still remove the modifiers? For the backport to 10_6_X, I should keep them, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swagata87 After the discussion in the meeting I understood the situation, which for the sake of completeness and bookeeping let me briefly summarise it here. The first miniAOD version that provides the correct inputs for BoostedTau is the UL miniAOD sent for production Dec 2020.
As it is now, you do not want to include boostedTaus in previous campaigns, right? If yes, then I think is safer to keep this part of the code as it is also for the master.

For 106X yes - you should keep them and add "run2_nanoAOD_devel". [Effectively, you want to always recalculate the inputs if not in "run2_nanoAOD_devel"] So I think it should look like:
(~(run2_nanoAOD_94XMiniAODv1 | run2_nanoAOD_94X2016 | run2_nanoAOD_94XMiniAODv2 | run2_nanoAOD_102Xv1 | run2_nanoAOD_106Xv1) & run2_nanoAOD_devel).toModify(process, lambda p : nanoAOD_addBoostedTauIds(p))
[but needs to be tested]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we think it's only safe to include them from the latest campaign on. Ok, so I'll keep it. Is there sth left for this PR to master (in case I missed sth)?
It's actually not 're'-calculating the inputs. If boosted taus are written out, we also need to run this ID. My feeling is that boosted taus should be produced independently from run2_nanoAOD_devel (but maybe I'm wrong about the purpose of this modifier). Could you please confirm what dependency is envisaged? Then I'll add it to the backport and could also open the PR for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we think it's only safe to include them from the latest campaign on. Ok, so I'll keep it. Is there sth left for this PR to master (in case I missed sth)?

It looks good to me - we just need to run the standard nanoAOD tests, but there is an issue we are trying to solve before testing the PR. I would suggest to move to the 106X PR in the meantime

It's actually not 're'-calculating the inputs. If boosted taus are written out, we also need to run this ID. My feeling is that boosted taus should be produced independently from run2_nanoAOD_devel (but maybe I'm wrong about the purpose of this modifier). Could you please confirm what dependency is envisaged? Then I'll add it to the backport and could also open the PR for it.

Thanks for the clarification! So then what you have there should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a bit delayed by an incompatibility in 10_6_X and I had to include a fix. I'll create the backport PR now.

@@ -371,6 +395,7 @@ def nanoAOD_customizeCommon(process):
addParticleNet=nanoAOD_addDeepInfoAK8_switch.nanoAOD_addParticleNet_switch,
jecPayload=nanoAOD_addDeepInfoAK8_switch.jecPayload)
(run2_nanoAOD_94XMiniAODv1 | run2_nanoAOD_94X2016 | run2_nanoAOD_94XMiniAODv2 | run2_nanoAOD_102Xv1 | run2_nanoAOD_106Xv1).toModify(process, lambda p : nanoAOD_addTauIds(p))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not introduced by you, but why are these modifiers needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, this is to produce IDs on top of miniAOD input, which were not yet part of miniAOD at that time but requested for nanoAOD. Meanwhile, those are obtained from miniAOD directly.


(run2_nanoAOD_92X | run2_miniAOD_80XLegacy | run2_nanoAOD_94X2016 | run2_nanoAOD_94X2016 | \
run2_nanoAOD_94XMiniAODv1 | run2_nanoAOD_94XMiniAODv2 | \
run2_nanoAOD_102Xv1).toReplaceWith(nanoSequenceFS, nanoSequenceFS.copyAndExclude([genVertexTable, genVertexT0Table]))

#remove boosted tau from previous eras
(run2_miniAOD_80XLegacy | run2_nanoAOD_92X | run2_nanoAOD_94XMiniAODv1 | run2_nanoAOD_94X2016 | run2_nanoAOD_94XMiniAODv2 | run2_nanoAOD_102Xv1 | run2_nanoAOD_106Xv1).toReplaceWith(nanoSequenceFS, nanoSequenceFS.copyAndExclude([boostedTauMC]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my other comments about the modifiers [see L. 398]

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b9b754/14587/summary.html
COMMIT: 96962a9
CMSSW: CMSSW_12_0_X_2021-04-26-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/33528/14587/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 38
  • DQMHistoTests: Total histograms compared: 2877605
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2877582
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 8.928 KiB( 37 files compared)
  • DQMHistoSizes: changed ( 1325.81,... ): 4.464 KiB Physics/NanoAODDQM
  • Checked 160 log files, 37 edm output root files, 38 DQM output files
  • TriggerResults: found differences in 7 / 37 workflows

@swozniewski
Copy link
Contributor Author

Hi @swozniewski can you please add the increase in size/evt and processing time/evt?

I've added the numbers to the PR description.

Comment on lines 75 to 108
boostedTausMCMatchLepTauForTable = cms.EDProducer("MCMatcher", # cut on deltaR, deltaPt/Pt; pick best by deltaR
src = boostedTauTable.src, # final reco collection
matched = cms.InputTag("finalGenParticles"), # final mc-truth particle collection
mcPdgId = cms.vint32(11,13), # one or more PDG ID (11 = electron, 13 = muon); absolute values (see below)
checkCharge = cms.bool(False), # True = require RECO and MC objects to have the same charge
mcStatus = cms.vint32(), # PYTHIA status code (1 = stable, 2 = shower, 3 = hard scattering)
maxDeltaR = cms.double(0.3), # Minimum deltaR for the match
maxDPtRel = cms.double(0.5), # Minimum deltaPt/Pt for the match
resolveAmbiguities = cms.bool(True), # Forbid two RECO objects to match to the same GEN object
resolveByMatchQuality = cms.bool(True), # False = just match input in order; True = pick lowest deltaR pair first
)

#This requires genVisTaus in taus_cff.py
boostedTausMCMatchHadTauForTable = cms.EDProducer("MCMatcher", # cut on deltaR, deltaPt/Pt; pick best by deltaR
src = boostedTauTable.src, # final reco collection
matched = cms.InputTag("genVisTaus"), # generator level hadronic tau decays
mcPdgId = cms.vint32(15), # one or more PDG ID (15 = tau); absolute values (see below)
checkCharge = cms.bool(False), # True = require RECO and MC objects to have the same charge
mcStatus = cms.vint32(), # CV: no *not* require certain status code for matching (status code corresponds to decay mode for hadronic tau decays)
maxDeltaR = cms.double(0.3), # Maximum deltaR for the match
maxDPtRel = cms.double(1.), # Maximum deltaPt/Pt for the match
resolveAmbiguities = cms.bool(True), # Forbid two RECO objects to match to the same GEN object
resolveByMatchQuality = cms.bool(True), # False = just match input in order; True = pick lowest deltaR pair first
)

boostedTauMCTable = cms.EDProducer("CandMCMatchTableProducer",
src = boostedTauTable.src,
mcMap = cms.InputTag("boostedTausMCMatchLepTauForTable"),
mcMapVisTau = cms.InputTag("boostedTausMCMatchHadTauForTable"),
objName = boostedTauTable.name,
objType = cms.string("Tau"),
branchName = cms.string("genPart"),
docString = cms.string("MC matching to status==2 taus"),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are all copied from PhysicsTools/NanoAOD/python/taus_cff.py
the only change are the src.

will be better if we clone them w/o reimplementing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sounds reasonable. I will change that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the last commit a also reuse the WPmasks from std taus.
An import that I found to be unnecessary was removed.

Comment on lines +61 to +69
rawMVAoldDM2017v2=Var("tauID('byIsolationMVArun2017v2DBoldDMwLTraw2017')",float, doc="byIsolationMVArun2017v2DBoldDMwLT raw output discriminator (2017v2)",precision=10),
rawMVAnewDM2017v2 = Var("tauID('byIsolationMVArun2017v2DBnewDMwLTraw2017')",float,doc='byIsolationMVArun2017v2DBnewDMwLT raw output discriminator (2017v2)',precision=10),
rawMVAoldDMdR032017v2 = Var("tauID('byIsolationMVArun2017v2DBoldDMdR0p3wLTraw2017')",float,doc='byIsolationMVArun2017v2DBoldDMdR0p3wLT raw output discriminator (2017v2)'),
idMVAnewDM2017v2 = _tauId7WPMask("by%sIsolationMVArun2017v2DBnewDMwLT2017", doc="IsolationMVArun2017v2DBnewDMwLT ID working point (2017v2)"),
idMVAoldDM2017v2=_tauId7WPMask("by%sIsolationMVArun2017v2DBoldDMwLT2017",doc="IsolationMVArun2017v2DBoldDMwLT ID working point (2017v2)"),
idMVAoldDMdR032017v2 = _tauId7WPMask("by%sIsolationMVArun2017v2DBoldDMdR0p3wLT2017",doc="IsolationMVArun2017v2DBoldDMdR0p3wLT ID working point (2017v2)"),
rawAntiEle2018 = Var("tauID('againstElectronMVA6Raw')", float, doc= "Anti-electron MVA discriminator V6 raw output discriminator (2018)", precision=10),
rawAntiEleCat2018 = Var("tauID('againstElectronMVA6category')", int, doc="Anti-electron MVA discriminator V6 category (2018)"),
idAntiEle2018 = _tauId5WPMask("againstElectron%sMVA6", doc= "Anti-electron MVA discriminator V6 (2018)"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tau as redefined in
https://cms-nanoaod-integration.web.cern.ch/integration/33513/mc106Xul17v2_size_report.html#Tau

the v2 variables are associated to tauID("...raw2017v2")
https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/taus_cff.py#L101
while here you only have 2017. Would it be better rawMVAoldDM2017v2--> rawMVAoldDM2017 to avoid any confusion ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input names differ from the standard taus because the toolchain for getting the IDs is not the same. But in fact this is v2 as well (see middle of the name (...MVArun2017v2...)). Therefore the name of the nanoAOD branch is correct.

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33528/22431

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2021

Pull request #33528 was updated. @cmsbuild, @mariadalfonso, @gouskos, @fgolf can you please check and sign again.

@mariadalfonso
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented May 4, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b9b754/14834/summary.html
COMMIT: 05ec716
CMSSW: CMSSW_12_0_X_2021-05-03-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/33528/14834/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 7 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2662646
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2662611
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 8.932 KiB( 36 files compared)
  • DQMHistoSizes: changed ( 1325.81,... ): 4.464 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 155 log files, 37 edm output root files, 37 DQM output files
  • TriggerResults: found differences in 7 / 36 workflows

@gouskos
Copy link
Contributor

gouskos commented May 6, 2021

+xpog

The PR adds the boostedTau collection in nanoAOD.
NanoAOD-integration tests passed: https://gitlab.cern.ch/cms-nanoAOD/nanoAOD-integration/-/issues/77#note_4458840
increase in size and timing in sync with PR description [~1% for both]

@cmsbuild
Copy link
Contributor

cmsbuild commented May 6, 2021

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented May 7, 2021

+1

@cmsbuild cmsbuild merged commit a9c4b93 into cms-sw:master May 7, 2021
cmsbuild added a commit that referenced this pull request May 12, 2021
…W_11_3_X_tau-pog_boostedTaus

Add boosted taus to nanoAOD (backport of #33528)
@mbluj mbluj deleted the CMSSW_11_3_X_tau-pog_boostedTaus branch October 10, 2023 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants