Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to MiniAOD to satisfy JME/SMP/B2G requests for NanoAOD #20638

Merged
merged 7 commits into from Oct 5, 2017

Conversation

rappoccio
Copy link
Contributor

@rappoccio rappoccio commented Sep 23, 2017

JME / SMP / B2G preparation for the nanoAOD.

  1. Added soft-dropped subjets from gen jets.

This is for nanoAOD to be fully usable for substructure analyses. The CPU time added is negligible (<1 ms/event) because I am not computing the areas for the gen jets (not needed). I also preclustered them with AK8. There is expected to be a small increase in size.

The preclustering required more code than I intended, since, to first order, this is merely a configuration change. However many dictionaries were missing to do the preclustering on GenJets, and the VirtualJetProducer could not understand the vector<FwdPtr<GenParticle>>.

  1. Adding 2 AK8 jets with lower pt for SMP-J measurements. We now have:
  • all AK8 PF jets with pt > 100 GeV with b-tagging information, (New: lower pt threshold)
    • if pt > 170 GeV, add grooming, tau1-tau4, and N2 (add tau4)
    • if pt > 250 GeV, add N3 (unchanged in this PR)
  • 3 AK8 jets with pt > 30 with only b-tagging information, no grooming nor substructure. (New)
  • all AK8 Gen jets with pt > 120 GeV with grooming (As in 1 from this PR).
  • 3 AK8 Gen jets with pt > 10 with no grooming nor substructure (New).

This is expected to add 3 AK8 jets and 3 AK8 genjets per event.

@arizzi, @gpetruc, @clelange also watching.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/PR-20638/933

Code check has found code style and quality issues which could be resolved by applying a patch in https://cmssdt.cern.ch/SDT/code-checks/PR-20638/933/git-diff.patch
e.g. curl https://cmssdt.cern.ch/SDT/code-checks/PR-20638/933/git-diff.patch | patch -p1

You can run scram build code-checks to apply code checks directly (this will soon be required for PRs to be considered)

@slava77
Copy link
Contributor

slava77 commented Sep 24, 2017 via email

@rappoccio
Copy link
Contributor Author

rappoccio commented Sep 24, 2017

Did I just draw the short straw to take care of the code checks from the entire DataFormats/Candidate, /HepMCCandidate and /PatCandidate because I added dictionaries? Ha!

@slava77
Copy link
Contributor

slava77 commented Sep 24, 2017 via email

@rappoccio
Copy link
Contributor Author

Oops, I have a few open. So I don't really have to do all the code checks for the DataFormats? :) Glee!

@rappoccio
Copy link
Contributor Author

It depends on #20637, where the same code checks have failed for the dataformats.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@rappoccio
Copy link
Contributor Author

Nice, it looks like the patch figures out which files you have actually modified and only patches those. That being said, perhaps a central campaign to get those "overrides" implemented is in order.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/PR-20638/934

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @rappoccio for master.

It involves the following packages:

CommonTools/RecoAlgos
DataFormats/HepMCCandidate
DataFormats/PatCandidates
PhysicsTools/PatAlgos
RecoJets/JetProducers

@perrotta, @efeyazgan, @perrozzi, @thuer, @monttj, @cmsbuild, @slava77, @govoni can you please review it and eventually sign? Thanks.
@TaiSakuma, @gouskos, @yslai, @abbiendi, @ahinzmann, @seemasharmafnal, @mmarionncern, @imarches, @makortel, @acaudron, @jhgoh, @jdolen, @ferencek, @rovere, @jdamgov, @nhanvtran, @gkasieczka, @schoef, @JyothsnaKomaragiri, @mverzett, @cbernet, @gpetruc, @mariadalfonso, @pvmulder this is something you requested to watch as well.
@davidlange6, @slava77 you are the release manager for this.

cms-bot commands are listed here

@davidlange6
Copy link
Contributor

please test

@rappoccio
Copy link
Contributor Author

Any other ideas on this? Or are we basically good to go for merging?

@arizzi
Copy link
Contributor

arizzi commented Sep 29, 2017

total size in crease is ~500bytes/ev right?

@rappoccio
Copy link
Contributor Author

For the AK8 jets, right. The total is 800 bytes/ev (other additions for the soft drop subjets for reco and genjets).

@arizzi
Copy link
Contributor

arizzi commented Sep 29, 2017

I start thinking if for the SMP request to have always at least 3 jets, but no details at all on them, isn't better to go for a separate collection with just the needed information as we do for calo jets.
Can you present the various size increases and their use cases in tuesday's XPOG?
I've to admit I'm a bit lost in the various feature additions listed on the top of this thread (and the impact of each of them)

@rappoccio
Copy link
Contributor Author

rappoccio commented Sep 29, 2017

The added information is (with splitlevel=0 for AK8 jets)

RECO Jets:

  • <= 3 AK8 jets with pt > 30 GeV (minimal information)
  • All AK8 jets with 100 < pt < 170 GeV (minimal information) :
  • All AK8 jets with pt > 170 (grooming, subjet information, N2)
  • All AK8 jets with pt > 230 (adding N3 variable)
    ===> These add 0.4 kb/event

Gen Jets:

  • <= 3 AK8 jets with pt > 10 GeV
  • All AK8 jets with 80 < pt < 150 GeV (no grooming)
  • All AK8 jets with pt > 150 GeV (adding grooming AND subjets)
    ===> These add 0.3 k/bevent

Previous configuration:
http://gimli.physics.buffalo.edu/nominal.html

New configuration, all additions implemented, splitlevel=0:
http://gimli.physics.buffalo.edu/SMPJ_2AK8_unsplit.html

@rappoccio
Copy link
Contributor Author

(Also sent this in email to you and Mikko, but here for the overall record).

Having a separate collection is a possibility, but we are already dropping most / all of the information for low pt jets (we keep JECs, a few b-tagging discriminators, and nothing else). The remaining question is whether they compress better in their own collection, but there is also the operational difficulty to consider.

From my perspective, we definitely need the AK8 Gen Jets to correctly do substructure analyses, if we adopt the nanoAOD (otherwise, we are currently running them all ourselves, reducing the overall buy-in of nanoAOD). The additional AK8 RECO jets are worth adding if we can get buy-in from SMPJ and the JME groups (or, at least, JERC). Overall this will hopefully give us the JECs faster. I realize this is something like $40kCHF per year extra, but it is probably worth that to shave 2-3 weeks off the JECs every time (accumulates probably to 1.5-2 months time saving over the year.... this is a competitive advantage).

@slava77
Copy link
Contributor

slava77 commented Sep 29, 2017 via email

@rappoccio
Copy link
Contributor Author

@perrotta
Copy link
Contributor

I also computed CPU with 50 evts of wf 10224 (TTbar with PU: is it the same you are looking at, @rappoccio?), and I can confirm that there is no increase in timing. Perhaps also some gain, from the removal of btagging modules (in particular pfCombinedCvsLJetTagsAK8Puppi):

 removed      -0.00%         0.04 ms/ev ->         0.00 ms/ev pfJetBProbabilityBJetTagsAK8Puppi
 removed      -0.00%         0.12 ms/ev ->         0.00 ms/ev pfInclusiveSecondaryVertexFinderCvsLTagInfosAK8Puppi
   added      +0.00%         0.00 ms/ev ->         0.03 ms/ev slimmedGenJetsAK8SoftDropSubJets
 removed      -0.00%         0.30 ms/ev ->         0.00 ms/ev pfCombinedCvsBJetTagsAK8Puppi
   added      +0.00%         0.00 ms/ev ->         0.05 ms/ev ak8GenJetsNoNuConstituents
 removed      -0.00%         0.02 ms/ev ->         0.00 ms/ev pfTrackCountingHighEffBJetTagsAK8Puppi
 removed      -0.00%         0.05 ms/ev ->         0.00 ms/ev softPFElectronBJetTagsAK8Puppi
 removed      -0.00%         0.02 ms/ev ->         0.00 ms/ev pfJetProbabilityBJetTagsAK8Puppi
 removed      -0.00%         0.02 ms/ev ->         0.00 ms/ev pfSimpleInclusiveSecondaryVertexHighEffBJetTagsAK8Puppi
 removed      -0.00%         0.04 ms/ev ->         0.00 ms/ev softPFMuonBJetTagsAK8Puppi
 removed      -0.00%         0.02 ms/ev ->         0.00 ms/ev pfSimpleSecondaryVertexHighEffBJetTagsAK8Puppi
 removed      -0.01%         0.46 ms/ev ->         0.00 ms/ev pfCombinedCvsLJetTagsAK8Puppi
   added      +0.00%         0.00 ms/ev ->         0.21 ms/ev ak8GenJetsNoNuSoftDrop

In my test the additional event size is a bit bigger than what you report above (miniAOD only, as reco output is unchanged)

272.6 ->      1090.8        818     120.0   1.12     patJets_slimmedJetsAK8__RECO.
  0.0 ->       180.0        180     NEWO   0.25     recoGenJets_slimmedGenJetsAK8SoftDropSubJets__RECO.
102.2 ->       335.1        233     106.5   0.32     recoGenJets_slimmedGenJetsAK8__RECO.

73168 ->       74403       1235             1.7     ALL BRANCHES

However, if the workflow you use is the same, I would rather thrust your 9k events than my current 50 ones for it.

@slava77
Copy link
Contributor

slava77 commented Sep 30, 2017 via email

@perrotta
Copy link
Contributor

@rappoccio
Copy link
Contributor Author

@perrotta I am using WF 25402.0 (TTbar_13+FS_TTbar_13_UP15_PU25+HARVESTUP15FS+MINIAODMCUP15FS) because it had relvals available already.

@perrotta
Copy link
Contributor

perrotta commented Oct 3, 2017

Trying to summarize: the overall event size increase measured in 25402 (by Sal) and wf 10224 (by me) is similar in relative terms: 1.8% in 25402 and 1.7% in 10224.
miniAOD in 10224 has a larger size since the begining (73.2 kB/evt vs 42.7 kB/evt, if I am calculating correctly from the numbers scattered along this thread) and therefore a larger absolute increase in the event size is visible in 10224.

Code-wise this PR is already fine for reco. @rappoccio @arizzi : if I understand correctly, you still need to finalize your discussion. Once you have reached any conclusion, could you please post it here, so that we can decide how to go on with this?

@arizzi
Copy link
Contributor

arizzi commented Oct 3, 2017

from XPOG perspective it is ok to add this, on longer term we may look for strategy to reduce the size but for now it is ok.

@perrotta
Copy link
Contributor

perrotta commented Oct 3, 2017

+1

  • One new product added to miniAOD output (slimmedGenJetsAK8SoftDropSubJets), and new thresholds applied for Ak8 jets
  • Overall cpu needs are kept low by removing unused btagging discriminators
  • There are no changes in the reco output
  • The miniAOD size increase by some 1.7% in the tested workflows with TTbar at high pu: this is deemed acceptable by XPOG (Updates to MiniAOD to satisfy JME/SMP/B2G requests for NanoAOD #20638 (comment)) although some strategy to reduce the size could eventually be put in practice

@dmitrijus
Copy link
Contributor

+1

@davidlange6
Copy link
Contributor

merge

@cmsbuild cmsbuild merged commit 5a475cf into cms-sw:master Oct 5, 2017
@@ -263,6 +263,8 @@ namespace pat {
if (specificJPT_.empty()) throw cms::Exception("Type Mismatch") << "This PAT jet was not made from a JPTJet.\n";
return specificJPT_[0];
}
/// check to see if the PFSpecific object is stored
bool hasPFSpecific() const { return !specificPF_.empty(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this needed?
The implementation appears to be the same as bool isPFJet()

cutLoose = cms.string(""),
nLoose = cms.uint32(0),
clearDaughters = cms.bool(False), #False means rekeying
dropSpecific = cms.bool(True), # Save space
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like extra care now needs to be taken to access things like
patJets_slimmedJetsAK8__reRECO.obj.neutralHadronEnergy()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants