New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Override split level in miniaod #19680
Conversation
A new Pull Request was created by @arizzi for master. It involves the following packages: Configuration/Applications @perrotta, @monttj, @cmsbuild, @franzoni, @slava77, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
The tests are being triggered in jenkins. |
+1 The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
output.fastCloning= cms.untracked.bool(False) | ||
output.overrideInputFileSplitLevels = cms.untracked.bool(True) | ||
from PhysicsTools.PatAlgos.slimming.miniAOD_tools import miniAOD_customizeOutput | ||
miniAOD_customizeOutput(output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it another case of inconsistent indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is interesting that the python intepreter understands things correctly while github doesn't
MiniAODOverrideBranchesSplitLevel = cms.untracked.VPSet( [ | ||
cms.untracked.PSet(branch = cms.untracked.string("patMuons_slimmedMuons__*"),splitLevel=cms.untracked.int32(0)), | ||
cms.untracked.PSet(branch = cms.untracked.string("patElectrons_slimmedElectrons__*"),splitLevel=cms.untracked.int32(0)), | ||
cms.untracked.PSet(branch = cms.untracked.string("patTaus_slimmedTaus__*"),splitLevel=cms.untracked.int32(0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the list of branches here substantially different from what's in MicroEventContent
?
Are there any branches that have to stay in the full split or is the real intent of the changes here to make everything in miniAOD to be splitLevel 0?
It looks like most of the miniAOD branches in this PR are set to use the no-split mode.
It seems more practical to use the output module splitLevel parameter to set the default and then override a few branches to have the full split (like packedCandidates),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some reduced verbosity would be nice, especially for readability.
E.g.
def split0PSet(name):
return cms.untracked.PSet(branch = cms.untracked.string(name),splitLevel=cms.untracked.int32(0))
MiniAODOverrideBranchesSplitLevel = cms.untracked.VPSet( [
split0PSet("patMuons_slimmedMuons__*"),
split0PSet("patElectrons_slimmedElectrons__*")
Yes, there are branches that have to stay in full split, and in fact we got better performance with everything in full split than with everything in no-split (that's the reason we use full split in first place!) Concerning what is "cheaper" to do... there are 40 branches out of 80 to keep in full split and 40 to have un-split... so there is no cheaper choice is exactly the same (so given that full-split for all is better than no split for all, I'd keep default-full-split and white list the one to not-split) |
OTOH, if one would restrict only to the branches that show some significant difference with full-split / no-split we could reduce the list to ~10 branches (and in that case is probably cheaper, in term of number of python lines, to default no-split and override only a few). PS: I was expecting more saving when running on less events (1k) that is more "production like" given that production uses fastmerge... but this is not what I see. I'll do more tests. |
cms.untracked.PSet(branch = cms.untracked.string("patPackedCandidates_lostTracks_eleTracks_*"),splitLevel=cms.untracked.int32(0)), | ||
cms.untracked.PSet(branch = cms.untracked.string("recoConversions_reducedEgamma_reducedSingleLegConversions_*"),splitLevel=cms.untracked.int32(0)), | ||
cms.untracked.PSet(branch = cms.untracked.string("recoGsfElectronCores_reducedEgamma_reducedGedGsfElectronCores_*"),splitLevel=cms.untracked.int32(0)), | ||
cms.untracked.PSet(branch = cms.untracked.string("recoPhotonCores_reducedEgamma_reducedGedPhotonCores_*"),splitLevel=cms.untracked.int32(0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about reducedOOT stuff?
cms.untracked.PSet(branch = cms.untracked.string("l1extraL1JetParticles_l1extraParticles_Central_*"),splitLevel=cms.untracked.int32(0)), | ||
cms.untracked.PSet(branch = cms.untracked.string("l1extraL1EmParticles_l1extraParticles_Isolated_*"),splitLevel=cms.untracked.int32(0)), | ||
cms.untracked.PSet(branch = cms.untracked.string("l1extraL1MuonParticles_l1extraParticles__*"),splitLevel=cms.untracked.int32(0)), | ||
cms.untracked.PSet(branch = cms.untracked.string("l1extraL1JetParticles_l1extraParticles_Tau_*"),splitLevel=cms.untracked.int32(0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how permissive is this input for wildcards?
Can we have just one *_l1extraParticles_*_*
?
-1 Tested at: 2312220 The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: You can see the results of the tests here: I found follow errors while testing this PR Failed tests: RelVals
When I ran the RelVals I found an error in the following worklfows: runTheMatrix-results/136.788_RunSinglePh2017B+RunSinglePh2017B+HLTDR2_2017+RECODR2_2017reHLT_skimSinglePh_Prompt+HARVEST2017/step3_RunSinglePh2017B+RunSinglePh2017B+HLTDR2_2017+RECODR2_2017reHLT_skimSinglePh_Prompt+HARVEST2017.log The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: |
Comparison not run due to runTheMatrix errors (RelVals and Igprof tests were also skipped) |
@cmsbuild please test the last few two failed with what seems like some random timeout or similar kill. |
The tests are being triggered in jenkins. |
+1 The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
+1
Some of the largest reductions (above 0.5% of the output file size)
|
@davidlange6 @monttj @franzoni |
merge |
This PR implements a dedicated list of what to split/not-to-split for MiniAOD.
Different branches can be optimal with full split or no-split depending on the number of items per event and/or the number of empty (zeroed) datamembers (sub-branches).
The list is obtained simply comparing what takes more or less space running with full split vs no-split.
Savings of 2.6kb/ev on ttbar with PU (out of ~45kb/ev) are expected