New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forward compression settings if fast cloning is disabled on first file of a merge #36287
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36287/26989
|
A new Pull Request was created by @dan131riley (Dan Riley) for master. It involves the following packages:
@perrotta, @smuzaffar, @Dr15Jones, @makortel, @cmsbuild, @qliphy, @fabiocos, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@cmsbuild, please test |
@@ -74,6 +74,8 @@ def mergeProcess(*inputFiles, **options): | |||
process.add_(Service("InitRootHandlers", EnableIMT = CfgTypes.untracked.bool(False))) | |||
else: | |||
outMod = OutputModule("PoolOutputModule") | |||
outMod.mergeJob = CfgTypes.untracked.bool(True) | |||
outMod.eventAuxiliaryBasketSize = CfgTypes.untracked.int32(2*1024*1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the basket size for the EventAuxiliary branch is bumped in order to improve the data layout for sites with large block sizes.
Should we think of increasing this by default too? Or are the benefits more or less specific to outputs of merge jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is mostly specific to the merge jobs, we generally don't have single jobs producing 20GB files, and we don't fast-clone the EventAuxiliary branch, so it should be sufficient to do this just for merges.
desc.addUntracked("mergeJob", false) | ||
->setComment( | ||
"If set to true and fast copying is disabled, copy input file compression and basket sizes to the output " | ||
"file."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking if more descriptive name would be clearer (e.g. along monstrous useCompressionBasketSizeDefaultsFromInput
)?
Or do you see the usage more of "now we know the job is a merge job, and can do various optimizations based on that"? (the code hints a bit towards that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I think there's more that could be done to optimize merge jobs where fast-cloning isn't possible, so I'd rather make this a hint to the output module, not a specific directive.
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-251054/20840/summary.html Comparison SummarySummary:
|
+1 |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged. |
PR description:
This PR addresses some issues found wrt memory usage at the tier-0 merge jobs:
https://hypernews.cern.ch/HyperNews/CMS/get/tier0-Ops/2304.html
It was observed that in merge jobs where fast cloning was disabled, the output file was being written with the default (zlib) compression and sub-optimal basket sizes. In this PR, if this is a merge job and fast cloning is disabled for this first file, then compression and basket sizes are forwarded to the output file. Also, the basket size for the EventAuxiliary branch is bumped in order to improve the data layout for sites with large block sizes.
This PR will actually make the merge job memory consumption worse, but will improve the compression and layout of the output files.
PR validation:
Tests are ok, changes are purely technical in the setting of various ROOT IO parameters.