-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use lzma/l4 for raw data #28109
Use lzma/l4 for raw data #28109
Conversation
please test
… On Oct 2, 2019, at 6:34 PM, David Lange ***@***.***> wrote:
PR description:
Proposal to use LZMA level 4 to compress the RAW data tier. Tests on a double muon raw file from run 324970 (containing the highest lumi part of the run) shows 15% reduction in file size at the cost of a 0.25 seconds/event extra time writing (repack noticeably slower) and 0.05 seconds/event extra reading (eg, 5x more read back overhead)
It will be interesting to test also standard when it is part of root (6.20 or 6.22 it seems), but using LZMA appears to be a much better usage of storage vs write/read resources given our usual usage of RAW data.
Searches of GitHub suggest we have only rediscovered this change as a way to gain in RAW data size with minimal cost oil CPU, so perhaps there is a good reason not to do it.
You can view, comment on, or merge this pull request online at:
#28109
Commit Summary
• add lzma/l4 for raw data
File Changes
• M Configuration/EventContent/python/EventContent_cff.py (2)
Patch Links:
• https://github.com/cms-sw/cmssw/pull/28109.patch
• https://github.com/cms-sw/cmssw/pull/28109.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28109/12116
|
The tests are being triggered in jenkins. |
A new Pull Request was created by @davidlange6 (David Lange) for master. It involves the following packages: Configuration/EventContent @cmsbuild, @franzoni, @fabiocos, @kpedro88, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
+operations the information about the compression algorithm used for the data tier is untracked, but effectively stored in the TFile through https://cmssdt.cern.ch/lxr/source/IOPool/Output/src/RootOutputFile.cc#0120 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
Proposal to use LZMA level 4 to compress the RAW data tier. Tests on a double muon raw file from run 324970 (containing the highest lumi part of the run) shows 15% reduction in file size at the cost of a 0.25 seconds/event extra time writing (repack noticeably slower) and 0.05 seconds/event extra reading (eg, 5x more read back overhead)
It will be interesting to test also standard when it is part of root (6.20 or 6.22 it seems), but using LZMA appears to be a much better usage of storage vs write/read resources given our usual usage of RAW data.
Searches of GitHub suggest we have only rediscovered this change as a way to gain in RAW data size with minimal cost oil CPU, so perhaps there is a good reason not to do it.