Use lzma/l4 for raw data #28109

davidlange6 · 2019-10-02T16:34:49Z

PR description:

Proposal to use LZMA level 4 to compress the RAW data tier. Tests on a double muon raw file from run 324970 (containing the highest lumi part of the run) shows 15% reduction in file size at the cost of a 0.25 seconds/event extra time writing (repack noticeably slower) and 0.05 seconds/event extra reading (eg, 5x more read back overhead)

It will be interesting to test also standard when it is part of root (6.20 or 6.22 it seems), but using LZMA appears to be a much better usage of storage vs write/read resources given our usual usage of RAW data.

Searches of GitHub suggest we have only rediscovered this change as a way to gain in RAW data size with minimal cost oil CPU, so perhaps there is a good reason not to do it.

davidlange6 · 2019-10-02T16:48:36Z

please test

…

On Oct 2, 2019, at 6:34 PM, David Lange ***@***.***> wrote: PR description: Proposal to use LZMA level 4 to compress the RAW data tier. Tests on a double muon raw file from run 324970 (containing the highest lumi part of the run) shows 15% reduction in file size at the cost of a 0.25 seconds/event extra time writing (repack noticeably slower) and 0.05 seconds/event extra reading (eg, 5x more read back overhead) It will be interesting to test also standard when it is part of root (6.20 or 6.22 it seems), but using LZMA appears to be a much better usage of storage vs write/read resources given our usual usage of RAW data. Searches of GitHub suggest we have only rediscovered this change as a way to gain in RAW data size with minimal cost oil CPU, so perhaps there is a good reason not to do it. You can view, comment on, or merge this pull request online at: #28109 Commit Summary • add lzma/l4 for raw data File Changes • M Configuration/EventContent/python/EventContent_cff.py (2) Patch Links: • https://github.com/cms-sw/cmssw/pull/28109.patch • https://github.com/cms-sw/cmssw/pull/28109.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

cmsbuild · 2019-10-02T18:24:47Z

The code-checks are being triggered in jenkins.

cmsbuild · 2019-10-02T18:31:42Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28109/12116

This PR adds an extra 16KB to repository

cmsbuild · 2019-10-02T18:32:07Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/2762/console Started: 2019/10/02 20:32

cmsbuild · 2019-10-02T18:32:09Z

A new Pull Request was created by @davidlange6 (David Lange) for master.

It involves the following packages:

Configuration/EventContent

@cmsbuild, @franzoni, @fabiocos, @kpedro88, @davidlange6 can you please review it and eventually sign? Thanks.
@Martin-Grunewald this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

cmsbuild · 2019-10-02T21:25:42Z

+1
Tested at: e3607f0
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8f9ffa/2762/summary.html

cmsbuild · 2019-10-02T21:25:45Z

Comparison job queued.

cmsbuild · 2019-10-02T23:29:03Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8f9ffa/2762/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 34
DQMHistoTests: Total histograms compared: 2956833
DQMHistoTests: Total failures: 1
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 2956491
DQMHistoTests: Total skipped: 341
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
Checked 147 log files, 16 edm output root files, 34 DQM output files

fabiocos · 2019-10-09T08:35:26Z

+operations

the information about the compression algorithm used for the data tier is untracked, but effectively stored in the TFile through https://cmssdt.cern.ch/lxr/source/IOPool/Output/src/RootOutputFile.cc#0120
which activates
https://root.cern.ch/doc/v608/src_2TFile_8cxx_source.html#l02136
with the status word defined in the description record
https://root.cern.ch/doc/v608/src_2TFile_8cxx_source.html#l00063
So I understand that the update looks transparent to the input system.

cmsbuild · 2019-10-09T08:35:54Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

fabiocos · 2019-10-09T08:38:50Z

+1

add lzma/l4 for raw data

e3607f0

cmsbuild added this to the CMSSW_11_0_X milestone Oct 2, 2019

cmsbuild added code-checks-pending comparison-pending operations-pending orp-pending pending-signatures tests-pending labels Oct 2, 2019

cmsbuild added code-checks-approved and removed code-checks-pending labels Oct 2, 2019

cmsbuild added tests-started and removed tests-pending labels Oct 2, 2019

cmsbuild added tests-approved and removed tests-started labels Oct 2, 2019

cmsbuild added comparison-available and removed comparison-pending labels Oct 2, 2019

cmsbuild added fully-signed operations-approved and removed operations-pending pending-signatures labels Oct 9, 2019

cmsbuild added orp-approved and removed orp-pending labels Oct 9, 2019

cmsbuild merged commit 12f2f04 into cms-sw:master Oct 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use lzma/l4 for raw data #28109

Use lzma/l4 for raw data #28109

davidlange6 commented Oct 2, 2019

davidlange6 commented Oct 2, 2019 via email

cmsbuild commented Oct 2, 2019

cmsbuild commented Oct 2, 2019

cmsbuild commented Oct 2, 2019 •

edited

cmsbuild commented Oct 2, 2019

cmsbuild commented Oct 2, 2019

cmsbuild commented Oct 2, 2019

cmsbuild commented Oct 2, 2019

fabiocos commented Oct 9, 2019

cmsbuild commented Oct 9, 2019

fabiocos commented Oct 9, 2019

Use lzma/l4 for raw data #28109

Use lzma/l4 for raw data #28109

Conversation

davidlange6 commented Oct 2, 2019

PR description:

davidlange6 commented Oct 2, 2019 via email

cmsbuild commented Oct 2, 2019

cmsbuild commented Oct 2, 2019

cmsbuild commented Oct 2, 2019 • edited

cmsbuild commented Oct 2, 2019

cmsbuild commented Oct 2, 2019

cmsbuild commented Oct 2, 2019

cmsbuild commented Oct 2, 2019

fabiocos commented Oct 9, 2019

cmsbuild commented Oct 9, 2019

fabiocos commented Oct 9, 2019

cmsbuild commented Oct 2, 2019 •

edited