New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LZMA and ZSTD support for Streamer format #27288
Conversation
1. avoid copy of the buffered content just to join it with the header. Instead only copy header. This allows to reduce buffer to only 50k instead of default 7 MB! 2. reduce ROOT TBuffer (TBufferFile object) size after serialization of the INI file. 3. LZMA compression option was added. To facilitate automatic detection in reader, a 4-byte header is present at the beginning of LZMA content (this does not conflict with zlib header which starts with a different sequence). Compression/decompression functions are based on ROOT implementation, but we turn off calculation CRC32/64 checksum at this time. Error handling is also improved and buffer size limitation is removed. 4. Similar to LZMA, Zstandard (ZSTD) option was added including autodetection header. Depends on ZSTD in cmsdist, which was (just) added to 11_0_X IB builds. Compression_algorithm parameter was added to the output module, using same options as PoolOutput modul
…uggested by scram b code-checks-all
@smorovic, CMSSW_11_0_X branch is closed for direct updates. cms-bot is going to move this PR to master branch. |
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-27288/10497
|
A new Pull Request was created by @smorovic (Srecko Morovic) for master. It involves the following packages: EventFilter/Utilities @perrotta, @smuzaffar, @Dr15Jones, @emeschi, @cmsbuild, @slava77, @mommsen can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@cmsbuild please test |
The tests are being triggered in jenkins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to see tests running the different compression options.
@@ -18,77 +18,100 @@ | |||
#include "DataFormats/Provenance/interface/SelectedProducts.h" | |||
#include "FWCore/Utilities/interface/get_underlying_safe.h" | |||
|
|||
const int init_size = 1024 * 1024; | |||
const int init_size = 0; //will be allocated on first event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr
would be better
@@ -18,77 +18,100 @@ | |||
#include "DataFormats/Provenance/interface/SelectedProducts.h" | |||
#include "FWCore/Utilities/interface/get_underlying_safe.h" | |||
|
|||
const int init_size = 1024 * 1024; | |||
const int init_size = 0; //will be allocated on first event | |||
const unsigned int reserve_size = 50000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to add both inside struct SerializeDataBuffer
to avoid having these names be in the global namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for comments. I moved them inside this struct.
Comparison job queued. |
Comparison is ready Comparison Summary:
|
Pull request #27288 was updated. @perrotta, @smuzaffar, @Dr15Jones, @emeschi, @cmsbuild, @slava77, @mommsen can you please check and sign again. |
@cmsbuild please test |
The tests are being triggered in jenkins. |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
+1 |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
PR adds new compression formats and also optimizes buffer usage in Streamer event writing.
Output buffering changes:
Output buffering changed to reduce extra copy of the payload. Instead, small header is copied to the front of the payload after compression. Initial default of 7 MB is not needed and can be sized dynamically (usually to much less). New initial size is 50 kb to reserve for event header.
Reduced ROOT TBuffer (TBufferFile object) size after serialization of the INI file, which, due to typically large INI payloads in HLT, results in significant reduction in memory usage of the process during event processing.
In testing with the realistic 2018 HLT payload and data with the 4-thread HLT (pp Physics) CMSSW job, these two changes reduced total memory footprint from 2.7 GB to 2 GB (per process).
Compression changes:
LZMA compression algorithm added. To facilitate automatic detection in reader, a
4-byte header is present at the beginning of LZMA content (this does not
conflict with zlib header which starts with a different sequence). Compression
/decompression functions are based on ROOT implementation, but we turn off
calculation of CRC32/64 checksum (adler32 is already used to checksum whole payload). Error handling is also improved and 16 MB buffer size limitation in ROOT routine is removed.
Zstandard compression is added. Compression factor is comparable to zlib, but is performing faster at lower compression levels making it potentially interesting for HLT in situations where CPU usage is tight. Similar 4-byte header ("ZS") is added to the payload.
"compression_algorithm" parameter was added to the output module allowing to choose between algorithms.
Decompression for both new algorithms with format autodetection is added to the StreamerInputSource.
several "scram b code-checks-all" changes to Streamer and Utililities modules has also been included with the PR.
PR validation:
Buffering and compression changes have been verified to produce correct files by producing output files with different algorithms (DAQ and Streamer output module) and reading them with Streamer input source to verify integrity.