Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LZMA and ZSTD support for Streamer format #27288

Merged
merged 7 commits into from Jun 26, 2019

Conversation

smorovic
Copy link
Contributor

PR description:

PR adds new compression formats and also optimizes buffer usage in Streamer event writing.

Output buffering changes:

  • Output buffering changed to reduce extra copy of the payload. Instead, small header is copied to the front of the payload after compression. Initial default of 7 MB is not needed and can be sized dynamically (usually to much less). New initial size is 50 kb to reserve for event header.

  • Reduced ROOT TBuffer (TBufferFile object) size after serialization of the INI file, which, due to typically large INI payloads in HLT, results in significant reduction in memory usage of the process during event processing.

  • In testing with the realistic 2018 HLT payload and data with the 4-thread HLT (pp Physics) CMSSW job, these two changes reduced total memory footprint from 2.7 GB to 2 GB (per process).

Compression changes:

  • LZMA compression algorithm added. To facilitate automatic detection in reader, a
    4-byte header is present at the beginning of LZMA content (this does not
    conflict with zlib header which starts with a different sequence). Compression
    /decompression functions are based on ROOT implementation, but we turn off
    calculation of CRC32/64 checksum (adler32 is already used to checksum whole payload). Error handling is also improved and 16 MB buffer size limitation in ROOT routine is removed.

  • Zstandard compression is added. Compression factor is comparable to zlib, but is performing faster at lower compression levels making it potentially interesting for HLT in situations where CPU usage is tight. Similar 4-byte header ("ZS") is added to the payload.

  • "compression_algorithm" parameter was added to the output module allowing to choose between algorithms.

  • Decompression for both new algorithms with format autodetection is added to the StreamerInputSource.

  • several "scram b code-checks-all" changes to Streamer and Utililities modules has also been included with the PR.

PR validation:

Buffering and compression changes have been verified to produce correct files by producing output files with different algorithms (DAQ and Streamer output module) and reading them with Streamer input source to verify integrity.

1. avoid copy of the buffered content just to join it with the header.
Instead only copy header. This allows to reduce buffer to only 50k instead
of default 7 MB!

2. reduce ROOT TBuffer (TBufferFile object) size after serialization of
the INI file.

3. LZMA compression option was added. To facilitate automatic detection
in reader, a 4-byte header is present at the beginning of LZMA content (this does
not conflict with zlib header which starts with a different sequence).
Compression/decompression functions are based on ROOT implementation,
but we turn off calculation CRC32/64 checksum at this time. Error
handling is also improved and buffer size limitation is removed.

4. Similar to LZMA, Zstandard (ZSTD) option was added including
autodetection header. Depends on ZSTD in cmsdist, which was (just) added to
11_0_X IB builds.

Compression_algorithm parameter was added to the output module, using
same options as PoolOutput modul
@cmsbuild cmsbuild changed the base branch from CMSSW_11_0_X to master June 20, 2019 12:36
@cmsbuild
Copy link
Contributor

@smorovic, CMSSW_11_0_X branch is closed for direct updates. cms-bot is going to move this PR to master branch.
In future, please use cmssw master branch to submit your changes.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-27288/10497

  • This PR adds an extra 44KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @smorovic (Srecko Morovic) for master.

It involves the following packages:

EventFilter/Utilities
IOPool/Streamer

@perrotta, @smuzaffar, @Dr15Jones, @emeschi, @cmsbuild, @slava77, @mommsen can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @makortel, @wddgit this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@smorovic smorovic marked this pull request as ready for review June 20, 2019 14:22
@slava77
Copy link
Contributor

slava77 commented Jun 20, 2019

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 20, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/1081/console Started: 2019/06/20 17:04

Copy link
Contributor

@Dr15Jones Dr15Jones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to see tests running the different compression options.

@@ -18,77 +18,100 @@
#include "DataFormats/Provenance/interface/SelectedProducts.h"
#include "FWCore/Utilities/interface/get_underlying_safe.h"

const int init_size = 1024 * 1024;
const int init_size = 0; //will be allocated on first event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constexpr would be better

@@ -18,77 +18,100 @@
#include "DataFormats/Provenance/interface/SelectedProducts.h"
#include "FWCore/Utilities/interface/get_underlying_safe.h"

const int init_size = 1024 * 1024;
const int init_size = 0; //will be allocated on first event
const unsigned int reserve_size = 50000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constexpr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add both inside struct SerializeDataBuffer to avoid having these names be in the global namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for comments. I moved them inside this struct.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-635200/1081/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 33
  • DQMHistoTests: Total histograms compared: 3254096
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3253761
  • DQMHistoTests: Total skipped: 334
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
  • Checked 137 log files, 14 edm output root files, 33 DQM output files

@cmsbuild
Copy link
Contributor

Pull request #27288 was updated. @perrotta, @smuzaffar, @Dr15Jones, @emeschi, @cmsbuild, @slava77, @mommsen can you please check and sign again.

@slava77
Copy link
Contributor

slava77 commented Jun 24, 2019

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 24, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/1127/console Started: 2019/06/24 10:02

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-635200/1127/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 33
  • DQMHistoTests: Total histograms compared: 3254598
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3254262
  • DQMHistoTests: Total skipped: 334
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
  • Checked 137 log files, 14 edm output root files, 33 DQM output files

@slava77
Copy link
Contributor

slava77 commented Jun 24, 2019

+1

for #27288 6b04fa1

  • reco code is essentially not affected; the reco signature requirement was triggered via a shared package EventFilter/Utilities.
  • jenkins tests pass and comparisons show no differences

@mommsen
Copy link
Contributor

mommsen commented Jun 25, 2019

+1

@Dr15Jones
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit cce8a24 into cms-sw:master Jun 26, 2019
@smorovic smorovic deleted the zstd-CMSSW_11_0_X-merged branch October 22, 2019 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants