Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use RAWEventContent parameter set for Repack output module #37791

Merged
merged 1 commit into from May 9, 2022

Conversation

germanfgv
Copy link
Contributor

PR description:

Repack workflows are currently using a default compression configuration with ZLIB instead of LZMA, as was initially identified here. To solve this Repack configuration will start using the already defined RAWEventContent output module configuration.

PR validation:

Used RunRepack.py to generate a test configuration. The resulting PSet had the required output module attributes and the repack job was executed correctly using cmsRun

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2022

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37791/29698

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2022

A new Pull Request was created by @germanfgv (Germán Felipe Giraldo Villa) for master.

It involves the following packages:

  • Configuration/DataProcessing (operations)

@cmsbuild, @perrotta, @qliphy, @fabiocos, @davidlange6 can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @missirol, @fabiocos this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@germanfgv
Copy link
Contributor Author

@cmsbuild please test

@germanfgv
Copy link
Contributor Author

@davidlange6 @qliphy as I should have expected I'm not authorized to issue the test command. Could you please clarify to me what should be the next step?

@qliphy
Copy link
Contributor

qliphy commented May 5, 2022

allow @germanfgv test rights

@qliphy
Copy link
Contributor

qliphy commented May 5, 2022

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented May 5, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-54c14a/24470/summary.html
COMMIT: 2258980
CMSSW: CMSSW_12_4_X_2022-05-04-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/37791/24470/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3700704
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3700680
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 206 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@davidlange6
Copy link
Contributor

davidlange6 commented May 5, 2022 via email

@germanfgv
Copy link
Contributor Author

germanfgv commented May 6, 2022

@davidlange6 That's a good question. It is difficult to have a lasrge scale test of this without a release ready for a replay, but I'll prepare a single job with and without these changes and will report their performance here.

@davidlange6
Copy link
Contributor

davidlange6 commented May 6, 2022 via email

@germanfgv
Copy link
Contributor Author

With the previous default configuration, the job took 397seconds. This is the time report:

TimeReport> Time report complete in 397.112 seconds
 Time Summary: 
 - Min event:   0.000417233
 - Max event:   15.5655
 - Avg event:   0.0261389
 - Total loop:  391.215
 - Total init:  5.89693
 - Total job:   397.112
 - EventSetup Lock: 0
 - EventSetup Get:  0
 Event Throughput: 29.2065 ev/s
 CPU Summary: 
 - Total loop:     315.266
 - Total init:     2.48223
 - Total extra:    0
 - Total children: 0.148745
 - Total job:      317.748
 Processing Summary: 
 - Number of Events:  11426
 - Number of Global Begin Lumi Calls:  7
 - Number of Global Begin Run Calls: 1

these are the files it produced:

-rw-r--r--. 1 ggiraldo zh 725M May  6 00:42 write_ZeroBias_RAW.root
-rw-r--r--. 1 ggiraldo zh  73M May  6 00:42 write_NoBPTX_RAW.root
-rw-r--r--. 1 ggiraldo zh 323M May  6 00:42 write_MinimumBias_RAW.root
-rw-r--r--. 1 ggiraldo zh 162M May  6 00:42 write_HcalNZS_RAW.root
-rw-r--r--. 1 ggiraldo zh  74M May  6 00:42 write_HLTPhysics_RAW.root

With the new configuration, the job took much longer, 2451seconds. This is the time report:

TimeReport> Time report complete in 2451.4 seconds
 Time Summary: 
 - Min event:   0.00041604
 - Max event:   25.8817
 - Avg event:   0.204825
 - Total loop:  2444.33
 - Total init:  7.07313
 - Total job:   2451.4
 - EventSetup Lock: 0
 - EventSetup Get:  0
 Event Throughput: 4.67449 ev/s
 CPU Summary: 
 - Total loop:     2382.18
 - Total init:     3.06658
 - Total extra:    0
 - Total children: 0.168931
 - Total job:      2385.25
 Processing Summary: 
 - Number of Events:  11426
 - Number of Global Begin Lumi Calls:  7
 - Number of Global Begin Run Calls: 1

and produced the following (smaller) files:

-rw-r--r--. 1 ggiraldo zh 562M May  6 09:44 write_ZeroBias_RAW.root
-rw-r--r--. 1 ggiraldo zh  57M May  6 09:44 write_NoBPTX_RAW.root
-rw-r--r--. 1 ggiraldo zh 256M May  6 09:44 write_MinimumBias_RAW.root
-rw-r--r--. 1 ggiraldo zh 121M May  6 09:44 write_HcalNZS_RAW.root
-rw-r--r--. 1 ggiraldo zh  58M May  6 09:44 write_HLTPhysics_RAW.root

@davidlange6 As you can see, this is indeed worrisome. Can that difference be explaine simply by the change from LZIB to LZMA? or is there maybe something else in RAWEventContent that's increasing the processing time this much?

@germanfgv
Copy link
Contributor Author

@drkovalskyi what do you think?

@drkovalskyi
Copy link
Contributor

@germanfgv I don't think we use ZLIB by default. Could you please check exact algorithm and level of compression for files with and without fix?

@germanfgv
Copy link
Contributor Author

Checking the files created with this fix, I get:

  • Compression Algorithm: LZMA (as expected)
  • Compression Level: 4 (as expected)

But for files created without the fix, I get

  • Compression Algorithm: ZSTD (instead of the expected LZIB)
  • Compression Level: 4

So we are not moving from ZLIB to LZMA, but from ZSTD to LZMA.

@drkovalskyi
Copy link
Contributor

Thanks German. ZSTD is an interesting algorithm. @bbockelm looked at it: https://indico.cern.ch/event/695984/contributions/2872933/attachments/1590457/2516802/ZSTD_and_ZLIB_Updates_-_January_20186.pdf
Maybe Brian can comment on this result.
Anyway, I think we need to add an option to override the compression in Tier0, not just set it via the EventContent. This will allow us to make our benchmarks.

@davidlange6
Copy link
Contributor

davidlange6 commented May 6, 2022 via email

@drkovalskyi
Copy link
Contributor

@davidlange6 not sure what's not "simple". Could you please elaborate?

@davidlange6
Copy link
Contributor

davidlange6 commented May 9, 2022 via email

@drkovalskyi
Copy link
Contributor

Thanks for the clarification. Indeed disabling fast cloning would explain the need for more CPU. One way or the other we want to use LZMA for RAW output. Therefore I see no issue with this modification as is. Whether it will be a fast cloning or a full compression depends on the compression of incoming data and it's a separate topic that we will discuss with DAQ. For now we need to ensure that RAW data gets a proper compression before we write it to tape.

@drkovalskyi
Copy link
Contributor

@davidlange6 I checked with DAQ. They say that we will have to recompress data regardless of their compression. I don't understand how streamer files are organized, but basically the claim is that it's not possible to avoid recompression. So what did you mean by "fast clone"?

@drkovalskyi
Copy link
Contributor

@qliphy, @perrotta we need this fix to in production asap. We are writing larger files than we should wasting space. Could you please integrate the change?

@qliphy
Copy link
Contributor

qliphy commented May 9, 2022

urgent

@drkovalskyi I am a bit lost, but should we disable fastCloning here?
We can discuss this also at tomorrow's ORP

@cmsbuild cmsbuild added the urgent label May 9, 2022
@drkovalskyi
Copy link
Contributor

We should not disable fast cloning. It simply shouldn't work by default since it shouldn't be possible.

@perrotta
Copy link
Contributor

perrotta commented May 9, 2022

Yes @drkovalskyi I agree with @qliphy that some summary of the issues reported for this PR and how do we are expected to cope with them is needed, either here or tomorrow at the ORP meeting. You started from noticing that the increase of CPU time "is indeed worrisome" to your request to merge this "asap". And from this thread it is not clear (at least to me) whether those worries can be considered addressed or not...

@davidlange6
Copy link
Contributor

davidlange6 commented May 9, 2022 via email

@perrotta
Copy link
Contributor

perrotta commented May 9, 2022

The increase in cpu is as expected. Its small compared to reco and saves 10+% of tape.

Thank you @davidlange6 for the summary, which was not obvious from the whole thread here above

@perrotta
Copy link
Contributor

perrotta commented May 9, 2022

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented May 9, 2022

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

@davidlange6
Copy link
Contributor

davidlange6 commented Oct 11, 2022 via email

@davidlange6
Copy link
Contributor

davidlange6 commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants