New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use RAWEventContent parameter set for Repack output module #37791
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37791/29698
|
A new Pull Request was created by @germanfgv (Germán Felipe Giraldo Villa) for master. It involves the following packages:
@cmsbuild, @perrotta, @qliphy, @fabiocos, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@cmsbuild please test |
@davidlange6 @qliphy as I should have expected I'm not authorized to issue the test command. Could you please clarify to me what should be the next step? |
allow @germanfgv test rights |
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-54c14a/24470/summary.html Comparison SummarySummary:
|
Great to make this change - but I'm curious by what factor this slows down the t0 repack process?
|
@davidlange6 That's a good question. It is difficult to have a lasrge scale test of this without a release ready for a replay, but I'll prepare a single job with and without these changes and will report their performance here. |
Thanks - I'd presume 500 events would be more than sufficient to know.
… On May 6, 2022, at 8:54 AM, Germán Felipe Giraldo Villa ***@***.***> wrote:
@davidlange6 That's a good question. It is difficult to have a lasrge scale test of this without a release ready for a replay, but I'll prepare a single jobs with and without this changes and will report their performance here.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
With the previous default configuration, the job took 397seconds. This is the time report:
these are the files it produced:
With the new configuration, the job took much longer, 2451seconds. This is the time report:
and produced the following (smaller) files:
@davidlange6 As you can see, this is indeed worrisome. Can that difference be explaine simply by the change from LZIB to LZMA? or is there maybe something else in RAWEventContent that's increasing the processing time this much? |
@drkovalskyi what do you think? |
@germanfgv I don't think we use ZLIB by default. Could you please check exact algorithm and level of compression for files with and without fix? |
Checking the files created with this fix, I get:
But for files created without the fix, I get
So we are not moving from ZLIB to LZMA, but from ZSTD to LZMA. |
Thanks German. ZSTD is an interesting algorithm. @bbockelm looked at it: https://indico.cern.ch/event/695984/contributions/2872933/attachments/1590457/2516802/ZSTD_and_ZLIB_Updates_-_January_20186.pdf |
This is not as simple as a zstd vs lzma speed comparison. To see that use zstd explicitly (with a different compression level eg)
On May 6, 2022 5:34 PM, drkovalskyi ***@***.***> wrote:
Thanks German. ZSTD is an interesting algorithm. @bbockelm<https://github.com/bbockelm> looked at it: https://indico.cern.ch/event/695984/contributions/2872933/attachments/1590457/2516802/ZSTD_and_ZLIB_Updates_-_January_20186.pdf
Maybe Brian can comment on this result.
Anyway, I think we need to add an option to override the compression in Tier0, not just set it via the EventContent. This will allow us to make our benchmarks.
—
Reply to this email directly, view it on GitHub<#37791 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ3KEP2WDRULOXERQNTVIU3VXANCNFSM5VAMMU2A>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@davidlange6 not sure what's not "simple". Could you please elaborate? |
I did already... fast clone vs not fast clone... Try the test I suggested to understand to what extent this makes a big difference.
… On May 9, 2022, at 9:46 AM, drkovalskyi ***@***.***> wrote:
@davidlange6 not sure what's not "simple". Could you please elaborate?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
Thanks for the clarification. Indeed disabling fast cloning would explain the need for more CPU. One way or the other we want to use LZMA for RAW output. Therefore I see no issue with this modification as is. Whether it will be a fast cloning or a full compression depends on the compression of incoming data and it's a separate topic that we will discuss with DAQ. For now we need to ensure that RAW data gets a proper compression before we write it to tape. |
@davidlange6 I checked with DAQ. They say that we will have to recompress data regardless of their compression. I don't understand how streamer files are organized, but basically the claim is that it's not possible to avoid recompression. So what did you mean by "fast clone"? |
urgent @drkovalskyi I am a bit lost, but should we disable fastCloning here? |
We should not disable fast cloning. It simply shouldn't work by default since it shouldn't be possible. |
Yes @drkovalskyi I agree with @qliphy that some summary of the issues reported for this PR and how do we are expected to cope with them is needed, either here or tomorrow at the ORP meeting. You started from noticing that the increase of CPU time "is indeed worrisome" to your request to merge this "asap". And from this thread it is not clear (at least to me) whether those worries can be considered addressed or not... |
The increase in cpu is as expected. Its small compared to reco and saves 10+% of tape.
… On May 9, 2022, at 3:44 PM, Andrea Perrotta ***@***.***> wrote:
Yes @drkovalskyi I agree with @qliphy that some summary of the issues reported for this PR and how do we are expected to cope with them is needed, either here or tomorrow at the ORP meeting. We started from noticing that the increase of CPU time "is indeed worrisome" to your request to merge this "asap". And from this thread it is not clear (at least to me) whether those worries can be considered addressed or not...
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
Thank you @davidlange6 for the summary, which was not obvious from the whole thread here above |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged. |
My understanding is that repack config usesu whatever compression is used to make the input file
On May 6, 2022 5:00 PM, drkovalskyi ***@***.***> wrote:
@germanfgv<https://github.com/germanfgv> I don't think we use ZLIB by default. Could you please check exact algorithm and level of compression for files with and without fix?
—
Reply to this email directly, view it on GitHub<#37791 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ5ZQCYVUYBVM4ATES3VIUXXTANCNFSM5VAMMU2A>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Recall that in addition to lzma being slower, the tier0 is fast cloning. 5hz is still small compared to reco
On May 6, 2022 10:29 AM, Germán Felipe Giraldo Villa ***@***.***> wrote:
With the previous default configuration, the job took 397seconds. This is the time report:
TimeReport> Time report complete in 397.112 seconds
Time Summary:
- Min event: 0.000417233
- Max event: 15.5655
- Avg event: 0.0261389
- Total loop: 391.215
- Total init: 5.89693
- Total job: 397.112
- EventSetup Lock: 0
- EventSetup Get: 0
Event Throughput: 29.2065 ev/s
CPU Summary:
- Total loop: 315.266
- Total init: 2.48223
- Total extra: 0
- Total children: 0.148745
- Total job: 317.748
Processing Summary:
- Number of Events: 11426
- Number of Global Begin Lumi Calls: 7
- Number of Global Begin Run Calls: 1
these are the files it produced:
…-rw-r--r--. 1 ggiraldo zh 725M May 6 00:42 write_ZeroBias_RAW.root
-rw-r--r--. 1 ggiraldo zh 73M May 6 00:42 write_NoBPTX_RAW.root
-rw-r--r--. 1 ggiraldo zh 323M May 6 00:42 write_MinimumBias_RAW.root
-rw-r--r--. 1 ggiraldo zh 162M May 6 00:42 write_HcalNZS_RAW.root
-rw-r--r--. 1 ggiraldo zh 74M May 6 00:42 write_HLTPhysics_RAW.root
With the new configuration, the job took much longer, 2451seconds. This is the time report:
TimeReport> Time report complete in 2451.4 seconds
Time Summary:
- Min event: 0.00041604
- Max event: 25.8817
- Avg event: 0.204825
- Total loop: 2444.33
- Total init: 7.07313
- Total job: 2451.4
- EventSetup Lock: 0
- EventSetup Get: 0
Event Throughput: 4.67449 ev/s
CPU Summary:
- Total loop: 2382.18
- Total init: 3.06658
- Total extra: 0
- Total children: 0.168931
- Total job: 2385.25
Processing Summary:
- Number of Events: 11426
- Number of Global Begin Lumi Calls: 7
- Number of Global Begin Run Calls: 1
and produced the following (smaller) files:
-rw-r--r--. 1 ggiraldo zh 562M May 6 09:44 write_ZeroBias_RAW.root
-rw-r--r--. 1 ggiraldo zh 57M May 6 09:44 write_NoBPTX_RAW.root
-rw-r--r--. 1 ggiraldo zh 256M May 6 09:44 write_MinimumBias_RAW.root
-rw-r--r--. 1 ggiraldo zh 121M May 6 09:44 write_HcalNZS_RAW.root
-rw-r--r--. 1 ggiraldo zh 58M May 6 09:44 write_HLTPhysics_RAW.root
@davidlange6<https://github.com/davidlange6> As you can see, this is indeed worrisome. Can that difference be explaine simply by the change from LZIB to LZMA? or is there maybe something else in RAWEventContent<https://github.com/cms-sw/cmssw/blob/62ebaf90e72c7948db7f95d768e4377b7a45964b/Configuration/EventContent/python/EventContent_cff.py#L169-L178> that's increasing the processing time this much?
—
Reply to this email directly, view it on GitHub<#37791 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQZSPJZ6RCLCJYENZWTVITJ2JANCNFSM5VAMMU2A>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
PR description:
Repack workflows are currently using a default compression configuration with ZLIB instead of LZMA, as was initially identified here. To solve this Repack configuration will start using the already defined RAWEventContent output module configuration.
PR validation:
Used RunRepack.py to generate a test configuration. The resulting PSet had the required output module attributes and the repack job was executed correctly using
cmsRun