-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Use ROOT lossy compression for P3 and position of reco::Track #39554
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39554/32342
|
A new Pull Request was created by @Dr15Jones (Chris Jones) for master. It involves the following packages:
@cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
-1 Failed Tests: Build BuildI found compilation error when building: >> Building LCG reflex dict from header file src/DataFormats/GsfTrackReco/src/classes.h >> Compiling LCG dictionary: tmp/el8_amd64_gcc10/src/DataFormats/GsfTrackReco/src/DataFormatsGsfTrackReco/a/DataFormatsGsfTrackReco_xr.cc >> Building shared library tmp/el8_amd64_gcc10/src/DataFormats/GsfTrackReco/src/DataFormatsGsfTrackReco/libDataFormatsGsfTrackReco.so Copying tmp/el8_amd64_gcc10/src/DataFormats/GsfTrackReco/src/DataFormatsGsfTrackReco/libDataFormatsGsfTrackReco.so to productstore area: >> Checking EDM Class Version for src/DataFormats/GsfTrackReco/src/classes_def.xml in libDataFormatsGsfTrackReco.so error: class 'reco::GsfTrack' has a different checksum for ClassVersion 20. Increment ClassVersion to 21 and assign it to checksum 1617233394 Suggestion: You can run 'scram build updateclassversion' to generate src/DataFormats/GsfTrackReco/src/classes_def.xml.generated with updated ClassVersion gmake: *** [tmp/el8_amd64_gcc10/src/DataFormats/GsfTrackReco/src/DataFormatsGsfTrackReco/libDataFormatsGsfTrackReco.so] Error 1 Leaving library rule at DataFormats/GsfTrackReco >> Leaving Package DataFormats/GsfTrackReco >> Package DataFormats/GsfTrackReco built |
0f99936
to
1cc2bbf
Compare
please test |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39554/32344
|
Pull request #39554 was updated. @mandrenguyen, @clacaputo can you please check and sign again. |
please test |
@Dr15Jones We see some small residual asymmetry comparing miniAOD (packed with rounding) and AOD track pt. |
Does the ROOT lossy compression the mantissa? Is there an option to do rounding on the lowest order bit instead? |
@cmsbuild please test to refresh |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7f2e52/30198/summary.html Comparison SummarySummary:
|
@pcanal, we are still waiting for your feedback. |
The default lossy encoding for |
@Dr15Jones @pcanal An example resolution in miniAOD to that was to apply compression to relevant objects at construction (pack->unpack cycle in PackedCandidate). Is there a way to apply the compression proposed here for specific consumers (e.g. just PAT producers) at runtime? |
@slava77 this change does not affect the storage of any of the Candidate related classes so I'm not quite sure what you are asking. |
I'm using PackedCandidate approach as an analogy where the concern about differences in created and consumed data is mitigated/resolved by applying compression at construction. I'm asking if there is a way to apply compression proposed here during runtime, possibly per consumer but perhaps by collection, without having to save the data on disk first. |
@slava77 I think I understand now. You are concerned that if PAT is run in the same job that creates the tracks, they will see the 'full' values while if PAT is run in a separate job and is reading the tracks from a file, they will see the 'lossy' version. |
yes |
In principal, it would be possible to write a EDProducer which reads in the collection, serializes it to a ROOT TBuffer and then deserializes out of the TBuffer to a new collection and then places that collection into the Event. That would all happen in memory so no need to write/read. |
this is promising. Are you available to prepare an example, or does an example exist already? |
considering that Refs or Ptrs are used all over the place, it's not really obvious to me that the best isolation of miniAOD differences is straightforward. So, this can become a rather deep rabbit hole. |
@Dr15Jones |
@Dr15Jones |
Looking back at my old spreadsheet I see values for 12 bits. From that spreadsheet
|
Thanks, we are likely OK with the covariance at 10 as in this PR. Since the loss is relatively low, we could just go to 13 bits in momentum to be safe and still get more than 10% reduction in the AOD size. |
So I started from the step3 output of workflow 11834.21 using 1000 events. I then slow copied the file using either standard CMSSW_13_1_0_pre1 code or modified versions of this PR also in pre1.
|
Closing in favor of http://www.github.com/cms-sw/cmssw/pull/41018 |
PR description:
PR validation:
Ran on workflow 11834.21 and say more than 10% decrease in AOD file size.
This is intended to be used for validation on a separate IB.