Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Use ROOT lossy compression for P3 and position of reco::Track #39554

Closed
wants to merge 1 commit into from

Conversation

Dr15Jones
Copy link
Contributor

@Dr15Jones Dr15Jones commented Sep 30, 2022

PR description:

  • added classes used by ROOT when doing storage which specify lossy compression
    • Tracking covariance matrix is stored with 10 bit mantissas
    • Tracking momentum is stored with 10 bit mantissas
    • Tracking x, y positions are stored with a fixed 24 bit precision with a min/max of -1100/1100
  • added iotypes rule to use new classes

PR validation:

Ran on workflow 11834.21 and say more than 10% decrease in AOD file size.

This is intended to be used for validation on a separate IB.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39554/32342

  • This PR adds an extra 24KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for master.

It involves the following packages:

  • DataFormats/TrackReco (reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks.
@VourMa, @JanFSchulte, @rovere, @VinInn, @missirol, @gpetruc, @mmusich, @mtosi this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7f2e52/27883/summary.html
COMMIT: 0f99936
CMSSW: CMSSW_12_6_X_2022-09-30-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39554/27883/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

>> Building LCG reflex dict from header file src/DataFormats/GsfTrackReco/src/classes.h
>> Compiling LCG dictionary: tmp/el8_amd64_gcc10/src/DataFormats/GsfTrackReco/src/DataFormatsGsfTrackReco/a/DataFormatsGsfTrackReco_xr.cc
>> Building shared library tmp/el8_amd64_gcc10/src/DataFormats/GsfTrackReco/src/DataFormatsGsfTrackReco/libDataFormatsGsfTrackReco.so
Copying tmp/el8_amd64_gcc10/src/DataFormats/GsfTrackReco/src/DataFormatsGsfTrackReco/libDataFormatsGsfTrackReco.so to productstore area:
>> Checking EDM Class Version for src/DataFormats/GsfTrackReco/src/classes_def.xml in libDataFormatsGsfTrackReco.so
error: class 'reco::GsfTrack' has a different checksum for ClassVersion 20. Increment ClassVersion to 21 and assign it to checksum 1617233394
Suggestion: You can run 'scram build updateclassversion' to generate src/DataFormats/GsfTrackReco/src/classes_def.xml.generated with updated ClassVersion
gmake: *** [tmp/el8_amd64_gcc10/src/DataFormats/GsfTrackReco/src/DataFormatsGsfTrackReco/libDataFormatsGsfTrackReco.so] Error 1
Leaving library rule at DataFormats/GsfTrackReco
>> Leaving Package DataFormats/GsfTrackReco
>> Package DataFormats/GsfTrackReco built


@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39554/32344

  • This PR adds an extra 28KB to repository

@cmsbuild
Copy link
Contributor

Pull request #39554 was updated. @mandrenguyen, @clacaputo can you please check and sign again.

@Dr15Jones
Copy link
Contributor Author

please test

@slava77
Copy link
Contributor

slava77 commented Jan 24, 2023

@Dr15Jones
do I understand correctly that the proposed schema truncates the mantissa?
Is there an option to do rounding?

We see some small residual asymmetry comparing miniAOD (packed with rounding) and AOD track pt.

@Dr15Jones
Copy link
Contributor Author

@pcanal

Does the ROOT lossy compression the mantissa? Is there an option to do rounding on the lowest order bit instead?

@slava77
Copy link
Contributor

slava77 commented Jan 26, 2023

@cmsbuild please test

to refresh

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7f2e52/30198/summary.html
COMMIT: 4347d97
CMSSW: CMSSW_13_0_X_2023-01-26-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/39554/30198/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 653 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555495
  • DQMHistoTests: Total failures: 4017
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3551456
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@Dr15Jones
Copy link
Contributor Author

@pcanal, we are still waiting for your feedback.

@pcanal
Copy link
Contributor

pcanal commented Feb 7, 2023

The default lossy encoding for Double32_t is to literally convert to a float, store this result and lossless-ly compress it. So in this case, yes the mantissa storage is also lossy and does not offer any tweaks. Some of the other options (see TBufferFile::WriteDouble32) offers different tradeoff (in particular by allowing an expected range specification)

@smuzaffar smuzaffar modified the milestones: CMSSW_13_0_X, CMSSW_13_1_X Feb 11, 2023
@slava77
Copy link
Contributor

slava77 commented Feb 14, 2023

@Dr15Jones @pcanal
There was a concern expressed (@mandrenguyen ) that with the lossy compression the results of e.g. PAT step (miniAOD) will be different in PromptReco vs re-miniAOD.

An example resolution in miniAOD to that was to apply compression to relevant objects at construction (pack->unpack cycle in PackedCandidate). Is there a way to apply the compression proposed here for specific consumers (e.g. just PAT producers) at runtime?

@Dr15Jones
Copy link
Contributor Author

@slava77 this change does not affect the storage of any of the Candidate related classes so I'm not quite sure what you are asking.

@slava77
Copy link
Contributor

slava77 commented Feb 14, 2023

@slava77 this change does not affect the storage of any of the Candidate related classes so I'm not quite sure what you are asking.

I'm using PackedCandidate approach as an analogy where the concern about differences in created and consumed data is mitigated/resolved by applying compression at construction.

I'm asking if there is a way to apply compression proposed here during runtime, possibly per consumer but perhaps by collection, without having to save the data on disk first.

@Dr15Jones
Copy link
Contributor Author

@slava77 I think I understand now. You are concerned that if PAT is run in the same job that creates the tracks, they will see the 'full' values while if PAT is run in a separate job and is reading the tracks from a file, they will see the 'lossy' version.

@slava77
Copy link
Contributor

slava77 commented Feb 14, 2023

@slava77 I think I understand now. You are concerned that if PAT is run in the same job that creates the tracks, they will see the 'full' values while if PAT is run in a separate job and is reading the tracks from a file, they will see the 'lossy' version.

yes

@Dr15Jones
Copy link
Contributor Author

Dr15Jones commented Feb 14, 2023

In principal, it would be possible to write a EDProducer which reads in the collection, serializes it to a ROOT TBuffer and then deserializes out of the TBuffer to a new collection and then places that collection into the Event. That would all happen in memory so no need to write/read.

@slava77
Copy link
Contributor

slava77 commented Feb 14, 2023

In principal, it would be possible to write a EDProducer which reads in the collection, serializes it to a ROOT TBuffer and then deserializes out of the TBuffer to a new collection and then places that collection into the Event. That would all happen in memory so no need to write/read.

this is promising. Are you available to prepare an example, or does an example exist already?

@slava77
Copy link
Contributor

slava77 commented Feb 15, 2023

In principal, it would be possible to write a EDProducer which reads in the collection, serializes it to a ROOT TBuffer and then deserializes out of the TBuffer to a new collection and then places that collection into the Event. That would all happen in memory so no need to write/read.

this is promising. Are you available to prepare an example, or does an example exist already?

considering that Refs or Ptrs are used all over the place, it's not really obvious to me that the best isolation of miniAOD differences is straightforward. So, this can become a rather deep rabbit hole.

@slava77
Copy link
Contributor

slava77 commented Mar 7, 2023

PR description:

* added classes used by ROOT when doing storage which specify lossy compression

* added iotypes rule to use new classes

PR validation:

Ran on workflow 11834.21 and say more than 10% decrease in AOD file size.

This is intended to be used for validation on a separate IB.

@Dr15Jones
please summarize the details of what's in this PR; it takes an effort to unfold what's in this PR by navigating to code and then through the thread in #39449 (please also add a reference to the original issue #39449)

@slava77
Copy link
Contributor

slava77 commented Mar 8, 2023

@Dr15Jones
it looks like we are eventually concluding that the proposed scheme with 10 bits for p{x,y,z} introduces rounding errors larger than the track direction resolution at high pt.
We'd likely need to go to 13 bits, but perhaps 12 will still be tolerable.
Do you have size change estimates with 12 and 13 bits?

@Dr15Jones
Copy link
Contributor Author

Dr15Jones commented Mar 8, 2023

it looks like we are eventually concluding that the proposed scheme with 10 bits for p{x,y,z} introduces rounding errors larger than the track direction resolution at high pt.
We'd likely need to go to 13 bits, but perhaps 12 will still be tolerable.
Do you have size change estimates with 12 and 13 bits?

Looking back at my old spreadsheet I see values for 12 bits. From that spreadsheet

Covariance Compression Momentum Compression Position Compression AOD file reduction
[0 0 10] [0 0 10] [-1100 1100 24] 12.76%
[0 0 12] [0 0 12] [-1100 1100 24] 10.85%

@slava77
Copy link
Contributor

slava77 commented Mar 8, 2023

it looks like we are eventually concluding that the proposed scheme with 10 bits for p{x,y,z} introduces rounding errors larger than the track direction resolution at high pt.
We'd likely need to go to 13 bits, but perhaps 12 will still be tolerable.
Do you have size change estimates with 12 and 13 bits?

Looking back at my old spreadsheet I see values for 12 bits. From that spreadsheet
Covariance Compression Momentum Compression Position Compression AOD file reduction
[0 0 10] [0 0 10] [-1100 1100 24] 12.76%
[0 0 12] [0 0 12] [-1100 1100 24] 10.85%

Thanks, we are likely OK with the covariance at 10 as in this PR.
I guess the 1.9% difference roughly splits as 3:15 for momentum:covariance (15 floats in the cov).

Since the loss is relatively low, we could just go to 13 bits in momentum to be safe and still get more than 10% reduction in the AOD size.

@Dr15Jones
Copy link
Contributor Author

So I started from the step3 output of workflow 11834.21 using 1000 events. I then slow copied the file using either standard CMSSW_13_1_0_pre1 code or modified versions of this PR also in pre1.

type file size relative size
pure pre1 591192300 1.0
P3 mantissa 10 (this PR) 508884087 0.861
P3 mantissa 12 510730389 0.864
P3 mantissa 13 511530532 0.865

@rappoccio
Copy link
Contributor

Closing in favor of http://www.github.com/cms-sw/cmssw/pull/41018

@rappoccio rappoccio closed this Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants