Reduction of RAW' data size via adjustment of approx. cluster member data types #39199

abaty · 2022-08-25T16:55:50Z

This PR is aimed at significantly reducing the size of the RAW' data format which uses the Approximated Cluster class, as the current implementation is not fully optimized. This is the subsequent PR for improving the format which is mentioned at the end of the first paragraph in the description of PR #38423.

The space savings is accomplished by changing the full-precision float class members of SiStripApproximatedCluster to integer data types having limited precision. In the case of the cluster width and average charge, values are rounded to the nearest whole number. For the barycenter, rounding to one decimal place was found to give a good tradeoff of tracking performance and space savings . To avoid using a floating point number, the barycenter is multiplied by 10 and stored as a 16-bit integer, and is converted back when regenerating the cluster collection from approximated clusters. This reduces the approximated cluster collection size by 37% over the current implementation, and the SiStrip detector payload by 45% compared to the conventional RAW format. A small loss (~0.5%) of inclusive tracks was observed, but largely disappears when applying analysis-type track selections. (More info on our tests at [1], which was discussed in our data format working group)

This PR also incorporates two suggestions from @makortel for cleaning up the approximated cluster code [2,3].

[1] https://twiki.cern.ch/twiki/pub/Main/HIDetectorReadout2020/ApproxClusters_integer10th_Aug25.pdf
[2] 8871eff
[3] 0492f1f

PR validation:

This PR can be tested using WF 140.58 and checking step2 to verify the smaller approx. cluster size. Step 3 can be used to check physics performance and tracking differences.

@icali @mandrenguyen

cmsbuild · 2022-08-25T17:19:26Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39199/31809

This PR adds an extra 20KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File DataFormats/SiStripCluster/src/classes_def.xml modified in PR(s): CUDA implementation of RecoLocalTracker/SiStripCluster ClustersFromRawProducer #34618

cmsbuild · 2022-08-25T17:19:56Z

A new Pull Request was created by @abaty for master.

It involves the following packages:

DataFormats/SiStripCluster (reconstruction)
RecoLocalTracker/SiStripClusterizer (reconstruction)

@jpata, @cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks.
@echabert, @VourMa, @swertz, @robervalwalsh, @yduhm, @GiacomoSguazzoni, @JanFSchulte, @rovere, @gbenelli, @alesaggio, @threus, @felicepantaleo, @gpetruc, @mmusich, @VinInn, @mtosi this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

mmusich · 2022-08-25T17:53:11Z

test parameters:

workflows = 161, 140.58

mandrenguyen · 2022-08-25T19:07:00Z

@cmsbuild please test

cmsbuild · 2022-08-26T01:41:06Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-315a48/27098/summary.html
COMMIT: 0e3e268
CMSSW: CMSSW_12_5_X_2022-08-25-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/39199/27098/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-315a48/140.58_RunHI2018+RunHI2018+RAWPRIMEHI18+RECOHID18APPROXCLUSTERS+HARVESTDHI18
/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-315a48/161.0_HydjetQ_B12_5020GeV_2021_ppReco+HydjetQ_B12_5020GeV_2021_ppReco+DIGIHI2021PPRECO+RAWPRIMESIMHI18+RECOHI2021PPRECOMBAPPROXCLUSTERS+ALCARECOHI2021PPRECO+HARVESTHI2021PPRECO4

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 53
DQMHistoTests: Total histograms compared: 3852985
DQMHistoTests: Total failures: 16210
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3836753
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
Checked 222 log files, 49 edm output root files, 53 DQM output files
TriggerResults: no differences found

DataFormats/SiStripCluster/src/SiStripApproximateCluster.cc

mandrenguyen · 2022-08-26T14:01:21Z

@perrotta @qliphy I want to draw your attention to this PR. This is not the "technical bug fix" that I understand was discussed in the ORP and is planned to be backported from 12_6_X. This PR effectively changes the compression level of the approximated clusters, and does have an effect on the relevant workflows (161 and 140.58) It could be considered a bug-fix in the sense that the default compression level was doing nearly nothing. In any case, this is needed for ion running, so we'll have to find a way to get it into 12_5_X.
For reference, the authors have presented this change at the HIN PAG meeting just now, and agreed to present it at the RECO/AT meeting next week (I still need to create the agenda).

perrotta · 2022-08-26T14:34:54Z

In any case, this is needed for ion running, so we'll have to find a way to get it into 12_5_X.

Technically it can be still be considered a bug fix, thus I don't see problems in backporting it to 12_5_X before 12_5_0.
Given the delays we have in building pre5, it is a pity that you cannot merge it in pre5 already, so that you can profit of the release validation of pre5. Perhaps it is still possible, provided @cms-sw/reconstruction-l2 signs it now...

mandrenguyen · 2022-08-26T14:54:55Z

+reconstruction
Local test show the expected decrease in the size of the approximatedClusters collection (30-40%), depending on the workflows, and the ~1% decrease in generalTracks, as described by the authors.
Only the two heavy-ion workflows with rawprime are affected, aside from a single negligible difference in
39496.0 and 39500.0 which cannot possibly be related

cmsbuild · 2022-08-26T14:55:20Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

perrotta · 2022-08-26T15:13:15Z

+1

Merge for pre5

slava77 · 2022-09-11T15:53:26Z

@abaty @mandrenguyen @icali
was there some discussion/plan to add some information about the cluster shape. Perhaps a few bits could resolve quite a few issues downstream?
connected to #38423 (comment)

IIUC, the current strategy has a massive inefficiency for V0 reconstruction
(assuming that as in pp the cluster shape filter stays)
Perhaps a separate issue for this is going to help to keep track of the progress.

abaty · 2022-09-13T22:53:44Z

@slava77 Currently we are working on another raised issue regarding the backwards compatibility of this strategy, so we have not had time to critically examine the feasibility/accuracy of storing cluster shape info at HLT. We will look into this issue once the backwards compatibility is resolved. For the upcoming HI run, we have at the moment disabled the cluster shape filter so the massive V0 inefficiency is not expected. I agree that making a separate issue to track this for the future would be a good idea.

was there some discussion/plan to add some information about the cluster shape. Perhaps a few bits could resolve quite a few issues downstream? connected to #38423 (comment)

IIUC, the current strategy has a massive inefficiency for V0 reconstruction (assuming that as in pp the cluster shape filter stays) Perhaps a separate issue for this is going to help to keep track of the progress.

abaty added 2 commits August 25, 2022 07:19

tidying up of approx cluster classes, as suggested by makortel

d12d565

approximation of barycenter in approx clusters to a 10th of a strip

0e3e268

cmsbuild added this to the CMSSW_12_5_X milestone Aug 25, 2022

cmsbuild added code-checks-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Aug 25, 2022

cmsbuild added code-checks-approved and removed code-checks-pending labels Aug 25, 2022

cmsbuild added tests-started and removed tests-pending labels Aug 25, 2022

cmsbuild added tests-approved and removed tests-started labels Aug 26, 2022

mandrenguyen reviewed Aug 26, 2022

View reviewed changes

DataFormats/SiStripCluster/src/SiStripApproximateCluster.cc Show resolved Hide resolved

cmsbuild added fully-signed reconstruction-approved and removed reconstruction-pending pending-signatures labels Aug 26, 2022

cmsbuild removed the orp-pending label Aug 26, 2022

cmsbuild added the orp-approved label Aug 26, 2022

cmsbuild merged commit bc79ae1 into cms-sw:master Aug 26, 2022

cmsbuild mentioned this pull request Aug 26, 2022

CUDA implementation of RecoLocalTracker/SiStripCluster ClustersFromRawProducer #34618

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduction of RAW' data size via adjustment of approx. cluster member data types #39199

Reduction of RAW' data size via adjustment of approx. cluster member data types #39199

abaty commented Aug 25, 2022 •

edited

cmsbuild commented Aug 25, 2022

cmsbuild commented Aug 25, 2022

mmusich commented Aug 25, 2022

mandrenguyen commented Aug 25, 2022

cmsbuild commented Aug 26, 2022

mandrenguyen commented Aug 26, 2022

perrotta commented Aug 26, 2022

mandrenguyen commented Aug 26, 2022 •

edited

cmsbuild commented Aug 26, 2022

perrotta commented Aug 26, 2022

slava77 commented Sep 11, 2022 •

edited

abaty commented Sep 13, 2022

Reduction of RAW' data size via adjustment of approx. cluster member data types #39199

Reduction of RAW' data size via adjustment of approx. cluster member data types #39199

Conversation

abaty commented Aug 25, 2022 • edited

PR validation:

cmsbuild commented Aug 25, 2022

cmsbuild commented Aug 25, 2022

mmusich commented Aug 25, 2022

mandrenguyen commented Aug 25, 2022

cmsbuild commented Aug 26, 2022

Comparison Summary

mandrenguyen commented Aug 26, 2022

perrotta commented Aug 26, 2022

mandrenguyen commented Aug 26, 2022 • edited

cmsbuild commented Aug 26, 2022

perrotta commented Aug 26, 2022

slava77 commented Sep 11, 2022 • edited

abaty commented Sep 13, 2022

abaty commented Aug 25, 2022 •

edited

mandrenguyen commented Aug 26, 2022 •

edited

slava77 commented Sep 11, 2022 •

edited