Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory reduction of the Cluster Charge Histogram used in SiStrip gain calibration #20010

Merged
merged 3 commits into from Sep 15, 2017

Conversation

dimattia
Copy link
Contributor

@dimattia dimattia commented Aug 1, 2017

This pull requests implements the code needed to reduce the number of bins of the cluster charge histogram. The implementation requires to book a TH2S histogram with variable bin size, which was incidentally not supported by the DQMStore. The fix for this is delivered within the pull request under the DQMservice package.

The motivation of this change has been already presented here
https://indico.cern.ch/event/649344/contributions/2672267/attachments/1498323/2332518/OptimizeChHisto.pdf

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 1, 2017

A new Pull Request was created by @dimattia for master.

It involves the following packages:

CalibTracker/SiStripChannelGain
DQMServices/Core

@ghellwig, @vazzolini, @kmaeshima, @arunhep, @cerminar, @dmitrijus, @cmsbuild, @franzoni, @vanbesien, @lpernie can you please review it and eventually sign? Thanks.
@ghellwig, @barvic, @gbenelli, @tocheng, @jlagram, @OlivierBondu, @mmusich this is something you requested to watch as well.
@davidlange6 you are the release manager for this.

cms-bot commands are listed here

@arunhep
Copy link
Contributor

arunhep commented Aug 1, 2017

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 1, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/21992/console Started: 2017/08/02 01:16

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 2, 2017

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 2, 2017

Comparison job queued.

@boudoul
Copy link
Contributor

boudoul commented Aug 2, 2017

Hi @dimattia , please change also the title of this PR (we can't get that this is for strips , gain calibration ), then put the same title in the backport #20011 (adding the string 92X is the title ) - thank you .

@@ -267,6 +267,9 @@ void SiStripGainsPCLWorker::processEvent(const TrackerTopology* topo) {
if(Validation) {ClusterChargeOverPath/=(*gainused)[i];}
if(OldGainRemoving){ClusterChargeOverPath*=(*gainused)[i];}
}

// keep processing of pixel cluster charge until here
if(APV->SubDet<=2) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @dimattia , Question : can't we get rid completely of the pixel part ? there are have checks and computation for the pixel above this line like here
https://github.com/dimattia/cmssw/blob/8bdcb4372e22de2ac5112b37575b3bead83546a5/CalibTracker/SiStripChannelGain/src/SiStripGainsPCLWorker.cc#L262
or here :
https://github.com/dimattia/cmssw/blob/8bdcb4372e22de2ac5112b37575b3bead83546a5/CalibTracker/SiStripChannelGain/src/SiStripGainsPCLWorker.cc#L238
which are eventually not used - Same comments for SiStripGainFromCalibTree.cc .
What do you think ? thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ciao @boudoul in principle yes because the pixel cluster charge is neither collected in the AlCARECO histograms nor used. Anyway, I would like to keep the Pixel hit processing for a while because we may discover that, with the new Pixel detector readout, filtering the statistic according to the quality of the pixel hits in the track helps the quality of the dE/dx estimation for the Strip.

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 2, 2017

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20010/21992/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 25
  • DQMHistoTests: Total histograms compared: 2651090
  • DQMHistoTests: Total failures: 46542
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2604366
  • DQMHistoTests: Total skipped: 181
  • DQMHistoTests: Total Missing objects: 0
  • Checked 102 log files, 14 edm output root files, 25 DQM output files

@mmusich
Copy link
Contributor

mmusich commented Aug 2, 2017

Hello @dimattia, @boudoul
for the record, this is how RSS consumption profile vs time of step3 of runTheMatrix.py -l 1001.0 -t 4 looks like with this PR with respect to plain CMSSW_9_3_0_pre3, both measured on cmsdev03
alltrends_vs_9_3_0_pre3_vs_9_3_0_pre3_ _ 20010
There is ~ factor 2 reduction at peak RSS consumption, as well as a reduction in processing time, both roughly in line with expectations.

@dimattia
Copy link
Contributor Author

dimattia commented Aug 2, 2017

Hi @mmusich thank you for the measurement which are an independent confirmation of the improvements. Anyway I want to stress that the reduction seen at peak level is not in line with the reduction of the memory that this implementation provides.

Indeed we achieve a reduction of a factor of 4 in memory (moving from 89000x2000 bins to 72500x687 bins); now the ClusterCharge histogram takes 25 MBytes while before it needed 100 MBytes. Therefore your measurements simply point out that the huge increase in memory consumption is not only due to the Histogram size. I think it also comes from some weird feature in the framework.

@dimattia dimattia changed the title Charge Histogram bins reduced Memory reduction of the Cluster Charge Histogram used in SiStrip gain calibration Aug 2, 2017
@mmusich
Copy link
Contributor

mmusich commented Aug 2, 2017

@dimattia (commenting on how to read the plot)

Anyway I want to stress that the reduction seen at peak level is not in line with the reduction of the memory that this implementation provides.

step3 of runTheMatrix.py -l 1001.0 probes a variety of calibration workflows (not only the Strip Gains), and, obviously, the rest of the workflows are untouched. This means that in the plot in #20010 (comment) there is an underlying contribution that doesn't scale with the binning of the CC histogram. For a quick check I think my test is good enough, since anyway wf 1001.0 is what is tested in IB relvals and run in real life applications.

float* binYarray = new float[688];
double p0 = 5.445;
double p1 = 0.002113;
double p2 = 69.01576;
Copy link
Contributor

@mmusich mmusich Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe there is a better place for these magic numbers in CalibTracker/SiStripChannelGain/interface/APVGainHelpers.h since they are used here and in SiStripGainsPCLWorker.cc as well?
Also do we ever expect to change them again? Maybe can be passed as tracked arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmusich these magic numbers are the result of the optimization work pointed out in the PR description. They are not supposed to be changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimattia thanks, do you really need 5 decimals to describe the function? anyway moving them in a single place is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmusich concerning the place of the "magic numbers". These numbers concerns the binning of histograms which - for the time being - are duplicated among the PCWorker and the SiStripGainFromCalibTree. The re-engineering of SiStripGainFromCalibTree.cc will happen later and is not a subject of this PR.

@davidlange6
Copy link
Contributor

davidlange6 commented Aug 2, 2017 via email

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 11, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/22846/console Started: 2017/09/11 12:16

@cmsbuild
Copy link
Contributor

-1

Tested at: 841ec1b

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
6cfac74
You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20010/22846/git-log-recent-commits
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20010/22846/git-merge-result

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20010/22846/summary.html

I found follow errors while testing this PR

Failed tests: RelVals

  • RelVals:

When I ran the RelVals I found an error in the following worklfows:
1306.0 step4

runTheMatrix-results/1306.0_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15/step4_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15.log

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
6cfac74
You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20010/22846/git-log-recent-commits
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20010/22846/git-merge-result

@cmsbuild
Copy link
Contributor

Comparison not run due to runTheMatrix errors (RelVals and Igprof tests were also skipped)

@davidlange6
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 12, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/22882/console Started: 2017/09/12 10:21

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20010/22882/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 26
  • DQMHistoTests: Total histograms compared: 2642439
  • DQMHistoTests: Total failures: 209
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2642040
  • DQMHistoTests: Total skipped: 189
  • DQMHistoTests: Total Missing objects: 0
  • Checked 107 log files, 14 edm output root files, 26 DQM output files

@mmusich
Copy link
Contributor

mmusich commented Sep 14, 2017

@dimattia in the interest of trying to converge with this PR, can you apply the code-checks patch suggested above:
curl https://cmssdt.cern.ch/SDT/code-checks/PR-20010/611/git-diff.patch | patch -p1
Thanks

@mmusich
Copy link
Contributor

mmusich commented Sep 15, 2017

@lpernie @arunhep @dmitrijus
As you have signed the backport of this PR (which is already merged) any objection in signing this one as well (I am not sure if by policy the code-checks failure is a red flag for integration)

@davidlange6 davidlange6 merged commit eda3314 into cms-sw:master Sep 15, 2017
@arunhep
Copy link
Contributor

arunhep commented Sep 15, 2017

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants