Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HCAL MAHI Code optimization to reduce CPU consuming #23568

Merged
merged 1 commit into from Jun 20, 2018

Conversation

hmiaozh
Copy link
Contributor

@hmiaozh hmiaozh commented Jun 12, 2018

1). Set FullSampleMatrix/FullSampleVector to be dynamic so that it works for different bx and TS configurations and also allocates space only for the minimum needed.
2). Move part of the initialization (in "resetWorkspace()") to the main body, serving as the same purpose as elaborated in 1).
3). Code Cleanup: Removed some unused variables and repeated sentences.

Testing results:
1>. Same reconstruction results.
Mahi_CPUSpeed.pdf

2>. "MahiFit::phase1Apply" CPU consuming reduces ~18% from igprof report.
Two igprof.res files (one for the old one for the new) are put: https://www.dropbox.com/sh/nyjv61mfn29mupt/AAB8Ge1XsKndxHaI7rtIaHgEa?dl=0

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @hmiaozh for master.

It involves the following packages:

RecoLocalCalo/HcalRecAlgos

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@mariadalfonso, @argiro this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Jun 12, 2018

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 12, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/28654/console Started: 2018/06/12 23:25

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23568/28654/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 250 differences found in the comparisons
  • DQMHistoTests: Total files compared: 31
  • DQMHistoTests: Total histograms compared: 2902004
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2901813
  • DQMHistoTests: Total skipped: 190
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 30 files compared)
  • Checked 128 log files, 14 edm output root files, 31 DQM output files

@perrotta
Copy link
Contributor

@hmiaozh : could you please add "Mahi" or "HCAL local reco" or whatever else can be used to identify the field of application of this PR in its title? Thank you.

@abdoulline
Copy link

@hmiaozh, @perrotta
I've come here to ask exactly the same: "MAHI code optimization..."

@hmiaozh hmiaozh changed the title Code optimization to reduce CPU consuming HCAL MAHI Code optimization to reduce CPU consuming Jun 13, 2018
@slava77
Copy link
Contributor

slava77 commented Jun 14, 2018

Reco comparison results: 250 differences found in the comparisons

@hmiaozh @mariadalfonso @deguio
please check and explain the changes

@felicepantaleo
Copy link
Contributor

@vkhristenko @fwyzard fyi

@slava77
Copy link
Contributor

slava77 commented Jun 14, 2018

to elaborate on #23568 (comment)

Testing results:
1>. Same reconstruction results.

this does not match the observed differences in jenkins tests.
There are changes in run-1 workflows (e.g. 1000.0 2011A, 1001.0 2011A, 140.53 HI, 4.53 2012B).
Based on the past changes in HCAL local reco code, these kind of differences were blamed on specifics of reconstructing channels with bad QPLL.
Please check and confirm that this case is also related to such channels.
Thank you.

@hmiaozh
Copy link
Contributor Author

hmiaozh commented Jun 16, 2018

@slava77 Thank you for all the information.
@mariadalfonso @deguio

I went through all the changes. As you already pointed out, they are in run-1 workflows: 1000.0 2011A, 1001.0 2011A, 140.53 HI, 4.53 2012B

Then I did runTheMatrix on those runs and checked the results (Here what I checked here are the hbheprereco output infos) The results are in the attached slides.
PR23568_changes_debug.pdf

As a summary, I did observe differences between Ref and New on those runs. (Although somehow I’m not able to reproduce exactly the same plots as jenkins tests have. I don’t know where the spikes centered at 0 in jenkins plots come from. In my plots, I don’t have them. You can also compare them from the slides)
Then I print out the channels where differences are observed. They all match with the reconstructing channels with bad QPLL EXCEPT channels with phi = 57 (where I already marked red on slides).
I’m not sure whether this is because I misunderstood the numbers in the bad QPLL list. So I read the bad QPLL list from /afs/cern.ch/user/a/abdullin/public/3d_QPLL_HcalRecoParams/QPLL+LBV_channels.txt
At last line, it writes
“ -16 57 3 HE
.....
-20 58 2 HE

Does this “……” cover all channels in HEM with iphi=57? If so, then ALL changes are from channels with bad QPLL. Can somebody confirm this or point me to some infos on this?

A brief explanation on slides:
slide 1 : the plots of differences by jenkins tests for run 1000.0 2011A/1001.0 2011A (they are the same)
slide 2 : the plots of my testing results on the same run
slide 3 : the channels of the differences observed and the channels with bad QPLL.

slides 4-6 : repeat the same pattern as slide1-3 for run 140.53 HI
slides 7-14 : repeat the same pattern as slide1-3 for run 4.53 2012B

slide 15 : comparison plots on a recent one as a cross check, where I run 5k events; and no difference is observed.

@abdoulline
Copy link

@hmiaozh Miao,
admittedly I was lazy enough to expand 3d QPLL channels list, as the (included channels) topology is identical (just phi sectors are different) for all three modules in question. So if you look at the first two QPLL's lists, the answer (confirmation) to your question is: yes, 3d QPLL includes iphi=57.

@slava77
Copy link
Contributor

slava77 commented Jun 18, 2018

Based on the past changes in HCAL local reco code, these kind of differences were blamed on specifics of reconstructing channels with bad QPLL.
Please check and confirm that this case is also related to such channels.

Hi all,

it looks like the set of changes indeed corresponds to the bad QPLL channels.

Which exact change leads to the change in this PR? I wanted to understand if it is going towards resolving #21135

@hmiaozh
Copy link
Contributor Author

hmiaozh commented Jun 19, 2018

@abdoulline Thank you for confirming this so detailed!
@slava77 The optimization is just to set FullSampleMatrix/FullSampleVector to be dynamic so that it works for different bx and TS configurations and also allocates space only for the minimum needed. Logically, it shouldn't cause any change in the fitting results. If the bad QPLL has the wrong center bx, then maybe it is this change in code https://github.com/hmiaozh/cmssw/blob/9cff8f2ec66c3f7df5db31966991cb6b52ec4a7c/RecoLocalCalo/HcalRecAlgos/src/MahiFit.cc#L182
that cause this difference? So before it starts filling from "nnlsWork_.fullTSOffset", this equals to "nnlsWork_.fullTSOffset = fullTSofInterest_ - channelData.soi();", like shown in https://indico.cern.ch/event/733498/contributions/3025199/attachments/1659986/2659045/Mahi_optimization_0601.pdf
slide2. Now I just immigrate it to start from 0. It is not supposed to change. But if data with bad QPLL and the wrong setting of soi, then maybe here's the difference comes in.

As for solving #21135, I plotted out the fitting results, as shown in the following slides: the upper row is current configuration; lower row is with this optimization. I picked up one from each of the three run-1 dataset (1000.0 2011A/1001.0 2011A, 140.53 HI, 4.53 2012B) where there are changes. From the plots, I don't think there's much difference/problem related to bad QPLL can be solved.
Jun19_pulse_QPLL.pdf

So since this optimization follows the same logic as the old, bad QPLL problem might need to be solved with other/deeper/further thoughts.

@slava77
Copy link
Contributor

slava77 commented Jun 19, 2018

+1

for #23568 9cff8f2

  • changes in the code appear to be in line with the intended speedups. Some changes are (after checks) are expected in Run-1 data which had channels with bad/drifting QPLLs (which were fixed in LS1 and none apparently appeared since then).
  • jenkins tests pass and comparisons with the baseline show only small difference in hbhe reco in run-1 data tests

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants