Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash of ECAL GPU reco when ECAL is out of the run - 11_3_X #34768

Merged
merged 2 commits into from Aug 4, 2021

Conversation

thomreis
Copy link
Contributor

@thomreis thomreis commented Aug 4, 2021

PR description:

Fixes a crash of the EcalUncalibRecHitProducerGPU module when the ECAL is not in the run. The reason for the crash is that the number of channels is not properly initialised to zero in the EcalRawToDigiGPU when there are no FEDs to unpack.
Also improved error message that helped to find the origin of the problem.

PR validation:

HLT configuration detailed in #34197 (comment) runs to completion now.

Backport of #34765

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 4, 2021

A new Pull Request was created by @thomreis (Thomas Reis) for CMSSW_11_3_X.

It involves the following packages:

  • EventFilter/EcalRawToDigi (reconstruction)
  • RecoLocalCalo/EcalRecProducers (reconstruction)

@perrotta, @jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@rchatter, @argiro, @Martin-Grunewald, @apsallid, @thomreis, @simonepigazzini this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy, @perrotta you are the release manager for this.

cms-bot commands are listed here

@@ -138,6 +138,10 @@ void EcalRawToDigiGPU::acquire(edm::Event const& event,
if (counter > 0) {
ecal::raw::entryPoint(
inputCPU, inputGPU, outputGPU_, scratchGPU, outputCPU_, conditions, ctx.stream(), counter, currentCummOffset);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to set these to 0 unconditionally, before calling (or not) ecal::raw::entryPoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done as well but zeroing seems an unnecessary operation when ecal::raw::entryPoint is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to make sure the variables are in a clean (initialised) state, independently of the future use.
Especially because they are only set in the subsequent call to produce(), and anybody looking at them inside ecal::raw::entryPoint (for example, to debug the next problem in a few weeks/months/years time) will likely be confused by seeing random values.

@thomreis
Copy link
Contributor Author

thomreis commented Aug 4, 2021

type bug-fix

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 4, 2021

Pull request #34768 was updated. @perrotta, @jpata, @cmsbuild, @slava77 can you please check and sign again.

@fwyzard
Copy link
Contributor

fwyzard commented Aug 4, 2021

please test

@thomreis
Copy link
Contributor Author

thomreis commented Aug 4, 2021

No enable gpu needed for this one @fwyzard ?

@fwyzard
Copy link
Contributor

fwyzard commented Aug 4, 2021 via email

@fwyzard
Copy link
Contributor

fwyzard commented Aug 4, 2021 via email

@fwyzard
Copy link
Contributor

fwyzard commented Aug 4, 2021 via email

@perrotta
Copy link
Contributor

perrotta commented Aug 4, 2021

urgent

@cmsbuild cmsbuild added the urgent label Aug 4, 2021
@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 4, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6de7ff/17525/summary.html
COMMIT: b9a3a80
CMSSW: CMSSW_11_3_X_2021-08-04-1100/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/34768/17525/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • Reco comparison had 3 failed jobs
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 9571
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 9571
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 7 differences found in the comparisons
  • DQMHistoTests: Total files compared: 38
  • DQMHistoTests: Total histograms compared: 2878314
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2878279
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 37 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 160 log files, 37 edm output root files, 38 DQM output files
  • TriggerResults: no differences found

@slava77
Copy link
Contributor

slava77 commented Aug 4, 2021

+reconstruction

for #34768 b9a3a80

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 4, 2021

This pull request is fully signed and it will be integrated in one of the next CMSSW_11_3_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy, @perrotta (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented Aug 4, 2021

+1

  • Identical to the one merged in the master, and simple enough that we can avoid waiting for checking the IB with it in the master

@cmsbuild cmsbuild merged commit 304db18 into cms-sw:CMSSW_11_3_X Aug 4, 2021
@thomreis thomreis deleted the ecal-gpu-ecalout-fix-113x branch August 5, 2021 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants