Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECAL - Switch off online ECAL RecHit GPU vs. CPU monitoring - 12_4_X #39395

Conversation

thomreis
Copy link
Contributor

@thomreis thomreis commented Sep 14, 2022

PR description:

Switches off the online ECAL RecHit GPU vs. CPU comparisons since the RecHit module is identical in both options.
Until PR #39373 is merged the collections are still consumed so the same existing RecHit collection is used for CPU and GPU inputs.
Requires PR #39373 to be merged since the RecHit collection tags have been removed from the configuration.

PR validation:

Configured and ran cmsRun with the configuration but no events were processed since the source is a stream that was not available. Should be tested on the playback system as well.

Backport of #39393

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 14, 2022

A new Pull Request was created by @thomreis (Thomas Reis) for CMSSW_12_4_X.

It involves the following packages:

  • DQM/Integration (dqm)

@emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @syuvivida, @pmandrik, @micsucmed, @rvenditti can you please review it and eventually sign? Thanks.
@battibass, @threus, @batinkov, @francescobrivio this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@thomreis
Copy link
Contributor Author

Backport of #39393

@thomreis
Copy link
Contributor Author

enable gpu

@thomreis
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a86f4e/27544/summary.html
COMMIT: 23c7e8d
CMSSW: CMSSW_12_4_X_2022-09-14-1100/el8_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39395/27544/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19876
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19868
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3677396
  • DQMHistoTests: Total failures: 19
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3677354
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@thomreis thomreis changed the title ECAL - Switch of online ECAL RecHit GPU vs. CPU monitoring - 12_4_X ECAL - Switch off online ECAL RecHit GPU vs. CPU monitoring - 12_4_X Sep 15, 2022
@syuvivida
Copy link
Contributor

Tested this PR at the playback of P5 using collision runs 357900, 357815, and cosmic run 356932. The ecalgpu client ran at fu-c2f11-15-01.cms without errors. Plots in the DQMGUI also look fine.

@rovere
Copy link
Contributor

rovere commented Sep 15, 2022

@thomreis do you need this for the next data-taking?
Do I understand correctly that #39373 is needed before this PR?

Thanks for clarifying!

@thomreis
Copy link
Contributor Author

Hi @rovere, yes we would need this for data taking when we unpack the ECAL auxiliary collections with the CPU unpacker (see https://its.cern.ch/jira/browse/CMSHLT-2454).

No, PR #39373 is not needed before this PR. However, once #39373 is merged lines 91-95 of this PR could be removed as well (https://github.com/cms-sw/cmssw/pull/39395/files#diff-7e9b17ce9f9a19ae82d8a705a90d5fab2e021a4f0e18f0a84d7a9875621c4224R91-R95).

@perrotta
Copy link
Contributor

perrotta commented Sep 15, 2022

Hi @rovere, yes we would need this for data taking when we unpack the ECAL auxiliary collections with the CPU unpacker (see https://its.cern.ch/jira/browse/CMSHLT-2454).

No, PR #39373 is not needed before this PR. However, once #39373 is merged lines 91-95 of this PR could be removed as well.

@thomreis , while I don't know whether those lines need to be removed, or it is just a "nice to have", if we want to speed up the merging, could you please give for granted that #39373 will get merged in the next few hours and remove those lines since now if needed, in particular starting from the master version of this PR?

@cmsbuild
Copy link
Contributor

Pull request #39395 was updated. @emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @syuvivida, @pmandrik, @micsucmed, @rvenditti can you please check and sign again.

@thomreis
Copy link
Contributor Author

@perrotta removing the lines is a nice to have to clean up the configuration if the RecHit monitoring is switched off. This is done now in all three PRs with the latest commit.
Please note that this means that the configuration will most likely crash on the playback system if #39373 is not there. I am not sure for the matrix tests though.

@thomreis
Copy link
Contributor Author

please test with #39373

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a86f4e/27583/summary.html
COMMIT: 8def2f8
CMSSW: CMSSW_12_4_X_2022-09-15-1100/el8_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39395/27583/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-INPUT

The relvals timed out after 4 hours.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • Reco comparison had 3 failed jobs
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19876
  • DQMHistoTests: Total failures: 36
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19840
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 1 / 3 workflows

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3677396
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3677372
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@thomreis
Copy link
Contributor Author

please test with #39373

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a86f4e/27591/summary.html
COMMIT: 8def2f8
CMSSW: CMSSW_12_4_X_2022-09-15-2300/el8_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39395/27591/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test test-das-selected-lumis had ERRORS

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • Reco comparison had 3 failed jobs
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19876
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19868
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3677396
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3677372
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@thomreis
Copy link
Contributor Author

please test with #39373

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a86f4e/27613/summary.html
COMMIT: 8def2f8
CMSSW: CMSSW_12_4_X_2022-09-16-1100/el8_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39395/27613/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • Reco comparison had 3 failed jobs
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19876
  • DQMHistoTests: Total failures: 36
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19840
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 1 / 3 workflows

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3677396
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3677366
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@syuvivida
Copy link
Contributor

Just a note, the test at playback of this PR was done with also PR 39373. Results were OK. (no errors from DQM^2 and DQM Histograms of ECAL are normal.)

@emanueleusai
Copy link
Member

+1

  • P5 tests OK
  • GPU comparison differences known

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_4_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_6_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 319d261 into cms-sw:CMSSW_12_4_X Sep 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants