Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update all GPU workflows [12.3.x] #37413

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Mar 30, 2022

PR description:

Add Pixel-only and HCAL-only validation workflows:

  • Pixel only, on CPU and GPU, with GPU-vs-CPU validation: #.503 (quadruplets), #.507 (triplets)
  • HCAL only, on CPU and GPU, with GPU-vs-CPU validation: #.523

Enable the existing Pixel-only and HCAL-only profiling workflows:

  • Pixel only, on GPU (optionally): #.504 (quadruplets), #.508 (triplets)
  • HCAL only, on GPU (optionally): #.524

Add a single workflow running all GPU-enabled reconstruction (Pixel, ECAL, HCAL):

  • all, on CPU: #.581 (quadruplets) and #.585 (triplets)
  • all, on GPU (optionally): #.582 (pixel quadruplets) and #.586 (triplets)
  • all, on CPU and GPU, with GPU-vs-CPU validation: #.583 (quadruplets) and #.587 (triplets)

Do not customise the HLT, as it already makes full use of GPU reconstruction when the "gpu" modifier is enabled.
Add a short description before each workflow.

PR validation:

All new or updated GPU workflows ran successfully in CMSSW_12_4_X_2022-03-23-2300:

Pixel-only

$ runTheMatrix.py -w gpu -j 4 -t 4 -l 11634.501,11634.502,11634.503,11634.504,11634.505,11634.506,11634.507,11634.508
11634.501_TTbar_14TeV+2021_Patatrack_PixelOnlyCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:38:32 2022-date Thu Mar 31 09:35:53 2022; exit: 0 0 0 0
11634.502_TTbar_14TeV+2021_Patatrack_PixelOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:38:34 2022-date Thu Mar 31 09:35:53 2022; exit: 0 0 0 0
11634.503_TTbar_14TeV+2021_Patatrack_PixelOnlyGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:38:34 2022-date Thu Mar 31 09:35:54 2022; exit: 0 0 0 0
11634.504_TTbar_14TeV+2021_Patatrack_PixelOnlyGPU_Profiling+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 31 09:38:08 2022-date Thu Mar 31 09:35:54 2022; exit: 0 0 0
11634.505_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:40:50 2022-date Thu Mar 31 09:38:08 2022; exit: 0 0 0 0
11634.506_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:41:11 2022-date Thu Mar 31 09:38:32 2022; exit: 0 0 0 0
11634.507_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:41:11 2022-date Thu Mar 31 09:38:34 2022; exit: 0 0 0 0
11634.508_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU_Profiling+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 31 09:40:46 2022-date Thu Mar 31 09:38:34 2022; exit: 0 0 0
8 8 8 8 tests passed, 0 0 0 0 failed

ECAL-only

$ runTheMatrix.py -w gpu -j 4 -t 4 -l 11634.511,11634.512,11634.513,11634.514
11634.511_TTbar_14TeV+2021_Patatrack_ECALOnlyCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:43:11 2022-date Thu Mar 31 09:40:46 2022; exit: 0 0 0 0
11634.512_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:43:20 2022-date Thu Mar 31 09:40:51 2022; exit: 0 0 0 0
11634.513_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:43:41 2022-date Thu Mar 31 09:41:12 2022; exit: 0 0 0 0
11634.514_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU_Profiling+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 31 09:43:18 2022-date Thu Mar 31 09:41:13 2022; exit: 0 0 0
4 4 4 4 tests passed, 0 0 0 0 failed

HCAL-only

$ runTheMatrix.py -w gpu -j 4 -t 4 -l 11634.521,11634.522,11634.523,11634.524
11634.521_TTbar_14TeV+2021_Patatrack_HCALOnlyCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:45:34 2022-date Thu Mar 31 09:43:11 2022; exit: 0 0 0 0
11634.522_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:45:45 2022-date Thu Mar 31 09:43:18 2022; exit: 0 0 0 0
11634.523_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:45:48 2022-date Thu Mar 31 09:43:21 2022; exit: 0 0 0 0
11634.524_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU_Profiling+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 31 09:45:44 2022-date Thu Mar 31 09:43:42 2022; exit: 0 0 0
4 4 4 4 tests passed, 0 0 0 0 failed

All GPU sequences

$ runTheMatrix.py -w gpu -j 4 -t 4 -l 11634.581,11634.582,11634.583,11634.585,11634.586,11634.587
11634.581_TTbar_14TeV+2021_Patatrack_AllCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:30:52 2022-date Thu Mar 31 10:28:11 2022; exit: 0 0 0 0
11634.582_TTbar_14TeV+2021_Patatrack_AllGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:30:52 2022-date Thu Mar 31 10:28:12 2022; exit: 0 0 0 0
11634.583_TTbar_14TeV+2021_Patatrack_AllGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:30:54 2022-date Thu Mar 31 10:28:12 2022; exit: 0 0 0 0
11634.585_TTbar_14TeV+2021_Patatrack_AllTripletsCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:30:52 2022-date Thu Mar 31 10:28:13 2022; exit: 0 0 0 0
11634.586_TTbar_14TeV+2021_Patatrack_AllTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:33:26 2022-date Thu Mar 31 10:30:53 2022; exit: 0 0 0 0
11634.587_TTbar_14TeV+2021_Patatrack_AllTripletsGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:33:26 2022-date Thu Mar 31 10:30:53 2022; exit: 0 0 0 0
6 6 6 6 tests passed, 0 0 0 0 failed

Full offline reconstruction with all GPU sequences

$ runTheMatrix.py -w gpu -j 4 -t 4 -l 11634.591,11634.592,11634.593,11634.595,11634.596,11634.597
11634.591_TTbar_14TeV+2021_Patatrack_FullRecoCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:53:14 2022-date Thu Mar 31 09:48:31 2022; exit: 0 0 0 0
11634.592_TTbar_14TeV+2021_Patatrack_FullRecoGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:53:13 2022-date Thu Mar 31 09:48:32 2022; exit: 0 0 0 0
11634.593_TTbar_14TeV+2021_Patatrack_FullRecoGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:55:43 2022-date Thu Mar 31 09:51:49 2022; exit: 0 0 0 0
11634.595_TTbar_14TeV+2021_Patatrack_FullRecoTripletsCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:55:42 2022-date Thu Mar 31 09:51:49 2022; exit: 0 0 0 0
11634.596_TTbar_14TeV+2021_Patatrack_FullRecoTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:57:05 2022-date Thu Mar 31 09:53:13 2022; exit: 0 0 0 0
11634.597_TTbar_14TeV+2021_Patatrack_FullRecoTripletsGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:57:05 2022-date Thu Mar 31 09:53:14 2022; exit: 0 0 0 0
6 6 6 6 tests passed, 0 0 0 0 failed

if this PR is a backport please specify the original PR and why you need to backport that PR:

Backport of #37411.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 30, 2022

backport #37411

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 30, 2022

enable gpu

@cmsbuild cmsbuild added this to the CMSSW_12_3_X milestone Mar 30, 2022
@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 30, 2022

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 30, 2022

A new Pull Request was created by @fwyzard (Andrea Bocci) for CMSSW_12_3_X.

It involves the following packages:

  • Configuration/PyReleaseValidation (pdmv, upgrade)

@jordan-martins, @bbilin, @wajidalikhan, @AdrianoDee, @srimanob, @kskovpen can you please review it and eventually sign? Thanks.
@makortel, @kpedro88, @Martin-Grunewald, @missirol, @fabiocos, @slomeo this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@fwyzard fwyzard changed the title Add Pixel-only and HCAL-only validation workflows Add Pixel-only and HCAL-only validation workflows [12.3.x] Mar 30, 2022
@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-275236/23528/summary.html
COMMIT: 73d725f
CMSSW: CMSSW_12_3_X_2022-03-30-1100/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37413/23528/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • Reco comparison had 3 failed jobs
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19874
  • DQMHistoTests: Total failures: 2437
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 17437
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3697381
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3697357
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 204 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

Pull request #37413 was updated. @jordan-martins, @bbilin, @wajidalikhan, @cmsbuild, @AdrianoDee, @srimanob, @kskovpen can you please check and sign again.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 31, 2022

please test

Add Pixel-only and HCAL-only validation workflows.
Add a single workflow running all GPU-enabled reconstruction.
Do not customise the HLT, as it already makes full use of GPU
reconstruction when the "gpu" modifier is enabled.
Add a short description before each workflow.
@fwyzard fwyzard force-pushed the add_Pixel_HCAL_validation_workflows_123x branch from ce086a6 to 187ef7e Compare March 31, 2022 08:26
@cmsbuild
Copy link
Contributor

Pull request #37413 was updated. @jordan-martins, @bbilin, @wajidalikhan, @cmsbuild, @AdrianoDee, @srimanob, @kskovpen can you please check and sign again.

@fwyzard fwyzard changed the title Add Pixel-only and HCAL-only validation workflows [12.3.x] Update all GPU workflows [12.3.x] Mar 31, 2022
@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 31, 2022

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 31, 2022

test parameters:

  • enable_tests = gpu
  • workflows_gpu = 11634.587

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 31, 2022

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 31, 2022

@smuzaffar did the last command request rerunning all the tests, or only the new one on gpu ?

@smuzaffar
Copy link
Contributor

@fwyzard , it runs all default tests/workflows plus extra test mentioned in the test parameters comment

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 31, 2022

ah... OK, sorry for the duplicate work :-/

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-275236/23571/summary.html
COMMIT: 187ef7e
CMSSW: CMSSW_12_3_X_2022-03-31-1100/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37413/23571/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-275236/11634.587_TTbar_14TeV+2021_Patatrack_AllTripletsGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19874
  • DQMHistoTests: Total failures: 1470
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 18404
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 6 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3697381
  • DQMHistoTests: Total failures: 14
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3697345
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 204 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@srimanob
Copy link
Contributor

srimanob commented Apr 3, 2022

+Upgrade

Backport PR.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 5, 2022

urgent

@qliphy
Copy link
Contributor

qliphy commented Apr 6, 2022

kindly ping @cms-sw/pdmv-l2

@kskovpen
Copy link
Contributor

kskovpen commented Apr 6, 2022

+pdmv

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 6, 2022

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_3_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_4_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented Apr 6, 2022

+1

@cmsbuild cmsbuild merged commit cba6858 into cms-sw:CMSSW_12_3_X Apr 6, 2022
@fwyzard fwyzard deleted the add_Pixel_HCAL_validation_workflows_123x branch July 31, 2022 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants