Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update all GPU workflows #37411

Merged

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Mar 30, 2022

PR description:

Add Pixel-only and HCAL-only validation workflows:

  • Pixel only, on CPU and GPU, with GPU-vs-CPU validation: #.503 (quadruplets), #.507 (triplets)
  • HCAL only, on CPU and GPU, with GPU-vs-CPU validation: #.523

Enable the existing Pixel-only and HCAL-only profiling workflows:

  • Pixel only, on GPU (optionally): #.504 (quadruplets), #.508 (triplets)
  • HCAL only, on GPU (optionally): #.524

Add a single workflow running all GPU-enabled reconstruction (Pixel, ECAL, HCAL):

  • all, on CPU: #.581 (quadruplets) and #.585 (triplets)
  • all, on GPU (optionally): #.582 (pixel quadruplets) and #.586 (triplets)
  • all, on CPU and GPU, with GPU-vs-CPU validation: #.583 (quadruplets) and #.587 (triplets)

Do not customise the HLT, as it already makes full use of GPU reconstruction when the "gpu" modifier is enabled.
Add a short description before each workflow.

PR validation:

All new or updated GPU workflows ran successfully:

Pixel-only

$ runTheMatrix.py -w gpu -j 4 -t 4 -l 11634.501,11634.502,11634.503,11634.504,11634.505,11634.506,11634.507,11634.508
11634.501_TTbar_14TeV+2021_Patatrack_PixelOnlyCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:38:32 2022-date Thu Mar 31 09:35:53 2022; exit: 0 0 0 0
11634.502_TTbar_14TeV+2021_Patatrack_PixelOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:38:34 2022-date Thu Mar 31 09:35:53 2022; exit: 0 0 0 0
11634.503_TTbar_14TeV+2021_Patatrack_PixelOnlyGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:38:34 2022-date Thu Mar 31 09:35:54 2022; exit: 0 0 0 0
11634.504_TTbar_14TeV+2021_Patatrack_PixelOnlyGPU_Profiling+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 31 09:38:08 2022-date Thu Mar 31 09:35:54 2022; exit: 0 0 0
11634.505_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:40:50 2022-date Thu Mar 31 09:38:08 2022; exit: 0 0 0 0
11634.506_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:41:11 2022-date Thu Mar 31 09:38:32 2022; exit: 0 0 0 0
11634.507_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:41:11 2022-date Thu Mar 31 09:38:34 2022; exit: 0 0 0 0
11634.508_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU_Profiling+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 31 09:40:46 2022-date Thu Mar 31 09:38:34 2022; exit: 0 0 0
8 8 8 8 tests passed, 0 0 0 0 failed

ECAL-only

$ runTheMatrix.py -w gpu -j 4 -t 4 -l 11634.511,11634.512,11634.513,11634.514
11634.511_TTbar_14TeV+2021_Patatrack_ECALOnlyCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:43:11 2022-date Thu Mar 31 09:40:46 2022; exit: 0 0 0 0
11634.512_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:43:20 2022-date Thu Mar 31 09:40:51 2022; exit: 0 0 0 0
11634.513_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:43:41 2022-date Thu Mar 31 09:41:12 2022; exit: 0 0 0 0
11634.514_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU_Profiling+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 31 09:43:18 2022-date Thu Mar 31 09:41:13 2022; exit: 0 0 0
4 4 4 4 tests passed, 0 0 0 0 failed

HCAL-only

$ runTheMatrix.py -w gpu -j 4 -t 4 -l 11634.521,11634.522,11634.523,11634.524
11634.521_TTbar_14TeV+2021_Patatrack_HCALOnlyCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:45:34 2022-date Thu Mar 31 09:43:11 2022; exit: 0 0 0 0
11634.522_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:45:45 2022-date Thu Mar 31 09:43:18 2022; exit: 0 0 0 0
11634.523_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:45:48 2022-date Thu Mar 31 09:43:21 2022; exit: 0 0 0 0
11634.524_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU_Profiling+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 31 09:45:44 2022-date Thu Mar 31 09:43:42 2022; exit: 0 0 0
4 4 4 4 tests passed, 0 0 0 0 failed

All GPU sequences

$ runTheMatrix.py -w gpu -j 4 -t 4 -l 11634.581,11634.582,11634.583,11634.585,11634.586,11634.587
11634.581_TTbar_14TeV+2021_Patatrack_AllCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:30:52 2022-date Thu Mar 31 10:28:11 2022; exit: 0 0 0 0
11634.582_TTbar_14TeV+2021_Patatrack_AllGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:30:52 2022-date Thu Mar 31 10:28:12 2022; exit: 0 0 0 0
11634.583_TTbar_14TeV+2021_Patatrack_AllGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:30:54 2022-date Thu Mar 31 10:28:12 2022; exit: 0 0 0 0
11634.585_TTbar_14TeV+2021_Patatrack_AllTripletsCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:30:52 2022-date Thu Mar 31 10:28:13 2022; exit: 0 0 0 0
11634.586_TTbar_14TeV+2021_Patatrack_AllTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:33:26 2022-date Thu Mar 31 10:30:53 2022; exit: 0 0 0 0
11634.587_TTbar_14TeV+2021_Patatrack_AllTripletsGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 10:33:26 2022-date Thu Mar 31 10:30:53 2022; exit: 0 0 0 0
6 6 6 6 tests passed, 0 0 0 0 failed

Full offline reconstruction with all GPU sequences

$ runTheMatrix.py -w gpu -j 4 -t 4 -l 11634.591,11634.592,11634.593,11634.595,11634.596,11634.597
11634.591_TTbar_14TeV+2021_Patatrack_FullRecoCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:53:14 2022-date Thu Mar 31 09:48:31 2022; exit: 0 0 0 0
11634.592_TTbar_14TeV+2021_Patatrack_FullRecoGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:53:13 2022-date Thu Mar 31 09:48:32 2022; exit: 0 0 0 0
11634.593_TTbar_14TeV+2021_Patatrack_FullRecoGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:55:43 2022-date Thu Mar 31 09:51:49 2022; exit: 0 0 0 0
11634.595_TTbar_14TeV+2021_Patatrack_FullRecoTripletsCPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:55:42 2022-date Thu Mar 31 09:51:49 2022; exit: 0 0 0 0
11634.596_TTbar_14TeV+2021_Patatrack_FullRecoTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:57:05 2022-date Thu Mar 31 09:53:13 2022; exit: 0 0 0 0
11634.597_TTbar_14TeV+2021_Patatrack_FullRecoTripletsGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 31 09:57:05 2022-date Thu Mar 31 09:53:14 2022; exit: 0 0 0 0
6 6 6 6 tests passed, 0 0 0 0 failed

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 30, 2022

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 30, 2022

please test

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37411/29084

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for master.

It involves the following packages:

  • Configuration/PyReleaseValidation (pdmv, upgrade)

@jordan-martins, @bbilin, @wajidalikhan, @AdrianoDee, @srimanob, @kskovpen can you please review it and eventually sign? Thanks.
@makortel, @kpedro88, @Martin-Grunewald, @missirol, @fabiocos, @slomeo this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 30, 2022

please wait before merging this PR, I will push some more developments

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0acc53/23527/summary.html
COMMIT: 2ea647d
CMSSW: CMSSW_12_4_X_2022-03-30-1100/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37411/23527/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19874
  • DQMHistoTests: Total failures: 1036
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 18838
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3585896
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3585866
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 200 log files, 45 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 31, 2022

please test

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37411/29105

@cmsbuild
Copy link
Contributor

Pull request #37411 was updated. @jordan-martins, @bbilin, @wajidalikhan, @cmsbuild, @AdrianoDee, @srimanob, @kskovpen can you please check and sign again.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 31, 2022

please test

@fwyzard fwyzard changed the title Add Pixel-only and HCAL-only validation workflows Update all GPU workflows Mar 31, 2022
@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 31, 2022

test parameters:

  • workflows_gpu = 11634.587
  • enable_tests = gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 31, 2022

@smuzaffar , after this PR is merged, we could replace the GPU tests (11634.506, 11634.512, 11634.522) with just 11634.587

@smuzaffar
Copy link
Contributor

please test
there was bug in bot which is fixed now, so let restart the tests

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0acc53/23562/summary.html
COMMIT: f1df4f8
CMSSW: CMSSW_12_4_X_2022-03-30-2300/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37411/23562/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-0acc53/11634.587_TTbar_14TeV+2021_Patatrack_AllTripletsGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19874
  • DQMHistoTests: Total failures: 2246
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 17628
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3591311
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3591281
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 200 log files, 45 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@srimanob
Copy link
Contributor

srimanob commented Apr 3, 2022

+Upgrade

This PR updates GPU workflows.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 4, 2022

@cms-sw/pdmv-l2, could you check this PR and let me know if you have any comments ?

@kskovpen
Copy link
Contributor

kskovpen commented Apr 4, 2022

+pdmv

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2022

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 5, 2022

urgent

the backport should be in 12.3.0

@cmsbuild cmsbuild added the urgent label Apr 5, 2022
10824.506, 10824.507, 10824.508,
10824.512, 10824.513, 10824.514,
10824.522, 10824.523, 10824.524,
10824.582, 10824.583, # 10824.524,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: 10824.524 and 10824.528 here (and the same below at Ls 55-56) are just placeholders, isn't it?
To have the release numbers sequential in the tabs here, they should be 10824.584 and 10824.588 instead.
Note: I am NOT asking to modify anything here, just trying to understand the logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, the placeholder comments have the wrong numbers :-/

@perrotta
Copy link
Contributor

perrotta commented Apr 5, 2022

+1

@cmsbuild cmsbuild merged commit 70954a6 into cms-sw:master Apr 5, 2022
@fwyzard fwyzard deleted the add_Pixel_HCAL_validation_workflows branch July 31, 2022 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants