Load CUDAService from Services_cff, and only if gpu modifier is active #432

makortel · 2019-12-17T22:28:46Z

PR description:

PR validation:

Unit tests run, profiling workflow runs (without explicit load of CUDAService).

AdrianoDee · 2020-01-15T12:48:46Z

Validation summary

Reference release CMSSW_11_0_0_pre13 at 91be707
Development branch cms-patatrack/CMSSW_11_0_X_Patatrack at e41560b
Testing PRs:

Load CUDAService from Services_cff, and only if gpu modifier is active #432 at 2d741b4

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.501
tracking validation plots and summary for workflow 10824.502

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.501
tracking validation plots and summary for workflow 10824.502

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.501
tracking validation plots and summary for workflow 10824.502

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.501
- ✔️ step3.py: log
development release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.885502
testing release, workflow 10824.5
- ❌ step3.py: log
testing release, workflow 10824.501
- ❌ step3.py: log
testing release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.885502

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.501
- ✔️ step3.py: log
development release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.885502
testing release, workflow 10824.5
- ❌ step3.py: log
testing release, workflow 10824.501
- ❌ step3.py: log
testing release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.885502

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.501
- ✔️ step3.py: log
development release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.885502
testing release, workflow 10824.5
- ❌ step3.py: log
testing release, workflow 10824.501
- ❌ step3.py: log
testing release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.885502

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/9a81de42a2a27335a8b70048031c655b2b7ea4b9/log .

fwyzard · 2020-01-15T14:11:36Z

This breaks the .5 and .501 workflows.
It looks like the reason is that the BeamSpotToCUDA module is included in those, even though they do not enable running on the GPU.

makortel · 2020-01-15T14:59:00Z

Thanks. So this

cmssw/RecoVertex/BeamSpotProducer/python/BeamSpot_cff.py

Lines 8 to 11 in e41560b

    
           offlineBeamSpotTask = cms.Task( 
        
               offlineBeamSpot, 
        
               offlineBeamSpotCUDA 
        
           )

should be changed to something along

offlineBeamSpotTask = cms.Task(offlineBeamSpot)
_offlineBeamSpotTask_gpu = offlineBeamSpotTask.clone()
_offlineBeamSpotTask_gpu.add(offlineBeamSpotCUDA)
gpu.toReplaceWith(offlineBeamSpotTask, _offlineBeamSpotTask_gpu)

I'll include that in this PR once I finish updating #429.

makortel · 2020-01-15T20:13:38Z

I'll include that in this PR once I finish updating #429.

Done.

AdrianoDee · 2020-01-15T20:23:27Z

🚧 Validation running at fu-c2a02-37-02:/data/user/adiflori/patatrack-validation/run.lDELclKDOR ...

AdrianoDee · 2020-01-15T22:23:08Z

Validation summary

Reference release CMSSW_11_0_0_pre13 at 91be707
Development branch cms-patatrack/CMSSW_11_0_X_Patatrack at e41560b
Testing PRs:

Load CUDAService from Services_cff, and only if gpu modifier is active #432 at 458ad2e

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.501
tracking validation plots and summary for workflow 10824.502

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.501
tracking validation plots and summary for workflow 10824.502

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.501
tracking validation plots and summary for workflow 10824.502

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.501
- ✔️ step3.py: log
development release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.885502
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.501
- ✔️ step3.py: log
testing release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.885502

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.501
- ✔️ step3.py: log
development release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.885502
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.501
- ✔️ step3.py: log
testing release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.885502

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.501
- ✔️ step3.py: log
development release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.885502
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.501
- ✔️ step3.py: log
testing release, workflow 10824.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.885502

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/6706d025121ec7ef85a53d727ab75afa038c34c2/log .

makortel · 2020-01-15T22:26:14Z

Thanks, the .5 and .501 workflows work now, and there is no impact on the performance (as expected).

fwyzard · 2020-01-16T10:16:01Z

RecoVertex/BeamSpotProducer/python/BeamSpot_cff.py

-    offlineBeamSpotCUDA
-)
+from Configuration.ProcessModifiers.gpu_cff import gpu
+offlineBeamSpotCUDA = _beamSpotToCUDA.clone()


I'm not suggesting to change this, just trying to understand: would it be equivalent to replace

from RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi import beamSpotToCUDA as _beamSpotToCUDA offlineBeamSpotCUDA = _beamSpotToCUDA.clone()

with

from RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi import beamSpotToCUDA as offlineBeamSpotCUDA

?

The two are not equivalent. The first one creates a copy/clone of RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi.beamSpotToCUDA, so if the BeamSpot_cff.py would do something along

offlineBeamSpotCUDA.src = "foo"

that change does not propagate to other configurations making use of RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi.beamSpotToCUDA.

The second one uses the very same object as RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi.beamSpotToCUDA, and any changes to offlineBeamSpotCUDA do propagate to other configurations making use of RecoVertex.BeamSpotProducer.beamSpotToCUDA_cfi.beamSpotToCUDA, which could be perceived as unexpected.

In this specific case there is little practical difference, so the choice of cloning is more of a following the recommended general pattern (and also protects for the case that someone else would use the second approach and modify a parameter).

Load CUDAService from Services_cff, and only if gpu modifier is active

2d741b4

Include offlineBeamSpotCUDA only if gpu Modifier is active

458ad2e

fwyzard reviewed Jan 16, 2020

View reviewed changes

fwyzard merged commit a68d9b8 into cms-patatrack:CMSSW_11_0_X_Patatrack Jan 16, 2020

makortel mentioned this pull request Jan 21, 2020

Simplify offlineBeamSpotCUDA configuration #433

Merged

fwyzard mentioned this pull request Aug 13, 2020

Patatrack integration - GPU beamspot data format and transfer (4/N) cms-sw/cmssw#31130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load CUDAService from Services_cff, and only if gpu modifier is active #432

Load CUDAService from Services_cff, and only if gpu modifier is active #432

makortel commented Dec 17, 2019

AdrianoDee commented Jan 15, 2020

fwyzard commented Jan 15, 2020

makortel commented Jan 15, 2020

makortel commented Jan 15, 2020

AdrianoDee commented Jan 15, 2020 via email

AdrianoDee commented Jan 15, 2020

makortel commented Jan 15, 2020

fwyzard Jan 16, 2020

makortel Jan 16, 2020

Load CUDAService from Services_cff, and only if gpu modifier is active #432

Load CUDAService from Services_cff, and only if gpu modifier is active #432

Conversation

makortel commented Dec 17, 2019

PR description:

PR validation:

AdrianoDee commented Jan 15, 2020

Validation summary

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Logs

fwyzard commented Jan 15, 2020

makortel commented Jan 15, 2020

makortel commented Jan 15, 2020

AdrianoDee commented Jan 15, 2020 via email

AdrianoDee commented Jan 15, 2020

Validation summary

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Logs

makortel commented Jan 15, 2020

fwyzard Jan 16, 2020

Choose a reason for hiding this comment

makortel Jan 16, 2020

Choose a reason for hiding this comment

logs and `nvprof`/`nvvp` profiles

logs and `nvprof`/`nvvp` profiles