Update the BeamSpotCUDA class #530

fwyzard · 2020-08-04T10:24:41Z

Make the BeamSpotCUDA class movable and explicitly non-copiable (as was already the case due to the device::unique_ptr data member).

Remove the cudaMemcpyAsync from the BeamSpotCUDA data format, and make the copy explicitly in the BeamSpotToCUDA producer.

fwyzard · 2020-08-04T10:24:55Z

Requires (and includes) #529 .

fwyzard · 2020-08-04T10:25:36Z

@AdrianoDee could you launch the validation of this PR ?

fwyzard · 2020-08-04T20:07:45Z

@makortel, assuming the tests run fine, do you think this is a reasonable approach ?

makortel

It is certainly a reasonable approach. I can think some other ways, but I don't have a strong feeling between them. Assuming a long-term goal of a generic SoA that would internally manage its memory, the approach of this PR would be closest to that.

makortel · 2020-08-04T20:46:41Z

CUDADataFormats/BeamSpot/interface/BeamSpotCUDA.h

@@ -21,12 +20,22 @@ class BeamSpotCUDA {
  };

  BeamSpotCUDA() = default;
-  BeamSpotCUDA(Data const* data_h, cudaStream_t stream);
+  BeamSpotCUDA(cudaStream_t stream) { data_d_ = cms::cuda::make_device_unique<Data>(stream); }


Just thinking out loud, an alternative would be to also allocate the data in the EDProducer and move in the cms::cuda::device::unique_ptr. Then the ptr() interface would not be needed.

I'm not convinced if that would be better though. E.g. I'd think that for a generic SoA the EDProducer should not have to deal with the memory allocation.

I suppose the main reason why allocation-in-EDProducer would look feasible here is the fact that BeamSpotCUDA is essentially a wrapper for cms::cuda::device::unique_ptr. Which leads me to another alternative of storing directly something along cms::cuda::device::unique_ptr<BeamSpotPOD> (assuming BeamSpotCUDA::Data would be renamed to, e.g., BeamSpotPOD).

fwyzard · 2020-08-06T15:18:47Z

For whatever reason is was working fine in 11.1.x, re-running in 11.2.x gave the same errors.

The reason is that I was using the default constructor of BeamSpotCUDA by mistake, which did not allocate the device memory.

I've committed two fixes:

the default constructor allocates memory using the default CUDA stream
use the explicit constructor with the CUDA stream from the context :-)

Local test was fine on top of 11.2.x, I'll wait to re-test it once #531 has been merged.

makortel · 2020-08-06T15:26:03Z

the default constructor allocates memory using the default CUDA stream

Could you elaborate why this would be needed? I'd imagine any use of the default-constructed BeamSpotCUDA to be a runtime error.

fwyzard · 2020-08-06T15:35:30Z

I'd imagine any use of the default-constructed BeamSpotCUDA to be a runtime error.

That's fine by me... in fact, do we need to have a default constructor at all ?

makortel · 2020-08-06T15:39:43Z

I'd imagine any use of the default-constructed BeamSpotCUDA to be a runtime error.

That's fine by me... in fact, do we need to have a default constructor at all ?

Default constructor is needed because of e.g. how edm::Wrapper works.

makortel · 2020-08-06T15:44:17Z

I'd imagine any use of the default-constructed BeamSpotCUDA to be a runtime error.

That's fine by me... in fact, do we need to have a default constructor at all ?

Default constructor is needed because of e.g. how edm::Wrapper works.

Written that, if we'd really really want, we could work around this requirement (e.g. with std::optional in cms::cuda::Product).

fwyzard · 2020-08-06T16:19:38Z

Default constructor is needed because of e.g. how edm::Wrapper works.

Does it only need to exist, or is it actually called ?
That is, would it make sense to throw an exception in the default constructor ?

makortel · 2020-08-06T18:13:28Z

Default constructor is needed because of e.g. how edm::Wrapper works.

Does it only need to exist, or is it actually called ?
That is, would it make sense to throw an exception in the default constructor ?

The default constructor is actually called.

fwyzard · 2020-08-06T19:20:37Z

The default constructor is actually called.

OK, then what is your preference ?

the default constructor does not allocate device memory (leading to an invalid object)
the default constructor allocates device memory on the default stream (leading to wasted allocations)

?

makortel · 2020-08-06T23:10:29Z

The default constructor leading to "invalid object" is clearly what we do elsewhere. But the more I think about this, the more I start to like of changing

cmssw/CUDADataFormats/Common/interface/Product.h

Line 55 in 2222f36

T data_; //!

to std::optional<T>, which would allow storing objects without a default constructor. I think we should not even need any checks for the optional having a value. (or we could "fake" the optional with std::unique_ptr, although that would lead to additional indirection).

(maybe some day also edm::Wrapper could be changed)

fwyzard · 2020-08-06T23:18:07Z

The default constructor leading to "invalid object" is clearly what we do elsewhere.

OK, I'll make the change here as well.

Using std::optional<T> seems reasonable - we can remove the default constructor here when it's no longer necessary.

Looking at cms::cuda::Product, the header file says that the default constructor is "needed only for ROOT dictionary generation" - so I have the same question: is Product's default constructor actually called at runtime ?

makortel · 2020-08-06T23:25:41Z

Looking at cms::cuda::Product, the header file says that the default constructor is "needed only for ROOT dictionary generation" - so I have the same question: is Product's default constructor actually called at runtime ?

The comment is actually imprecise, the Products default constructor is called at runtime (by edm::Wrapper). Actually trying to remove it won't compile.

Make the BeamSpotCUDA movable and explicitly non-copiable (as was already the case due to the device::unique_ptr data member). Remove the cudaMemcpyAsync from the BeamSpotCUDA data format, and move it to the BeamSpotToCUDA producer.

fwyzard · 2020-08-07T09:57:05Z

Validation summary

Reference release CMSSW_11_2_0_pre2 at 92997c6
Development branch cms-patatrack/CMSSW_11_2_X_Patatrack at 2222f36
Testing PRs:

Update the BeamSpotCUDA class #530 at b1cac7f

Validation plots

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 11634.5
tracking validation plots and summary for workflow 11634.501
tracking validation plots and summary for workflow 11634.502

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 11634.5
tracking validation plots and summary for workflow 11634.501
tracking validation plots and summary for workflow 11634.502

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 11634.5
tracking validation plots and summary for workflow 11634.501
tracking validation plots and summary for workflow 11634.502

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 11634.5
- ✔️ step3.py: log
development release, workflow 11634.5
- ✔️ step3.py: log
development release, workflow 11634.501
- ✔️ step3.py: log
development release, workflow 11634.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 11634.511
- ✔️ step3.py: log
development release, workflow 11634.512
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
development release, workflow 11634.521
- ✔️ step3.py: log
development release, workflow 11634.522
- ✔️ step3.py: log
- ✔️ profile.py: log
- ❌ cuda-memcheck --tool initcheck (report, log) found 1010536 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
development release, workflow 136.885502
development release, workflow 136.885512
development release, workflow 136.885522
testing release, workflow 11634.5
- ✔️ step3.py: log
testing release, workflow 11634.501
- ✔️ step3.py: log
testing release, workflow 11634.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 11634.511
- ✔️ step3.py: log
testing release, workflow 11634.512
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
testing release, workflow 11634.521
- ✔️ step3.py: log
testing release, workflow 11634.522
- ✔️ step3.py: log
- ✔️ profile.py: log
- ❌ cuda-memcheck --tool initcheck (report, log) found 937896 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
testing release, workflow 136.885502
testing release, workflow 136.885512
testing release, workflow 136.885522

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 11634.5
- ✔️ step3.py: log
development release, workflow 11634.5
- ✔️ step3.py: log
development release, workflow 11634.501
- ✔️ step3.py: log
development release, workflow 11634.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 11634.511
- ✔️ step3.py: log
development release, workflow 11634.512
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
development release, workflow 11634.521
- ✔️ step3.py: log
development release, workflow 11634.522
- ✔️ step3.py: log
- ✔️ profile.py: log
- ❌ cuda-memcheck --tool initcheck (report, log) found 48488 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.885502
development release, workflow 136.885512
development release, workflow 136.885522
testing release, workflow 11634.5
- ✔️ step3.py: log
testing release, workflow 11634.501
- ✔️ step3.py: log
testing release, workflow 11634.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 11634.511
- ✔️ step3.py: log
testing release, workflow 11634.512
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
testing release, workflow 11634.521
- ✔️ step3.py: log
testing release, workflow 11634.522
- ✔️ step3.py: log
- ✔️ profile.py: log
- ❌ cuda-memcheck --tool initcheck (report, log) found 48128 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.885502
testing release, workflow 136.885512
testing release, workflow 136.885522

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 11634.5
- ✔️ step3.py: log
development release, workflow 11634.5
- ✔️ step3.py: log
development release, workflow 11634.501
- ✔️ step3.py: log
development release, workflow 11634.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 11634.511
- ✔️ step3.py: log
development release, workflow 11634.512
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
development release, workflow 11634.521
- ✔️ step3.py: log
development release, workflow 11634.522
- ✔️ step3.py: log
- ✔️ profile.py: log
- ❌ cuda-memcheck --tool initcheck (report, log) found 26512 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.885502
development release, workflow 136.885512
development release, workflow 136.885522
testing release, workflow 11634.5
- ✔️ step3.py: log
testing release, workflow 11634.501
- ✔️ step3.py: log
testing release, workflow 11634.502
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 11634.511
- ✔️ step3.py: log
testing release, workflow 11634.512
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
testing release, workflow 11634.521
- ✔️ step3.py: log
testing release, workflow 11634.522
- ✔️ step3.py: log
- ✔️ profile.py: log
- ❌ cuda-memcheck --tool initcheck (report, log) found 25608 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.885502
testing release, workflow 136.885512
testing release, workflow 136.885522

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/0c8200324f38446c04a2bb42be3e2cac651de3d2/log .

makortel · 2020-08-07T13:58:32Z

Looking at cms::cuda::Product, the header file says that the default constructor is "needed only for ROOT dictionary generation" - so I have the same question: is Product's default constructor actually called at runtime ?

The comment is actually imprecise, the Products default constructor is called at runtime (by edm::Wrapper). Actually trying to remove it won't compile.

On a further look it could be the comment could actually more accurate than I indicated above. I don't think we (CMS) really call the default constructor of a data product (edm::Wrapper is constructed on each event either by moving in an object, or by forwarding the arguments to the object's constructor). Or at least DXR did not point any calls to the edm::Wrapper's default constructor.

fwyzard · 2020-08-07T18:10:28Z

I'm not sure why "testing" is always faster than "development", even when there are no relevant changes in the code...

fwyzard · 2020-08-07T18:12:34Z

No impact on physics (as expected), probably no impact on timing (I don't trust the flat 10% improvement, since it applies also to the ECAL and HCAL workflows...)

Make the BeamSpotCUDA movable and explicitly non-copiable (as was already the case due to the device::unique_ptr data member). Remove the cudaMemcpyAsync from the BeamSpotCUDA data format, and move it to the BeamSpotToCUDA producer.

fwyzard added the Beam spot label Aug 4, 2020

fwyzard requested a review from makortel August 4, 2020 10:25

fwyzard mentioned this pull request Aug 4, 2020

Add host-to-device async copy for host::noncached::unique_ptr #529

Merged

makortel approved these changes Aug 4, 2020

View reviewed changes

This comment has been minimized.

Sign in to view

cms-patatrack deleted a comment from AdrianoDee Aug 5, 2020

This comment has been minimized.

Sign in to view

fwyzard force-pushed the patatrack_update_BeamSpotCUDA_data_format branch from 5950cb3 to 15bd6df Compare August 6, 2020 14:51

Update the BeamSpotCUDA class

b1cac7f

Make the BeamSpotCUDA movable and explicitly non-copiable (as was already the case due to the device::unique_ptr data member). Remove the cudaMemcpyAsync from the BeamSpotCUDA data format, and move it to the BeamSpotToCUDA producer.

fwyzard force-pushed the patatrack_update_BeamSpotCUDA_data_format branch from 15bd6df to b1cac7f Compare August 7, 2020 08:44

cms-patatrack deleted a comment from AdrianoDee Aug 7, 2020

fwyzard merged commit 0f6e945 into cms-patatrack:CMSSW_11_2_X_Patatrack Aug 7, 2020

fwyzard deleted the patatrack_update_BeamSpotCUDA_data_format branch August 7, 2020 18:15

fwyzard mentioned this pull request Aug 13, 2020

Patatrack integration - GPU beamspot data format and transfer (4/N) cms-sw/cmssw#31130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the BeamSpotCUDA class #530

Update the BeamSpotCUDA class #530

fwyzard commented Aug 4, 2020

fwyzard commented Aug 4, 2020

fwyzard commented Aug 4, 2020

fwyzard commented Aug 4, 2020

makortel left a comment

makortel Aug 4, 2020

This comment has been minimized.

This comment has been minimized.

fwyzard commented Aug 6, 2020

makortel commented Aug 6, 2020

fwyzard commented Aug 6, 2020

makortel commented Aug 6, 2020

makortel commented Aug 6, 2020

fwyzard commented Aug 6, 2020

makortel commented Aug 6, 2020

fwyzard commented Aug 6, 2020 •

edited

makortel commented Aug 6, 2020

fwyzard commented Aug 6, 2020

makortel commented Aug 6, 2020

fwyzard commented Aug 7, 2020 •

edited

makortel commented Aug 7, 2020

fwyzard commented Aug 7, 2020

fwyzard commented Aug 7, 2020

Update the BeamSpotCUDA class #530

Update the BeamSpotCUDA class #530

Conversation

fwyzard commented Aug 4, 2020

fwyzard commented Aug 4, 2020

fwyzard commented Aug 4, 2020

fwyzard commented Aug 4, 2020

makortel left a comment

Choose a reason for hiding this comment

makortel Aug 4, 2020

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

fwyzard commented Aug 6, 2020

makortel commented Aug 6, 2020

fwyzard commented Aug 6, 2020

makortel commented Aug 6, 2020

makortel commented Aug 6, 2020

fwyzard commented Aug 6, 2020

makortel commented Aug 6, 2020

fwyzard commented Aug 6, 2020 • edited

makortel commented Aug 6, 2020

fwyzard commented Aug 6, 2020

makortel commented Aug 6, 2020

fwyzard commented Aug 7, 2020 • edited

Validation summary

Validation plots

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and nvprof/nvvp profiles

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

Logs

makortel commented Aug 7, 2020

fwyzard commented Aug 7, 2020

fwyzard commented Aug 7, 2020

fwyzard commented Aug 6, 2020 •

edited

fwyzard commented Aug 7, 2020 •

edited

logs and `nvprof`/`nvvp` profiles