Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the BeamSpotCUDA class #530

Conversation

fwyzard
Copy link

@fwyzard fwyzard commented Aug 4, 2020

Make the BeamSpotCUDA class movable and explicitly non-copiable (as was already the case due to the device::unique_ptr data member).

Remove the cudaMemcpyAsync from the BeamSpotCUDA data format, and make the copy explicitly in the BeamSpotToCUDA producer.

@fwyzard
Copy link
Author

fwyzard commented Aug 4, 2020

Requires (and includes) #529 .

@fwyzard fwyzard requested a review from makortel August 4, 2020 10:25
@fwyzard
Copy link
Author

fwyzard commented Aug 4, 2020

@AdrianoDee could you launch the validation of this PR ?

@fwyzard
Copy link
Author

fwyzard commented Aug 4, 2020

@makortel, assuming the tests run fine, do you think this is a reasonable approach ?

Copy link

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is certainly a reasonable approach. I can think some other ways, but I don't have a strong feeling between them. Assuming a long-term goal of a generic SoA that would internally manage its memory, the approach of this PR would be closest to that.

@@ -21,12 +20,22 @@ class BeamSpotCUDA {
};

BeamSpotCUDA() = default;
BeamSpotCUDA(Data const* data_h, cudaStream_t stream);
BeamSpotCUDA(cudaStream_t stream) { data_d_ = cms::cuda::make_device_unique<Data>(stream); }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud, an alternative would be to also allocate the data in the EDProducer and move in the cms::cuda::device::unique_ptr. Then the ptr() interface would not be needed.

I'm not convinced if that would be better though. E.g. I'd think that for a generic SoA the EDProducer should not have to deal with the memory allocation.

I suppose the main reason why allocation-in-EDProducer would look feasible here is the fact that BeamSpotCUDA is essentially a wrapper for cms::cuda::device::unique_ptr. Which leads me to another alternative of storing directly something along cms::cuda::device::unique_ptr<BeamSpotPOD> (assuming BeamSpotCUDA::Data would be renamed to, e.g., BeamSpotPOD).

@AdrianoDee

This comment has been minimized.

@cms-patatrack cms-patatrack deleted a comment from AdrianoDee Aug 5, 2020
@fwyzard

This comment has been minimized.

@fwyzard fwyzard force-pushed the patatrack_update_BeamSpotCUDA_data_format branch from 5950cb3 to 15bd6df Compare August 6, 2020 14:51
@fwyzard
Copy link
Author

fwyzard commented Aug 6, 2020

For whatever reason is was working fine in 11.1.x, re-running in 11.2.x gave the same errors.

The reason is that I was using the default constructor of BeamSpotCUDA by mistake, which did not allocate the device memory.

I've committed two fixes:

  • the default constructor allocates memory using the default CUDA stream
  • use the explicit constructor with the CUDA stream from the context :-)

Local test was fine on top of 11.2.x, I'll wait to re-test it once #531 has been merged.

@makortel
Copy link

makortel commented Aug 6, 2020

  • the default constructor allocates memory using the default CUDA stream

Could you elaborate why this would be needed? I'd imagine any use of the default-constructed BeamSpotCUDA to be a runtime error.

@fwyzard
Copy link
Author

fwyzard commented Aug 6, 2020

I'd imagine any use of the default-constructed BeamSpotCUDA to be a runtime error.

That's fine by me... in fact, do we need to have a default constructor at all ?

@makortel
Copy link

makortel commented Aug 6, 2020

I'd imagine any use of the default-constructed BeamSpotCUDA to be a runtime error.

That's fine by me... in fact, do we need to have a default constructor at all ?

Default constructor is needed because of e.g. how edm::Wrapper works.

@makortel
Copy link

makortel commented Aug 6, 2020

I'd imagine any use of the default-constructed BeamSpotCUDA to be a runtime error.

That's fine by me... in fact, do we need to have a default constructor at all ?

Default constructor is needed because of e.g. how edm::Wrapper works.

Written that, if we'd really really want, we could work around this requirement (e.g. with std::optional in cms::cuda::Product).

@fwyzard
Copy link
Author

fwyzard commented Aug 6, 2020

Default constructor is needed because of e.g. how edm::Wrapper works.

Does it only need to exist, or is it actually called ?
That is, would it make sense to throw an exception in the default constructor ?

@makortel
Copy link

makortel commented Aug 6, 2020

Default constructor is needed because of e.g. how edm::Wrapper works.

Does it only need to exist, or is it actually called ?
That is, would it make sense to throw an exception in the default constructor ?

The default constructor is actually called.

@fwyzard
Copy link
Author

fwyzard commented Aug 6, 2020

The default constructor is actually called.

OK, then what is your preference ?

  • the default constructor does not allocate device memory (leading to an invalid object)
  • the default constructor allocates device memory on the default stream (leading to wasted allocations)

?

@makortel
Copy link

makortel commented Aug 6, 2020

The default constructor leading to "invalid object" is clearly what we do elsewhere. But the more I think about this, the more I start to like of changing


to std::optional<T>, which would allow storing objects without a default constructor. I think we should not even need any checks for the optional having a value. (or we could "fake" the optional with std::unique_ptr, although that would lead to additional indirection).

(maybe some day also edm::Wrapper could be changed)

@fwyzard
Copy link
Author

fwyzard commented Aug 6, 2020

The default constructor leading to "invalid object" is clearly what we do elsewhere.

OK, I'll make the change here as well.

Using std::optional<T> seems reasonable - we can remove the default constructor here when it's no longer necessary.

Looking at cms::cuda::Product, the header file says that the default constructor is "needed only for ROOT dictionary generation" - so I have the same question: is Product's default constructor actually called at runtime ?

@makortel
Copy link

makortel commented Aug 6, 2020

Looking at cms::cuda::Product, the header file says that the default constructor is "needed only for ROOT dictionary generation" - so I have the same question: is Product's default constructor actually called at runtime ?

The comment is actually imprecise, the Products default constructor is called at runtime (by edm::Wrapper). Actually trying to remove it won't compile.

Make the BeamSpotCUDA movable and explicitly non-copiable (as was
already the case due to the device::unique_ptr data member).

Remove the cudaMemcpyAsync from the BeamSpotCUDA data format, and move
it to the BeamSpotToCUDA producer.
@fwyzard fwyzard force-pushed the patatrack_update_BeamSpotCUDA_data_format branch from 15bd6df to b1cac7f Compare August 7, 2020 08:44
@fwyzard
Copy link
Author

fwyzard commented Aug 7, 2020

Validation summary

Reference release CMSSW_11_2_0_pre2 at 92997c6
Development branch cms-patatrack/CMSSW_11_2_X_Patatrack at 2222f36
Testing PRs:

Validation plots

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.885502.png
zoom-136.885502.png
scan-136.885512.png
zoom-136.885512.png
scan-136.885522.png
zoom-136.885522.png

logs and nvprof/nvvp profiles

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 1010536 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 937896 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 48488 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 48128 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 26512 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 25608 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/0c8200324f38446c04a2bb42be3e2cac651de3d2/log .

@cms-patatrack cms-patatrack deleted a comment from AdrianoDee Aug 7, 2020
@makortel
Copy link

makortel commented Aug 7, 2020

Looking at cms::cuda::Product, the header file says that the default constructor is "needed only for ROOT dictionary generation" - so I have the same question: is Product's default constructor actually called at runtime ?

The comment is actually imprecise, the Products default constructor is called at runtime (by edm::Wrapper). Actually trying to remove it won't compile.

On a further look it could be the comment could actually more accurate than I indicated above. I don't think we (CMS) really call the default constructor of a data product (edm::Wrapper is constructed on each event either by moving in an object, or by forwarding the arguments to the object's constructor). Or at least DXR did not point any calls to the edm::Wrapper's default constructor.

@fwyzard
Copy link
Author

fwyzard commented Aug 7, 2020

I'm not sure why "testing" is always faster than "development", even when there are no relevant changes in the code...

@fwyzard
Copy link
Author

fwyzard commented Aug 7, 2020

No impact on physics (as expected), probably no impact on timing (I don't trust the flat 10% improvement, since it applies also to the ECAL and HCAL workflows...)

@fwyzard fwyzard merged commit 0f6e945 into cms-patatrack:CMSSW_11_2_X_Patatrack Aug 7, 2020
@fwyzard fwyzard deleted the patatrack_update_BeamSpotCUDA_data_format branch August 7, 2020 18:15
fwyzard added a commit that referenced this pull request Aug 8, 2020
Make the BeamSpotCUDA movable and explicitly non-copiable (as was
already the case due to the device::unique_ptr data member).

Remove the cudaMemcpyAsync from the BeamSpotCUDA data format, and move
it to the BeamSpotToCUDA producer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants