-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[13_0_X] Fix cudaMemcpyAsync
for localCoordToHostAsync
#40870
[13_0_X] Fix cudaMemcpyAsync
for localCoordToHostAsync
#40870
Conversation
A new Pull Request was created by @AdrianoDee (Adriano Di Florio) for CMSSW_13_0_X. It involves the following packages:
@cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here
|
type bugfix |
test parameters:
|
please test |
|
||
cudaCheck(cudaMemcpyAsync(ret.get(), view().xLocal(), rowSize, cudaMemcpyDefault, stream)); | ||
cudaCheck(cudaMemcpyAsync(ret.get() + nHits(), view().yLocal(), rowSize, cudaMemcpyDefault, stream)); | ||
cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 2, view().xerrLocal(), rowSize, cudaMemcpyDefault, stream)); | ||
cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 3, view().yerrLocal(), rowSize, cudaMemcpyDefault, stream)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be done in a single call as
cudaCheck(cudaMemcpyAsync(ret.get(), view().xLocal(), rowSize, cudaMemcpyDefault, stream)); | |
cudaCheck(cudaMemcpyAsync(ret.get() + nHits(), view().yLocal(), rowSize, cudaMemcpyDefault, stream)); | |
cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 2, view().xerrLocal(), rowSize, cudaMemcpyDefault, stream)); | |
cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 3, view().yerrLocal(), rowSize, cudaMemcpyDefault, stream)); | |
size_t srcPitch = ptrdiff_t(src.yLocal()) - ptrdiff_t(src.xLocal()); | |
cudaCheck(cudaMemcpy2DAsync(ret.get(), rowSize, view().xLocal(), srcPitch, rowSize, 4, cudaMemcpyDeviceToHost, stream)); |
But I admit I've never actually tried !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works :)
backport of #40869 |
-1 Failed Tests: RelVals RelValsValueError: Undefined workflows: 11634.592, 11634.596 GPU Comparison SummarySummary:
|
test parameters:
|
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b6113f/30897/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
519c140
to
1cbb29c
Compare
Pull request #40870 was updated. @cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please check and sign again. |
code-checks |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40870/34371
|
(not launching the tests for the moment since #40563 (comment)) |
urgent This fix should be included in TSG reported large CPU-vs-GPU differences in HLT results with [1] The numbers in the table are events accepted by the trigger
[2] # CMSSW_13_0_0_pre2 : /dev/CMSSW_13_0_0/GRun/V2
# CMSSW_13_0_0_pre3 : /dev/CMSSW_13_0_0/GRun/V6
# CMSSW_13_0_0_pre4 : /dev/CMSSW_13_0_0/GRun/V20
hltGetConfiguration /dev/CMSSW_13_0_0/GRun/V20 \
--globaltag 126X_dataRun3_HLT_v1 \
--data \
--no-prescale \
--no-output \
--paths HLT_Ele32_WPTight_Gsf_L1DoubleEG_v* \
--max-events 10000 \
--input \
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/276db176-1f2f-4ab6-83c9-e6c9da0fa82d.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2a67c409-d136-4f42-b0aa-906d185b84f8.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2b6f6c3a-0db3-44a6-bdc5-fe1c6adde598.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2be50467-e1cc-449f-9adc-6f2fb5d3dd72.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2c1f47f5-3bb1-4680-8b77-4aa121546d01.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2cc02a4a-506a-4b62-a56b-387f59ecdc8c.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2d5d783e-1a0f-405c-bfb4-61f87a0e9d2d.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2ec00523-9364-4447-b759-9aead39fe5a7.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2f792d66-d8d6-43ff-898b-1ac5ee551b05.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2fb10ea8-e1ca-4ae1-9e01-000ed7352ee8.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2fb62493-48fb-45a9-b313-60195343e9a5.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/349c288b-8bf7-4edf-ae2c-00e7d5ab4660.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/37b06d7a-b62a-4786-9023-1dac4b2e2c33.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/37b3068a-8982-4775-b1fd-517ce1b85d32.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/395b8860-fdb1-4b6b-83b5-727fe125ef07.root \
> hlt.py
cat <<@EOF >> hlt.py
#process.options.accelerators = ['cpu']
@EOF
cmsRun hlt.py &> hlt.log |
+heterogeneous |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b6113f/30946/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
+reconstruction |
This pull request is fully signed and it will be integrated in one of the next CMSSW_13_0_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_13_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
In
TrackingRecHitSoADevice::localCoordToHostAsync
used inSiPixelRecHitFromCUDA
to fill the legacy hits,cudaMemcpyAsync
is not taking into account the SoA layout buffer padding. So it's copying some wrong portions of memory. This was noted in #40604 and this solves it.This fix is quick and dirty, given also this CUDA to legacy copy will be dropped soon.
This is a back-port to
13_0_X
of #40869.PR validation:
Run 11634.59x.