[13_0_X] Fix `cudaMemcpyAsync` for `localCoordToHostAsync` #40870

AdrianoDee · 2023-02-24T08:21:46Z

PR description:

In TrackingRecHitSoADevice::localCoordToHostAsync used in SiPixelRecHitFromCUDA to fill the legacy hits,cudaMemcpyAsync is not taking into account the SoA layout buffer padding. So it's copying some wrong portions of memory. This was noted in #40604 and this solves it.

This fix is quick and dirty, given also this CUDA to legacy copy will be dropped soon.

This is a back-port to 13_0_X of #40869.

PR validation:

Run 11634.59x.

cmsbuild · 2023-02-24T08:22:07Z

A new Pull Request was created by @AdrianoDee (Adriano Di Florio) for CMSSW_13_0_X.

It involves the following packages:

CUDADataFormats/TrackingRecHit (heterogeneous, reconstruction)

@cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please review it and eventually sign? Thanks.
@missirol, @rovere this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

Backported from Fix cudaMemcpyAsync for localCoordToHostAsync #40869

AdrianoDee · 2023-02-24T08:26:40Z

type bugfix

AdrianoDee · 2023-02-24T08:26:45Z

test parameters:

workflows = 11634.596, 11634.592
enable = gpu
relvals_opt_gpu = --what cleanedupgrade,standard,highstats,pileup,generator,extendedgen,production,ged,machine,premix,nano

AdrianoDee · 2023-02-24T08:26:53Z

please test

fwyzard · 2023-02-24T08:39:36Z

CUDADataFormats/TrackingRecHit/interface/TrackingRecHitSoADevice.h

+
+    cudaCheck(cudaMemcpyAsync(ret.get(), view().xLocal(), rowSize, cudaMemcpyDefault, stream));
+    cudaCheck(cudaMemcpyAsync(ret.get() + nHits(), view().yLocal(), rowSize, cudaMemcpyDefault, stream));
+    cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 2, view().xerrLocal(), rowSize, cudaMemcpyDefault, stream));
+    cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 3, view().yerrLocal(), rowSize, cudaMemcpyDefault, stream));


this could be done in a single call as

Suggested change

cudaCheck(cudaMemcpyAsync(ret.get(), view().xLocal(), rowSize, cudaMemcpyDefault, stream));

cudaCheck(cudaMemcpyAsync(ret.get() + nHits(), view().yLocal(), rowSize, cudaMemcpyDefault, stream));

cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 2, view().xerrLocal(), rowSize, cudaMemcpyDefault, stream));

cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 3, view().yerrLocal(), rowSize, cudaMemcpyDefault, stream));

size_t srcPitch = ptrdiff_t(src.yLocal()) - ptrdiff_t(src.xLocal());

cudaCheck(cudaMemcpy2DAsync(ret.get(), rowSize, view().xLocal(), srcPitch, rowSize, 4, cudaMemcpyDeviceToHost, stream));

But I admit I've never actually tried !

It works :)

AdrianoDee · 2023-02-24T09:10:34Z

backport of #40869

cmsbuild · 2023-02-24T12:35:52Z

-1

Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b6113f/30871/summary.html
COMMIT: 519c140
CMSSW: CMSSW_13_0_X_2023-02-23-2300/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/40870/30871/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

ValueError: Undefined workflows: 11634.592, 11634.596

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 9 differences found in the comparisons
DQMHistoTests: Total files compared: 4
DQMHistoTests: Total histograms compared: 19862
DQMHistoTests: Total failures: 317
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 19545
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
Checked 12 log files, 9 edm output root files, 4 DQM output files
TriggerResults: found differences in 3 / 3 workflows

missirol · 2023-02-24T20:37:23Z

test parameters:

enable = gpu
workflows_gpu = 10824.592,10824.593,10824.596,10824.597,11634.592,11634.593,11634.596,11634.597
relvals_opt_gpu = --what upgrade,standard,highstats,pileup,generator,extendedgen,production,ged,machine,premix,nano

missirol · 2023-02-24T20:37:51Z

please test

cmsbuild · 2023-02-24T23:01:14Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b6113f/30897/summary.html
COMMIT: 519c140
CMSSW: CMSSW_13_0_X_2023-02-24-1100/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/40870/30897/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially added 9 lines to the logs
Reco comparison results: 7 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3556944
DQMHistoTests: Total failures: 3
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3556919
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
Checked 213 log files, 164 edm output root files, 49 DQM output files
TriggerResults: no differences found

GPU Comparison Summary

Summary:

You potentially removed 267 lines from the logs
ROOTFileChecks: Some differences in event products or their sizes found
Reco comparison results: 48 differences found in the comparisons
DQMHistoTests: Total files compared: 8
DQMHistoTests: Total histograms compared: 495800
DQMHistoTests: Total failures: 1164
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 494636
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 7 files compared)
Checked 40 log files, 33 edm output root files, 8 DQM output files
TriggerResults: found differences in 3 / 7 workflows

cmsbuild · 2023-02-27T09:19:17Z

Pull request #40870 was updated. @cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please check and sign again.

AdrianoDee · 2023-02-27T14:00:00Z

code-checks

cmsbuild · 2023-02-27T14:05:43Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40870/34371

This PR adds an extra 12KB to repository

AdrianoDee · 2023-02-27T14:12:25Z

(not launching the tests for the moment since #40563 (comment))

missirol · 2023-02-28T09:07:01Z

urgent

This fix should be included in CMSSW_13_0_0 (I guess I'm stating the obvious).

TSG reported large CPU-vs-GPU differences in HLT results with 13_0_0_pre4, and a quick check [1] shows that this fix reduces them.

[1]
Input: 10k events of run-362616 (Run2022G).
HLT menu: latest GRun menu of the given pre-release.
Example for 13_0_0_pre4 in [2].

The numbers in the table are events accepted by the trigger HLT_Ele32_WPTight_Gsf_L1DoubleEG_v* (chosen arbitrarily as an example). CMSSW_13_0_0_pre3 is the first (pre-)release which includes #40465.

CMSSW version	CPU	GPU
`CMSSW_13_0_0_pre2`	19	19
`CMSSW_13_0_0_pre3`	19	3
`CMSSW_13_0_0_pre4`	19	2
`CMSSW_13_0_0_pre4` + #40870	19	19

[2]

# CMSSW_13_0_0_pre2 : /dev/CMSSW_13_0_0/GRun/V2
# CMSSW_13_0_0_pre3 : /dev/CMSSW_13_0_0/GRun/V6
# CMSSW_13_0_0_pre4 : /dev/CMSSW_13_0_0/GRun/V20

hltGetConfiguration /dev/CMSSW_13_0_0/GRun/V20 \
   --globaltag 126X_dataRun3_HLT_v1 \
   --data \
   --no-prescale \
   --no-output \
   --paths HLT_Ele32_WPTight_Gsf_L1DoubleEG_v* \
   --max-events 10000 \
   --input \
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/276db176-1f2f-4ab6-83c9-e6c9da0fa82d.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2a67c409-d136-4f42-b0aa-906d185b84f8.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2b6f6c3a-0db3-44a6-bdc5-fe1c6adde598.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2be50467-e1cc-449f-9adc-6f2fb5d3dd72.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2c1f47f5-3bb1-4680-8b77-4aa121546d01.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2cc02a4a-506a-4b62-a56b-387f59ecdc8c.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2d5d783e-1a0f-405c-bfb4-61f87a0e9d2d.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2ec00523-9364-4447-b759-9aead39fe5a7.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2f792d66-d8d6-43ff-898b-1ac5ee551b05.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2fb10ea8-e1ca-4ae1-9e01-000ed7352ee8.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2fb62493-48fb-45a9-b313-60195343e9a5.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/349c288b-8bf7-4edf-ae2c-00e7d5ab4660.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/37b06d7a-b62a-4786-9023-1dac4b2e2c33.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/37b3068a-8982-4775-b1fd-517ce1b85d32.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/395b8860-fdb1-4b6b-83b5-727fe125ef07.root \
   > hlt.py

cat <<@EOF >> hlt.py
#process.options.accelerators = ['cpu']
@EOF

cmsRun hlt.py &> hlt.log

fwyzard · 2023-02-28T10:21:43Z

+heterogeneous

cmsbuild · 2023-02-28T13:04:06Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b6113f/30946/summary.html
COMMIT: 013581a
CMSSW: CMSSW_13_0_X_2023-02-27-2300/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40870/30946/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially added 5 lines to the logs
Reco comparison results: 6 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3557190
DQMHistoTests: Total failures: 3
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3557165
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
Checked 213 log files, 164 edm output root files, 49 DQM output files
TriggerResults: no differences found

GPU Comparison Summary

Summary:

You potentially removed 227 lines from the logs
ROOTFileChecks: Some differences in event products or their sizes found
Reco comparison results: 160 differences found in the comparisons
DQMHistoTests: Total files compared: 8
DQMHistoTests: Total histograms compared: 495800
DQMHistoTests: Total failures: 10434
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 485366
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 7 files compared)
Checked 40 log files, 33 edm output root files, 8 DQM output files
TriggerResults: found differences in 3 / 7 workflows

clacaputo · 2023-02-28T13:19:14Z

+reconstruction

cmsbuild · 2023-02-28T13:19:38Z

This pull request is fully signed and it will be integrated in one of the next CMSSW_13_0_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_13_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

perrotta · 2023-03-01T09:04:43Z

+1

cmsbuild added this to the CMSSW_13_0_X milestone Feb 24, 2023

cmsbuild added reconstruction-pending pending-signatures tests-pending orp-pending heterogeneous-pending tracking labels Feb 24, 2023

cmsbuild added tests-started bug-fix and removed tests-pending labels Feb 24, 2023

fwyzard reviewed Feb 24, 2023

View reviewed changes

cmsbuild added the backport label Feb 24, 2023

cmsbuild added tests-rejected and removed tests-started labels Feb 24, 2023

cmsbuild added tests-started and removed tests-rejected labels Feb 24, 2023

cmsbuild added tests-approved and removed tests-started labels Feb 24, 2023

Proper memcopy for localCoordToHostAsync

1cbb29c

AdrianoDee force-pushed the fix_localCoordToHostAsync branch from 519c140 to 1cbb29c Compare February 27, 2023 09:04

cmsbuild removed the tests-approved label Feb 27, 2023

cmsbuild added the urgent label Feb 28, 2023

perrotta mentioned this pull request Feb 28, 2023

Fix cudaMemcpyAsync for localCoordToHostAsync #40869

Merged

cmsbuild added tests-started heterogeneous-approved and removed tests-pending heterogeneous-pending labels Feb 28, 2023

cmsbuild added tests-approved and removed tests-started labels Feb 28, 2023

cmsbuild added reconstruction-approved fully-signed and removed reconstruction-pending pending-signatures labels Feb 28, 2023

cmsbuild added backport-ok and removed backport labels Feb 28, 2023

cmsbuild added orp-approved and removed orp-pending labels Mar 1, 2023

cmsbuild merged commit 89e3634 into cms-sw:CMSSW_13_0_X Mar 1, 2023

AdrianoDee mentioned this pull request Mar 2, 2023

Segfaults in RecHitsSortedInPhi constructor in GPU workflows #40604

Closed

AdrianoDee deleted the fix_localCoordToHostAsync branch March 2, 2023 11:08

cmsbuild mentioned this pull request Mar 2, 2023

[BuildRules] Enable alpaka ROCm backend [13.0.x] cms-sw/cmsdist#8347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[13_0_X] Fix `cudaMemcpyAsync` for `localCoordToHostAsync` #40870

[13_0_X] Fix `cudaMemcpyAsync` for `localCoordToHostAsync` #40870

AdrianoDee commented Feb 24, 2023 •

edited

Loading

cmsbuild commented Feb 24, 2023 •

edited

Loading

AdrianoDee commented Feb 24, 2023

AdrianoDee commented Feb 24, 2023

AdrianoDee commented Feb 24, 2023

fwyzard Feb 24, 2023

AdrianoDee Feb 27, 2023

AdrianoDee commented Feb 24, 2023

cmsbuild commented Feb 24, 2023

missirol commented Feb 24, 2023 •

edited

Loading

missirol commented Feb 24, 2023

cmsbuild commented Feb 24, 2023

cmsbuild commented Feb 27, 2023

AdrianoDee commented Feb 27, 2023

cmsbuild commented Feb 27, 2023

AdrianoDee commented Feb 27, 2023

missirol commented Feb 28, 2023

fwyzard commented Feb 28, 2023

cmsbuild commented Feb 28, 2023

clacaputo commented Feb 28, 2023

cmsbuild commented Feb 28, 2023

perrotta commented Mar 1, 2023

[13_0_X] Fix cudaMemcpyAsync for localCoordToHostAsync #40870

[13_0_X] Fix cudaMemcpyAsync for localCoordToHostAsync #40870

Conversation

AdrianoDee commented Feb 24, 2023 • edited Loading

PR description:

PR validation:

cmsbuild commented Feb 24, 2023 • edited Loading

AdrianoDee commented Feb 24, 2023

AdrianoDee commented Feb 24, 2023

AdrianoDee commented Feb 24, 2023

fwyzard Feb 24, 2023

Choose a reason for hiding this comment

AdrianoDee Feb 27, 2023

Choose a reason for hiding this comment

AdrianoDee commented Feb 24, 2023

cmsbuild commented Feb 24, 2023

RelVals

GPU Comparison Summary

missirol commented Feb 24, 2023 • edited Loading

missirol commented Feb 24, 2023

cmsbuild commented Feb 24, 2023

Comparison Summary

GPU Comparison Summary

cmsbuild commented Feb 27, 2023

AdrianoDee commented Feb 27, 2023

cmsbuild commented Feb 27, 2023

AdrianoDee commented Feb 27, 2023

missirol commented Feb 28, 2023

fwyzard commented Feb 28, 2023

cmsbuild commented Feb 28, 2023

Comparison Summary

GPU Comparison Summary

clacaputo commented Feb 28, 2023

cmsbuild commented Feb 28, 2023

perrotta commented Mar 1, 2023

[13_0_X] Fix `cudaMemcpyAsync` for `localCoordToHostAsync` #40870

[13_0_X] Fix `cudaMemcpyAsync` for `localCoordToHostAsync` #40870

AdrianoDee commented Feb 24, 2023 •

edited

Loading

cmsbuild commented Feb 24, 2023 •

edited

Loading

missirol commented Feb 24, 2023 •

edited

Loading