Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[13_0_X] Fix cudaMemcpyAsync for localCoordToHostAsync #40870

Merged
merged 2 commits into from
Mar 1, 2023

Conversation

AdrianoDee
Copy link
Contributor

@AdrianoDee AdrianoDee commented Feb 24, 2023

PR description:

In TrackingRecHitSoADevice::localCoordToHostAsync used in SiPixelRecHitFromCUDA to fill the legacy hits,cudaMemcpyAsync is not taking into account the SoA layout buffer padding. So it's copying some wrong portions of memory. This was noted in #40604 and this solves it.

This fix is quick and dirty, given also this CUDA to legacy copy will be dropped soon.

This is a back-port to 13_0_X of #40869.

PR validation:

Run 11634.59x.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 24, 2023

A new Pull Request was created by @AdrianoDee (Adriano Di Florio) for CMSSW_13_0_X.

It involves the following packages:

  • CUDADataFormats/TrackingRecHit (heterogeneous, reconstruction)

@cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please review it and eventually sign? Thanks.
@missirol, @rovere this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@AdrianoDee
Copy link
Contributor Author

type bugfix

@AdrianoDee
Copy link
Contributor Author

test parameters:

  • workflows = 11634.596, 11634.592
  • enable = gpu
  • relvals_opt_gpu = --what cleanedupgrade,standard,highstats,pileup,generator,extendedgen,production,ged,machine,premix,nano

@AdrianoDee
Copy link
Contributor Author

please test

Comment on lines 51 to 55

cudaCheck(cudaMemcpyAsync(ret.get(), view().xLocal(), rowSize, cudaMemcpyDefault, stream));
cudaCheck(cudaMemcpyAsync(ret.get() + nHits(), view().yLocal(), rowSize, cudaMemcpyDefault, stream));
cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 2, view().xerrLocal(), rowSize, cudaMemcpyDefault, stream));
cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 3, view().yerrLocal(), rowSize, cudaMemcpyDefault, stream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be done in a single call as

Suggested change
cudaCheck(cudaMemcpyAsync(ret.get(), view().xLocal(), rowSize, cudaMemcpyDefault, stream));
cudaCheck(cudaMemcpyAsync(ret.get() + nHits(), view().yLocal(), rowSize, cudaMemcpyDefault, stream));
cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 2, view().xerrLocal(), rowSize, cudaMemcpyDefault, stream));
cudaCheck(cudaMemcpyAsync(ret.get() + nHits() * 3, view().yerrLocal(), rowSize, cudaMemcpyDefault, stream));
size_t srcPitch = ptrdiff_t(src.yLocal()) - ptrdiff_t(src.xLocal());
cudaCheck(cudaMemcpy2DAsync(ret.get(), rowSize, view().xLocal(), srcPitch, rowSize, 4, cudaMemcpyDeviceToHost, stream));

But I admit I've never actually tried !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works :)

@AdrianoDee
Copy link
Contributor Author

backport of #40869

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b6113f/30871/summary.html
COMMIT: 519c140
CMSSW: CMSSW_13_0_X_2023-02-23-2300/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/40870/30871/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

ValueError: Undefined workflows: 11634.592, 11634.596

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 9 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19862
  • DQMHistoTests: Total failures: 317
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19545
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 3 / 3 workflows

@missirol
Copy link
Contributor

missirol commented Feb 24, 2023

test parameters:

  • enable = gpu
  • workflows_gpu = 10824.592,10824.593,10824.596,10824.597,11634.592,11634.593,11634.596,11634.597
  • relvals_opt_gpu = --what upgrade,standard,highstats,pileup,generator,extendedgen,production,ged,machine,premix,nano

@missirol
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b6113f/30897/summary.html
COMMIT: 519c140
CMSSW: CMSSW_13_0_X_2023-02-24-1100/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/40870/30897/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 9 lines to the logs
  • Reco comparison results: 7 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3556944
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3556919
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • You potentially removed 267 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 48 differences found in the comparisons
  • DQMHistoTests: Total files compared: 8
  • DQMHistoTests: Total histograms compared: 495800
  • DQMHistoTests: Total failures: 1164
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 494636
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 7 files compared)
  • Checked 40 log files, 33 edm output root files, 8 DQM output files
  • TriggerResults: found differences in 3 / 7 workflows

@cmsbuild
Copy link
Contributor

Pull request #40870 was updated. @cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please check and sign again.

@AdrianoDee
Copy link
Contributor Author

code-checks

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40870/34371

  • This PR adds an extra 12KB to repository

@AdrianoDee
Copy link
Contributor Author

(not launching the tests for the moment since #40563 (comment))

@missirol
Copy link
Contributor

urgent

This fix should be included in CMSSW_13_0_0 (I guess I'm stating the obvious).

TSG reported large CPU-vs-GPU differences in HLT results with 13_0_0_pre4, and a quick check [1] shows that this fix reduces them.

[1]
Input: 10k events of run-362616 (Run2022G).
HLT menu: latest GRun menu of the given pre-release.
Example for 13_0_0_pre4 in [2].

The numbers in the table are events accepted by the trigger HLT_Ele32_WPTight_Gsf_L1DoubleEG_v* (chosen arbitrarily as an example). CMSSW_13_0_0_pre3 is the first (pre-)release which includes #40465.

CMSSW version CPU GPU
CMSSW_13_0_0_pre2 19 19
CMSSW_13_0_0_pre3 19 3
CMSSW_13_0_0_pre4 19 2
CMSSW_13_0_0_pre4 + #40870 19 19

[2]

# CMSSW_13_0_0_pre2 : /dev/CMSSW_13_0_0/GRun/V2
# CMSSW_13_0_0_pre3 : /dev/CMSSW_13_0_0/GRun/V6
# CMSSW_13_0_0_pre4 : /dev/CMSSW_13_0_0/GRun/V20

hltGetConfiguration /dev/CMSSW_13_0_0/GRun/V20 \
   --globaltag 126X_dataRun3_HLT_v1 \
   --data \
   --no-prescale \
   --no-output \
   --paths HLT_Ele32_WPTight_Gsf_L1DoubleEG_v* \
   --max-events 10000 \
   --input \
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/276db176-1f2f-4ab6-83c9-e6c9da0fa82d.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2a67c409-d136-4f42-b0aa-906d185b84f8.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2b6f6c3a-0db3-44a6-bdc5-fe1c6adde598.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2be50467-e1cc-449f-9adc-6f2fb5d3dd72.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2c1f47f5-3bb1-4680-8b77-4aa121546d01.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2cc02a4a-506a-4b62-a56b-387f59ecdc8c.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2d5d783e-1a0f-405c-bfb4-61f87a0e9d2d.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2ec00523-9364-4447-b759-9aead39fe5a7.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2f792d66-d8d6-43ff-898b-1ac5ee551b05.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2fb10ea8-e1ca-4ae1-9e01-000ed7352ee8.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/2fb62493-48fb-45a9-b313-60195343e9a5.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/349c288b-8bf7-4edf-ae2c-00e7d5ab4660.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/37b06d7a-b62a-4786-9023-1dac4b2e2c33.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/37b3068a-8982-4775-b1fd-517ce1b85d32.root,\
/store/data/Run2022G/EphemeralHLTPhysics0/RAW/v1/000/362/616/00000/395b8860-fdb1-4b6b-83b5-727fe125ef07.root \
   > hlt.py

cat <<@EOF >> hlt.py
#process.options.accelerators = ['cpu']
@EOF

cmsRun hlt.py &> hlt.log

@fwyzard
Copy link
Contributor

fwyzard commented Feb 28, 2023

+heterogeneous

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b6113f/30946/summary.html
COMMIT: 013581a
CMSSW: CMSSW_13_0_X_2023-02-27-2300/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40870/30946/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 5 lines to the logs
  • Reco comparison results: 6 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3557190
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3557165
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • You potentially removed 227 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 160 differences found in the comparisons
  • DQMHistoTests: Total files compared: 8
  • DQMHistoTests: Total histograms compared: 495800
  • DQMHistoTests: Total failures: 10434
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 485366
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 7 files compared)
  • Checked 40 log files, 33 edm output root files, 8 DQM output files
  • TriggerResults: found differences in 3 / 7 workflows

@clacaputo
Copy link
Contributor

+reconstruction

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_13_0_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_13_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented Mar 1, 2023

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants