Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Alpaka ESProducer model by making the data copies to device implicit #40403

Merged
merged 3 commits into from Feb 3, 2023

Conversation

makortel
Copy link
Contributor

@makortel makortel commented Dec 22, 2022

PR description:

This PR simplifies the Alpaka ESProducer model by making it more strict towards the use case of being able to to use the host backend(s) and a device backend (of one platform) in the same job (for which a test configuration is added).

In this PR this goal is achieved by making each Alpaka ESProducer to produce the "simplified" ESProduct in the host memory, after which the ESProduct is implicitly copied to the memory of the device of the ESProducer's backend (if the job has a device memory space consumer for the ESProduct). The host-side ESProduct is also available to be used.

The TransferToHost class template specialization (introduced) in #39248 is replaced with copyToHostAsync(TQueue&, TSrc const&) -> TDst function overload. For host-to-device copies a similar copyToDeviceAsync(TQueue&, TSrc const&) -> TDst function overload is required. Here TSrc is the data product type in the source memory space, and TDst the type in the destination memory space. (I felt the function overload to be a little bit simpler and more expressive than the earlier class template specialization approach) renamed to CopyToHost, and analoguous CopyToDevice is added for host-to-device copies. (reason for moving back to class template specialization from function overloads is described in #40403 (comment))

In practice the ALPAKA_ACCELERATOR_NAMESPACE::ESProducer::setWhatProduced() (for host-side Records) registers another callback function (implemented as lambda) to copy the host-side data product to the device memory. For host backends the copy is avoided, as the data product produced by the user-registered callback is used directly.

The earlier "ESProduct model 1", where the re-formatted host ESProduct was not accessible for other purposes than the data copy, is removed because of being incompatible with the model here. The PortableCollection-based "model 3" is renamed to "model 1" in the comments throughout the code.

Given that this change comes in quite late in the porting effort, I'd be happy to help any porting project to migrate from the previous ESProducer model (from #39248) to this one.

PR validation:

Unit tests pass on machines with and without GPUs.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40403/33525

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40403/33526

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel (Matti Kortelainen) for master.

It involves the following packages:

  • DataFormats/Portable (heterogeneous)
  • HeterogeneousCore/AlpakaCore (heterogeneous)
  • HeterogeneousCore/AlpakaInterface (heterogeneous)
  • HeterogeneousCore/AlpakaTest (heterogeneous)

@cmsbuild, @makortel, @fwyzard can you please review it and eventually sign? Thanks.
@missirol, @rovere this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

enable gpu

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fa4161/29762/summary.html
COMMIT: 047982c
CMSSW: CMSSW_13_0_X_2022-12-22-1100/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40403/29762/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555748
  • DQMHistoTests: Total failures: 157
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3555569
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19934
  • DQMHistoTests: Total failures: 85
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19849
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

@fwyzard Do you have any comments?

@makortel
Copy link
Contributor Author

makortel commented Feb 2, 2023

Rebased and fixed conflicts.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 2, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40403/34024

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 2, 2023

Pull request #40403 was updated. @cmsbuild, @makortel, @fwyzard can you please check and sign again.

@makortel
Copy link
Contributor Author

makortel commented Feb 2, 2023

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2023

-1

Failed Tests: RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fa4161/30363/summary.html
COMMIT: af62750
CMSSW: CMSSW_13_0_X_2023-02-02-1100/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40403/30363/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-INPUT

The relvals timed out after 4 hours.

Comparison Summary

Summary:

  • You potentially added 20 lines to the logs
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555495
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3555467
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 9 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19862
  • DQMHistoTests: Total failures: 281
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19581
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 3 / 3 workflows

@perrotta
Copy link
Contributor

perrotta commented Feb 3, 2023

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fa4161/30376/summary.html
COMMIT: af62750
CMSSW: CMSSW_13_0_X_2023-02-02-2300/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40403/30376/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555495
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3555470
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19862
  • DQMHistoTests: Total failures: 64
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19798
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 2 / 3 workflows

@makortel
Copy link
Contributor Author

makortel commented Feb 3, 2023

+heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2023

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented Feb 3, 2023

+1

@cmsbuild cmsbuild merged commit 2c5a031 into cms-sw:master Feb 3, 2023
@makortel makortel deleted the alpakaESProducer3 branch February 3, 2023 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants