New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify Alpaka ESProducer model by making the data copies to device implicit #40403
Conversation
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40403/33525
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
2e8a2cd
to
047982c
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40403/33526
|
A new Pull Request was created by @makortel (Matti Kortelainen) for master. It involves the following packages:
@cmsbuild, @makortel, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
enable gpu |
@cmsbuild, please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fa4161/29762/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
@fwyzard Do you have any comments? |
Rebased and fixed conflicts. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40403/34024
|
@cmsbuild, please test |
-1 Failed Tests: RelVals-INPUT RelVals-INPUTThe relvals timed out after 4 hours. Comparison SummarySummary:
GPU Comparison SummarySummary:
|
@cmsbuild, please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fa4161/30376/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
+heterogeneous |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
This PR simplifies the Alpaka ESProducer model by making it more strict towards the use case of being able to to use the host backend(s) and a device backend (of one platform) in the same job (for which a test configuration is added).
In this PR this goal is achieved by making each Alpaka ESProducer to produce the "simplified" ESProduct in the host memory, after which the ESProduct is implicitly copied to the memory of the device of the ESProducer's backend (if the job has a device memory space consumer for the ESProduct). The host-side ESProduct is also available to be used.
The
TransferToHost
class template specialization (introduced) in #39248 isreplaced withrenamed tocopyToHostAsync(TQueue&, TSrc const&) -> TDst
function overload. For host-to-device copies a similarcopyToDeviceAsync(TQueue&, TSrc const&) -> TDst
function overload is required. HereTSrc
is the data product type in the source memory space, andTDst
the type in the destination memory space. (I felt the function overload to be a little bit simpler and more expressive than the earlier class template specialization approach)CopyToHost
, and analoguousCopyToDevice
is added for host-to-device copies. (reason for moving back to class template specialization from function overloads is described in #40403 (comment))In practice the
ALPAKA_ACCELERATOR_NAMESPACE::ESProducer::setWhatProduced()
(for host-side Records) registers another callback function (implemented as lambda) to copy the host-side data product to the device memory. For host backends the copy is avoided, as the data product produced by the user-registered callback is used directly.The earlier "ESProduct model 1", where the re-formatted host ESProduct was not accessible for other purposes than the data copy, is removed because of being incompatible with the model here. The
PortableCollection
-based "model 3" is renamed to "model 1" in the comments throughout the code.Given that this change comes in quite late in the porting effort, I'd be happy to help any porting project to migrate from the previous ESProducer model (from #39248) to this one.
PR validation:
Unit tests pass on machines with and without GPUs.