Follow-ups to "new Alpaka GPU framework" #40140

makortel · 2022-11-24T01:31:15Z

This issue is to track the TODO items from #39428 (in the order of appearance)

Document the interfaces in README.md (done in Add README for Alpaka modules #40690)
Some files that depend on Alpaka and FWCore/Utilities, but are specific to CMS. Ideally the package of these files should not depend on FWCore/Framework (which is the case for HeterogeneousCore/AlpakaCore), but given the CMS-specificity HeterogeneousCore/AlpakaInterface doesn't sound ideal either.
- HeterogeneousCore/AlpakaCore/interface/alpaka/ESDeviceProduct.h
- HeterogeneousCore/AlpakaCore/interface/alpaka/typelookup.h
- HeterogeneousCore/AlpakaInterface/interface/TransferToHost.h
Move the multiple device (per backend) support in EventSetup into the EventSetup system itself. This should come as part of making the framework overall aware of memory spaces. (will take long)
- This development allows to remove the check for "if the product was null on any device", which hopefully won't need any other treatment in the mean time
Make EventSetup data transfers asynchronous
device::Event::queue() const and device::Record<T>::queue() const return a non-const reference because Alpaka functions do not accept a temporary (copy) as an argument
device::Event's device-side emplace() and put() do not return edm::OrphanHandle, while the host-side overloads of the functions do. The implication is that for time being the EDProducers can not create edm::Ref (etc) on the device-side products they produce. But this way of dealing with the memory spaces would need a different kind of Ref infrastructure anyway (that would be aware of the memory space)
ALPAKA_ACCELERATOR_NAMESPACE::ProducerBase would benefit from overloading labelsForToken() for the device:: tokens
Should host-synchronous backends have a DeviceProduct branch(es) as well? (done in Simplify Alpaka ESProducer model by making the data copies to device implicit #40403)
- Assuming we continue with allowing different Alpaka backends to result in different data product types, the answer would be clear "no", because we'd have to create one for all backends supported by any $SCRAM_ARCH of the given release.
- Also otherwise it feels silly, and Allow SwitchProducer cases to declare different transient products #40104 relaxes SwitchProducer such that it won't complaint from case-EDProducers(/EDAliases) having different device-side branches
- So probably the comment could just be removed
In principle we could decide to not register the transfer-to-host transformation if (e.g.) the TransferToHost specialization is missing (comment removed in Simplify Alpaka ESProducer model by making the data copies to device implicit #40403)
- On the other hand, that would lead to inconsistency in persistable data products between the host serial backend and device backends, that could lead to problems in e.g. jobs that read multiple files that were produced in different hardware
- So probably the comment could just be removed
If we go with "model 2" of the EventSetup data products (data product class is templated over device type), the data product registration would really have to be per memory space rather than per backend
A way to not do a blocking wait in ALPAKA_ACCELERATOR_NAMESPACE::EDMetadata destructor (inherited from cms::cuda::ProductBase destructor)
How necessary it is to ensure in ALPAKA_ACCELERATOR_NAMESPACE::EDMetadata::synchronize() that the data product and the consumer reside in the same device?
- Currently we guarantee the input data product and the consumer module use the same device, and I'd (perhaps naively) think we should continue doing that. From this perspective the explicit check is redundant.
Improve the EDM stream -> device assignment logic (inherited from cms::cuda::chooseDevice())
If cms::alpakatools::TransferToHost is not specialized for a device-side data product type, the compilation error (from GCC) is about "indeterminate type". With some trickery it might be possible to give a better compilation error message, but I'm not sure if it would be worth of the effort.
ESProducer's produceHost() could, in principle, use pinned and cached memory allocations
- Although a bigger question is a tradeoff between two models for the ESProducers:
  - One ESProducer for the host-side data reformatting whose product is directly consumable by all host backends in addition to the per-device-backend ESProducers. Host memory can not be pinned (without tricks). Expressing the setup in a portable configuration needs some thought.
  - One ESProducer for each backend with both produceHost() and produceDevice(). Host-side data reformatting gets duplicated, and also on host backends the produceDevice() needs to copy the data, or have an explicit #ifdef for reusing the input data product. Could use pinned host memory. Straightforward to express in a portable configuration.
In ESProducer produceDevice() associating the cached memory allocation to the Queue is strictly speaking incorrect, because the lifetime of ESProducts can be fully controlled by the framework (there can't be any asynchronous activity going on anymore when the ESProduct is deleted at the end of IOV). Connects to New Heterogeneous Memory Pool #37952 / [Do Not Merge] New "CUDA" Memory Pool [12.4.X] #39623.
In some ESProduct models the transfer from host to device memory space could be automated (done in Simplify Alpaka ESProducer model by making the data copies to device implicit #40403)
HeterogeneousCore/AlpakaTest/test/BuildFile.xml declares dependence on cuda only to make GPU_X IBs to run the tests on GPU machine. The dependence should be changed to alpaka once the testing infrastructure runs also tests depending on alpaka on GPU_X IBs (done in Make AlpakaTest tests to use their dependencies for controlling their execution, and extend module tests to ROCm. #43204)
HeterogeneousCore/AlpakaTest/test/testAlpakaModules_cfg.py can be simplified once an implementation of the "module maker approach" for portable configuration gets in (done in Enable "alternative module loading" approach for Alpaka modules #40564)
HeterogeneousCore/AlpakaTest/test/writer.py can be simplified once Allow SwitchProducer cases to declare different transient products #40104 gets in

The text was updated successfully, but these errors were encountered:

makortel · 2022-11-24T01:31:24Z

assign heterogeneous

cmsbuild · 2022-11-24T01:31:33Z

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2022-11-24T01:31:35Z

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel · 2022-12-27T22:56:04Z

#40403 addresses the following

Should host-synchronous backends have a DeviceProduct branch(es) as well?

This comment is removed (as suggested in the further discussion in the issue description).

Although a bigger question is a tradeoff between two models for the ESProducers:

One ESProducer for the host-side data reformatting whose product is directly consumable by all host backends in addition to the per-device-backend ESProducers. Host memory can not be pinned (without tricks). Expressing the setup in a portable configuration needs some thought.

One ESProducer for each backend with both produceHost() and produceDevice(). Host-side data reformatting gets duplicated, and also on host backends the produceDevice() needs to copy the data, or have an explicit #ifdef for reusing the input data product. Could use pinned host memory. Straightforward to express in a portable configuration.

The ESProducer model is changed such that the per-backend ESProducer's produce() function contains the code to reformat the host-side data, and the copy-to-device part is implicit (similar to copy-to-host in EDProducers).

In some ESProduct models the transfer from host to device memory space could be automated

The transfer from host to device memory is automated (up to user having to provide an overload of copyToDevice() function, which is provided for all PortableCollection<TLayout>.

makortel · 2023-11-10T09:29:31Z

HeterogeneousCore/AlpakaTest/test/BuildFile.xml declares dependence on cuda only to make GPU_X IBs to run the tests on GPU machine. The dependence should be changed to alpaka once the testing infrastructure runs also tests depending on alpaka on GPU_X IBs

This is done in #43204

cmsbuild added heterogeneous-pending pending-signatures labels Nov 24, 2022

makortel mentioned this issue Jan 17, 2024

Example of Alpaka-based EventSetup data product that uses PortableCollection member variables #43682

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-ups to "new Alpaka GPU framework" #40140

Follow-ups to "new Alpaka GPU framework" #40140

makortel commented Nov 24, 2022 •

edited

makortel commented Nov 24, 2022

cmsbuild commented Nov 24, 2022

cmsbuild commented Nov 24, 2022

makortel commented Dec 27, 2022

makortel commented Nov 10, 2023

Follow-ups to "new Alpaka GPU framework" #40140

Follow-ups to "new Alpaka GPU framework" #40140

Comments

makortel commented Nov 24, 2022 • edited

makortel commented Nov 24, 2022

cmsbuild commented Nov 24, 2022

cmsbuild commented Nov 24, 2022

makortel commented Dec 27, 2022

makortel commented Nov 10, 2023

makortel commented Nov 24, 2022 •

edited