Add a way to specify compute accelerators in the configuration #31760

makortel · 2020-10-12T19:35:14Z

Currently the CUDA tooling relies on auto-discovery of resources (honoring $CUDA_VISIBLE_DEVICES), but we should find a way to specify in configuration

forcing the SwitchProducer choice (e.g. with SwitchProducerCUDA either cpu or cuda), and
specifying the compute device(s) to be used

The information should propagate to both SwitchProducer(s), that dictate which case(s) of module chains will be run (at the configuration level), and to CUDAService (and similar) that provide finer-grained resource information to the C++ code (list of actual devices).

With $CUDA_VISIBLE_DEVICES alone it is not possible to force a configuration to use cuda on a machine without GPU (the forcing itself would be useful for testing, and running such configuration on a machine without GPU would be an error that should get reported somehow).

The text was updated successfully, but these errors were encountered:

makortel · 2020-10-12T19:35:22Z

assign core,heterogeneous

cmsbuild · 2020-10-12T19:35:30Z

New categories assigned: heterogeneous,core

@Dr15Jones,@smuzaffar,@makortel,@makortel,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2020-10-12T19:35:32Z

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel · 2020-10-12T19:36:00Z

cms-patatrack#542 demonstrates a hacky way to force the SwitchProducer choice.

makortel · 2020-10-12T20:11:32Z

Thinking out loud (names are bad and long etc):

The SwitchProduder choice(s) could be specified along

process.options.SwitchProducers.SwitchProducerCUDA.choose = cms.untracked.string("cuda")
# or forward-thinking the possibility of event-by-event decisions
process.options.SwitchProducers.SwitchProduderCUDA.choices = cms.untracked.vstring("cuda")

The compute devices could be specified along

process.CUDAService.devices = cms.untracked.vint32(0, 1, 2)

# to disable
process.CUDAService.devices = cms.untracked.vint32()
# to allow everything available, default?
process.CUDAService.devices = cms.untracked.vint32(-1)

This option would supersede the current CUDAService.enable.

It could handy if the SwitchProducerCUDA could make use of this parameter as well, along

if empty, or process.CUDAService does not exist, choice will be cpu
if non-empty, choice will be cuda (or "cpu and cuda" if we get to event-by-event choice)

Currently the SwitchProducer does not have access to the Process object at the point where the choice is made (and I'm not sure if giving access to the full Process there would be a good idea). Also the code in FWCore/ParameterSet should stay generic, so it would be only SwitchProducerCUDA who knows to look for process.CUDAService. I also gave a though on using process.options to specify the devices (along process.options.offload.cuda.devices = cms.untracked.vint32(0, 1)), but I'm not sure if it would bring any value over using process.CUDAService.

fwyzard · 2020-10-12T20:24:07Z

hi Matti,
why is it better to do something like

process.options.SwitchProducers.SwitchProducerCUDA.choose = cms.untracked.string("cuda")

rather than

SwitchProducerCUDA.choose = cms.untracked.string("cuda")

?

fwyzard · 2020-10-12T20:25:40Z

About:

The compute devices could be specified along

process.CUDAService.devices = cms.untracked.vint32(0, 1, 2)

# to disable
process.CUDAService.devices = cms.untracked.vint32()
# to allow everything available, default?
process.CUDAService.devices = cms.untracked.vint32(-1)

This option would supersede the current CUDAService.enable.

I would prefer to keep CUDAService.enable as it is, and use an empty devices list to specify the default behaviour, that is using all available devices.

fwyzard · 2020-10-12T20:29:25Z

By the way, is this strictly about CUDA, or a more general approach ?

Something like SYCL/oneAPI does not really enumerate the devices as ordinal numbers; rather, it uses a combination of backend (e.g. OpenCL vs CUDA), device type (CPU vs GPU), vendor and device name.

Which makes it more powerful, and a lot more complicated to implement in our software.

fwyzard · 2020-10-12T20:30:52Z

Currently the SwitchProducer does not have access to the Process object at the point where the choice is made (and I'm not sure if giving access to the full Process there would be a good idea).

Would it work to put the whole configuration in the SwitchProducerCUDA, and make the CUDAService pull it from there ?

makortel · 2020-10-12T21:31:03Z

why is it better to do something like

process.options.SwitchProducers.SwitchProducerCUDA.choose = cms.untracked.string("cuda")

rather than

SwitchProducerCUDA.choose = cms.untracked.string("cuda")

?

If by SwitchProducerCUDA.choose you mean something like a class variable (since there is no process.SwitchProducerCUDA), that would not be visible e.g. in edmConfigDump. If you mean an instance variable, then every SwitchProducerCUDA instance in the Process would have to be configured in the same way.

makortel · 2020-10-12T21:36:50Z

By the way, is this strictly about CUDA, or a more general approach ?

I'd like to end up in a solution we think could be later extended to SYCL as well.

Something like SYCL/oneAPI does not really enumerate the devices as ordinal numbers; rather, it uses a combination of backend (e.g. OpenCL vs CUDA), device type (CPU vs GPU), vendor and device name.

Which makes it more powerful, and a lot more complicated to implement in our software.

At the lowest level I think that's fine (just replace cms.vint32 with cms.vstring), but I can imagine e.g. at higher level figuring out the proper strings to be challenging.

makortel · 2020-10-12T21:38:46Z

Currently the SwitchProducer does not have access to the Process object at the point where the choice is made (and I'm not sure if giving access to the full Process there would be a good idea).

Would it work to put the whole configuration in the SwitchProducerCUDA, and make the CUDAService pull it from there ?

The natural dependence goes other way around. One can use CUDAService without SwitchProducerCUDA, e.g. for a configuration that always requires CUDA. But use of SwitchProcucerCUDA in practice implies the use of CUDAService (even if they don't strictly depend on each other, which also causes some duplication of the discovery mechanism).

In the long term we could also end up not using SwitchProducer, e.g. if with SYCL a "one module for all backends" would work better than the current "each backend has its own module" approach (last bullet of #28576 (comment)).

fwyzard · 2020-10-12T21:50:57Z

If by SwitchProducerCUDA.choose you mean something like a class variable (since there is no process.SwitchProducerCUDA), that would not be visible e.g. in edmConfigDump.

If this is the only concern it should be easy to fix: just like the process knows to add

from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA

it could check if SwitchProducerCUDA.choose is not None, and add instead

from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
SwitchProducerCUDA.choose = "cuda"

fwyzard · 2020-10-12T21:58:23Z

Would it work to put the whole configuration in the SwitchProducerCUDA, and make the CUDAService pull it from there ?

The natural dependence goes other way around. One can use CUDAService without SwitchProducerCUDA, e.g. for a configuration that always requires CUDA.

Indeed... at the moment some modules need the CUDAService (disabled) even when we don't use CUDA.

But use of SwitchProcucerCUDA in practice implies the use of CUDAService (even if they don't strictly depend on each other, which also causes some duplication of the discovery mechanism).

The reason I suggested it is because it seems difficult for the SwitchProducerCUDA to extract information from the CUDAService, while the CUDAService might have ways of querying the SwitchProducerCUDA configuration.

Otherwise we could add a new process.options.CUDA = cms.untracked.PSet(...) with all the information, and make both the CUDAService and the SwitchProducerCUDA query it ?

makortel · 2020-10-12T22:03:38Z

If by SwitchProducerCUDA.choose you mean something like a class variable (since there is no process.SwitchProducerCUDA), that would not be visible e.g. in edmConfigDump.

If this is the only concern it should be easy to fix: just like the process knows to add
from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
it could check if SwitchProducerCUDA.choose is not None, and add instead
from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
SwitchProducerCUDA.choose = "cuda"

For edmConfigDump specifically yes. But it would imply that customizations would not be possible with the Process object alone, but in addition one would have to import a specific object (strictly speaking not true, any SwitchProducerCUDA object in the Process could be used to set the class variable, but I'm a bit afraid that would be confusing; also generally one would have to find one such object instead of knowing directly which knob to tune).

makortel · 2020-10-12T22:08:56Z

Otherwise we could add a new process.options.CUDA = cms.untracked.PSet(...) with all the information, and make both the CUDAService and the SwitchProducerCUDA query it ?

Something like that is indeed one option. On the other hand SwitchProducer would have to be extended in some way to be able to read configuration options outside of itself, technically it would not matter much if it reads process.options or process.CUDAService (but there may be other reasons to favor process.options).

fwyzard · 2020-10-13T05:24:13Z

But it would imply that customizations would not be possible with the Process object alone, but in addition one would have to import a specific object

That's not very different from customisations that take the process as input, and have to import the modules / sequences / tasks they add to it:

def customiseLoadCUDAService(process):
    from HeterogeneousCore.CUDAServices.CUDAService_cfi import CUDAService
    process.CUDAService = cms.Service("CUDAService", ...)

    return process

vs

def customiseForceCUDA(process):
    from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
    SwitchProducerCUDA.choose = 'cuda'

    return process

The latter does not really need the process parameter, but it can be added for consistency with all other customisation functions.

fwyzard · 2020-10-13T05:24:47Z

Anyway, whatever the options are, let's just not pick something that ends up with a syntax too cumbersome to use.

fwyzard · 2021-04-30T15:23:09Z

Coming back to this, maybe we should keep these two things separate:

a way to restrict a SwitchProducer to only one (or more) branch(es)
a way to limit the devices available to CMSSW

For example, a SwitchProducerCUDA could be configured to follow only the cpu branch (and thus ignore any GPU), or only the cuda branch (and thus require a GPU to be present) or be left free to choose either.

A hypothetical SwitchProducerAlpaka with the serial, tbb and cuda options could be configured to allow only the tbb or cuda ones, etc.

Independently, a CMSSW job can have access any number of CPU cores, any number of CUDA GPUs, any number of SYCL devices, etc.

IMHO this is best handled outside of the job (e.g. via cgroups, taskset, or environment variables ¹ ²), because the actual list of available devices is likely to change from machine to machine.

If we do decide to implement some kind of device selection in CMSSW, I'd prefer to make it orthogonal to the SwitchProducer choice. If their combination results in an unrunnable configuration (e.g. by disabling all GPUs while requiring the cuda branch) the jobs can fail, hopefully with a descriptive error.

¹ CUDA_VISIBLE_DEVICES can be used to limit the CUDA devices available to the runtime: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
² SYCL_DEVICE_FILTER can be used to limit the SYCL devices available to the runtime: https://intel.github.io/llvm-docs/EnvironmentVariables.html#sycl_device_filter

makortel · 2021-05-04T19:38:36Z

Good point that there can be cases where the SwitchProducer case choice can not be derived from the set of available devices (like Alpaka serial vs tbb).

And perhaps indeed it is best (certainly easiest from the application perspective) to not create a configuration mechanism for whatever devices. (at least until a real motivation for such comes up)

makortel · 2021-10-01T14:28:27Z

I see had forgotten to add one idea for "forcing SwitchProducer choice" (from a chat with @Dr15Jones some time ago). We could make each SwitchProducer instance configurable on choice, e.g. setCase_("cuda") or forceCase_("cuda") (in general in some cases being able to force a case instance-by-instance can make sense, e.g. different ways to run on CPU). To make it easy to set it for all SwitchProducers of a given type, we could add a function to Process along process.setSwitchProducerCaseForAll("SwitchProducerCUDA", "cuda").

fwyzard · 2021-10-01T14:31:44Z

Would this be persisted across edmConfigDump or pickling/unpickling ?

makortel · 2021-10-01T14:35:50Z

Good question. My first thought is that it should be persistent in those ways, because it is set explicitly.

(maybe it should also be possible to unset it, e.g. passing None to those functions)

makortel · 2021-10-02T00:29:47Z

Based on #31760 (comment) and #31760 (comment) I crafted #35510.

makortel · 2022-01-12T21:39:18Z

#36699 takes another attempt, this time adding process.options.accelerators = cms.untracked.vstring(), and adding a new concept of ProcessAccelerator.

makortel · 2022-02-23T14:06:05Z

+1

cmsbuild · 2022-02-23T14:06:25Z

This issue is fully signed and ready to be closed.

cmsbuild added core-pending heterogeneous-pending pending-signatures labels Oct 12, 2020

makortel mentioned this issue Feb 8, 2021

Add a way to specify compute accelerators in the configuration cms-sw/framework-team#21

Closed

makortel mentioned this issue Oct 2, 2021

[RFC] Add a mechanism to set the chosen case for SwitchProducer #35510

Closed

makortel mentioned this issue Nov 8, 2021

How to ensure consistency in the configuration of Heterogeneous Producers? #35516

Closed

makortel mentioned this issue Dec 15, 2021

Store "architecture" in event provenance #30044

Open

makortel mentioned this issue Jan 12, 2022

Add a generic mechanism to specify compute accelerators to use in the configuration #36699

Merged

cmsbuild closed this as completed in #36699 Feb 23, 2022

cmsbuild added core-approved fully-signed heterogeneous-approved and removed core-pending pending-signatures heterogeneous-pending labels Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a way to specify compute accelerators in the configuration #31760

Add a way to specify compute accelerators in the configuration #31760

makortel commented Oct 12, 2020

makortel commented Oct 12, 2020

cmsbuild commented Oct 12, 2020

cmsbuild commented Oct 12, 2020

makortel commented Oct 12, 2020

makortel commented Oct 12, 2020

fwyzard commented Oct 12, 2020 •

edited

Loading

fwyzard commented Oct 12, 2020

fwyzard commented Oct 12, 2020

fwyzard commented Oct 12, 2020

makortel commented Oct 12, 2020

makortel commented Oct 12, 2020

makortel commented Oct 12, 2020 •

edited

Loading

fwyzard commented Oct 12, 2020 •

edited

Loading

fwyzard commented Oct 12, 2020

makortel commented Oct 12, 2020

makortel commented Oct 12, 2020

fwyzard commented Oct 13, 2020

fwyzard commented Oct 13, 2020

fwyzard commented Apr 30, 2021 •

edited

Loading

makortel commented May 4, 2021

makortel commented Oct 1, 2021

fwyzard commented Oct 1, 2021

makortel commented Oct 1, 2021

makortel commented Oct 2, 2021

makortel commented Jan 12, 2022

makortel commented Feb 23, 2022

cmsbuild commented Feb 23, 2022

Add a way to specify compute accelerators in the configuration #31760

Add a way to specify compute accelerators in the configuration #31760

Comments

makortel commented Oct 12, 2020

makortel commented Oct 12, 2020

cmsbuild commented Oct 12, 2020

cmsbuild commented Oct 12, 2020

makortel commented Oct 12, 2020

makortel commented Oct 12, 2020

fwyzard commented Oct 12, 2020 • edited Loading

fwyzard commented Oct 12, 2020

fwyzard commented Oct 12, 2020

fwyzard commented Oct 12, 2020

makortel commented Oct 12, 2020

makortel commented Oct 12, 2020

makortel commented Oct 12, 2020 • edited Loading

fwyzard commented Oct 12, 2020 • edited Loading

fwyzard commented Oct 12, 2020

makortel commented Oct 12, 2020

makortel commented Oct 12, 2020

fwyzard commented Oct 13, 2020

fwyzard commented Oct 13, 2020

fwyzard commented Apr 30, 2021 • edited Loading

makortel commented May 4, 2021

makortel commented Oct 1, 2021

fwyzard commented Oct 1, 2021

makortel commented Oct 1, 2021

makortel commented Oct 2, 2021

makortel commented Jan 12, 2022

makortel commented Feb 23, 2022

cmsbuild commented Feb 23, 2022

fwyzard commented Oct 12, 2020 •

edited

Loading

makortel commented Oct 12, 2020 •

edited

Loading

fwyzard commented Oct 12, 2020 •

edited

Loading

fwyzard commented Apr 30, 2021 •

edited

Loading