-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to specify compute accelerators in the configuration #31760
Comments
assign core,heterogeneous |
New categories assigned: heterogeneous,core @Dr15Jones,@smuzaffar,@makortel,@makortel,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks |
A new Issue was created by @makortel Matti Kortelainen. @Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
cms-patatrack#542 demonstrates a hacky way to force the SwitchProducer choice. |
Thinking out loud (names are bad and long etc): The SwitchProduder choice(s) could be specified along process.options.SwitchProducers.SwitchProducerCUDA.choose = cms.untracked.string("cuda")
# or forward-thinking the possibility of event-by-event decisions
process.options.SwitchProducers.SwitchProduderCUDA.choices = cms.untracked.vstring("cuda") The compute devices could be specified along process.CUDAService.devices = cms.untracked.vint32(0, 1, 2)
# to disable
process.CUDAService.devices = cms.untracked.vint32()
# to allow everything available, default?
process.CUDAService.devices = cms.untracked.vint32(-1) This option would supersede the current It could handy if the
Currently the |
hi Matti, process.options.SwitchProducers.SwitchProducerCUDA.choose = cms.untracked.string("cuda") rather than SwitchProducerCUDA.choose = cms.untracked.string("cuda") ? |
About:
I would prefer to keep |
By the way, is this strictly about CUDA, or a more general approach ? Something like SYCL/oneAPI does not really enumerate the devices as ordinal numbers; rather, it uses a combination of backend (e.g. OpenCL vs CUDA), device type (CPU vs GPU), vendor and device name. Which makes it more powerful, and a lot more complicated to implement in our software. |
Would it work to put the whole configuration in the |
If by |
I'd like to end up in a solution we think could be later extended to SYCL as well.
At the lowest level I think that's fine (just replace |
The natural dependence goes other way around. One can use In the long term we could also end up not using |
If this is the only concern it should be easy to fix: just like the from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA it could check if from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
SwitchProducerCUDA.choose = "cuda" |
Indeed... at the moment some modules need the
The reason I suggested it is because it seems difficult for the Otherwise we could add a new |
For |
Something like that is indeed one option. On the other hand SwitchProducer would have to be extended in some way to be able to read configuration options outside of itself, technically it would not matter much if it reads |
That's not very different from customisations that take the def customiseLoadCUDAService(process):
from HeterogeneousCore.CUDAServices.CUDAService_cfi import CUDAService
process.CUDAService = cms.Service("CUDAService", ...)
return process vs def customiseForceCUDA(process):
from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
SwitchProducerCUDA.choose = 'cuda'
return process The latter does not really need the |
Anyway, whatever the options are, let's just not pick something that ends up with a syntax too cumbersome to use. |
Coming back to this, maybe we should keep these two things separate:
For example, a A hypothetical Independently, a CMSSW job can have access any number of CPU cores, any number of CUDA GPUs, any number of SYCL devices, etc. IMHO this is best handled outside of the job (e.g. via cgroups, taskset, or environment variables ¹ ²), because the actual list of available devices is likely to change from machine to machine. If we do decide to implement some kind of device selection in CMSSW, I'd prefer to make it orthogonal to the ¹ |
Good point that there can be cases where the And perhaps indeed it is best (certainly easiest from the application perspective) to not create a configuration mechanism for whatever devices. (at least until a real motivation for such comes up) |
I see had forgotten to add one idea for "forcing |
Would this be persisted across |
Good question. My first thought is that it should be persistent in those ways, because it is set explicitly. (maybe it should also be possible to unset it, e.g. passing |
Based on #31760 (comment) and #31760 (comment) I crafted #35510. |
#36699 takes another attempt, this time adding |
+1 |
This issue is fully signed and ready to be closed. |
Currently the CUDA tooling relies on auto-discovery of resources (honoring
$CUDA_VISIBLE_DEVICES
), but we should find a way to specify in configurationSwitchProducerCUDA
eithercpu
orcuda
), andThe information should propagate to both SwitchProducer(s), that dictate which case(s) of module chains will be run (at the configuration level), and to
CUDAService
(and similar) that provide finer-grained resource information to the C++ code (list of actual devices).With
$CUDA_VISIBLE_DEVICES
alone it is not possible to force a configuration to usecuda
on a machine without GPU (the forcing itself would be useful for testing, and running such configuration on a machine without GPU would be an error that should get reported somehow).The text was updated successfully, but these errors were encountered: