Interaction between the accelerator back-end (CUDA and ROCm) support flags #268919

SomeoneSerge · 2023-11-21T09:55:54Z

Issue description

At this point we independently expose the global config.cudaSupport and config.rocmSupport options. We also have a number of package expressions of the form similar to:

{ ...
, config
, cudaSupport ? config.cudaSupport
, rocmSupport ? config.rocmSupport
}:

... {
  meta.broken =
    (cudaSupport && rocmSupport) # Sometimes the back-ends are mutually exclusive
    || !(cudaSupport || rocmSupport) # Sometimes at least one is required

}

There are many variations, both in the signature: { config, gpuBackend ? if config.cudaSupport then ... elif ... else ... }: ..., and in the way incompatible combinations are handled: assert (builtins.elem gpuBackend [ ... ]) instead of broken = ....

We might want to find a more consistent approach

Preliminary proposal

Handle interaction at the individual package level, do nothing about the config.xSupport combinations.
Rewrite the package expressions with { ..., gpuBackend }: ...
In the first cycle, keep the existing {cuda,rocm}Support and {with,enable}{Cuda,Rocm} arguments. Set defaults to null. Whenever they aren't null, display a deprecation warning and set gpuBackend appropriately
Do not assert, use broken instead. This way it's always the end-user's decision as to what builds to attempt (maybe the users extend patches and manage to relax the interaction rules)

CC @NixOS/cuda-maintainers @NixOS/rocm-maintainers

The text was updated successfully, but these errors were encountered:

samuela · 2023-11-21T17:20:14Z

For packages that do support building with both cudaSupport and rocmSupport enabled, I could imagine that some user may desire to be able to do so, eg managing a single package cache across multiple machines. Unfortunately packages that disallow building with both, like magma, lead to a wrinkle in this.

Perhaps we ought to treat it like cudaCapabilities? Eg config.gpuBackends is a list (not sure if nix has a set type?) that can contain nothing, just "cuda", just "rocm", or both.

SomeoneSerge · 2023-11-21T20:59:49Z

Lists sound good! There's a natural designation for the CPU-only builds, and a representation for "enable both backends".

RE: config.gpuBackends, set types

This actually leads us back to the config.accelerators.{cuda,rocm} question. We might actually prefer lists over sets, with the order representing priority! E.g. import nixpkgs { config.accelerators.enabledBackends = [ "rocm" "cuda" ]; } could mean "enable rocm in every package that supports it, and build cuda too if that's possible"

Madouura · 2023-11-21T22:54:05Z

could mean "enable rocm in every package that supports it, and build cuda too if that's possible"

To be really honest, that seems a bit useless because 99.999999999% of users aren't going to have both AMD and NVIDIA GPUs on their system.
Despite it's arguable uselessness though, we should definitely prioritize-by-order.
I like this idea a lot.

samuela · 2023-11-22T01:12:48Z

Hmm I'm not sure I understand the priority thing. What's the use case for having priority flags on build options? Shouldn't the build always execute with the configuration it's given?

SomeoneSerge added the 6.topic: cuda Parallel computing platform and API label Nov 21, 2023

SomeoneSerge mentioned this issue Nov 21, 2023

python311Packages.torch: choose magma at the expression level #268746

Merged

6 tasks

fazo96 mentioned this issue Nov 21, 2023

comfyui: init #268378

Open

13 tasks

Madouura added the 6.topic: rocm label Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interaction between the accelerator back-end (CUDA and ROCm) support flags #268919

Interaction between the accelerator back-end (CUDA and ROCm) support flags #268919

SomeoneSerge commented Nov 21, 2023 •

edited

Loading

samuela commented Nov 21, 2023

SomeoneSerge commented Nov 21, 2023

Madouura commented Nov 21, 2023

samuela commented Nov 22, 2023 •

edited

Loading

Interaction between the accelerator back-end (CUDA and ROCm) support flags #268919

Interaction between the accelerator back-end (CUDA and ROCm) support flags #268919

Comments

SomeoneSerge commented Nov 21, 2023 • edited Loading

Issue description

Preliminary proposal

samuela commented Nov 21, 2023

SomeoneSerge commented Nov 21, 2023

Madouura commented Nov 21, 2023

samuela commented Nov 22, 2023 • edited Loading

SomeoneSerge commented Nov 21, 2023 •

edited

Loading

samuela commented Nov 22, 2023 •

edited

Loading