cudaPackages: keep unsupported attributes, mark unsupported, explain why #266475

ConnorBaker · 2023-11-09T15:00:52Z

A fair number of tools we use in cudaPackages depend on specific versions of CUDA. Sometimes this is a strict dependency due to API changes or other breakages; other times, it's merely because it is not known whether the code is forward (or backward!) compatible.

The approach that I (@ConnorBaker) have taken so far in cudaPackages has been that when a package is unavailable for some release of CUDA, or is known to be incompatible, it is not present in that CUDA release's package set (e.g., cudaPackages_12_2).

However, this limits the flexibility of downstream users and avoids a larger pattern: attempting to use newer or older versions of dependencies.

@SomeoneSerge raised this issue in a comment on a PR about cuda-samples, which we use as a sort of sanity-check, allowing us to make sure that the cudatoolkit builds: #266115 (comment)

To clarify, it's not "break" in the sense that the derivation has broken = true -- I'm replacing builtins.throw which would "break" evaluation entirely.

Right, we definitely do not want to throw. But broken = true is still an alternative to optionalAttrs, what do you think about those?

For context: cudaPackages.cuda-samples has a release for each recent release of CUDA except for 11.7. I made the decision to exclude cuda-samples from cudaPackages_11_7 be wrapping it in optionalAttrs.

This issue proposes two mechanisms in abstract:

A way to build a variable version of a package against a fixed version of a package set
- e.g., use an older cuda-samples release (11.6) with cudaPackages_11_7
- e.g., use a newer cuda-samples release (11.8) with cudaPackages_11_7
A way to build a fixed version of a package against a variable version of a package set
- e.g., use the same version of TensorRT with different versions of cudaPackages even if compatibility is not guaranteed by NVIDIA's support matrix

Taken together, these mechanisms would allow users to attempt to build things which would otherwise not be possible (e.g., using a newer release of CUDNN which might support a specific version of CUDA, even if it is not listed in the support matrix).

It's worth keeping in mind that support matrices (specifically as we're talking about them here, with respect to NVIDIA's software) are essentially guaranteeing that the listed combinations work. It does not guarantee that combinations outside those enumerated will be broken, only that they are unsupported.

@SomeoneSerge @samuela thoughts?

The text was updated successfully, but these errors were encountered:

samuela · 2023-11-09T23:21:49Z

Is this currently blocking any packages/PRs? It looks like #266115 has already been merged

ConnorBaker · 2023-11-13T17:18:03Z

No, not blocking anything. Just had this thought since I’ve seen this pattern around as I’ve been rewriting cuda-modules. It could simplify things internally and perhaps make it easier for users to try out officially unsupported combinations without needing to do a number of deep overrides — I mean, assuming that’s something we’d want to let them do!

SomeoneSerge · 2023-11-14T16:50:02Z

More pros of broken = true are that in addition to .override, you could also attempt the following:

❯ nix eval -f '<nixpkgs>' --argstr crossSystem aarch64-linux cudaPackages.cuda_compat.meta.name
...
❯ nix build -f '<nixpkgs>' --argstr crossSystem aarch64-linux cudaPackages.cuda_compat.src
...

SomeoneSerge · 2023-11-24T16:49:59Z

I just had torchvision to even evaluate in #269639 because of

nixpkgs/pkgs/development/libraries/science/math/tensorrt/extension.nix

Lines 27 to 29 in 6fb31ec

    
           defaultBuild = { "tensorrt" = if allBuilds ? ${computeName tensorRTDefaultVersion} 
        
             then allBuilds.${computeName tensorRTDefaultVersion} 
        
             else throw "tensorrt-${tensorRTDefaultVersion} does not support your cuda version ${cudaVersion}"; };

, even though torch/torchvision doesn't actually use tensorrt by default.

I'm pretty sure throw is not the tool we need

SomeoneSerge · 2023-11-25T20:23:47Z

Some more reflection on the TRT situation: it's different from cuda_compat.

We only enable cuda_compat for aarch64-linux hosts, and we presume them to be jetsons. We do so because nvidia just chooses not to ship cuda_compat for the consumer gpus and x86_64 hosts, although in principle the same hack would totally work on any platform. Any recent cudatoolkit release clearly corresponds to a version of cuda_compat, so it "exists". Albeit only buildable on certain platforms.

With TRT, there are cudatoolkit releases for which TRT isn't a thing (yet). We couldn't even assign the broken/fake derivation any meaningful version. However, I still don't feel like omitting the attribute is a good thing, because it breaks eval so easily. If we leave out cudaPackages.tensorrt, we've got to remove python3Packages.tensorrt as well. I just don't feel like it's the kind of thing we want to maintain really?

SomeoneSerge · 2024-01-12T00:27:07Z

cuda-library-samples

Going through #280386 I see that cuda-library-samples still contain optionalAttrs (e.g. they're confusingly missing in nix-instantiate in the linked PR) even after #273794

ConnorBaker self-assigned this Nov 9, 2023

ConnorBaker added the 6.topic: cuda label Nov 9, 2023

ConnorBaker mentioned this issue Nov 9, 2023

cudaPackages.cuda-samples: misc bumps and fixups #266115

Merged

13 tasks

SomeoneSerge mentioned this issue Nov 13, 2023

Use cuda_compat drivers when available #267247

Merged

14 tasks

SomeoneSerge mentioned this issue Nov 25, 2023

cudaPackages: bump the default 11 -> 12 #269639

Merged

6 tasks

This was referenced Dec 9, 2023

cudaPackages: support multiple platforms #256324

Merged

cudaPackages: all outPaths should be evaluate-able #273794

Closed

SomeoneSerge mentioned this issue Jan 8, 2024

cudaPackages.cuda-library-samples: Only enable on Linux, fix channel #279684

Merged

SomeoneSerge changed the title ~~cudaPackages: implement an allow(Newer|Older) mechanism~~ cudaPackages: keep unsupported attributes, mark unsupported, explain why Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudaPackages: keep unsupported attributes, mark unsupported, explain why #266475

cudaPackages: keep unsupported attributes, mark unsupported, explain why #266475

ConnorBaker commented Nov 9, 2023 •

edited

samuela commented Nov 9, 2023

ConnorBaker commented Nov 13, 2023

SomeoneSerge commented Nov 14, 2023 •

edited

SomeoneSerge commented Nov 24, 2023

SomeoneSerge commented Nov 25, 2023

SomeoneSerge commented Jan 12, 2024

cudaPackages: keep unsupported attributes, mark unsupported, explain why #266475

cudaPackages: keep unsupported attributes, mark unsupported, explain why #266475

Comments

ConnorBaker commented Nov 9, 2023 • edited

samuela commented Nov 9, 2023

ConnorBaker commented Nov 13, 2023

SomeoneSerge commented Nov 14, 2023 • edited

SomeoneSerge commented Nov 24, 2023

SomeoneSerge commented Nov 25, 2023

SomeoneSerge commented Jan 12, 2024

ConnorBaker commented Nov 9, 2023 •

edited

SomeoneSerge commented Nov 14, 2023 •

edited