Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaPackages.nccl: only ever builds with default version of cudaPackages #221895

Closed
ConnorBaker opened this issue Mar 18, 2023 · 4 comments · Fixed by #217619
Closed

cudaPackages.nccl: only ever builds with default version of cudaPackages #221895

ConnorBaker opened this issue Mar 18, 2023 · 4 comments · Fixed by #217619

Comments

@ConnorBaker
Copy link
Contributor

Describe the bug

nccl is only ever built with the default version of cudaPackages because the line which adds the derivation never changes the version of cudaPackages provided:

https://github.com/NixOS/nixpkgs/blob/master/pkgs/top-level/cuda-packages.nix#L48

I confirmed this by making this change to the nccl derivation

cuda-redist = throw "cudaPackages.cudaVersion is ${cudaPackages.cudaVersion}";

and trying to build it with two different versions of cudaPackages. Notice that the error is the same:

[connorbaker@fedora nixpkgs]$ nix build --impure -L .#cudaPackages.nccl
warning: Git tree '/home/connorbaker/Documents/nixpkgs' is dirty
error:
       … while calling the 'derivationStrict' builtin

         at //builtin/derivation.nix:9:12: (source not available)

       … while evaluating derivation 'nccl-2.16.5-1-cuda-11.7'
         whose name attribute is located at /nix/store/bh247jakm9gvjrif38ssqz90zfgqxy05-source/pkgs/stdenv/generic/make-derivation.nix:302:7

       … while evaluating attribute 'nativeBuildInputs' of derivation 'nccl-2.16.5-1-cuda-11.7'

         at /nix/store/bh247jakm9gvjrif38ssqz90zfgqxy05-source/pkgs/stdenv/generic/make-derivation.nix:342:7:

          341|       depsBuildBuild              = lib.elemAt (lib.elemAt dependencies 0) 0;
          342|       nativeBuildInputs           = lib.elemAt (lib.elemAt dependencies 0) 1;
             |       ^
          343|       depsBuildTarget             = lib.elemAt (lib.elemAt dependencies 0) 2;

       (stack trace truncated; use '--show-trace' to show the full trace)

       error: cudaPackages.cudaVersion is 11.7
[connorbaker@fedora nixpkgs]$ nix build --impure -L .#cudaPackages_11_8.nccl
warning: Git tree '/home/connorbaker/Documents/nixpkgs' is dirty
error:
       … while calling the 'derivationStrict' builtin

         at //builtin/derivation.nix:9:12: (source not available)

       … while evaluating derivation 'nccl-2.16.5-1-cuda-11.7'
         whose name attribute is located at /nix/store/bh247jakm9gvjrif38ssqz90zfgqxy05-source/pkgs/stdenv/generic/make-derivation.nix:302:7

       … while evaluating attribute 'nativeBuildInputs' of derivation 'nccl-2.16.5-1-cuda-11.7'

         at /nix/store/bh247jakm9gvjrif38ssqz90zfgqxy05-source/pkgs/stdenv/generic/make-derivation.nix:342:7:

          341|       depsBuildBuild              = lib.elemAt (lib.elemAt dependencies 0) 0;
          342|       nativeBuildInputs           = lib.elemAt (lib.elemAt dependencies 0) 1;
             |       ^
          343|       depsBuildTarget             = lib.elemAt (lib.elemAt dependencies 0) 2;

       (stack trace truncated; use '--show-trace' to show the full trace)

       error: cudaPackages.cudaVersion is 11.7

In python-packages.nix the pattern seems to be passing pythonPackages = self. However, it is relatively rare that the derivation takes pythonPackages as an argument. Instead, it seems to be preferred to have the derivation take, as arguments, the actual python packages required.

To bring the NCCL derivation in line with this pattern, it should take the CUDA packages it requires as arguments.

Notify maintainers

@NixOS/cuda-maintainers

@SomeoneSerge
Copy link
Contributor

We only expose nccl as cudaPackages.nccl. Let's change it to take cudaPackages' individual attributes as arguments, so that overrideScope' works properly. Nccl already uses solely the redist packages, so I guess we needn't worry about cuda10

@ConnorBaker
Copy link
Contributor Author

We only expose nccl as cudaPackages.nccl. Let's change it to take cudaPackages' individual attributes as arguments, so that overrideScope' works properly. Nccl already uses solely the redist packages, so I guess we needn't worry about cuda10

Agreed -- WIP at #217619. Running a nixpkgs-review now :)

@ConnorBaker ConnorBaker changed the title nccl: only ever builds with default version of cudaPackages cudaPackages.nccl: only ever builds with default version of cudaPackages Mar 18, 2023
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-46/26872/1

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Mar 31, 2023

Well, since gh pushed a notification anyway: that's one long nixpkgs-review:)

ConnorBaker added a commit that referenced this issue Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants