cudaPackages: overhaul of how we package cuda packages #167016

FRidh · 2022-04-03T09:20:00Z

Reorganize how we handle cuda packages in Nixpkgs.

TODO:

fix eval
fix supported versions cudnn / cutensor
split cuda-packages into several extensions and put them in the correct places. cudnn, cutensor, cudatoolkit-redist
add cuda-samples

Outside scope is to fix all cudatoolkit-redist packages. That can be done in a follow-up PR.

cc @obsidian-systems-maintenance

Description of changes

Things done

pkgs/top-level/cuda-packages.nix

pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix

pkgs/top-level/cuda-packages.nix

FRidh · 2022-04-03T10:42:07Z

pkgs/top-level/cuda-packages.nix

+  libcusolver = final.addBuildInputs prev.libcusolver [
+    prev.libcublas
+  ];
+


TODO: nsight depends on Qt

SomeoneSerge · 2022-04-03T12:49:52Z

In 163704#1086739688 I tried to start with a description of a hypothetical public interface that I feel wouldn't block any of our purposes.

IIUC, the scope of the PR at hand is specifically to split cudatoolkit into smaller packages.
This necessarily affects the interface too, so maybe it would be easier to discuss the draft in detail, if you gave an introduction in terms of what API we're going to have when this is merged, and where would we push the interface next?

In particular, how do you see (now and in the following PRs) the downstream derivations should consume cuda-related pieces; how does a maintainer handle a derivation in a package set that suddenly requires a custom combination of cuda packages; how does the end-user achieve the same in an overlay?

pkgs/top-level/cuda-packages.nix

FRidh · 2022-04-03T14:51:45Z

IIUC, the scope of the PR at hand is specifically to split cudatoolkit into smaller packages.

This PR packages all the cudatoolkit parts in small packages. Some packages will need to be fixed with overrides. Try e.g.

$ NIXPKGS_ALLOW_UNFREE=1 nix-build -A cudaToolkitPackages.libcusolver

This necessarily affects the interface too, so maybe it would be easier to discuss the draft in detail, if you gave an introduction in terms of what API we're going to have when this is merged, and where would we push the interface next?

In particular, how do you see (now and in the following PRs) the downstream derivations should consume cuda-related pieces; how does a maintainer handle a derivation in a package set that suddenly requires a custom combination of cuda packages; how does the end-user achieve the same in an overlay?

A package needing one or more Cuda packages has as parameter cudaToolkitPackages and then in the buildInputs you do something like

buildInputs = ... ++ (with cudaToolkitPackages; [ cudnn ]);

pkgs/top-level/cuda-packages.nix

pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix

samuela

This is really great stuff @FRidh! Thanks for taking a stab at this! It really is a brave endeavor.

I mostly have a bunch of questions:

Where are the JSON manifests coming from? How do we retrieve these? It would be great if we could also have a script to generate them, or at minimum some documentation as to how to build these for new releases, etc.
Is there a reason to not collect all of the disparate packages as they exist today into a new folder structure a la python-modules? In other words, derivations are spread out today: cudatoolkit is in one directory, cuDNN is in another, same for cuTENSOR, etc. Could we collect all of those things into a common cuda-packages directory?
IIUC this abandons the old runfile way of building cudatoolkit in favor of the redist tarballs? If so, that's awesome!
I haven't tested this out locally yet... What do you see as the major blockers to moving forward with this?

Ok, I have a million questions but I'll stop here for now!

pkgs/top-level/cuda-packages.nix

pkgs/development/compilers/cudatoolkit/versions.toml

pkgs/top-level/cuda-packages.nix

pkgs/top-level/all-packages.nix

samuela · 2022-04-04T00:07:47Z

Echoing @SomeoneSerge's comment re API: It would be great to get some example code as to how packages can switch to this new package set. Obv this is draft-stage and all, but just something that we should document for future users as this PR matures and eventually lands! Might even be wiki-worthy material

samuela · 2022-04-04T00:11:39Z

AFAIU right now only a single version of cudnn is supported per cudaPackages set. In order to workaround this, I'd suggest simply including every cudnn version in every cudaPackages set. The cudnn derivation should be smart enough to mark itself as broken when used in conjunction with an incompatible cudatoolkit version. That way users can also override if they would like to use a combo that is not officially supported.

pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix

samuela · 2022-04-04T01:17:39Z

~~Ok, so I'm taking a stab at migrating cuda-samples to use this PR, and I'm not able to locate cuBLAS in the new package set. How do I access that? We should prob add that as well?~~

I'm a dumdum it's libcublas.

pkgs/top-level/cuda-packages.nix

samuela · 2022-04-04T02:25:13Z

FWIW I've found that this is the package set to build on linux:

nix-build --max-jobs 32 \
-A cudaToolkitPackages_11_6.cuda_cccl \
-A cudaToolkitPackages_11_6.cuda_cudart \
-A cudaToolkitPackages_11_6.cuda_cuobjdump \
-A cudaToolkitPackages_11_6.cuda_cupti \
-A cudaToolkitPackages_11_6.cuda_cuxxfilt \
-A cudaToolkitPackages_11_6.cuda_demo_suite \
-A cudaToolkitPackages_11_6.cuda_documentation \
-A cudaToolkitPackages_11_6.cuda_gdb \
-A cudaToolkitPackages_11_6.cuda_memcheck \
-A cudaToolkitPackages_11_6.cuda_nsight \
-A cudaToolkitPackages_11_6.cuda_nvcc \
-A cudaToolkitPackages_11_6.cuda_nvdisasm \
-A cudaToolkitPackages_11_6.cuda_nvml_dev \
-A cudaToolkitPackages_11_6.cuda_nvprof \
-A cudaToolkitPackages_11_6.cuda_nvprune \
-A cudaToolkitPackages_11_6.cuda_nvrtc \
-A cudaToolkitPackages_11_6.cuda_nvtx \
-A cudaToolkitPackages_11_6.cuda_nvvp \
-A cudaToolkitPackages_11_6.cuda_sanitizer_api \
-A cudaToolkitPackages_11_6.cudatoolkit \
-A cudaToolkitPackages_11_6.cudnn \
-A cudaToolkitPackages_11_6.cutensor \
-A cudaToolkitPackages_11_6.fabricmanager \
-A cudaToolkitPackages_11_6.libcublas \
-A cudaToolkitPackages_11_6.libcufft \
-A cudaToolkitPackages_11_6.libcufile \
-A cudaToolkitPackages_11_6.libcurand \
-A cudaToolkitPackages_11_6.libcusolver \
-A cudaToolkitPackages_11_6.libcusparse \
-A cudaToolkitPackages_11_6.libnpp \
-A cudaToolkitPackages_11_6.libnvidia_nscq \
-A cudaToolkitPackages_11_6.libnvjpeg \
-A cudaToolkitPackages_11_6.magma \
-A cudaToolkitPackages_11_6.nccl \
-A cudaToolkitPackages_11_6.nsight_compute \
-A cudaToolkitPackages_11_6.nsight_systems \
-A cudaToolkitPackages_11_6.nvidia_driver \
-A cudaToolkitPackages_11_6.nvidia_fs

Other things don't really are windows-only, or just aren't relevant to build.

pkgs/top-level/cuda-packages.nix

FRidh · 2022-04-04T06:40:57Z

I mostly have a bunch of questions:

* Where are the JSON manifests coming from? How do we retrieve these? It would be great if we could also have a script to generate them, or at minimum some documentation as to how to build these for new releases, etc.

* Is there a reason to not collect all of the disparate packages as they exist today into a new folder structure a la `python-modules`? In other words, derivations are spread out today: cudatoolkit is in one directory, cuDNN is in another, same for cuTENSOR, etc. Could we collect all of those things into a common `cuda-packages` directory?

* IIUC this abandons the old runfile way of building `cudatoolkit` in favor of the redist tarballs? If so, that's awesome!

* I haven't tested this out locally yet... What do you see as the major blockers to moving forward with this?

Ok, I have a million questions but I'll stop here for now!

Will add a comment about the manifests. They are fetched from https://developer.download.nvidia.com/compute/cuda/redist/.
Yes, files need to be moved. I tried not to break the old expressions hence why I kept them at different places, but yes, as soon as everything here is working, I'll move the expressions together.
cudatoolkit is added in two ways, the old classic way, and the new redist way
cudnn support for multiple versions needs to be fixed first. When that is done, I need to rename the set to cudaPackages, add aliases, and update all references. After that, or in the meanwhile, we can work on fixing the redist packages.

FRidh · 2022-04-04T06:43:38Z

FWIW I've found that this is the package set to build on linux:

nix-build --max-jobs 32 \
-A cudaToolkitPackages_11_6.cuda_cccl \
-A cudaToolkitPackages_11_6.cuda_cudart \
-A cudaToolkitPackages_11_6.cuda_cuobjdump \
-A cudaToolkitPackages_11_6.cuda_cupti \
-A cudaToolkitPackages_11_6.cuda_cuxxfilt \
-A cudaToolkitPackages_11_6.cuda_demo_suite \
-A cudaToolkitPackages_11_6.cuda_documentation \
-A cudaToolkitPackages_11_6.cuda_gdb \
-A cudaToolkitPackages_11_6.cuda_memcheck \
-A cudaToolkitPackages_11_6.cuda_nsight \
-A cudaToolkitPackages_11_6.cuda_nvcc \
-A cudaToolkitPackages_11_6.cuda_nvdisasm \
-A cudaToolkitPackages_11_6.cuda_nvml_dev \
-A cudaToolkitPackages_11_6.cuda_nvprof \
-A cudaToolkitPackages_11_6.cuda_nvprune \
-A cudaToolkitPackages_11_6.cuda_nvrtc \
-A cudaToolkitPackages_11_6.cuda_nvtx \
-A cudaToolkitPackages_11_6.cuda_nvvp \
-A cudaToolkitPackages_11_6.cuda_sanitizer_api \
-A cudaToolkitPackages_11_6.cudatoolkit \
-A cudaToolkitPackages_11_6.cudnn \
-A cudaToolkitPackages_11_6.cutensor \
-A cudaToolkitPackages_11_6.fabricmanager \
-A cudaToolkitPackages_11_6.libcublas \
-A cudaToolkitPackages_11_6.libcufft \
-A cudaToolkitPackages_11_6.libcufile \
-A cudaToolkitPackages_11_6.libcurand \
-A cudaToolkitPackages_11_6.libcusolver \
-A cudaToolkitPackages_11_6.libcusparse \
-A cudaToolkitPackages_11_6.libnpp \
-A cudaToolkitPackages_11_6.libnvidia_nscq \
-A cudaToolkitPackages_11_6.libnvjpeg \
-A cudaToolkitPackages_11_6.magma \
-A cudaToolkitPackages_11_6.nccl \
-A cudaToolkitPackages_11_6.nsight_compute \
-A cudaToolkitPackages_11_6.nsight_systems \
-A cudaToolkitPackages_11_6.nvidia_driver \
-A cudaToolkitPackages_11_6.nvidia_fs

Other things don't really are windows-only, or just aren't relevant to build.

We could add a release-cuda.nix in pkgs/top-level that is buildable by a Hydra.

samuela · 2022-04-04T06:44:24Z

Awesome, I'm working on

Fixing builds of all the redist packages
Then, switching the cuda-samples tests to use the redist packages

Perhaps we should start a TODO list:

Build all redist packages successfully
Figure out a way to avoid autoPatchelfIgnoreMissingDeps = true; (cudaPackages: overhaul of how we package cuda packages #167016 (comment))
Rebase onto a more recent master commit. Some merge conflicts atm.
Support multiple cudnn, cutensor versions
Switch to cudaPackages

Perhaps other things I'm forgetting?

FRidh · 2022-04-07T16:28:59Z

Waiting for tests https://hercules-ci.com/github/SomeoneSerge/nixpkgs-unfree/jobs/221.

SomeoneSerge · 2022-04-07T17:36:08Z

pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix

+
+  meta = {
+    description = attrs.name;
+    license = lib.licenses.unfree;


Oh, by the way, we need licenses.unfreeRedistributable! I don't remember if I have filters for meta.license.redistributable, but numtide/nixpkgs-unfree iirc does

From the previous discussions I understood that redistributable is something we desire, but seems isn't reality.

We're literally downloading stuff from .../redist/... URL.
Again, the only part that is not redistributable is libcuda.so, which lives in nvidia_x11

But! I thought I pointed at one of the expressions that were new. The cudatoolkit has had unfree license in previous revisions, so let's have that discussion in a separate PR

I think the question is whether or not patchelf constitutes modification or not... but IANAL. We can always change the licenses in the future as necessary. I'm not sure how to get a definitive answer on these kinds of things.

SomeoneSerge · 2022-04-07T17:42:09Z

🙃 I would s/cudaPackages \? { }/cudaPackages/ now, before someone starts copying it and neither can be seen as a convention

SomeoneSerge · 2022-04-07T21:10:30Z

https://hercules-ci.com/github/SomeoneSerge/nixpkgs-unfree/jobs/225 - the rebuld with cuda 11.5 - has finished.
I'm still unsure what's wrong with torchvision 🤔

...maybe a hotfix could be to set this environment variable: https://github.com/pytorch/pytorch/blob/8bf8b64b540f53fd78eaaa643605bf6759effbb8/torch/utils/cpp_extension.py#L1654
But it's better we figure out what exactly we've broken EDIT: that variable seems to be being set in preBuild, but to an empty list. That list must be coming from the pytorch expression

EDIT2:

let
  cfg = { config.allowUnfree = true; config.cudaSupport = true; };
  stable = import <nixpkgs> cfg;
  master = import ./. cfg;
in {
  stableList = stable.python3Packages.pytorch.cudaArchList;
  masterList = master.python3Packages.pytorch.cudaArchList;
}

nix-repl> archlists.masterList
[ ]

nix-repl> archlists.stableList
[ "3.5" "5.0" "5.2" "6.0" "6.1" "7.0" "7.0+PTX" "7.5" "7.5+PTX" ]

pkgs/development/python-modules/pytorch/default.nix

There are many different versions of the `cudatoolkit` and related cuda packages, and it can be tricky to ensure they remain compatible. - `cudaPackages` is now a package set with `cudatoolkit`, `cudnn`, `cutensor`, `nccl`, as well as `cudatoolkit` split into smaller packages ("redist"); - expressions should now use `cudaPackages` as parameter instead of the individual cuda packages; - `makeScope` is now used, so it is possible to use `.overrideScope'` to set e.g. a different `cudnn` version; - `release-cuda.nix` is introduced to easily evaluate cuda packages using hydra.

sternenseemann · 2022-04-09T11:51:22Z

Seems like some old style references were missed, xgboost fails to evaluate with allowAliases = false now.

Edit: #167985.

See NixOS#167016 (comment)

nixos-discourse · 2022-10-05T16:04:10Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-install-a-specific-version-of-cuda-and-cudnn/21725/4

FRidh added 2.status: work-in-progress 6.topic: cuda labels Apr 3, 2022

FRidh requested a review from samuela April 3, 2022 09:20

FRidh mentioned this pull request Apr 3, 2022

[proposal] split up cudatoolkit package into its constituent pieces #163704

Closed

3 tasks

FRidh requested a review from SomeoneSerge April 3, 2022 09:20

FRidh commented Apr 3, 2022

View reviewed changes

pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved

FRidh commented Apr 3, 2022

View reviewed changes

pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix Outdated Show resolved Hide resolved

ofborg bot added 10.rebuild-darwin: 0 10.rebuild-linux: 0 labels Apr 3, 2022

FRidh commented Apr 3, 2022

View reviewed changes

pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved

pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved

FRidh commented Apr 3, 2022

View reviewed changes

SomeoneSerge reviewed Apr 3, 2022

View reviewed changes

pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved

ofborg bot added the 2.status: merge conflict label Apr 3, 2022

FRidh force-pushed the cudatoolkit-redist branch from 7322a38 to bd120df Compare April 3, 2022 17:43

FRidh commented Apr 3, 2022

View reviewed changes

pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved

pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix Show resolved Hide resolved

samuela reviewed Apr 4, 2022

View reviewed changes

pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix Show resolved Hide resolved

samuela reviewed Apr 4, 2022

View reviewed changes

pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved

samuela reviewed Apr 4, 2022

View reviewed changes

pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved

samuela reviewed Apr 4, 2022

View reviewed changes

pkgs/top-level/cuda-packages.nix Show resolved Hide resolved

FRidh commented Apr 4, 2022

View reviewed changes

pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved

ofborg bot added 10.rebuild-linux: 1-10 and removed 10.rebuild-darwin: 11-100 10.rebuild-linux: 11-100 labels Apr 7, 2022

SomeoneSerge reviewed Apr 7, 2022

View reviewed changes

pkgs/development/python-modules/pytorch/default.nix Outdated Show resolved Hide resolved

FRidh force-pushed the cudatoolkit-redist branch from 18f44d1 to ab8a744 Compare April 9, 2022 06:50

FRidh merged commit 1d63f89 into NixOS:master Apr 9, 2022

ofborg bot added 10.rebuild-darwin: 0 10.rebuild-linux: 0 and removed 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10 labels Apr 9, 2022

gebner mentioned this pull request Apr 9, 2022

wstunnel: fix build #167973

Merged

13 tasks

sternenseemann added a commit to sternenseemann/nixpkgs that referenced this pull request Apr 9, 2022

xgboost: fix eval without aliases

d0fb423

See NixOS#167016 (comment)

sternenseemann mentioned this pull request Apr 9, 2022

xgboost: fix eval without aliases #167985

Merged

13 tasks

This was referenced Apr 12, 2022

faiss: use the redistributable cuda #168380

Merged

python3Packages.pytorchWithCuda: consistent cuda #166734

Closed

CUDA in 22.05 #167068

Closed

SomeoneSerge mentioned this pull request Apr 19, 2022

May use redistributable cudaPackages XSEDE/nix-container-cuda-10.2#1

Open

mschwaig mentioned this pull request Apr 21, 2022

cudatoolkit: enable build on aarch64-linux for versions 11.0+ #158350

Closed

13 tasks

SomeoneSerge mentioned this pull request May 18, 2022

cudaPackages: clean "cudatoolkit" up #173462

Open

5 tasks

ConnorBaker mentioned this pull request Feb 23, 2023

docs: Questions about maintaining CUDA-related packaging #217780

Open

3 tasks

This was referenced Mar 14, 2023

Add NVIDIA licenses #76233

Closed

Wrap CUDA without modifying binary #76512

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudaPackages: overhaul of how we package cuda packages #167016

cudaPackages: overhaul of how we package cuda packages #167016

FRidh commented Apr 3, 2022 •

edited

FRidh Apr 3, 2022

SomeoneSerge commented Apr 3, 2022

FRidh commented Apr 3, 2022

samuela left a comment

samuela commented Apr 4, 2022

samuela commented Apr 4, 2022

samuela commented Apr 4, 2022 •

edited

samuela commented Apr 4, 2022 •

edited

FRidh commented Apr 4, 2022

FRidh commented Apr 4, 2022

samuela commented Apr 4, 2022 •

edited by FRidh

FRidh commented Apr 7, 2022

SomeoneSerge Apr 7, 2022

knedlsepp Apr 7, 2022 •

edited

SomeoneSerge Apr 8, 2022

SomeoneSerge Apr 8, 2022

samuela Apr 8, 2022

SomeoneSerge commented Apr 7, 2022

SomeoneSerge commented Apr 7, 2022 •

edited

sternenseemann commented Apr 9, 2022 •

edited

nixos-discourse commented Oct 5, 2022

cudaPackages: overhaul of how we package cuda packages #167016

cudaPackages: overhaul of how we package cuda packages #167016

Conversation

FRidh commented Apr 3, 2022 • edited

TODO:

Description of changes

Things done

FRidh Apr 3, 2022

Choose a reason for hiding this comment

SomeoneSerge commented Apr 3, 2022

FRidh commented Apr 3, 2022

samuela left a comment

Choose a reason for hiding this comment

samuela commented Apr 4, 2022

samuela commented Apr 4, 2022

samuela commented Apr 4, 2022 • edited

samuela commented Apr 4, 2022 • edited

FRidh commented Apr 4, 2022

FRidh commented Apr 4, 2022

samuela commented Apr 4, 2022 • edited by FRidh

FRidh commented Apr 7, 2022

SomeoneSerge Apr 7, 2022

Choose a reason for hiding this comment

knedlsepp Apr 7, 2022 • edited

Choose a reason for hiding this comment

SomeoneSerge Apr 8, 2022

Choose a reason for hiding this comment

SomeoneSerge Apr 8, 2022

Choose a reason for hiding this comment

samuela Apr 8, 2022

Choose a reason for hiding this comment

SomeoneSerge commented Apr 7, 2022

SomeoneSerge commented Apr 7, 2022 • edited

sternenseemann commented Apr 9, 2022 • edited

nixos-discourse commented Oct 5, 2022

FRidh commented Apr 3, 2022 •

edited

samuela commented Apr 4, 2022 •

edited

samuela commented Apr 4, 2022 •

edited

samuela commented Apr 4, 2022 •

edited by FRidh

knedlsepp Apr 7, 2022 •

edited

SomeoneSerge commented Apr 7, 2022 •

edited

sternenseemann commented Apr 9, 2022 •

edited