Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaPackages: overhaul of how we package cuda packages #167016

Merged
merged 1 commit into from
Apr 9, 2022

Conversation

FRidh
Copy link
Member

@FRidh FRidh commented Apr 3, 2022

Reorganize how we handle cuda packages in Nixpkgs.

TODO:

  • fix eval
  • fix supported versions cudnn / cutensor
  • split cuda-packages into several extensions and put them in the correct places. cudnn, cutensor, cudatoolkit-redist
  • add cuda-samples

Outside scope is to fix all cudatoolkit-redist packages. That can be done in a follow-up PR.

cc @obsidian-systems-maintenance

Description of changes
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.05 Release Notes (or backporting 21.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved
pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved
libcusolver = final.addBuildInputs prev.libcusolver [
prev.libcublas
];

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: nsight depends on Qt

@SomeoneSerge
Copy link
Contributor

In 163704#1086739688 I tried to start with a description of a hypothetical public interface that I feel wouldn't block any of our purposes.

IIUC, the scope of the PR at hand is specifically to split cudatoolkit into smaller packages.
This necessarily affects the interface too, so maybe it would be easier to discuss the draft in detail, if you gave an introduction in terms of what API we're going to have when this is merged, and where would we push the interface next?

In particular, how do you see (now and in the following PRs) the downstream derivations should consume cuda-related pieces; how does a maintainer handle a derivation in a package set that suddenly requires a custom combination of cuda packages; how does the end-user achieve the same in an overlay?

@FRidh
Copy link
Member Author

FRidh commented Apr 3, 2022

IIUC, the scope of the PR at hand is specifically to split cudatoolkit into smaller packages.

This PR packages all the cudatoolkit parts in small packages. Some packages will need to be fixed with overrides. Try e.g.

$ NIXPKGS_ALLOW_UNFREE=1 nix-build -A cudaToolkitPackages.libcusolver

This necessarily affects the interface too, so maybe it would be easier to discuss the draft in detail, if you gave an introduction in terms of what API we're going to have when this is merged, and where would we push the interface next?

In particular, how do you see (now and in the following PRs) the downstream derivations should consume cuda-related pieces; how does a maintainer handle a derivation in a package set that suddenly requires a custom combination of cuda packages; how does the end-user achieve the same in an overlay?

A package needing one or more Cuda packages has as parameter cudaToolkitPackages and then in the buildInputs you do something like

buildInputs = ... ++ (with cudaToolkitPackages; [ cudnn ]);

Copy link
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great stuff @FRidh! Thanks for taking a stab at this! It really is a brave endeavor.

I mostly have a bunch of questions:

  • Where are the JSON manifests coming from? How do we retrieve these? It would be great if we could also have a script to generate them, or at minimum some documentation as to how to build these for new releases, etc.
  • Is there a reason to not collect all of the disparate packages as they exist today into a new folder structure a la python-modules? In other words, derivations are spread out today: cudatoolkit is in one directory, cuDNN is in another, same for cuTENSOR, etc. Could we collect all of those things into a common cuda-packages directory?
  • IIUC this abandons the old runfile way of building cudatoolkit in favor of the redist tarballs? If so, that's awesome!
  • I haven't tested this out locally yet... What do you see as the major blockers to moving forward with this?

Ok, I have a million questions but I'll stop here for now!

pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved
pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved
pkgs/top-level/cuda-packages.nix Outdated Show resolved Hide resolved
pkgs/top-level/all-packages.nix Outdated Show resolved Hide resolved
@samuela
Copy link
Member

samuela commented Apr 4, 2022

Echoing @SomeoneSerge's comment re API: It would be great to get some example code as to how packages can switch to this new package set. Obv this is draft-stage and all, but just something that we should document for future users as this PR matures and eventually lands! Might even be wiki-worthy material

@samuela
Copy link
Member

samuela commented Apr 4, 2022

AFAIU right now only a single version of cudnn is supported per cudaPackages set. In order to workaround this, I'd suggest simply including every cudnn version in every cudaPackages set. The cudnn derivation should be smart enough to mark itself as broken when used in conjunction with an incompatible cudatoolkit version. That way users can also override if they would like to use a combo that is not officially supported.

@samuela
Copy link
Member

samuela commented Apr 4, 2022

Ok, so I'm taking a stab at migrating cuda-samples to use this PR, and I'm not able to locate cuBLAS in the new package set. How do I access that? We should prob add that as well?

I'm a dumdum it's libcublas.

@samuela
Copy link
Member

samuela commented Apr 4, 2022

FWIW I've found that this is the package set to build on linux:

nix-build --max-jobs 32 \
-A cudaToolkitPackages_11_6.cuda_cccl \
-A cudaToolkitPackages_11_6.cuda_cudart \
-A cudaToolkitPackages_11_6.cuda_cuobjdump \
-A cudaToolkitPackages_11_6.cuda_cupti \
-A cudaToolkitPackages_11_6.cuda_cuxxfilt \
-A cudaToolkitPackages_11_6.cuda_demo_suite \
-A cudaToolkitPackages_11_6.cuda_documentation \
-A cudaToolkitPackages_11_6.cuda_gdb \
-A cudaToolkitPackages_11_6.cuda_memcheck \
-A cudaToolkitPackages_11_6.cuda_nsight \
-A cudaToolkitPackages_11_6.cuda_nvcc \
-A cudaToolkitPackages_11_6.cuda_nvdisasm \
-A cudaToolkitPackages_11_6.cuda_nvml_dev \
-A cudaToolkitPackages_11_6.cuda_nvprof \
-A cudaToolkitPackages_11_6.cuda_nvprune \
-A cudaToolkitPackages_11_6.cuda_nvrtc \
-A cudaToolkitPackages_11_6.cuda_nvtx \
-A cudaToolkitPackages_11_6.cuda_nvvp \
-A cudaToolkitPackages_11_6.cuda_sanitizer_api \
-A cudaToolkitPackages_11_6.cudatoolkit \
-A cudaToolkitPackages_11_6.cudnn \
-A cudaToolkitPackages_11_6.cutensor \
-A cudaToolkitPackages_11_6.fabricmanager \
-A cudaToolkitPackages_11_6.libcublas \
-A cudaToolkitPackages_11_6.libcufft \
-A cudaToolkitPackages_11_6.libcufile \
-A cudaToolkitPackages_11_6.libcurand \
-A cudaToolkitPackages_11_6.libcusolver \
-A cudaToolkitPackages_11_6.libcusparse \
-A cudaToolkitPackages_11_6.libnpp \
-A cudaToolkitPackages_11_6.libnvidia_nscq \
-A cudaToolkitPackages_11_6.libnvjpeg \
-A cudaToolkitPackages_11_6.magma \
-A cudaToolkitPackages_11_6.nccl \
-A cudaToolkitPackages_11_6.nsight_compute \
-A cudaToolkitPackages_11_6.nsight_systems \
-A cudaToolkitPackages_11_6.nvidia_driver \
-A cudaToolkitPackages_11_6.nvidia_fs

Other things don't really are windows-only, or just aren't relevant to build.

@FRidh
Copy link
Member Author

FRidh commented Apr 4, 2022

I mostly have a bunch of questions:

* Where are the JSON manifests coming from? How do we retrieve these? It would be great if we could also have a script to generate them, or at minimum some documentation as to how to build these for new releases, etc.

* Is there a reason to not collect all of the disparate packages as they exist today into a new folder structure a la `python-modules`? In other words, derivations are spread out today: cudatoolkit is in one directory, cuDNN is in another, same for cuTENSOR, etc. Could we collect all of those things into a common `cuda-packages` directory?

* IIUC this abandons the old runfile way of building `cudatoolkit` in favor of the redist tarballs? If so, that's awesome!

* I haven't tested this out locally yet... What do you see as the major blockers to moving forward with this?

Ok, I have a million questions but I'll stop here for now!

  • Will add a comment about the manifests. They are fetched from https://developer.download.nvidia.com/compute/cuda/redist/.
  • Yes, files need to be moved. I tried not to break the old expressions hence why I kept them at different places, but yes, as soon as everything here is working, I'll move the expressions together.
  • cudatoolkit is added in two ways, the old classic way, and the new redist way
  • cudnn support for multiple versions needs to be fixed first. When that is done, I need to rename the set to cudaPackages, add aliases, and update all references. After that, or in the meanwhile, we can work on fixing the redist packages.

@FRidh
Copy link
Member Author

FRidh commented Apr 4, 2022

FWIW I've found that this is the package set to build on linux:

nix-build --max-jobs 32 \
-A cudaToolkitPackages_11_6.cuda_cccl \
-A cudaToolkitPackages_11_6.cuda_cudart \
-A cudaToolkitPackages_11_6.cuda_cuobjdump \
-A cudaToolkitPackages_11_6.cuda_cupti \
-A cudaToolkitPackages_11_6.cuda_cuxxfilt \
-A cudaToolkitPackages_11_6.cuda_demo_suite \
-A cudaToolkitPackages_11_6.cuda_documentation \
-A cudaToolkitPackages_11_6.cuda_gdb \
-A cudaToolkitPackages_11_6.cuda_memcheck \
-A cudaToolkitPackages_11_6.cuda_nsight \
-A cudaToolkitPackages_11_6.cuda_nvcc \
-A cudaToolkitPackages_11_6.cuda_nvdisasm \
-A cudaToolkitPackages_11_6.cuda_nvml_dev \
-A cudaToolkitPackages_11_6.cuda_nvprof \
-A cudaToolkitPackages_11_6.cuda_nvprune \
-A cudaToolkitPackages_11_6.cuda_nvrtc \
-A cudaToolkitPackages_11_6.cuda_nvtx \
-A cudaToolkitPackages_11_6.cuda_nvvp \
-A cudaToolkitPackages_11_6.cuda_sanitizer_api \
-A cudaToolkitPackages_11_6.cudatoolkit \
-A cudaToolkitPackages_11_6.cudnn \
-A cudaToolkitPackages_11_6.cutensor \
-A cudaToolkitPackages_11_6.fabricmanager \
-A cudaToolkitPackages_11_6.libcublas \
-A cudaToolkitPackages_11_6.libcufft \
-A cudaToolkitPackages_11_6.libcufile \
-A cudaToolkitPackages_11_6.libcurand \
-A cudaToolkitPackages_11_6.libcusolver \
-A cudaToolkitPackages_11_6.libcusparse \
-A cudaToolkitPackages_11_6.libnpp \
-A cudaToolkitPackages_11_6.libnvidia_nscq \
-A cudaToolkitPackages_11_6.libnvjpeg \
-A cudaToolkitPackages_11_6.magma \
-A cudaToolkitPackages_11_6.nccl \
-A cudaToolkitPackages_11_6.nsight_compute \
-A cudaToolkitPackages_11_6.nsight_systems \
-A cudaToolkitPackages_11_6.nvidia_driver \
-A cudaToolkitPackages_11_6.nvidia_fs

Other things don't really are windows-only, or just aren't relevant to build.

We could add a release-cuda.nix in pkgs/top-level that is buildable by a Hydra.

@samuela
Copy link
Member

samuela commented Apr 4, 2022

Awesome, I'm working on

  • Fixing builds of all the redist packages
  • Then, switching the cuda-samples tests to use the redist packages

Perhaps we should start a TODO list:

Perhaps other things I'm forgetting?

@FRidh
Copy link
Member Author

FRidh commented Apr 7, 2022


meta = {
description = attrs.name;
license = lib.licenses.unfree;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, by the way, we need licenses.unfreeRedistributable! I don't remember if I have filters for meta.license.redistributable, but numtide/nixpkgs-unfree iirc does

Copy link
Member

@knedlsepp knedlsepp Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the previous discussions I understood that redistributable is something we desire, but seems isn't reality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're literally downloading stuff from .../redist/... URL.
Again, the only part that is not redistributable is libcuda.so, which lives in nvidia_x11

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But! I thought I pointed at one of the expressions that were new. The cudatoolkit has had unfree license in previous revisions, so let's have that discussion in a separate PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question is whether or not patchelf constitutes modification or not... but IANAL. We can always change the licenses in the future as necessary. I'm not sure how to get a definitive answer on these kinds of things.

@SomeoneSerge
Copy link
Contributor

🙃 I would s/cudaPackages \? { }/cudaPackages/ now, before someone starts copying it and neither can be seen as a convention

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Apr 7, 2022

https://hercules-ci.com/github/SomeoneSerge/nixpkgs-unfree/jobs/225 - the rebuld with cuda 11.5 - has finished.
I'm still unsure what's wrong with torchvision 🤔

...maybe a hotfix could be to set this environment variable: https://github.com/pytorch/pytorch/blob/8bf8b64b540f53fd78eaaa643605bf6759effbb8/torch/utils/cpp_extension.py#L1654
But it's better we figure out what exactly we've broken
EDIT: that variable seems to be being set in preBuild, but to an empty list. That list must be coming from the pytorch expression

EDIT2:

let
  cfg = { config.allowUnfree = true; config.cudaSupport = true; };
  stable = import <nixpkgs> cfg;
  master = import ./. cfg;
in {
  stableList = stable.python3Packages.pytorch.cudaArchList;
  masterList = master.python3Packages.pytorch.cudaArchList;
}
nix-repl> archlists.masterList
[ ]

nix-repl> archlists.stableList
[ "3.5" "5.0" "5.2" "6.0" "6.1" "7.0" "7.0+PTX" "7.5" "7.5+PTX" ]

There are many different versions of the `cudatoolkit` and related
cuda packages, and it can be tricky to ensure they remain compatible.

- `cudaPackages` is now a package set with `cudatoolkit`, `cudnn`, `cutensor`, `nccl`, as well as `cudatoolkit` split into smaller packages ("redist");
- expressions should now use `cudaPackages` as parameter instead of the individual cuda packages;
- `makeScope` is now used, so it is possible to use `.overrideScope'` to set e.g. a different `cudnn` version;
- `release-cuda.nix` is introduced to easily evaluate cuda packages using hydra.
@sternenseemann
Copy link
Member

sternenseemann commented Apr 9, 2022

Seems like some old style references were missed, xgboost fails to evaluate with allowAliases = false now.

Edit: #167985.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-install-a-specific-version-of-cuda-and-cudnn/21725/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants