Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3Packages.torch: drop submodules in favor of Nixpkgs #239291

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from
62 changes: 62 additions & 0 deletions doc/languages-frameworks/cuda.section.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,65 @@ for your specific card(s).

Library maintainers should consult [NVCC Docs](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/)
and release notes for their software package.

## Adding a new CUDA release {#adding-a-new-cuda-release}

> **WARNING**
>
> This section of the docs is still very much in progress. Feedback is welcome in GitHub Issues tagging @NixOS/cuda-maintainers or on [Matrix](https://matrix.to/#/#cuda:nixos.org).

The CUDA Toolkit is a suite of CUDA libraries and software meant to provide a development environment for CUDA-accelerated applications, packaged in a multi-gigabyte monolithic installer. Since CUDA 11.4, NVIDIA has maintained CUDA redistributables (“CUDA-redist”): individually packaged components, meant to facilitate redistribution and inclusion in downstream projects.

All new projects should use the CUDA redistributables, as they are much easier to maintain and update.

### Updating CUDA redistributables {#updating-cuda-redistributables}

1. Go to NVIDIA's index of CUDA redistributables: <https://developer.download.nvidia.com/compute/cuda/redist/>
2. Copy the `redistrib_*.json` corresponding to the release to `pkgs/development/compilers/cudatoolkit/redist/manifests`.
3. Generate the `redistrib_features_*.json` file by running:

```bash
nix run github:ConnorBaker/cuda-redist-find-features -- <path to manifest>
```

That command will generate the `redistrib_features_*.json` file in the same directory as the manifest.

4. Include the path to the new manifest in `pkgs/development/compilers/cudatoolkit/redist/extension.nix`.

### Updating the CUDA Toolkit {#updating-the-cuda-toolkit}

> **WARNING**
>
> While the CUDA Toolkit is still available in Nixpkgs, it is not recommended for use and should be considered deprecated.
>
> However, to ensure packages relying on the CUDA Toolkit continue to build, the CUDA Toolkit will continue to be updated until a migration path is available.

1. Go to NVIDIA's CUDA Toolkit download page: <https://developer.nvidia.com/cuda-downloads>
2. Select the appropriate OS, architecture, distribution, and version, and installer type.

- For example: Linux, x86_64, Ubuntu, 22.04, runfile (local)
- NOTE: Typically, we use the Ubuntu runfile. It is unclear if the runfile for other distributions will work.

3. Take the link provided by the installer instructions on the webpage after selecting the installer type and get its hash by running:

```bash
nix store prefetch-file --hash-type sha256 <link>
```

4. Update `pkgs/development/compilers/cudatoolkit/versions.toml` to include the release.

### Updating the CUDA package set {#updating-the-cuda-package-set}

1. Include a new `cudaPackages_<major>_<minor>` package set in `pkgs/top-level/all-packages.nix`.

- NOTE: Changing the default CUDA package set should occur in a separate PR, allowing time for additional testing.

2. Successfully build the closure of the new package set, updating `pkgs/development/compilers/cudatoolkit/redist/overrides.nix` as needed. Below are some common failures:

| Unable to ... | During ... | Reason | Solution | Note |
| --- | --- | --- | --- | --- |
| Find headers | `configurePhase` or `buildPhase` | Missing dependency on a `dev` output | Add the missing dependency | The `dev` output typically contain the headers |
| Find libraries | `configurePhase` | Missing dependency on a `dev` output | Add the missing dependency | The `dev` output typically contain CMake configuration files |
| Find libraries | `buildPhase` or `patchelf` | Missing dependency on a `lib` or `static` output | Add the missing dependency | The `lib` or `static` output typically contain the libraries |

In the scenario you are unable to run the resulting binary: this is arguably the most complicated as it could be any combination of the previous reasons. This type of failure typically occurs when a library attempts to load or open a library it depends on that it does not declare in its `DT_NEEDED` section. As a first step, ensure that dependencies are patched with `cudaPackages.autoAddOpenGLRunpath`. Failing that, try running the application with `nixGL` (https://github.com/guibou/nixGL) or a similar wrapper tool. If that works, it likely means that the application is attempting to load a library that is not in the `RPATH` or `RUNPATH` of the binary.
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

# CMake's enable_language(CUDA) runs a compiler test and it doesn't account for
# CUDAToolkit_ROOT. We have to help it locate libcudart
export NVCC_APPEND_FLAGS+=" -L@cudartRoot@/lib -I@cudartRoot@/include"
export NVCC_APPEND_FLAGS+=" -L@cudartLib@/lib -L@cudartStatic@/lib -I@cudartInclude@/include"
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ setupCUDAToolkitCompilers() {
# CMake's enable_language(CUDA) runs a compiler test and it doesn't account for
# CUDAToolkit_ROOT. We have to help it locate libcudart
if [[ -z "${nvccDontPrependCudartFlags-}" ]] ; then
export NVCC_APPEND_FLAGS+=" -L@cudartRoot@/lib -I@cudartRoot@/include"
export NVCC_APPEND_FLAGS+=" -L@cudartLib@/lib -L@cudartStatic@/lib -I@cudartInclude@/include"
fi
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,27 +1,69 @@
# Type Aliases
#
# See ./extension.nix:
# - ReleaseAttrs
# - ReleaseFeaturesAttrs
#
# General callPackage-supplied arguments
{ lib
, stdenv
, backendStdenv
, fetchurl
, autoPatchelfHook
, autoAddOpenGLRunpathHook
, markForCudatoolkitRootHook
, lndir
, symlinkJoin
}:
# Function arguments
{
# Short package name (e.g., "cuda_cccl")
# pname : String
pname
, # Long package name (e.g., "CXX Core Compute Libraries")
# description : String
description
, # platforms : List System
platforms
, # version : Version
version
, # releaseAttrs : ReleaseAttrs
releaseAttrs
, # releaseFeaturesAttrs : ReleaseFeaturesAttrs
releaseFeaturesAttrs
,
}:

pname:
attrs:

let
arch = "linux-x86_64";
# Useful imports
inherit (lib.lists) optionals;
inherit (lib.meta) getExe;
inherit (lib.strings) optionalString;
in
backendStdenv.mkDerivation {
inherit pname;
inherit (attrs) version;
# NOTE: Even though there's no actual buildPhase going on here, the derivations of the
# redistributables are sensitive to the compiler flags provided to stdenv. The patchelf package
# is sensitive to the compiler flags provided to stdenv, and we depend on it. As such, we are
# also sensitive to the compiler flags provided to stdenv.
inherit pname version;
strictDeps = true;

src = assert (lib.hasAttr arch attrs); fetchurl {
url = "https://developer.download.nvidia.com/compute/cuda/redist/${attrs.${arch}.relative_path}";
inherit (attrs.${arch}) sha256;
outputs = with releaseFeaturesAttrs;
[ "out" ]
++ optionals hasBin [ "bin" ]
++ optionals hasLib [ "lib" ]
++ optionals hasStatic [ "static" ]
++ optionals hasDev [ "dev" ]
++ optionals hasDoc [ "doc" ]
++ optionals hasSample [ "sample" ];

src = fetchurl {
url = "https://developer.download.nvidia.com/compute/cuda/redist/${releaseAttrs.relative_path}";
inherit (releaseAttrs) sha256;
};

# We do need some other phases, like configurePhase, so the multiple-output setup hook works.
dontBuild = true;

nativeBuildInputs = [
autoPatchelfHook
# This hook will make sure libcuda can be found
Expand All @@ -46,23 +88,87 @@ backendStdenv.mkDerivation {
"$ORIGIN"
];

dontBuild = true;
installPhase = with releaseFeaturesAttrs;
# Pre-install hook
''
runHook preInstall
''
# doc and dev have special output handling. Other outputs need to be moved to their own
# output.
# Note that moveToOutput operates on all outputs:
# https://github.com/NixOS/nixpkgs/blob/2920b6fc16a9ed5d51429e94238b28306ceda79e/pkgs/build-support/setup-hooks/multiple-outputs.sh#L105-L107
+ ''
mkdir -p "$out"
rm LICENSE
mv * "$out"
''
# Handle bin, which defaults to out
+ optionalString hasBin ''
moveToOutput "bin" "$bin"
''
# Handle lib, which defaults to out
+ optionalString hasLib ''
moveToOutput "lib" "$lib"
''
# Handle static libs, which isn't handled by the setup hook
+ optionalString hasStatic ''
moveToOutput "**/*.a" "$static"
''
# Handle samples, which isn't handled by the setup hook
+ optionalString hasSample ''
moveToOutput "samples" "$sample"
''
# Post-install hook
+ ''
runHook postInstall
'';

# TODO: choose whether to install static/dynamic libs
installPhase = ''
runHook preInstall
rm LICENSE
mkdir -p $out
mv * $out
runHook postInstall
# The out output leverages the same functionality which backs the `symlinkJoin` function in
# Nixpkgs:
# https://github.com/NixOS/nixpkgs/blob/d8b2a92df48f9b08d68b0132ce7adfbdbc1fbfac/pkgs/build-support/trivial-builders/default.nix#L510
#
# That should allow us to emulate "fat" default outputs without having to actually create them.
#
# It is important that this run after the autoPatchelfHook, otherwise the symlinks in out will reference libraries in lib, creating a circular dependency.
postPhases = [ "postPatchelf" ];
# For each output, create a symlink to it in the out output.
# NOTE: We must recreate the out output here, because the setup hook will have deleted it
# if it was empty.
# NOTE: Do not use optionalString based on whether `outputs` contains only `out` -- phases
# which are empty strings are skipped/unset and result in errors of the form "command not
# found: <customPhaseName>".
postPatchelf = ''
mkdir -p "$out"
for output in $outputs; do
if [ "$output" = "out" ]; then
continue
fi
${getExe lndir} "''${!output}" "$out"
done
'';

# Make the CUDA-patched stdenv available
passthru.stdenv = backendStdenv;

# Setting propagatedBuildInputs to false will prevent outputs known to the multiple-outputs
# from depending on `out` by default.
# https://github.com/NixOS/nixpkgs/blob/2920b6fc16a9ed5d51429e94238b28306ceda79e/pkgs/build-support/setup-hooks/multiple-outputs.sh#L196
# Indeed, we want to do the opposite -- fat "out" outputs that contain all the other outputs.
propagatedBuildOutputs = false;

# By default, if the dev output exists it just uses that.
# However, because we disabled propagatedBuildOutputs, dev doesn't contain libraries or
# anything of the sort. To remedy this, we set outputSpecified to true, and use
# outputsToInstall, which tells Nix which outputs to use when the package name is used
# unqualified (that is, without an explicit output).
outputSpecified = true;

meta = {
description = attrs.name;
inherit description platforms;
license = lib.licenses.unfree;
maintainers = lib.teams.cuda.members;
platforms = lib.optionals (lib.hasAttr arch attrs) [ "x86_64-linux" ];
# Force the use of the default, fat output by default (even though `dev` exists, which
# causes Nix to prefer that output over the others if outputSpecified isn't set).
outputsToInstall = [ "out" ];
};
}
Loading
Loading