Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaPackages: point nvcc at a compatible -ccbin #218265

Merged
merged 13 commits into from
Mar 6, 2023

Conversation

SomeoneSerge
Copy link
Contributor

@SomeoneSerge SomeoneSerge commented Feb 25, 2023

EDIT of 2023-04-01: linking the old libstdc++ was a mistake; we should link the newest possible libstdc++, even if we use an older compiler; libstdc++ if backwards-compatible in the sense that if a process loads a newer libstdc++ first, and then loads a library built against an older libstdc++, everything should work just fine


This PR is a rather dirty hot-fix needed to resume caching cuda-enabled packages. I hope we can just merge this and remove the scum later

This PR specifically does not address the issue of downstream packages (like torch) consuming several major versions of gcc and libstdc++ (one from stdenv, and one from cudatoolkit.cc)

Desiredata
  • NVCC uses a compatible backend by default

    E.g. cuda11 uses gcc11 and links to gcc11.cc.lib, even if stdenv is at gcc12. This means successful buildPhase-s

  • No runtime linkage errors

    E.g. no libstdc++.so.6: version `GLIBCXX_3.4.30' not found after a successful build. This means successful checkPhase-s

  • In default configuration, a downstream package's runtime closure only includes one toolchain

    E.g. we don't link cached packages against multiple versions of libstdc++.so at once, and maybe there's a warning if we accidentally try to. This means smaller closures and fewer conflicts

Description of changes

This is a hot-fix to un-break cuda-enabled packages (like tensorflow, jaxlib, faiss, opencv, ...) after the gcc11->gcc12 bump. We should probably build the whole downstream packages with a compatible stdenv (such as gcc11Stdenv for cudaPackages_11), but just pointing nvcc at the right compiler seems to do the trick

We already used this hack for non-redist cudatoolkit. Now we use it more consistently.

This commit also re-links cuda packages against libstdc++ from the same "compatible" gcc, rather than the current stdenv. We didn't test if this is necessary -> need to revise in further PRs

NOTE: long-term we should make it possible to override -ccbin and use e.g. clang
NOTE: the NVCC_PREPEND_FLAGS line pollutes build logs with warnings when e.g. cmake appends another -ccbin

Things done
  • Built on platform(s)
    • x86_64-linux:
      • python3Packages.jaxlib,
      • opencv4,
      • python3Packages.opencv4
      • faiss,
      • python3Packages.torch
      • python3Packages.torchvision: still fails
  • Tested, as applicable: ...
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.
Related
Notify maintainers

@NixOS/cuda-maintainers @ConnorBaker @mcwitt

@SomeoneSerge SomeoneSerge added the 6.topic: cuda Parallel computing platform and API label Feb 25, 2023
@SomeoneSerge
Copy link
Contributor Author

UPD: torchvision seems to still override -ccbin, but I expect #218035 should fix it

@SomeoneSerge
Copy link
Contributor Author

Result of nixpkgs-review pr 218265 run on x86_64-linux 1

3 packages marked as broken and skipped:
  • cudaPackages.nvidia_driver
  • python310Packages.caffeWithCuda
  • truecrack-cuda
5 packages failed to build:
  • cudaPackages.tensorrt (cudaPackages.tensorrt_8_4_0)
  • mathematica-cuda
  • python310Packages.tensorflowWithCuda
  • python310Packages.tensorrt
  • python311Packages.tensorrt
69 packages built:
  • caffeWithCuda
  • colmapWithCuda
  • cudaPackages.cuda_cccl
  • cudaPackages.cuda_cudart
  • cudaPackages.cuda_cuobjdump
  • cudaPackages.cuda_cupti
  • cudaPackages.cuda_cuxxfilt
  • cudaPackages.cuda_demo_suite
  • cudaPackages.cuda_documentation
  • cudaPackages.cuda_gdb
  • cudaPackages.cuda_memcheck
  • cudaPackages.cuda_nsight
  • cudaPackages.cuda_nvcc
  • cudaPackages.cuda_nvdisasm
  • cudaPackages.cuda_nvml_dev
  • cudaPackages.cuda_nvprof
  • cudaPackages.cuda_nvprune
  • cudaPackages.cuda_nvrtc
  • cudaPackages.cuda_nvtx
  • cudaPackages.cuda_nvvp
  • cudaPackages.cuda_sanitizer_api
  • cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
  • cudaPackages.cudnn (cudaPackages.cudnn_8_7_0)
  • cudaPackages.cudnn_8_4_1
  • cudaPackages.cudnn_8_5_0
  • cudaPackages.cudnn_8_6_0
  • cudaPackages.cutensor
  • cudaPackages.fabricmanager
  • cudaPackages.libcublas
  • cudaPackages.libcufft
  • cudaPackages.libcufile
  • cudaPackages.libcurand
  • cudaPackages.libcusolver
  • cudaPackages.libcusparse
  • cudaPackages.libnpp
  • cudaPackages.libnvidia_nscq
  • cudaPackages.libnvjpeg
  • cudaPackages.nccl
  • cudaPackages.nsight_compute
  • cudaPackages.nsight_systems
  • cudaPackages.nvidia_fs
  • forge
  • gpu-burn
  • gpu-screen-recorder
  • gpu-screen-recorder-gtk
  • gromacsCudaMpi
  • gwe
  • hip-nvidia
  • katagoWithCuda
  • librealsenseWithCuda
  • magma (magma-cuda)
  • nvtop
  • nvtop-nvidia
  • python310Packages.TheanoWithCuda
  • python310Packages.cupy
  • python310Packages.jaxlibWithCuda
  • python310Packages.numbaWithCuda
  • python310Packages.pycuda
  • python310Packages.pynvml
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.torchWithCuda
  • python311Packages.TheanoWithCuda
  • python311Packages.cupy
  • python311Packages.jaxlibWithCuda
  • python311Packages.pycuda
  • python311Packages.pynvml
  • python311Packages.pyrealsense2WithCuda
  • xgboostWithCuda
  • xpraWithNvenc

@SomeoneSerge
Copy link
Contributor Author

Failed derivations

@SomeoneSerge
Copy link
Contributor Author

Rebuilding python3Packages.tensorflow, will see if this helped

@ofborg ofborg bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Feb 27, 2023
@SomeoneSerge SomeoneSerge force-pushed the hotfix-nvcc-gcc-incompatibility branch 2 times, most recently from 04142e2 to f63d5d3 Compare February 28, 2023 00:52

in
# A silly unit-test
assert (formatCapabilities { cudaCapabilities = [ "7.5" "8.6" ]; }) == {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be more interesting to test [ "8.6" "7.5" ]. Should this preserve the order? Should this print a warning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my opinion that capabilities should be sorted, so I would want the order of the output to be invariant with respect to the order of the input (which should already be sorted). Although, I'd love to hear other views!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we handle this parameter now, the order is significant. It's our semi-implicit convention that the last element goes into PTX. Maybe the take away is rather that we don't want this to be implicit:)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think you're right there -- the last capability in the list shouldn't be the one which gets turned into a virtual architecture.

Although, I do like the idea of having them ordered so packages can decide what to build for. For example, Magma doesn't support 8.6/8.9, so I can imagine at some point in the future Magma iterating over the list of cuda capabilities to find the greatest lower bound (in Magma's case, 8.0) and building for that architecture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left as a TODO

@@ -191,7 +209,7 @@ stdenv.mkDerivation rec {
preFixup =
let rpath = lib.concatStringsSep ":" [
(lib.makeLibraryPath (runtimeDependencies ++ [ "$lib" "$out" "$out/nvvm" ]))
"${stdenv.cc.cc.lib}/lib64"
"${gcc.cc.lib}/lib64"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the gcc/stdenv distinction here is subtle enough i believe it deserves a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my confusion was rather why reference gcc directly instead of accessing it through stdenv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. So, I should explain that gcc here is what's we override in extension.nix based on versions.toml
Actually, maybe I should override stdenv too? Like so

https://github.com/SomeoneSerge/nixpkgs/blob/cc4f01552c2dca50b452170df2770edb71148555/pkgs/development/compilers/cudatoolkit/extension.nix#L11-L15

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh yeah not a bad idea...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we get away with only overriding stdenv and then pulling the gcc version from that stdenv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samuela Good. I'm thinking about exposing that stdenv in cudaPackages (rather than cudatoolkit) then. I, however, feel uneasy about exposing it as cudaPackages.stdenv because it might affect people's expectations... E.g. since clangStdenv contains clang, people might think cudaPackages.stdenv contains nvcc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alt names I'm thinking of: cudaStdenv (might be misinterpreted the same way), backendStdenv (exactly what it is, but hard to pronounce 🤣 ). Do you like any?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I agree that cudaStdenv could be slightly misleading. It's a little tricky to name... Maybe matchingStdenv? compatibleStdenv? Idk I'm happy with whatever you feel is most appropriate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the ugly name, because "backend for nvcc" seemed like the clearest description...

pkgs/development/compilers/cudatoolkit/extension.nix Outdated Show resolved Hide resolved
pkgs/development/libraries/science/math/nccl/default.nix Outdated Show resolved Hide resolved
@ofborg ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Feb 28, 2023
@ofborg ofborg bot requested review from abbradar and tbenst February 28, 2023 01:58
SomeoneSerge and others added 6 commits March 4, 2023 00:59
This is needed for faster builds when debugging the opencv derivation,
and it's more consistent with other cuda-enabled packages

-DCUDA_GENERATION seems to expect architecture names, so we refactor
cudaFlags to facilitate easier extraction of the configured archnames
Make tensorflow (and a bunch of ther things) use CUDA-compatible
toolchain. Introduces cudaPackages.backendStdenv
Co-authored-by: Connor Baker <ConnorBaker01@Gmail.com>
@SomeoneSerge SomeoneSerge force-pushed the hotfix-nvcc-gcc-incompatibility branch from 271c5a4 to 22f7656 Compare March 3, 2023 23:04
@samuela
Copy link
Member

samuela commented Mar 4, 2023

Wooo, thanks for seeing this through @SomeoneSerge ! diff LGTM. @ConnorBaker are you ok with these changes? i saw that you two still have some convos open

@SomeoneSerge SomeoneSerge force-pushed the hotfix-nvcc-gcc-incompatibility branch from 947b833 to ac64f07 Compare March 4, 2023 01:14
@ofborg ofborg bot requested a review from samuela March 4, 2023 01:28
@ConnorBaker
Copy link
Contributor

@samuela looks good to me!

@SomeoneSerge thank you for all the work you put into this :)

Copy link
Contributor

@ConnorBaker ConnorBaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just questions for future reference/work

Comment on lines +83 to +86
minArch' = builtins.head (builtins.sort builtins.lessThan cudaArchitectures);
in
# If this fails some day, something must've changed and we should re-validate our assumptions
assert builtins.stringLength minArch' == 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit for later: This is lexicographic sorting right? Won't we run into issues starting with Blackwell (post-Hopper) because we'll have capabilities starting with a one? E.g., "100" < "50".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, just for future stuff! We'd have until at least 2024 before this becomes a problem, and that's assuming they keep the same naming scheme.

My preference is to see this PR merged sooner rather than later so I can work on rebasing my PRs ;)

nativeBuildInputs = [ which addOpenGLRunpath ];
nativeBuildInputs = [
which
addOpenGLRunpath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the autoAddOpenGLRunpathHook also work here, or do we need to manually invoke it in postFixup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoAddOpenGLRunpathHook should work, but we better test

@SomeoneSerge
Copy link
Contributor Author

Another note for later (but not to delay the merge):

@samuela samuela merged commit e1fbe85 into NixOS:master Mar 6, 2023
@samuela
Copy link
Member

samuela commented Mar 6, 2023

In the interest of keeping things moving I went ahead and merged. AFAIU there are still a 4 things left as TODOs for future PRs however:

  • libstdc++.so.6 issues (link)
  • try autoAddOpenGLRunpathHook in nccl (link)
  • lexicographic sorting of arch's is not future proof (link)
  • more arch list sorting (link)

Thanks @SomeoneSerge !

@ConnorBaker
Copy link
Contributor

cat <<EOF >> $out/nix-support/setup-hook
cmakeFlags+=' -DCUDA_TOOLKIT_ROOT_DIR=$out'
cmakeFlags+=' -DCUDA_HOST_COMPILER=${backendStdenv.cc}/bin'
cmakeFlags+=' -DCMAKE_CUDA_HOST_COMPILER=${backendStdenv.cc}/bin'
Copy link
Contributor Author

@SomeoneSerge SomeoneSerge Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: nvidia/thrust treats this as a path to the executable, not parent directory
TODO: check if maybe nvidia/thrust actually does this right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants