-
-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cudaPackages: point nvcc at a compatible -ccbin #218265
cudaPackages: point nvcc at a compatible -ccbin #218265
Conversation
pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix
Outdated
Show resolved
Hide resolved
UPD: |
Result of 3 packages marked as broken and skipped:
5 packages failed to build:
69 packages built:
|
Failed derivations
|
Rebuilding |
04142e2
to
f63d5d3
Compare
f63d5d3
to
ae72b9d
Compare
|
||
in | ||
# A silly unit-test | ||
assert (formatCapabilities { cudaCapabilities = [ "7.5" "8.6" ]; }) == { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be more interesting to test [ "8.6" "7.5" ]
. Should this preserve the order? Should this print a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my opinion that capabilities should be sorted, so I would want the order of the output to be invariant with respect to the order of the input (which should already be sorted). Although, I'd love to hear other views!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we handle this parameter now, the order is significant. It's our semi-implicit convention that the last element goes into PTX. Maybe the take away is rather that we don't want this to be implicit:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think you're right there -- the last capability in the list shouldn't be the one which gets turned into a virtual architecture.
Although, I do like the idea of having them ordered so packages can decide what to build for. For example, Magma doesn't support 8.6/8.9, so I can imagine at some point in the future Magma iterating over the list of cuda capabilities to find the greatest lower bound (in Magma's case, 8.0) and building for that architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left as a TODO
ae72b9d
to
f9cb4d3
Compare
@@ -191,7 +209,7 @@ stdenv.mkDerivation rec { | |||
preFixup = | |||
let rpath = lib.concatStringsSep ":" [ | |||
(lib.makeLibraryPath (runtimeDependencies ++ [ "$lib" "$out" "$out/nvvm" ])) | |||
"${stdenv.cc.cc.lib}/lib64" | |||
"${gcc.cc.lib}/lib64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the gcc/stdenv distinction here is subtle enough i believe it deserves a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my confusion was rather why reference gcc
directly instead of accessing it through stdenv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. So, I should explain that gcc
here is what's we override in extension.nix
based on versions.toml
Actually, maybe I should override stdenv
too? Like so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh yeah not a bad idea...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we get away with only overriding stdenv
and then pulling the gcc version from that stdenv
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samuela Good. I'm thinking about exposing that stdenv
in cudaPackages
(rather than cudatoolkit
) then. I, however, feel uneasy about exposing it as cudaPackages.stdenv
because it might affect people's expectations... E.g. since clangStdenv
contains clang
, people might think cudaPackages.stdenv
contains nvcc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alt names I'm thinking of: cudaStdenv
(might be misinterpreted the same way), backendStdenv
(exactly what it is, but hard to pronounce 🤣 ). Do you like any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I agree that cudaStdenv
could be slightly misleading. It's a little tricky to name... Maybe matchingStdenv
? compatibleStdenv
? Idk I'm happy with whatever you feel is most appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the ugly name, because "backend for nvcc" seemed like the clearest description...
This is needed for faster builds when debugging the opencv derivation, and it's more consistent with other cuda-enabled packages -DCUDA_GENERATION seems to expect architecture names, so we refactor cudaFlags to facilitate easier extraction of the configured archnames
Make tensorflow (and a bunch of ther things) use CUDA-compatible toolchain. Introduces cudaPackages.backendStdenv
Co-authored-by: Connor Baker <ConnorBaker01@Gmail.com>
271c5a4
to
22f7656
Compare
Wooo, thanks for seeing this through @SomeoneSerge ! diff LGTM. @ConnorBaker are you ok with these changes? i saw that you two still have some convos open |
947b833
to
ac64f07
Compare
@samuela looks good to me! @SomeoneSerge thank you for all the work you put into this :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just questions for future reference/work
minArch' = builtins.head (builtins.sort builtins.lessThan cudaArchitectures); | ||
in | ||
# If this fails some day, something must've changed and we should re-validate our assumptions | ||
assert builtins.stringLength minArch' == 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit for later: This is lexicographic sorting right? Won't we run into issues starting with Blackwell (post-Hopper) because we'll have capabilities starting with a one? E.g., "100" < "50".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, just for future stuff! We'd have until at least 2024 before this becomes a problem, and that's assuming they keep the same naming scheme.
My preference is to see this PR merged sooner rather than later so I can work on rebasing my PRs ;)
nativeBuildInputs = [ which addOpenGLRunpath ]; | ||
nativeBuildInputs = [ | ||
which | ||
addOpenGLRunpath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the autoAddOpenGLRunpathHook
also work here, or do we need to manually invoke it in postFixup
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autoAddOpenGLRunpathHook
should work, but we better test
Another note for later (but not to delay the merge):
|
In the interest of keeping things moving I went ahead and merged. AFAIU there are still a 4 things left as TODOs for future PRs however:
Thanks @SomeoneSerge ! |
cat <<EOF >> $out/nix-support/setup-hook | ||
cmakeFlags+=' -DCUDA_TOOLKIT_ROOT_DIR=$out' | ||
cmakeFlags+=' -DCUDA_HOST_COMPILER=${backendStdenv.cc}/bin' | ||
cmakeFlags+=' -DCMAKE_CUDA_HOST_COMPILER=${backendStdenv.cc}/bin' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: nvidia/thrust
treats this as a path to the executable, not parent directory
TODO: check if maybe nvidia/thrust
actually does this right
EDIT of 2023-04-01: linking the old libstdc++ was a mistake; we should link the newest possible libstdc++, even if we use an older compiler; libstdc++ if backwards-compatible in the sense that if a process loads a newer libstdc++ first, and then loads a library built against an older libstdc++, everything should work just fine
This PR is a rather dirty hot-fix needed to resume caching cuda-enabled packages. I hope we can just merge this and remove the scum later
This PR specifically does not address the issue of downstream packages (like torch) consuming several major versions of gcc and libstdc++ (one from
stdenv
, and one fromcudatoolkit.cc
)Desiredata
NVCC uses a compatible backend by default
E.g. cuda11 uses
gcc11
and links togcc11.cc.lib
, even ifstdenv
is atgcc12
. This means successful buildPhase-sNo runtime linkage errors
E.g. no
libstdc++.so.6: version `GLIBCXX_3.4.30' not found
after a successful build. This means successful checkPhase-sIn default configuration, a downstream package's runtime closure only includes one toolchain
E.g. we don't link cached packages against multiple versions of
libstdc++.so
at once, and maybe there's a warning if we accidentally try to. This means smaller closures and fewer conflictsDescription of changes
This is a hot-fix to un-break cuda-enabled packages (like tensorflow, jaxlib, faiss, opencv, ...) after the gcc11->gcc12 bump. We should probably build the whole downstream packages with a compatible stdenv (such as gcc11Stdenv for cudaPackages_11), but just pointing nvcc at the right compiler seems to do the trick
We already used this hack for non-redist cudatoolkit. Now we use it more consistently.
This commit also re-links cuda packages against libstdc++ from the same "compatible" gcc, rather than the current stdenv. We didn't test if this is necessary -> need to revise in further PRs
NOTE: long-term we should make it possible to override -ccbin and use e.g. clang
NOTE: the
NVCC_PREPEND_FLAGS
line pollutes build logs with warnings when e.g. cmake appends another-ccbin
Things done
python3Packages.jaxlib
,opencv4
,python3Packages.opencv4
faiss
,python3Packages.torch
python3Packages.torchvision
: still failsnix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Related
Notify maintainers
@NixOS/cuda-maintainers @ConnorBaker @mcwitt