-
-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cuda_compat drivers when available #267247
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# shellcheck shell=bash | ||
# Patch all dynamically linked, ELF files with the CUDA driver (libcuda.so) | ||
# coming from the cuda_compat package by adding it to the RUNPATH. | ||
echo "Sourcing auto-add-cuda-compat-runpath-hook" | ||
|
||
elfHasDynamicSection() { | ||
patchelf --print-rpath "$1" >& /dev/null | ||
} | ||
|
||
autoAddCudaCompatRunpathPhase() ( | ||
local outputPaths | ||
mapfile -t outputPaths < <(for o in $(getAllOutputNames); do echo "${!o}"; done) | ||
find "${outputPaths[@]}" -type f -executable -print0 | while IFS= read -rd "" f; do | ||
if isELF "$f"; then | ||
# patchelf returns an error on statically linked ELF files | ||
if elfHasDynamicSection "$f" ; then | ||
echo "autoAddCudaCompatRunpathHook: patching $f" | ||
local origRpath="$(patchelf --print-rpath "$f")" | ||
patchelf --set-rpath "@libcudaPath@:$origRpath" "$f" | ||
elif (( "${NIX_DEBUG:-0}" >= 1 )) ; then | ||
echo "autoAddCudaCompatRunpathHook: skipping a statically-linked ELF file $f" | ||
fi | ||
fi | ||
done | ||
) | ||
|
||
postFixupHooks+=(autoAddCudaCompatRunpathPhase) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,4 +44,24 @@ final: _: { | |
./auto-add-opengl-runpath-hook.sh | ||
) | ||
{}; | ||
|
||
# autoAddCudaCompatRunpathHook hook must be added AFTER `setupCudaHook`. Both | ||
# hooks prepend a path with `libcuda.so` to the `DT_RUNPATH` section of | ||
# patched elf files, but `cuda_compat` path must take precedence (otherwise, | ||
# it doesn't have any effect) and thus appear first. Meaning this hook must be | ||
# executed last. | ||
autoAddCudaCompatRunpathHook = | ||
final.callPackage | ||
( | ||
{makeSetupHook, cuda_compat}: | ||
makeSetupHook | ||
{ | ||
name = "auto-add-cuda-compat-runpath-hook"; | ||
substitutions = { | ||
libcudaPath = "${cuda_compat}/compat"; | ||
}; | ||
} | ||
./auto-add-cuda-compat-runpath.sh | ||
) | ||
{}; | ||
Comment on lines
+47
to
+66
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just a thought: maybe we don't want the directly-linked Here's how it goes (well, it's all assumptions): we always want to use the newest possible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All in all, I think I agree with you. The driver is provided impurely in That being said, I think right now jetpack-nixos is lagging behind (based on NixOS 22.11), so this PR would still be a temporary (TM) way to get cuda_compat working there without waiting for a newer jetpack? But, I would be tempted to get rid of this once #256324 and associated PRs all make it into a NixOS release, and jetpack-nixos supports that release. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I wonder if we still misunderstood each other. To be clear, I hope that this PR's changes will stay and that we won't have to remove them. That cuda_compat would essentially work the same way as ROCm drivers do (which, afaiu, we link directly and which we don't need tools like nvidia-docker2 for). I only wonder about special cases where the user might just hypothetically load a different libcuda There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I interpreted this, at least on jetpack-nixos, as "let's put cuda_compat into /run/opengl-driver because it's more flexible: jetpack-nixos is now responsible for selecting cuda_compat or the normal drivers, and it's easy to dynamically change by toggling what we put in /run/opengl-driver without resorting to |
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think propagating a different version of
addOpenGLRunpath
might cause conflicts in derivations that explicitly consume their own version.Initially, I thought that all we'd need is
patchelf --add-needed
with the absolute path to${cuda_compat}/lib/libcuda.so
in https://github.com/NixOS/nixpkgs/blob/bb142a6838c823a2cd8235e1d0dd3641e7dcfd63/pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix. This would ensure that anybody trying to load, for example,libcudart.so
ends up loading the cuda-compat driver first.Now I think that might be insufficient (how do we know people don't try to dlopen libcuda directly?) and you're right about overriding
addOpenGLRunpath
. Except we might want to do that on an even more global scale, i.e. we might want to changepkgs.addOpenGLRunpath
whenever our nixpkgs instantiation happens to target an nvidia jetson (e.g.config.cudaSupport && hostPlatform.system == "aarch64-linux"
, but also maybe a special flag)All in all, I think we need to ponder at the available options a bit more. Thank you for starting this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't a simple test that
cudaPackages.cuda_compat
exists, as we do here, be sufficient? Basically, if there's a CUDA cuda_compat redist package available, we do that. Maybe we can add a test to see if we're not cross-compiling.I personally think changing
pkgs.addOpenGLRunpath
would make sense, as what we do is mostly to "extend" the usual path/run/opengl-driver
to find cuda_compat'slibcuda.so
. But we noticed that some packages are using e.g. the attributeaddOpenGLRunpath.driverLink
, so we must be careful to not change the interface of the derivation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no, e.g. cf. #266475. I'd say, we probably don't want the definition of
addOpenGLRunpath
to depend on the internals of thecudaPackages
set, but it's OK if it depends on a top-level attribute likeconfig
. I haven't thought about this much thoughYes