Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cuda_compat drivers when available #267247

Merged
merged 1 commit into from Dec 12, 2023

Conversation

yannham
Copy link
Contributor

@yannham yannham commented Nov 13, 2023

Description of changes

Some nvidia devices, such as the Jetson family, support a package called nvidia compatibility package (cudaPackages.cuda_compat) which allows to run executables built against a higher CUDA major version on a system with an older CUDA driver. On such platforms, the consensus among CUDA maintainers is that there is no downside in always enabling it by default.

This PR links to the relevant cuda_compat shared libraries when available by patching the executables' runpaths when cuda_compat is available, in the same way as we already do for OpenGL drivers.

We initially tried to upgrade and reuse add-patch-opengl-driver because the script does exactly what we want, excepted that we would like to add several paths, and not just one, to the runpath of concerned executables.

However, this derivation is used in many places (both the result but also the attrs such as driverLink), making it harder to reuse it or to make more generic without causing a mass rebuild and without breaking existing packages. Instead, we just override the driverLink attrs, which is small, localized, and what we want.

TODO before merging

  • Test on the Nvidia Jetson

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think propagating a different version of addOpenGLRunpath might cause conflicts in derivations that explicitly consume their own version.

Initially, I thought that all we'd need is patchelf --add-needed with the absolute path to ${cuda_compat}/lib/libcuda.so in https://github.com/NixOS/nixpkgs/blob/bb142a6838c823a2cd8235e1d0dd3641e7dcfd63/pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix. This would ensure that anybody trying to load, for example, libcudart.so ends up loading the cuda-compat driver first.

Now I think that might be insufficient (how do we know people don't try to dlopen libcuda directly?) and you're right about overriding addOpenGLRunpath. Except we might want to do that on an even more global scale, i.e. we might want to change pkgs.addOpenGLRunpath whenever our nixpkgs instantiation happens to target an nvidia jetson (e.g. config.cudaSupport && hostPlatform.system == "aarch64-linux", but also maybe a special flag)

All in all, I think we need to ponder at the available options a bit more. Thank you for starting this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a simple test that cudaPackages.cuda_compat exists, as we do here, be sufficient? Basically, if there's a CUDA cuda_compat redist package available, we do that. Maybe we can add a test to see if we're not cross-compiling.

I personally think changing pkgs.addOpenGLRunpath would make sense, as what we do is mostly to "extend" the usual path /run/opengl-driver to find cuda_compat's libcuda.so. But we noticed that some packages are using e.g. the attribute addOpenGLRunpath.driverLink, so we must be careful to not change the interface of the derivation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a simple test that cudaPackages.cuda_compat exists, as we do here, be sufficient

Yes and no, e.g. cf. #266475. I'd say, we probably don't want the definition of addOpenGLRunpath to depend on the internals of the cudaPackages set, but it's OK if it depends on a top-level attribute like config. I haven't thought about this much though

some packages are using e.g. the attribute addOpenGLRunpath.driverLink, so we must be careful to not change the interface of the derivation.

Yes

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Nov 13, 2023

Actually, no we do not want addOpenGLRunpath to know about cuda_compat. I thought we might, because it's a dependency for every cuda package, but there are other ways

What we really want is buildInputs = optionals isJetson [ cuda_compat ] in every package that uses CUDA, and we want to ensure that cuda_compat is loaded before the impurely deployed driver (so maybe an explicit --add-needed). E.g. propagatedBuildInputs = optionals isJetson [ cuda_compat ] in the setupCudaHook would do something akin

Potential blockers:

  • addOpenGLRunpath currently prepends instead of appending, thus it'll have higher priority unless we --add-needed by the absolute path, or add cuda_compat's runpath in the postFixups, or change addOpenGLRunpath's prioriy
  • Many packages won't declare libcuda.so in DT_NEEDED, but will dlopen, which is how we might end up loading the wrong driver unless priorities are taken care of
    • Most, however, will first load one of the toolkit libraries, so if we just get priorities in those right, we'll be mostly fine

@ConnorBaker
Copy link
Contributor

@yannham I just rebased on branch on the latest staging to fix the merge conflicts so you'll want to rebase as well.

@yannham yannham force-pushed the feat/cuda-compat-jetson branch 2 times, most recently from 7f232fe to ac5faa6 Compare November 14, 2023 16:34
@SomeoneSerge
Copy link
Contributor

E.g. propagatedBuildInputs = optionals isJetson [ cuda_compat ]

There's a bunch of places, currently in build-cuda-redist-package.nix where we tell autopatchelf to ignore the missing Runpath for libcuda.so - we need to make that optional on cuda_compat availability

@yannham
Copy link
Contributor Author

yannham commented Nov 14, 2023

@SomeoneSerge thanks a lot for your input! I'm not drawing any conclusion here, and you guys are the experts, but I wonder: how is the openGL driver different from cuda_compat? Because I believe your recommendation/alternative way of reaching all CUDA packages could apply to linking to the openGL driver as well. That is, instead of using autoAddOpenglRuntimeHook, we could also add it to the build inputs of each and every CUDA-enabled package.

I can only guess, but I suppose it wasn't done this way because:

  1. Absolutely each and every CUDA-enabled package needs to have access to the driver at runtime. So there seems to be no exception.
  2. Thus, it's error-prone to rely on maintainers' discipline not to forget to patch binary XXX
  3. Maybe something related to dlopen being sneaky as well, making a point similar to 2. even more annoying, because we could need to patch the build inputs of package using CUDA more indirectly through dlopen?

Are those assumptions somehow remotely correct? And if they are, is there anything differentiating cuda_compat from the openGL driver in this respect? Because I feel whatever strategy is deemed the best, we should probably apply the same to both cuda_compat's and classic drivers.

As a side-question, if you think patching build inputs is still the way to go, it sounds like a bigger change: if this PR works as it is with a very minimal change, would that be reasonable as a first step, and then open an issue or whatever tracking method to use the right approach as a follow-up?

@yannham
Copy link
Contributor Author

yannham commented Nov 15, 2023

In fact, what I'm saying might not make a lot of sense, because I thought autoAddOpenGLRuntimeHook was somehow automagically added, but it's not, it also needs to be added to the native build inputs of every CUDA package as well.

Also, there is a kind of a bootstrapping issue, because cuda_compat is also patched using the same hook, which makes an infinite loop (as the hook uses cuda_compat itself). One possibility would be to filter out the nativeBuildInputs of cuda_compat, but I believe we can't remove the whole addOpenGLRuntimeHook, because cuda_compat probably still needs to access to the other shared libraries in /run/opengl-driver. I'll try that first, but if that doesn't work - as I expect - then your suggestion of having a separate build input for adding cuda_compat might indeed by easier to work with.

@SomeoneSerge
Copy link
Contributor

how is the openGL driver different from cuda_compat?

Indeed, they are similar, which is why normally we deploy both impurely. The exact restrictions that OpenGL/libgbm imposes are different.

The cuda_compat package only ships libcuda.so, and ships it as a normal library. For this reason, we want to link it as a normal library (by putting it in buildInputs and letting the build systems discover it, and letting Nix record its exact location in the DT_RUNPATHs). That's why we also want to disable all of the ugly hacks we use to delay its linkage on other systems, where we had to link it impurely.

There will be a few programs that use dlopen(), we'll have to find those manually and patchelf them

I believe we can't remove the whole addOpenGLRuntimeHook

Yes, it does seem like we have to deploy libnvrm* impurely after all.

As suggested before, I would try reverting the addOpenGLRunpath commit, putting cuda_compat in setupCudaHook's propagatedBuildInputs instead, and seeing what works and what doesn't

@yannham
Copy link
Contributor Author

yannham commented Nov 17, 2023

So tried the proposed approach, but it doesn't work (ignore cuda_compat and show the actual version of the driver, 11.4). I can't see any libcuda.so in the DT_NEEDED field of saxpy or any of its dependencies (that is, any library that is present in the RUNPATH somewhere), which makes me think libcuda.so is always dlopen-ed, and thus adding it to the propagated build inputs just doesn't have any effect.

I personally don't see an obvious way of enabling cuda_compat without doing something impure such as we do for openGL drivers...

@yannham yannham force-pushed the feat/cuda-compat-jetson branch 2 times, most recently from 30b2433 to 85adf60 Compare November 17, 2023 16:47
@yannham
Copy link
Contributor Author

yannham commented Nov 23, 2023

The last version is attempting to do what @SomeoneSerge suggested, by duplicating the logic of autoAddOpenGLPatch but in a separate setup hook so that 1. it's not included automatically for every package using autoAddOpenGLPatch and 2. we avoid the circular/bootstraping issue with cuda_compat, which can now be patched with autoAddOpenGLPatch without being patched with the cuda_compat hook.

Unfortunately, I get the following error:

error: build of '/nix/store/g8niakiybnrfpi99c9r8q1gy6blbkjjz-saxpy-unstable-2023-07-11.drv' on 'ssh-ng://nix@ubuntu-hetzner' failed: builder for '/nix/store/g8niakiybnrfpi99c9r8q1gy6blbkjjz-saxpy-unstable-2023-07-11.drv' failed with exit code 1;
       last 10 log lines:
       > Missing variable is:
       > _CMAKE_CUDA_WHOLE_FLAG
       > CMake Error at /nix/store/22615bbcycbnik75zy9jnzadm93xqg1b-cmake-3.27.7/share/cmake-3.27/Modules/CMakeDetermineCompilerABI.cmake:57 (try_compile):
       >   Failed to generate test project build system.
       > Call Stack (most recent call first):
       >   /nix/store/22615bbcycbnik75zy9jnzadm93xqg1b-cmake-3.27.7/share/cmake-3.27/Modules/CMakeTestCUDACompiler.cmake:19 (CMAKE_DETERMINE_COMPILER_ABI)
       >   CMakeLists.txt:2 (project)
       >
       > 
       > -- Configuring incomplete, errors occurred!
       For full logs, run 'nix log /nix/store/g8niakiybnrfpi99c9r8q1gy6blbkjjz-saxpy-unstable-2023-07-11.drv'.

@yannham
Copy link
Contributor Author

yannham commented Nov 24, 2023

Additional update: the build works, and the binary run without any LD_LIBRARY_PATH trickery when using nix develop .#cudaPackages_12_0.saxpy --impure and then calling to genericBuid. It also works when building from nix-shell, but for some reason, nix-build reports the error above.

@yannham
Copy link
Contributor Author

yannham commented Nov 24, 2023

I used breakpointHook to dump the build environment and compare it to the devshell environment. They are a number of differences in the cmakeFlags. I thus used the cmakeFlags from the build environment (that fails) in the develop environment (which succeeds), and this still succeeds. So it's something else: either another part of the environment that differ (but beside sandbox-specific paths and settings, the rest seems to be mostly semantically identical), or another sandbox effect.

@SomeoneSerge
Copy link
Contributor

The error looks like cmake trying to enable_language(cuda) and failing some sanity checks... I think I had similar issues when nvcc coildn't find cudart. Could you gist the full log?

PS sorry I'm slow to respond lately'

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Nov 24, 2023

One more thing: addOpenGLRunpath assigns the impure path a higher priority so it's important that our postfixup happens last or we might load the old driver. We also need to ensure that we're prepending the compat driver to the runpath, not appending. Try inspecting --print-rpath of the resulting binaries

@yannham
Copy link
Contributor Author

yannham commented Nov 24, 2023

Here is the full CMake log. I also tried with --trace-expand added to the CMake flags. It results in a lot of noise but not much more information (we just see which CMake line is executed, but there's no like compiler errors or missing library file).

cmake.log

I also attached the environment (serialized with export -p), and with new lines added to cmakeFlags just to make the diff nicer to read, between the nix build environment that fails and the nix develop environment that succeeds.

nixbuild.env.log

nixdevelop.env.log

As mentioned above, I tried to use the exact exported cmakeFlags from the build environment in the develop environment, and the build still succeeds in the develop environment, so it seems there's more to it than just the difference in cmakeFlags. Of course one is sandboxed and not the other, so there might be many other differences, but here I don't really know what to do to go further.

Beside trying to strace everything and see in this sea of verbose logs if there's a failure in the sandboxed build environment that looks suspicious?

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Nov 24, 2023

Damn! I just realized that we probably can't have the compat libcuda.so in dt_needed, because then we couldn't e.g. load the reverse dependencies inside the nix sandbox. This is because libcuda from compat has the libnvrm* libraries in dt_needed but they're impure. E.g. we couldn't import torch, if it loads libcublas, and that loads libcuda, and that fails to find libnvrm*

Unless that's all wrong I guess we want to just modify the runpath...

@yannham
Copy link
Contributor Author

yannham commented Nov 24, 2023

Another thought: in jetpack-nixos, libcuda.so of the driver (taken from l4t-cuda) has the libnvrm* linked directly with the right absolute path. This is why, although those aren't currently put in /run/opengl-driver, the default 11.4 driver still works fine.

Could we do the same thing with the one from the compat library, and patch its libnvrm* dependencies directly with the right paths? It might be made more complicated by the fact that the l4t packages packaged by jetpack-nixos aren't in Nixpkgs, though. So maybe as a second step, but for now, going with the original (impure) plan?

Another solution, fully on the jetpack-nixos side: we can control what goes in the /run/opengl-driver easily. In fact, this is my current patch to bring the libnvrm* there (in one of jetpack-nixos' module):

     hardware.opengl.package = pkgs.nvidia-jetpack.l4t-3d-core;
     hardware.opengl.extraPackages = with pkgs.nvidia-jetpack; [
+      # l4t-core provides the libnvrm_gpu.so and libnvrm_mem.so, which are
+      # otherwise missing when using cuda_compat for the time being. Ideally,
+      # they should probably be directly set in a DT_NEEDED entry, as is the
+      # case currently with the `libcuda.so` provided by l4t-cuda
+      l4t-core
       l4t-cuda
       l4t-nvsci # cuda may use nvsci
       l4t-gbm l4t-wayland

I wonder if doing something as dumb as including cuda_compat instead of l4t-cuda there would do the trick.

@SomeoneSerge
Copy link
Contributor

Could we do the same thing with the one from the compat library, and patch its libnvrm* dependencies directly with the right paths

We'd have to get hold of the jetpack .deb files in Nixpkgs, which we currently don't package. I also expect that the libnvrm* files will turn out to be kernel-locked, and we will have to link them impurely. Doesn't hurt to test though

I wonder if doing something as dumb as including cuda_compat instead of l4t-cuda there would do the trick.

Indeed. If we deploy a jetpack-nixos, it probably only makes sense to deploy the compat cuda driver, since it offers pretty much a superset of the features in l4t-cuda's libcuda.so

However, I think we'd still want to (try) to link cuda_compat directly to the nixpkgs' apps, because we'd want them to work on vanilla/ubuntu jetsons as well

@yannham
Copy link
Contributor Author

yannham commented Dec 11, 2023

I rebased, now I just need to add a guard to not add this hook when we're not on a Jetson

Comment on lines +47 to +66

# autoAddCudaCompatRunpathHook hook must be added AFTER `setupCudaHook`. Both
# hooks prepend a path with `libcuda.so` to the `DT_RUNPATH` section of
# patched elf files, but `cuda_compat` path must take precedence (otherwise,
# it doesn't have any effect) and thus appear first. Meaning this hook must be
# executed last.
autoAddCudaCompatRunpathHook =
final.callPackage
(
{makeSetupHook, cuda_compat}:
makeSetupHook
{
name = "auto-add-cuda-compat-runpath-hook";
substitutions = {
libcudaPath = "${cuda_compat}/compat";
};
}
./auto-add-cuda-compat-runpath.sh
)
{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: maybe we don't want the directly-linked cuda_compat to take precedence over the impure path. Rather, we want the impure path to also contain a cuda_compat. I.e. we want jetpack-nixos to deploy the compat libcuda, and we want the wrappers for ubuntu to also prefer the compat library over normal one

Here's how it goes (well, it's all assumptions): we always want to use the newest possible cuda_compat. We're going to link one directly through DT_RUNPATH at build time, which we expect to be newer than libcuda in the jetpack Ubuntu installation. However, a year or a few down the line, we might want to reuse a package built from an old nixpkgs revision. It conceivable that we'd be running a much newer kernel module, one for which cuda_compat in the runpath would be too old. Then we just deploy a newer cuda_compat on our jetpack-nixos and make the app use it. We could do that via LD_LIBRARY_PATH/LD_PRELOAD, or we could try to give the impure runpath higher priority. The latter might hypothetically break because it could rely on a newer libc/libstdc++, but in practice nvidia seems to always link old versions. LD_* are always an option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, I think I agree with you. The driver is provided impurely in /run/opengl-driver, and as cuda_compat is supposed be a drop-in replacement for the driver, I don't see a good reason why it should be treated differently.

That being said, I think right now jetpack-nixos is lagging behind (based on NixOS 22.11), so this PR would still be a temporary (TM) way to get cuda_compat working there without waiting for a newer jetpack? But, I would be tempted to get rid of this once #256324 and associated PRs all make it into a NixOS release, and jetpack-nixos supports that release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR would still be a temporary (TM) way to

I wonder if we still misunderstood each other. To be clear, I hope that this PR's changes will stay and that we won't have to remove them. That cuda_compat would essentially work the same way as ROCm drivers do (which, afaiu, we link directly and which we don't need tools like nvidia-docker2 for). I only wonder about special cases where the user might just hypothetically load a different libcuda

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ust a thought: maybe we don't want the directly-linked cuda_compat to take precedence over the impure path. Rather, we want the impure path to also contain a cuda_compat. I.e. we want jetpack-nixos to deploy the compat libcuda, and we want the wrappers for ubuntu to also prefer the compat library over normal one

I interpreted this, at least on jetpack-nixos, as "let's put cuda_compat into /run/opengl-driver because it's more flexible: jetpack-nixos is now responsible for selecting cuda_compat or the normal drivers, and it's easy to dynamically change by toggling what we put in /run/opengl-driver without resorting to LD_". In the latter case, would this PR still be needed? It seems to me that the answer is no, but maybe I'm missing something

Some nvidia devices, such as the Jetson family, support the Nvidia compatibility package (nvidia_compat) which allows to run executables built against a higher CUDA major version on a system with an older CUDA driver. On such platforms, the consensus among CUDA maintainers is that there is no downside in always enabling it by default.

This commit links to the relevant cuda_compat shared libraries by patching the CUDA core packages' runpaths when cuda_compat is available, in the same way as we do for OpenGL drivers currently.
@yannham
Copy link
Contributor Author

yannham commented Dec 12, 2023

This branch has been rebased, fixed and tested on Jetson. It's good to go

@ConnorBaker ConnorBaker merged commit 9ad72f8 into NixOS:master Dec 12, 2023
7 of 8 checks passed
@SomeoneSerge
Copy link
Contributor

Ofborg mentions cuda_compat.src in eval errors, potentially related to this PR: https://gist.github.com/GrahamcOfBorg/ba2bf54f652ba1e89c1552139c48778a

@yannham
Copy link
Contributor Author

yannham commented Dec 12, 2023

Argh. Thanks for noticing @SomeoneSerge. Maybe the guard isJetsonBuild at usage site of autoAddCudaCompatRunpathHook isn't enough, and something tries to evaluate cudaPackages.cuda_compat when it should not?

@RaitoBezarius
Copy link
Member

Please @ConnorBaker do not merge when ofborg-eval has not finished on such PRs. Maybe the breakage was avoidable on master here...

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/on-nixpkgs-and-the-ai-follow-up-to-2023-nix-developer-dialogues/37087/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants