-
Notifications
You must be signed in to change notification settings - Fork 2k
nvidia driver version baked into docker build #856
Comments
You are most likely building on a machine which is configured with the nvidia runtime as the default docker runtime. |
@3XX0 So do you think that currently it is not using the stub libcuda.so cause gpu is enabled at build time right? |
Yes it's definitely not using the stub for linking otherwise you wouldn't see such error. Usually if you see empty driver files in your final build image, it means that you used gpu support at build-time. These files should be harmless but can cause confusion if you are not expecting them. Also if you see something like |
So @douglas-gibbons TLDR:
|
@3XX0 In the image I already see |
Of course |
Whoever built the image you're using as a base. |
@3XX0 Ok I try to rebuild the full chain myself. |
@3XX0 Ok rebuilding the full chain with |
@3XX0 Sometime I am building on a host that uses docker compose and docker compose doesn't support runtime selection anymore (see docker/app#241). |
@3XX0 Do you think that set |
Many thanks @3XX0 - I've been trying to use the nvidia runtime as an all-purpose work-horse. You've made me re-consider that approach. While I go about fixing things properly, for now I've just (forgive me) added this to the end of the Dockerfile:
..and that has solved the immediate issue. |
@3XX0 @flx42 Same here. I don't think that docker/app#241 will be resolved in the mid term. So it is a pain to change the default run-time between the build and docker compose. If you have any elegant workaround let me know. |
If I build a docker on Machine A I should be able to run it on Machine B even if the nvidia drivers are different minor versions. I'm getting errors building a c++ app because the
libcuda.so
points to the wrong version.c++ error
It seems the root problem is that the nvidia driver library simlinks from our build machines are leaking into our docker images. See example
Dockerfile.nvidia
below on Machine A with nvidia driver versionlibcuda.so.390.77
that I stripped down to the minimum needed to reproduce. Then I tried running on Machine B with driver versionlibcuda.so.390.87
which demonstrates the issue.Machine A
Machine B
Why do I have residual simlinks for the nvidia driver
/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.390.77
in the docker image built on Machine A and how do I remove them?Is this expected behavior? We want to support client machines with different minor versions. How do we fix this?
Detailed Information below
Machine A
Machine B
The text was updated successfully, but these errors were encountered: