Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

"CUDA driver version is insufficient" despite sufficient drivers #24

Closed
wielandbrendel opened this issue Dec 21, 2015 · 9 comments
Closed
Labels

Comments

@wielandbrendel
Copy link

I am receiving the following error upon running deviceQuery in nvidia/cuda:7.0-cudnn3-devel:

./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

However, I have run containers with CUDA7 on the same system before. The host system is on 346.46, which should be sufficient. The container was started with

docker run --device /dev/nvidia-uvm:/dev/nvidia/uvm --device /dev/nvidia0:/dev/nvidia0 --device  \
/dev/nvidia1:/dev/nvidia1 --device /dev/nvidia2:/dev/nvidia2 --device /dev/nvidia3:/dev/nvidia3 --device \
/dev/nvidiactl:/dev/nvidiactl -it nvidia/cuda:7.0-cudnn3-devel bash

Any idea why that happens or what I should check? A big thanks in advance!

@flx42
Copy link
Member

flx42 commented Dec 21, 2015

You need to use our nvidia-docker wrapper script.

With our approach, we do not install the driver inside the image. This is the only solution in order to have CUDA images truly independent with the driver version of the host.
As a result, we need to mount the driver libraries from the host inside the container when it is started:

NV_LIBS_CUDA="cuda \
nvcuvid \
nvidia-compiler \
nvidia-encode \
nvidia-ml"

If you don't want to use the wrapper script, you could do like Tensorflow:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker#running-the-container
But it's less portable than using our script.

@wielandbrendel
Copy link
Author

Sorry for overlooking the wrapper - it indeed works perfectly on the nvidia/cuda:7.0-cudnn3-devel image. However, if I am running the same script on an image that is based on nvidia/cuda then I get

[ NVIDIA ] =INFO= Not a CUDA image, nothing to be done

which seems to be due to the script checking the version/name of the image it is applied to. How would you recommend to handle images based on your images?

@flx42
Copy link
Member

flx42 commented Dec 21, 2015

It should not happen, it's checking the label present in the base image:
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/Dockerfile#L13

So it should work fine with images based on this one, do you have a small repro where it doesn't work?
Thanks!

@flx42
Copy link
Member

flx42 commented Dec 21, 2015

Ah, I think I found the problem, if you don't have any image locally:

$ nvidia-docker run -ti nvidia/cuda:7.5
Error: No such image or container: nvidia/cuda:7.5
[ NVIDIA ] =INFO= Not a CUDA image, nothing to be done

Unable to find image 'nvidia/cuda:7.5' locally
7.5: Pulling from nvidia/cuda
0bf056161913: Pull complete 
[...]

$ nvidia-docker run -ti nvidia/cuda:7.5
[ NVIDIA ] =INFO= Driver version: 352.68
[ NVIDIA ] =INFO= CUDA image version: 7.5

The first time, the image is not present locally. So it's a bug in our label detection code.

@flx42
Copy link
Member

flx42 commented Dec 21, 2015

We will think of a solution, in the mean time doing docker pull user/myimage before nvidia-docker run user/myimage should work.
Or use the other option I described in my first reply.

@wielandbrendel
Copy link
Author

There is an additional problem unrelated to non-pulled images: I discovered that the error occurs if the number of flags is too large. E.g. the following does not work:

GPU=0,1,2,3 ./nvidia-docker run -m 300M -a stdout -a stdin -i -t -d nvidia/cuda:7.0-cudnn3-devel

but

GPU=0,1,2,3 ./nvidia-docker run -m 300M -a stdout -a stdin -itd nvidia/cuda:7.0-cudnn3-devel

does (aside from some conflicting flag issues...). The error I get in the first line is

flag provided but not defined: -m0
See 'docker inspect --help'.
[ NVIDIA ] =INFO= Not a CUDA image, nothing to be done

So the current workaround would be too lower the number of flags used.

@flx42
Copy link
Member

flx42 commented Dec 22, 2015

Ping @3XX0

@3XX0
Copy link
Member

3XX0 commented Dec 22, 2015

Sorry, it was an issue in the parsing, it's fixed now.

Regarding the issue where you need to pull before doing run, unfortunately this is a limitation with nvidia-docker. This is due to the fact that there is no way to inspect an image stored remotely.

@flx42
Copy link
Member

flx42 commented Jan 4, 2016

Closing, since I believe this is fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants