-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to initialize NVML: GPU access blocked by the operating system #1
Comments
Never mind, I fooled myself. My host has 346.47, but the version included in the CUDA installer is 346.46. Thanks! |
That's quite a subtle problem, so I'm glad you managed to solve the issue yourself. I'll be pushing an update soon which includes the CUDA driver version in the readmes so that people can check their hosts first. |
Got this same error with a mismatch of driver versions: 346.46 to 346.72. Make sure they match exactly! |
Useful to know for passers-by: You cannot combine this Dockerfile with a host that has CUDA installed via one of the DEB packages from https://developer.nvidia.com/cuda-downloads, because those usually compare newer driver versions while the To get around this, either downgrade your host driver to also use the one from the
(Not using a Dockerfile here because it makes it really hard to get a file from the host into the container without it blowing up the container disk space.) |
@nh2 I tried as your guidance. Everything looks fine. However, the problem persists. My host is a centOS, but the docker is built on Ubuntu. Does that make a difference? |
|
@tengpeng The OS mismatch is almost certainly the problem when it comes to drivers. I am only going to support Ubuntu (but will take PRs for CentOS if anyone wants to maintain that), therefore I suggest trying NVIDIA's project and letting me know so that I can update my documentation. |
@Kaixhin I tried Nvidia's docker. I passed all tests and I can use the digits set. However, I fail to get the mxnet docker works.
|
@tengpeng Looks like you will have to build your own image using this Dockerfile for reference. It will need to start from a NVIDIA CentOS image and replace any Ubuntu-specific commands with CentOS-specific commands (like |
If you want to use nvidia's thing and build off of an ubuntu image instead https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/caffe/Dockerfile i.e. you can start from cuda:7.0-cudnn4-runtime. On Wed, Feb 10, 2016 at 10:40 AM, Kai Arulkumaran notifications@github.com
|
@bshillingford Thanks! |
@Kaixhin Thanks! Previously, I thought I could call docker/cuda to run docker/mxnet. |
@tengpeng have you fixed the issue? Please share your solution/dockerfile if you did. I am encountering the same issue. My host is cent os, with CUDA 352.79 installed. I am trying to find out how to make mxnet run on it. |
@Yunrui I have successfully installed mxnet on my hard disk, rather then running a docker. If I understand the author of the docker correctly, the docker support Ubuntu only. |
Hi, I have a same problem, running on a g.2xlarge EC2 host instance with ubuntu 14.04, when I run
I get
I compared the driver, both the same:
any help would be appreciated |
@jamborta I cannot give you detail advice but NVIDIA now provides their own docker containers (https://github.com/NVIDIA/nvidia-docker) which you might want to try. |
@jamborta I found the issues with that driver version on this NVIDIA thread, which is why you will probably want to make your own Docker images using NVIDIA's official tools if you want to use EC2. Otherwise, if you can (I don't use EC2 and don't know what's possible), you could try reinstalling the drivers and/or CUDA on the host using the runfile. |
@tengpeng I want to use the NVIDIA Driver Version: 352.79 to do some research,but the NVIDIA of working docker is 346.46,could you give me your docker-cuda-container link? Thank you for your replys. |
Try updating the version of CUDA to the latest thing. |
Does the container need to be using the same version of cuda as the host machine? If so, this is like an unprecedented runtime dependency that a container now has. Meaning, a Docker container now has specific docker hosts in which it can run on. Is there a way to even specify such a dependency? I had assumed Docker containers made no assumptions about the underlying Dockerhost. |
@tommyjcarpenter this is a kernel issue - you can find more details on the NVIDIA Docker repo. FYI all CUDA images in this repo now use their setup, so they are subject to whatever restrictions NVIDIA have needed. |
Hi, first of all, thanks for sharing these Dockerfiles. I've been trying to use your
kaixhin/cuda
, but I can't access the GPUs within the container. I'm fairly certain both the host and container are running the same CUDA versions, 7.0.28. Butnvidia-smi
always outputsFailed to initialize NVML: GPU access blocked by the operating system
. Alsonvidia-smi -a
produces the same error, so I can't find a way to get more information about this error. Do you have any ideas what this could be caused by?Thanks!
Brendan
Within the docker container:
On the host:
The text was updated successfully, but these errors were encountered: