Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Found no NVIDIA driver on your system #533

Closed
danlou opened this issue Nov 14, 2017 · 4 comments
Closed

Found no NVIDIA driver on your system #533

danlou opened this issue Nov 14, 2017 · 4 comments

Comments

@danlou
Copy link

danlou commented Nov 14, 2017

I've installed the latest version of nvidia-docker (v2) in order to run PyTorch on the GPU from this container - https://hub.docker.com/r/allennlp/allennlp/.

I've followed your instructions and am able to open nvidia-smi inside the container, which displays the driver version 387.12, same as my local machine.

Still, whenever I try to use the GPU on PyTorch inside the container, I get the error 'Found no NVIDIA driver on your system', which is odd given that nvidia-smi seems to show everything fine. It could be an issue with PyTorch, but probably not since it runs just fine on the local machine.

Any suggestions? Thanks

@flx42
Copy link
Member

flx42 commented Nov 14, 2017

What are the steps that you used?

@danlou
Copy link
Author

danlou commented Nov 14, 2017

I installed nvidia-docker on Ubuntu 16.04 (docker version 17.09.0-ce, build afdb6d4) with:

sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd

Opened the container with:
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -it --rm allennlp/allennlp

And tested this script in ipython:

import torch
x = torch.Tensor(5, 3)
x.cuda()

Obtaining this traceback:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-3-795404c7d5aa> in <module>()
----> 1 x.cuda()

/usr/local/lib/python3.6/site-packages/torch/_utils.py in _cuda(self, device, async)
     64         else:
     65             new_type = getattr(torch.cuda, self.__class__.__name__)
---> 66             return new_type(self.size()).copy_(self, async)
     67 
     68 

/usr/local/lib/python3.6/site-packages/torch/cuda/__init__.py in _lazy_new(cls, *args, **kwargs)
    264 @staticmethod
    265 def _lazy_new(cls, *args, **kwargs):
--> 266     _lazy_init()
    267     # We need this method only for lazy init, so we can remove it
    268     del _CudaBase.__new__

/usr/local/lib/python3.6/site-packages/torch/cuda/__init__.py in _lazy_init()
     82         raise RuntimeError(
     83             "Cannot re-initialize CUDA in forked subprocess. " + msg)
---> 84     _check_driver()
     85     torch._C._cuda_init()
     86     torch._C._cuda_sparse_init()

/usr/local/lib/python3.6/site-packages/torch/cuda/__init__.py in _check_driver()
     56 Found no NVIDIA driver on your system. Please check that you
     57 have an NVIDIA GPU and installed a driver from
---> 58 http://www.nvidia.com/Download/index.aspx""")
     59         else:
     60             # TODO: directly link to the alternative bin that needs install

AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

Also:
$ docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -it --rm allennlp/allennlp nvidia-smi

Tue Nov 14 22:04:15 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.12                 Driver Version: 387.12                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     Off  | 00000000:01:00.0  On |                  N/A |
|  0%   28C    P8    15W / 196W |    631MiB /  4034MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

@flx42
Copy link
Member

flx42 commented Nov 14, 2017

Ok, this image doesn't have the label recognized by nvidia-docker 1.0 nor the environment variables recognized by nvidia-docker 2.0.

As mentioned in the README, you can enable images that are not based on our nvidia/cuda images by passing additional environment variables to trigger nvidia-docker 2.0:

docker run -ti --runtime=nvidia -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all allennlp/allennlp

You were pretty close, but you were missing the compute capability.

@flx42 flx42 closed this as completed Nov 15, 2017
@danlou
Copy link
Author

danlou commented Nov 15, 2017

that did it, thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants