Found no NVIDIA driver on your system #533

danlou · 2017-11-14T18:00:13Z

I've installed the latest version of nvidia-docker (v2) in order to run PyTorch on the GPU from this container - https://hub.docker.com/r/allennlp/allennlp/.

I've followed your instructions and am able to open nvidia-smi inside the container, which displays the driver version 387.12, same as my local machine.

Still, whenever I try to use the GPU on PyTorch inside the container, I get the error 'Found no NVIDIA driver on your system', which is odd given that nvidia-smi seems to show everything fine. It could be an issue with PyTorch, but probably not since it runs just fine on the local machine.

Any suggestions? Thanks

flx42 · 2017-11-14T18:08:21Z

What are the steps that you used?

danlou · 2017-11-14T22:02:56Z

I installed nvidia-docker on Ubuntu 16.04 (docker version 17.09.0-ce, build afdb6d4) with:

sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd

Opened the container with:
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -it --rm allennlp/allennlp

And tested this script in ipython:

import torch
x = torch.Tensor(5, 3)
x.cuda()

Obtaining this traceback:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-3-795404c7d5aa> in <module>()
----> 1 x.cuda()

/usr/local/lib/python3.6/site-packages/torch/_utils.py in _cuda(self, device, async)
     64         else:
     65             new_type = getattr(torch.cuda, self.__class__.__name__)
---> 66             return new_type(self.size()).copy_(self, async)
     67 
     68 

/usr/local/lib/python3.6/site-packages/torch/cuda/__init__.py in _lazy_new(cls, *args, **kwargs)
    264 @staticmethod
    265 def _lazy_new(cls, *args, **kwargs):
--> 266     _lazy_init()
    267     # We need this method only for lazy init, so we can remove it
    268     del _CudaBase.__new__

/usr/local/lib/python3.6/site-packages/torch/cuda/__init__.py in _lazy_init()
     82         raise RuntimeError(
     83             "Cannot re-initialize CUDA in forked subprocess. " + msg)
---> 84     _check_driver()
     85     torch._C._cuda_init()
     86     torch._C._cuda_sparse_init()

/usr/local/lib/python3.6/site-packages/torch/cuda/__init__.py in _check_driver()
     56 Found no NVIDIA driver on your system. Please check that you
     57 have an NVIDIA GPU and installed a driver from
---> 58 http://www.nvidia.com/Download/index.aspx""")
     59         else:
     60             # TODO: directly link to the alternative bin that needs install

AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

Also:
$ docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -it --rm allennlp/allennlp nvidia-smi

Tue Nov 14 22:04:15 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.12                 Driver Version: 387.12                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     Off  | 00000000:01:00.0  On |                  N/A |
|  0%   28C    P8    15W / 196W |    631MiB /  4034MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

flx42 · 2017-11-14T22:46:44Z

Ok, this image doesn't have the label recognized by nvidia-docker 1.0 nor the environment variables recognized by nvidia-docker 2.0.

As mentioned in the README, you can enable images that are not based on our nvidia/cuda images by passing additional environment variables to trigger nvidia-docker 2.0:

docker run -ti --runtime=nvidia -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all allennlp/allennlp

You were pretty close, but you were missing the compute capability.

danlou · 2017-11-15T21:32:01Z

that did it, thanks!

flx42 closed this as completed Nov 15, 2017

MartinaRuocco mentioned this issue Jan 18, 2021

[solved] "Found no NVIDIA driver on your system" and "cannot connect to X server" nlakshmanan/ADL_project#1

Open

KagurazakaNyaa mentioned this issue Mar 1, 2023

Failed to start docker compose --profile auto up --build get error " Found no NVIDIA driver on your system." AbdBarho/stable-diffusion-webui-docker#348

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Found no NVIDIA driver on your system #533

Found no NVIDIA driver on your system #533

danlou commented Nov 14, 2017

flx42 commented Nov 14, 2017

danlou commented Nov 14, 2017 •

edited

Loading

flx42 commented Nov 14, 2017 •

edited

Loading

danlou commented Nov 15, 2017

Found no NVIDIA driver on your system #533

Found no NVIDIA driver on your system #533

Comments

danlou commented Nov 14, 2017

flx42 commented Nov 14, 2017

danlou commented Nov 14, 2017 • edited Loading

flx42 commented Nov 14, 2017 • edited Loading

danlou commented Nov 15, 2017

danlou commented Nov 14, 2017 •

edited

Loading

flx42 commented Nov 14, 2017 •

edited

Loading