nvidia-docker on Azure broken after restart

We run a few machines on Azure with nvidia-docker on them. Instances get restarted overnight (they are used on demand). After starting instances today, we started to experience problems with docker instances (launched with nvidia-docker) not being able to access nvidia drivers. 

Running nvidia-smi on host comes back with proper results, 

```
>nvidia-smi
Thu Mar 23 20:22:57 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | B60C:00:00.0     Off |                    0 |
| N/A   39C    P8    27W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```

Running nvidia-smi inside nvidia/cuda comes back with the following:

```
>nvidia-docker run --rm nvidia/cuda nvidia-smi
container_linux.go:247: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH"
docker: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH".
```

We have previously dealt with the issue of nvidia drivers disappearing after restart, but this issue is different due to nvidia-smi working perfectly on host after restart.

We tried removing nvidia-docker, uninstalling nvidia drivers, restarting and installing everything over. That didn't help. Everything was installed according to wiki/azure instructions on this repo. 

Appreciate any help you can provide. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvidia-docker on Azure broken after restart #349

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nvidia-docker on Azure broken after restart #349

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions