Error response from daemon: OCI runtime create failed + CUDA 9.1 + 396.24 #752

saurabhjha1 · 2018-06-04T17:24:13Z

Issue Description

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smidocker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=9.0 --pid=4290 /var/lib/docker/overlay2/5de1fc82ef1ec5c30c41111e5142a53b668ff9904258e918097696d27b43cad9/merged]\\\\nnvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected\\\\n\\\"\"": unknown.

Relevant Outputs

lsb_release -a output

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.4 LTS Release: 16.04 Codename: xenial

Linux Kernel Information
4.4.0-127-generic
docker info output
``
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 3
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-127-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 31.4GiB
Name: spencer
ID: A4LD:VXTZ:Q26L:AACU:GAQU:XM5D:ZOA5:MS4G:DVI6:KG5B:GS4A:RBIK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
``
nvidia-smi outout

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1758 G /usr/lib/xorg/Xorg 110MiB |
+-----------------------------------------------------------------------------+
**ldconfig output**
sudo ldconfig
echo $?
0
``

The text was updated successfully, but these errors were encountered:

flx42 · 2018-06-05T21:28:30Z

Fixed after discussing with @saurabhjha1, the issue was virtualgl that installed /etc/modprobe.d/virtualgl.conf which changed the ownership of the NVIDIA devices (0660 instead of 0666).

saurabhjha1 · 2018-06-05T21:28:38Z

I was able to solve the issue with the help of Felix Abecassis. We diagnosed the problem to vgl package-related issue.

To check if you have the same issue
cat /etc/modprobe.d/virtualgl.conf
In case, you see the output as below
options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=8204 NVreg_DeviceFileMode=0660
then, follow these steps to resolve the issue

change `NVreg_DeviceFileMode=0660` to `NVreg_DeviceFileMode=0666` in /etc/modprobe.d/virtualgl.conf then you probably need to do `sudo update-initramfs -u` then reboot

hangxu124 · 2018-10-01T14:18:53Z

Hi, I update my nvidia-docker v1 to v2, and meet the same error:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused "process_linux.go:385: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 --pid=5742 /var/lib/docker/overlay2/12bf47ad1d254d9707737629f0a2d530d499585c791a8c589c007c3f1b9fd932/merged]\\nnvidia-container-cli: requirement error: invalid expression\\n\""": unknown.

and when I try: cat /etc/modprobe.d/virtualgl.conf I got cat: /etc/modprobe.d/virtualgl.conf: No such file or directory. I dont have the virtualgl.conf file in my modprobe.d.
So, is there some method to solve that?
Thanks a lot.

flx42 · 2018-10-01T16:06:39Z

@hangxu124 it's not the same error. You are facing this issue:
#829

bruinxiong · 2018-10-25T20:33:29Z

@saurabhjha1 sorry, I met the error as follows:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused "process_linux.go:385: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=9.0 --pid=18535 /var/lib/docker/overlay2/cc3dc5a868799b61df77807eb1c778bb12c3fecd88e48bb0ec235593b0c8ed47/merged]\\nnvidia-container-cli: initialization error: cuda error: unknown error\\n\""": unknown.

Please give me as suggestion to fix it. Thanks!

bruinxiong · 2018-10-25T20:34:18Z

@flx42 docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused "process_linux.go:385: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=9.0 --pid=18535 /var/lib/docker/overlay2/cc3dc5a868799b61df77807eb1c778bb12c3fecd88e48bb0ec235593b0c8ed47/merged]\\nnvidia-container-cli: initialization error: cuda error: unknown error\\n\""": unknown.

Please give me some suggestions to fix it. Thanks!

runningXin · 2018-11-16T20:16:33Z

@bruinxiong I had the exact same issue, anyone could shed lights on it? Thx!

dav1nci · 2019-03-01T11:33:01Z

@runningXin In my case I found that docker image was built with cuda 10 binaries, but local machine had cuda 9, and this mismatch caused the error

vincentfenet · 2019-03-28T11:10:11Z

fixed by installing cuda 10.1 on host (+ driver 418 as dependency)

501st-alpha1 · 2020-03-20T20:40:30Z

I ran into this issue when installing nvidia-docker just now, and the solution from @saurabhjha1 fixed it, except for me the line was in nvidia-kernel-common.conf instead of virtualgl.conf. I'm on Debian Stretch, with driver version 440.64.

@flx42 Is there some reason that this is still necessary? Could whatever it is that installs that file be updated to do it right the first time, so we don't have to modify it manually?

flx42 closed this as completed Jun 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error response from daemon: OCI runtime create failed + CUDA 9.1 + 396.24 #752

Error response from daemon: OCI runtime create failed + CUDA 9.1 + 396.24 #752

saurabhjha1 commented Jun 4, 2018

flx42 commented Jun 5, 2018

saurabhjha1 commented Jun 5, 2018 •

edited

hangxu124 commented Oct 1, 2018

flx42 commented Oct 1, 2018

bruinxiong commented Oct 25, 2018

bruinxiong commented Oct 25, 2018

runningXin commented Nov 16, 2018

dav1nci commented Mar 1, 2019

vincentfenet commented Mar 28, 2019

501st-alpha1 commented Mar 20, 2020

Error response from daemon: OCI runtime create failed + CUDA 9.1 + 396.24 #752

Error response from daemon: OCI runtime create failed + CUDA 9.1 + 396.24 #752

Comments

saurabhjha1 commented Jun 4, 2018

Issue Description

Relevant Outputs

flx42 commented Jun 5, 2018

saurabhjha1 commented Jun 5, 2018 • edited

hangxu124 commented Oct 1, 2018

flx42 commented Oct 1, 2018

bruinxiong commented Oct 25, 2018

bruinxiong commented Oct 25, 2018

runningXin commented Nov 16, 2018

dav1nci commented Mar 1, 2019

vincentfenet commented Mar 28, 2019

501st-alpha1 commented Mar 20, 2020

saurabhjha1 commented Jun 5, 2018 •

edited