Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Error response from daemon: OCI runtime create failed #614

Closed
sconeyard opened this issue Jan 22, 2018 · 8 comments
Closed

Error response from daemon: OCI runtime create failed #614

sconeyard opened this issue Jan 22, 2018 · 8 comments

Comments

@sconeyard
Copy link

sconeyard commented Jan 22, 2018

1. Issue or feature description

Nvidia-Docker stopped working.
I had a jupyterhub running with nvidia-docker supported and it worked quite well.
Today I logged into the host system and ran sudo apt-get update/upgrade, and somehow, suddenly Nvidia-Docker does not work anymore. That said I can't recall if the upgrade actually did something so that might not be the root of the issue.
System runs debian.

2. Steps to reproduce the issue

sudo docker run --rm nvidia/cuda:8.0-devel nvidia-smi

docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "process_linux.go:398: container init caused \"process_linux.go:381: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=8.0 --pid=25807 /var/lib/docker/overlay2/8127e7486398ec495fc98de2cee1f18e769ee97f43211ccbc455a058d3b3923a/merged]\\\\nnvidia-container-cli: ldcache error: open failed: /sbin/ldconfig.real: no such file or directory\\\\n\\\"\"": unknown.

3. Information to attach (optional if deemed irrelevant)

$uname -a Linux donna 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux

 $ docker version
Client:
 Version:	17.12.0-ce
 API version:	1.35
 Go version:	go1.9.2
 Git commit:	c97c6d6
 Built:	Wed Dec 27 20:11:19 2017
 OS/Arch:	linux/amd64

Server:
 Engine:
  Version:	17.12.0-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.2
  Git commit:	c97c6d6
  Built:	Wed Dec 27 20:09:54 2017
  OS/Arch:	linux/amd64
  Experimental:	false

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.82                 Driver Version: 375.82                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 0000:41:00.0     Off |                  N/A |
|  0%   23C    P0    55W / 250W |      0MiB / 11170MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$ nvidia-container-cli -V
version: 1.0.0
build date: 2018-01-11T00:29+00:00
build revision: 4a618459e8ba522d834bb2b4c665847fae8ce0ad
build compiler: x86_64-linux-gnu-gcc-6 6.3.0 20170516
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

@sconeyard
Copy link
Author

Sorry for causing the trouble, it seems that I had the wrong sources list installed. To everyone running Debian and having this issue: Make sure you get your stuff from here: https://nvidia.github.io/nvidia-docker/

@flx42
Copy link
Member

flx42 commented Jan 22, 2018

You were probably using the Ubuntu packages instead of the Debian ones.

@flx42 flx42 closed this as completed Jan 22, 2018
@khallaghi
Copy link

Is it possible to have other causation?
I have exactly the same issue on the same platform(debian stretch) but I installed from the right repository.

@protopyte
Copy link

@khallaghi I believe so. I first got hit by #677, then this one.
This is however not a Debian stretch, but a mix of testing and unstable.

My workaround was to symlink /sbin/ldconfig to /sbin/ldconfig.real

@undcloud
Copy link

undcloud commented Jan 2, 2020

met the same problem,thanks @sleveque
sudo docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: ldcache error: open failed: /sbin/ldconfig.real: no such file or directory\\n\""": unknown.
ERRO[0000] error waiting for container: context canceled
Solution:
ln -s /sbin/ldconfig /sbin/ldconfig.real

@davidshen84
Copy link

Hi,

Sorry to post on a closed ticket. Could someone help me understand why we need to create this symlink? It seems neither nvidia nor glibc intended to create this link. But it is consumed in the application. Is it some legacy naming issue?

Thanks.

@klueska
Copy link
Contributor

klueska commented Apr 6, 2021

It just depends on what the real binary (not any wrapper shell script is on your host). You don't have to create a symlink, you can also change the path to it in /etc/nvidia-container-runtime/config.toml

@shahriar8866
Copy link

I have a google compute vm with Debian 11.
for install GPU Tesla/T4 and activate gpu for microk8s node follow these steps:
1- make sure remove all nvidia and cuda:

  • sudo apt-get remove --purge '^nvidia-.*'
  • sudo apt-get remove --purge '^libnvidia-.*'
  • sudo apt-get remove --purge '^cuda-.*'
  • sudo apt autoremove
  • sudo apt autoclean
    2- sudo apt-get install linux-headers-$(uname -r)
    3- Make sure you have python3 installed on your VM.
    4- Download gpu installation python script:
  • curl https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py --output install_gpu_driver.py
    5- sudo python3 install_gpu_driver.py
    6- Test gpu by nvidia-smi
    image
    7- enable gpu on microk8s by microk8s enable gpu
    8- Test GPU on microk8s by microk8s kubectl run gpu-test --rm -t -i --restart=Never --image=nvcr.io/nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
  • if the error nvidia-container-cli.real: ldcache error: open failed: /sbin/ldconfig.real: no such file or directory: unknown happen, this a work around:
  • sudo cp -r /sbin/ldconfig /sbin/ldconfig.real
  • now try microk8s kubectl run gpu-test --rm -t -i --restart=Never --image=nvcr.io/nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants