docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/761bd05e8ceb95e1459db860b160e9dda095254a969ebd9a0b777524f73f9263/log.json: no such file or directory): exec: "nvidia-container-runtime": executable file not found in $PATH: unknown. #166

wjimenez5271 · 2020-05-05T01:16:25Z

When following the latest instructions to install the nvidia driver on https://github.com/NVIDIA/nvidia-docker/, it says that nvidia-docker2 has been deprecated and one should install the nvidia container toolkit. I followed the instructions for Ubuntu 18.04 with Docker 19.03, however this does not seem to install the nvidia-container-runtime binary mentioned in the README for this project. This results in the docker not being able to start any container after updating the default runtime per the README in /etc/docker/daemon.json. Is this device plugin not compatible with the latest iteration of the nvidia driver? Here is the error message:

docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/761bd05e8ceb95e1459db860b160e9dda095254a969ebd9a0b777524f73f9263/log.json: no such file or directory): exec: "nvidia-container-runtime": executable file not found in $PATH: unknown.

and just to show:

ls /usr/bin/nvidia-container-runtime
ls: cannot access '/usr/bin/nvidia-container-runtime': No such file or directory

I also tried nvidia-container-cli as this is installed by the current package. Is it possible this repo needs to be updated to reflect nvidia-docker2's deprecation?

The text was updated successfully, but these errors were encountered:

ardenpm · 2020-05-19T04:02:01Z

The docs in this repo specifically state that nvidia-container-toolkit should not be used and that nvidia-docker2 should be used instead (even though deprecated) since K8s isn't aware of the --gpus Docker flag yet (not sure if that is still the case).

So it looks like the instructions for Docker and K8s are currently different. I setup per the instructions in this repo for K8s but right now I can't run anything in Docker so I doubt it will work in K8s. When I try to run with the nvidia runtime I get segfaults immediately. Still trying to track that down.

klueska · 2020-05-19T11:52:08Z

I agree, the docs are confusing and should be synchronized better.

Please see my comment here for an explanation on how nvidia-docker2 and nvidia-container-toolkit are related: #168 (comment)

Regarding the segfault, I'm curious if it could be related to: NVIDIA/nvidia-docker#1280 (comment)

ardenpm · 2020-05-19T12:03:06Z

Indeed, that comment helped make it clear. It was also reassuring to know that basically behind the scenes its basically the same since the deprecation statements on nvidia-docker2 are a bit disconcerting.

Now on the segfault, this was/is really strange. I think mine was actually different to the one in this issue. nvidia-container-cli also would segfault immediately even just using the info commands, so I don’t think it was specific to docker.

All of my testing there was on CentOS 7 latest and I wasn’t able to resolve the problem. Since I needed to do some testing I switched to Ubuntu 18.04 and was not able to replicate the issue there at all.

I still have both images in a stopped state on AWS from my testing so I can probably get more details on the actual segfault stack trace but I am not sure if others are encountering this. The actual error was related to munmap_chunk: invalid pointer.

kengz · 2020-06-07T07:23:43Z

I had the same issue setting up a k8s cluster with GPUs. Went through the comments here and other related issues, and put together the steps to make it work, probably useful to people looking for solution:

Kubernetes NVIDIA GPU device plugin

follow the official NVIDIA GPU device plugin until the step to configure runtime

as explained in this comment, k8s still needs nvidia-container-runtime; install it:

# install the old nvidia-container-runtime for k8s
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install -y nvidia-container-runtime

add the following /etc/docker/daemon.json as required by k8s

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

restart docker and test:

sudo systemctl restart docker
# test that docker can run with GPU without the --gpus flag
docker run nvidia/cuda:10.2-runtime-ubuntu18.04 nvidia-smi

finally, install the NVIDIA device plugin on your cluster:

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml

github-actions · 2024-02-29T04:25:25Z

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions · 2024-03-31T04:26:19Z

This issue was automatically closed due to inactivity.

ardenpm mentioned this issue May 24, 2020

Nvidia runtime fails to run any container NVIDIA/nvidia-docker#1252

Closed

8 tasks

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 29, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wjimenez5271 commented May 5, 2020

ardenpm commented May 19, 2020

klueska commented May 19, 2020 •

edited

ardenpm commented May 19, 2020

kengz commented Jun 7, 2020

github-actions bot commented Feb 29, 2024

github-actions bot commented Mar 31, 2024

Comments

wjimenez5271 commented May 5, 2020

ardenpm commented May 19, 2020

klueska commented May 19, 2020 • edited

ardenpm commented May 19, 2020

kengz commented Jun 7, 2020

Kubernetes NVIDIA GPU device plugin

github-actions bot commented Feb 29, 2024

github-actions bot commented Mar 31, 2024

klueska commented May 19, 2020 •

edited