Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use with Docker 19.03 / nvidia-container-toolkit ? #168

Closed
bryanlarsen opened this issue May 8, 2020 · 5 comments
Closed

How to use with Docker 19.03 / nvidia-container-toolkit ? #168

bryanlarsen opened this issue May 8, 2020 · 5 comments

Comments

@bryanlarsen
Copy link

1. Issue or feature description

The documentation appears to be for use with nvidia-docker2, which has been deprecated and replaced with nvidia-container-toolkit, as far as I can tell.

2. Steps to reproduce the issue

Resulting error:

2020/05/08 17:32:45 Loading NVML
2020/05/08 17:32:45 Failed to initialize NVML: could not load NVML library.
2020/05/08 17:32:45 If this is a GPU node, did you set the docker default runtime to `nvidia`?
2020/05/08 17:32:45 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2020/05/08 17:32:45 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start

nvidia-container-toolkit doesn't provide an "nvidia" runtime, so I can't make it the default

3. Information to attach (optional if deemed irrelevant)

docker version: 19.03.8
kubernetes version: 1.18.2
nvidia-container-cli -V: version: 1.0.7
nvidia-driver: 440.59-0ubuntu0.18.04

@klueska
Copy link
Contributor

klueska commented May 8, 2020

Yeah, with docker 19.03 and the requirement to only install nvidia-container-toolkit (instead of nvidia-docker2) there is no automatic setup of the nvidia runtime in /etc/docker/daemon.json anymore.

Try modifying /etc/docker/daemon.json to:

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

There is no need for this to be set anymore for docker alone, but it is still needed for the k8s-device-plugin to work with it.

@bryanlarsen
Copy link
Author

/usr/bin/nvidia-container-runtime doesn't exist. I've got a /usr/bin/nvidia-container-runtime-hook and and /usr/bin/nvidia-container-toolkit. Should I try one of those?

@bryanlarsen
Copy link
Author

That gives the error docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/e1d6281f9df6c39a38229edc6cf294a0bbeae61e0a374f5bcde1f1c61a16f8a6/log.json: no such file or directory): /usr/bin/nvidia-container-toolkit did not terminate sucessfully: unknown.

It appears that nvidia-docker-toolkit and nvidia-docker2 can be installed simultaneously. I did that, which gave me a nvidia-container-runtime which I put in daemon.json and it looks like it's working at first glance.

@klueska
Copy link
Contributor

klueska commented May 8, 2020

Sorry. Yes. I'm confusing 2 different things. I apologize.
Forget everything I said before. The following is what I should have told you in the first place.

If you want to use K8s with docker 19.03 and GPUs, you need to continue using nvidia-docker2, not just nvidia-container-toolkit.

That said, nvidia-docker2 was recently rearchitected to simply wrap nvidia-container-toolkit with a thin wrapper to serve as the nvidia-container-runtime that you need to set in /etc/docker/daemon.json.

All this wrapper does is take the runc spec as input, inject the nvidia-container-runtime-hook as a prestart hook and call out to the native runc, passing it the modfied runc spec with that hook set. Moreover, the nvidia-container-runtime-hook is now just a symlink to nvidia-container-toolkit.

So you are basically running on the exact same stack as you would be whether you install nvidia-docker2 or nvidia-container-toolkit, except that nvidia-docker2 will install a thin runtime for you that you can set in /etc/docker/daemon.json.

@bryanlarsen
Copy link
Author

Thanks! I appear to be working fine. You're welcome to close the issue or leave it open as a documentation issue so that your excellent explanation is more visible to others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants