Understanding the relationship between nvidia-docker-toolkit and nvidia-docker-runtime #1035

brannondorsey · 2019-08-02T17:23:26Z

Congrats on the new release of nvidia-container-runtime and deprecation of nvidia-docker2! We write software which allows users to run jobs on their own GPU hardware via NVIDIA Docker, and are trying to understand exactly what this change means for us.

Currently, we instruct users to download and install NVIDIA docker (nvidia-docker2), which in the past, made the nvidia runtime available to Docker. We would then use this runtime when starting containers on their machines. With the new update, however, the nvidia runtime is not registered with the user's docker engine, so we cannot rely on it. It would be trivial for us to instruct users to instead download nvidia-docker-runtime, and register the runtime manually, but I'm looking to gain insight on the best solution moving forward.

Is using the custom nvidia runtime still recommended, or is it preferred to use nvidia-docker-toolkit instead, using the native runc runtime and the new --gpus flag or HostConfig.DeviceRequests when creating containers via POST /containers/create in the API (CHANGELOG notes here)?

I realize this question is similar to one asked previously #815, however, it seems the details have changed a bit with the recent release. In general, we are looking to modify our software to find the best solution given these requirements:

Easy for novice users to install and get running. Ideally an apt-get install from a public repository, or simpler.
Is forward thinking and will be supported for a long time
Allows us to define which GPU devices to make available inside the containers
Has the possibility of being available on Windows or MacOS platforms now or in the future

Is nvidia-docker-toolkit (not nvidia-docker2, nvidia-docker-runtime, or some other solution) the preferred solution to accessing NVIDIA GPU hardware inside docker containers given these criteria?

The text was updated successfully, but these errors were encountered:

guptaNswati · 2019-08-02T20:32:05Z

Sorry for the confusion. With deprecating nvidia-docker2, we have renamed our packages and i understand it might be a bit confusing.

First, the renamed package nvidia-container-toolkit contains our library (libnvidia and the cli) plus our pre-start-hook. There is no nvidia modified runc to call the pre-start-hook anymore as docker is directly calling our pre-start-hook. For backward compatibility and supporting existing users, we have released nvidia-docker2 packages also to support both --gpu and --runtime options for different docker versions.

Docker versions
docker (>= 19.03) -> automatically calls into nvidia-container-runtime-hook.

No need to register the runtime thats what --gpu option is doing plus more. So if users have docker 19.03 they just need to do sudo apt-get install -y nvidia-container-toolkit. We handle everything from there.

Old docker versions docker (< 19.03):
-> either install nvidia-docker2 packges (no need to register the runtime)
-> or install the nvidia-container-runtime (you would need to register the runtime)

Check: https://github.com/NVIDIA/nvidia-docker/tree/master#upgrading-with-nvidia-docker2-deprecated

The recommended solution is to update to docker 19.03 and install nvidia-container-toolkit.

Hope this clears your confusion. Closing it for now.

RenaudWasTaken · 2019-08-02T21:54:12Z

Hey that's a cool integration of nvidia-docker! Thanks for sharing that with us!

brannondorsey · 2019-08-02T22:29:19Z

Hey there, thanks for the quick responses and kind words 😸

This is very helpful for us to understand how to move forward, so I appreciate you taking the time to provide guidance @guptaNswati. We create containers programmatically via the Docker engine API, and I assume that the most recent version of the API provides support for doing this with GPU devices as well. Is there any magic to be aware of with the --gpu option that can't be replicated by a POST /containers/create? Seems like HostConfig.DeviceRequests would be the way to go with the new approach, rather than specifying an nvidia runtime on creation. Are there any examples in the docs about the values the parameters with such a request should take on to share the GPU with the container?

In the past, we also detected the presence of nvidia-docker2 by checking if nvidia was an available runtime via a GET /info call (equivalent to docker info). With the new approach, what's the best way to determine if the user has nvidia-container-toolkit installed and correctly configured?

guptaNswati · 2019-08-05T16:57:40Z

Hey we don't have any documentation on how to directly use the docker API for GPU support but take a look at this moby/moby@8f936ae and for specifying device and capabilities, you can pass a list of comma separated values.

impala454 · 2020-09-28T14:05:08Z

The installation documentation still referes to the deprecated package. It would be nice for that to be updated to reflect the information in this issue. I was confused until I arrived here via google.

klueska · 2020-10-05T09:18:44Z

@impala454 Unfortunately nvidia-docker2 is not actually deprecated for all use-cases. Please see the following for a bit more clarification:

#1268 (comment)

There are plans to integrate this info into the official documentation, but it hasn't happened yet unfortunately.
/cc @dualvtable

guptaNswati closed this as completed Aug 2, 2019

brannondorsey mentioned this issue Aug 2, 2019

Instruct users to download nvidia-docker2 for now runwayml/learn#46

Merged

wideblue mentioned this issue Aug 6, 2019

Unknown runtime specified nvidia aws-deepracer-community/deepracer-core#45

Closed

ggregoire mentioned this issue Dec 4, 2019

Support for NVIDIA GPUs under Docker Compose docker/compose#6691

Closed

dmandalidis mentioned this issue Dec 15, 2019

Add support for --gpus new in Docker 19.0.3 dmandalidis/docker-client#75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding the relationship between nvidia-docker-toolkit and nvidia-docker-runtime #1035

Understanding the relationship between nvidia-docker-toolkit and nvidia-docker-runtime #1035

brannondorsey commented Aug 2, 2019

guptaNswati commented Aug 2, 2019 •

edited

Loading

RenaudWasTaken commented Aug 2, 2019

brannondorsey commented Aug 2, 2019 •

edited

Loading

guptaNswati commented Aug 5, 2019

impala454 commented Sep 28, 2020

klueska commented Oct 5, 2020

Understanding the relationship between nvidia-docker-toolkit and nvidia-docker-runtime #1035

Understanding the relationship between nvidia-docker-toolkit and nvidia-docker-runtime #1035

Comments

brannondorsey commented Aug 2, 2019

guptaNswati commented Aug 2, 2019 • edited Loading

RenaudWasTaken commented Aug 2, 2019

brannondorsey commented Aug 2, 2019 • edited Loading

guptaNswati commented Aug 5, 2019

impala454 commented Sep 28, 2020

klueska commented Oct 5, 2020

guptaNswati commented Aug 2, 2019 •

edited

Loading

brannondorsey commented Aug 2, 2019 •

edited

Loading