Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Understanding the relationship between nvidia-docker-toolkit and nvidia-docker-runtime #1035

Closed
brannondorsey opened this issue Aug 2, 2019 · 6 comments

Comments

@brannondorsey
Copy link

Congrats on the new release of nvidia-container-runtime and deprecation of nvidia-docker2! We write software which allows users to run jobs on their own GPU hardware via NVIDIA Docker, and are trying to understand exactly what this change means for us.

Currently, we instruct users to download and install NVIDIA docker (nvidia-docker2), which in the past, made the nvidia runtime available to Docker. We would then use this runtime when starting containers on their machines. With the new update, however, the nvidia runtime is not registered with the user's docker engine, so we cannot rely on it. It would be trivial for us to instruct users to instead download nvidia-docker-runtime, and register the runtime manually, but I'm looking to gain insight on the best solution moving forward.

Is using the custom nvidia runtime still recommended, or is it preferred to use nvidia-docker-toolkit instead, using the native runc runtime and the new --gpus flag or HostConfig.DeviceRequests when creating containers via POST /containers/create in the API (CHANGELOG notes here)?

I realize this question is similar to one asked previously #815, however, it seems the details have changed a bit with the recent release. In general, we are looking to modify our software to find the best solution given these requirements:

  • Easy for novice users to install and get running. Ideally an apt-get install from a public repository, or simpler.
  • Is forward thinking and will be supported for a long time
  • Allows us to define which GPU devices to make available inside the containers
  • Has the possibility of being available on Windows or MacOS platforms now or in the future

Is nvidia-docker-toolkit (not nvidia-docker2, nvidia-docker-runtime, or some other solution) the preferred solution to accessing NVIDIA GPU hardware inside docker containers given these criteria?

@guptaNswati
Copy link
Contributor

guptaNswati commented Aug 2, 2019

Sorry for the confusion. With deprecating nvidia-docker2, we have renamed our packages and i understand it might be a bit confusing.

First, the renamed package nvidia-container-toolkit contains our library (libnvidia and the cli) plus our pre-start-hook. There is no nvidia modified runc to call the pre-start-hook anymore as docker is directly calling our pre-start-hook. For backward compatibility and supporting existing users, we have released nvidia-docker2 packages also to support both --gpu and --runtime options for different docker versions.

Docker versions
docker (>= 19.03) -> automatically calls into nvidia-container-runtime-hook.

No need to register the runtime thats what --gpu option is doing plus more. So if users have docker 19.03 they just need to do sudo apt-get install -y nvidia-container-toolkit. We handle everything from there.

Old docker versions docker (< 19.03):
-> either install nvidia-docker2 packges (no need to register the runtime)
-> or install the nvidia-container-runtime (you would need to register the runtime)

Check: https://github.com/NVIDIA/nvidia-docker/tree/master#upgrading-with-nvidia-docker2-deprecated

The recommended solution is to update to docker 19.03 and install nvidia-container-toolkit.

Hope this clears your confusion. Closing it for now.

@RenaudWasTaken
Copy link
Contributor

Hey that's a cool integration of nvidia-docker! Thanks for sharing that with us!

@brannondorsey
Copy link
Author

brannondorsey commented Aug 2, 2019

Hey there, thanks for the quick responses and kind words 😸

This is very helpful for us to understand how to move forward, so I appreciate you taking the time to provide guidance @guptaNswati. We create containers programmatically via the Docker engine API, and I assume that the most recent version of the API provides support for doing this with GPU devices as well. Is there any magic to be aware of with the --gpu option that can't be replicated by a POST /containers/create? Seems like HostConfig.DeviceRequests would be the way to go with the new approach, rather than specifying an nvidia runtime on creation. Are there any examples in the docs about the values the parameters with such a request should take on to share the GPU with the container?

In the past, we also detected the presence of nvidia-docker2 by checking if nvidia was an available runtime via a GET /info call (equivalent to docker info). With the new approach, what's the best way to determine if the user has nvidia-container-toolkit installed and correctly configured?

@guptaNswati
Copy link
Contributor

Hey we don't have any documentation on how to directly use the docker API for GPU support but take a look at this moby/moby@8f936ae and for specifying device and capabilities, you can pass a list of comma separated values.

@impala454
Copy link

The installation documentation still referes to the deprecated package. It would be nice for that to be updated to reflect the information in this issue. I was confused until I arrived here via google.

@klueska
Copy link
Contributor

klueska commented Oct 5, 2020

@impala454 Unfortunately nvidia-docker2 is not actually deprecated for all use-cases. Please see the following for a bit more clarification:

#1268 (comment)

There are plans to integrate this info into the official documentation, but it hasn't happened yet unfortunately.
/cc @dualvtable

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants