Skip to content

gw0/docker-debian-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

docker-debian-cuda

docker-debian-cuda is a minimal Docker image built from Debian 9 (amd64) with CUDA Toolkit and cuDNN using only Debian packages.

Although the vendor specific nvidia-docker tool can run CUDA inside Docker images, it performs the same thing in a less transparent way and is incompatible with other Docker tools. Instead of using yet another wrapper command, we explicitly expose GPU devices and inject the host's CUDA Driver library. The latest image starts from the official Debian image and follows the NVIDIA deb (network) instalation steps for Ubuntu 17.04.

Open source project:

Available tags (based on Debian 9/stretch and NVIDIA deb (network) installation without CUDA Driver):

  • latest points to 9.1_7.0
  • 9.1_7.0, 9.1.85-1_7.0.5.15-1, 9.0_7.0, 9.0.176-1_7.0.5.15-1 [2018-02-15]: CUDA Toolkit (9.1.85-1/9.0.176-1) + cuDNN (7.0.5.15-1) (Dockerfile)

Available tags (based on only Debian 9/stretch packages, also for Nvidia CUDA Toolkit):

  • 8.0.44-4_6.0.21-1_375.82-1, 8.0_6.0, 8.0.44-4_7.0.4.31-1_375.82-1, 8.0_7.0 [2017-12-01]: CUDA Toolkit (8.0.44-4) + cuDNN (6.0.21-1/7.0.4.31-1) + CUDA Driver (375.82-1) (Dockerfile)
  • 8.0.44-3_5.1.10-1_375.66-1, 8.0_5.1 [2017-05-31]: CUDA Toolkit (8.0.44-3) + cuDNN (5.1.10-1) + CUDA Driver (375.66-1) (Dockerfile)
  • 8.0.44-3_5.1.10-1_375.39-1 [2017-03-27]: CUDA Toolkit (8.0.44-3) + cuDNN (5.1.10-1) + CUDA Driver (375.39-1)
  • 8.0.44-2_5.1.5-1_375.20-4 [2016-12-21]: CUDA Toolkit (8.0.44-2) + cuDNN (5.1.5-1) + CUDA Driver (375.20-4)
  • 7.5.18-4_5.1.3_361.45.18-2, 7.5_5.1 [2016-09-19]: CUDA Toolkit (7.5.18-4) + cuDNN (5.1.3) + CUDA Driver (361.45.18-2)
  • 7.5.18-2 [2016-07-20]: CUDA Toolkit (7.5.18-2) + cuDNN (4.0.7) + CUDA Driver (352.79-8)

Usage

Host system requirements (eg. Debian 9 or similar Ubuntu):

  • GPU card with CUDA Compute Capability 3.5 or higher
  • NVIDIA Kernel Driver (nvidia-kernel-dkms)
  • CUDA Driver library (libcuda1, same version as NVIDIA Kernel Driver)
  • optionally nvidia-smi, nvidia-opencl-icd

To utilize your GPUs this Docker image needs access to your /dev/nvidia* devices and also the correct version of CUDA Driver, like:

$ docker run -it --rm $(ls /dev/nvidia* | xargs -I{} echo '--device={}') $(ls /usr/lib/x86_64-linux-gnu/{libcuda,libnvidia}* | xargs -I{} echo '-v {}:{}:ro') gw000/debian-cuda

The additional parameters in above command explicitly expose your GPU devices and CUDA Driver library from the host system into the container. The vendor specific nvidia-docker tool performs the same thing in a less transparent way and is incompatible with other Docker tools.

Host system

List of devices that should be present on the host system:

$ ll /dev/nvidia*
crw-rw---- 1 root video 250,   0 Jul 13 15:56 /dev/nvidia-uvm
crw-rw---- 1 root video 250,   1 Jul 13 15:56 /dev/nvidia-uvm-tools
crw-rw---- 1 root video 195,   0 Jul 13 15:56 /dev/nvidia0
crw-rw---- 1 root video 195, 255 Jul 13 15:56 /dev/nvidiactl

In case /dev/nvidia0 and /dev/nvidiactl are not present, ensure the kernel module nvidia is automatically loaded, properly configured, and there is a udev rule to create the devices:

$ echo 'nvidia' > /etc/modules-load.d/nvidia.conf
$ cat > /etc/udev/rules.d/70-nvidia.rules << __EOF__
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 0660 /dev/nvidia* && /bin/chgrp video /dev/nvidia*'"
__EOF__

For OpenCL support the devices /dev/nvidia-uvm and /dev/nvidia-uvm-tools are needed. Ensure the kernel module nvidia-uvm is automatically loaded, and add a custom udev rule to create the device:

$ echo 'nvidia-uvm' > /etc/modules-load.d/nvidia-uvm.conf
$ cat > /etc/udev/rules.d/70-nvidia-uvm.rules << __EOF__
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0660 /dev/nvidia-uvm* && /bin/chgrp video /dev/nvidia-uvm*'"
__EOF__

If you would like to monitor real-time temperatures on your host system use something like:

$ watch -n 5 'nvidia-smi; echo; sensors; for hdd in /dev/sd?; do echo -n "$hdd  "; smartctl -A $hdd | grep Temperature_Celsius; done'

In case your NVIDIA Kernel Driver and CUDA Driver versions differ an error appears in kernel messages (dmesg) or using nvidia-smi inside the container. Possible solutions:

  • upgrade your Nvidia kernel driver on the host directly from Debian 9 packages: nvidia-kernel-dkms, nvidia-alternative, libnvidia-ml1, nvidia-smi
  • upgrade your Nvidia kernel driver on the host by compiling it yourself
  • inject the correct version of CUDA Driver into the container as mentioned above (if it is installed on the host)

Decision against nvidia-docker

It is true, that Nvidia recommends to use their nvidia-docker command as part of their vendor lock-in strategy. In reality the nvidia-docker is nothing more than a fancy wrapper that runs the docker command with the additional parameters to mount the device and host libraries into the container. Latest Docker introduced runtimes and nvidia-container-runtime should make things work, but unfortunately this is not supported by docker-compose and other tools.

Pros for nvidia-docker tool:

  • shorter command (no need to remember those additional parameters)

Cons for nvidia-docker tool:

  • yet another tool that administrators need to learn (why bother administrators to learn anything more than docker run?)
  • less transparent what is being executed (some believe some "black magic" happens behind nvidia-docker that handles 2 instances on same GPU better, although it works exactly the same)
  • not possible to use with docker-compose and other tools for managing Docker containers
  • only Nvidia GPUs are supported (what if someone would want to use a GPU from another vendor? or a FPGA device?)
  • no support for OpenCL
  • vendor lock-in

Feedback

If you encounter any bugs or have feature requests, please file them in the issue tracker or even develop it yourself and submit a pull request over GitHub.

License

Copyright © 2016-2018 gw0 [http://gw.tnode.com/] <gw.2018@ena.one>

All code is licensed under the GNU Affero General Public License 3.0+ (AGPL-3.0+). Note that it is mandatory to make all modifications and complete source code publicly available to any user.