Use nvidia-smi in Dockerfile #225

Josca · 2016-10-20T11:15:48Z

Hello,

I would like to call nvidia-smi in Dockerfile, but docker building fails. My Dockerfile:
FROM nvidia/cuda:7.5-cudnn5-devel
RUN nvidia-smi
CMD /bin/bash

I am using building command: nvidia-docker build -t gpu ., but error message is displayed:
/bin/sh: 1: nvidia-smi: not found

When I build another docker image based on nvidia/cuda:7.5-cudnn5-devel and run container using such image, command nvidia-smi works. It seems nvidia GPU and its libraries are not available during docker image building.

Could you help me?

flx42 · 2016-10-20T16:41:12Z

It seems nvidia GPU and its libraries are not available during docker image building.

This is correct, the driver files (libraries and binaries) are mounted from the host (using a Docker volume) when the container is started.
When doing a docker build, there is a limited set of options for the build environment: you can't import devices, you can't change the network setting.
Note that nvidia-docker is passthrough to docker for docker build, this is documented here

But this shouldn't be an issue, you don't need to actually a GPU in your system in order to compile CUDA code. You can install the nvcc toolchain on any machine and compile your code, and then during deployment you do need a machine with a GPU and you use nvidia-docker.

flx42 · 2016-10-31T17:27:15Z

@Josca: does that answer your question, can we close this?

tiagoshibata · 2016-11-08T22:43:25Z

Hi,

I've had problems with missing libraries/directories during a build and arrived at this issue. My exact issue is that I ran a ldconfig during a build and NVIDIA's libraries didn't make it to the loader cache because their directories aren't mounted at build time. Steps to reproduce:

Minimal Dockerfile:

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
RUN ldconfig -v | grep nvidia || true

Output of nvidia-docker build .:

Sending build context to Docker daemon 2.048 kB
Step 1 : FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
 ---> 0e44f0afa846
Step 2 : RUN ldconfig -v | grep nvidia || true
 ---> Running in 2548bd9799b6
/sbin/ldconfig.real: Can't stat /usr/local/cuda/lib: No such file or directory
/sbin/ldconfig.real: Path `/usr/local/cuda/lib64' given more than once
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib64: No such file or directory
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring

 ---> 4e3b5f01e1cb
Removing intermediate container 2548bd9799b6
Successfully built 4e3b5f01e1cb

Libraries are found when ldconfig -v | grep nvidia is executed in a running container:

root@c6a487836b23:/# ldconfig -v | grep nvidia
/sbin/ldconfig.real: Can't stat /usr/local/cuda/lib: No such file or directory
/sbin/ldconfig.real: Path `/usr/local/cuda/lib64' given more than once
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/usr/local/nvidia/lib:
        libnvidia-ptxjitcompiler.so.375.10 -> libnvidia-ptxjitcompiler.so.375.10
        libnvidia-eglcore.so.375.10 -> libnvidia-eglcore.so.375.10
        libnvidia-ml.so.1 -> libnvidia-ml.so.375.10
        libnvidia-fatbinaryloader.so.375.10 -> libnvidia-fatbinaryloader.so.375.10
        libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.375.10
        libnvidia-tls.so.375.10 -> libnvidia-tls.so.375.10
        libnvidia-fbc.so.1 -> libnvidia-fbc.so.375.10
        libnvidia-glcore.so.375.10 -> libnvidia-glcore.so.375.10
        libEGL_nvidia.so.0 -> libEGL_nvidia.so.375.10
        libnvidia-encode.so.1 -> libnvidia-encode.so.375.10
        libGLX_nvidia.so.0 -> libGLX_nvidia.so.375.10
        libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.375.10
        libnvidia-ifr.so.1 -> libnvidia-ifr.so.375.10
        libnvidia-glsi.so.375.10 -> libnvidia-glsi.so.375.10
/usr/local/nvidia/lib64:
        libnvidia-ptxjitcompiler.so.375.10 -> libnvidia-ptxjitcompiler.so.375.10
        libnvidia-eglcore.so.375.10 -> libnvidia-eglcore.so.375.10
        libnvidia-compiler.so.375.10 -> libnvidia-compiler.so.375.10
        libnvidia-ml.so.1 -> libnvidia-ml.so.375.10
        libnvidia-fatbinaryloader.so.375.10 -> libnvidia-fatbinaryloader.so.375.10
        libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.375.10
        libnvidia-tls.so.375.10 -> libnvidia-tls.so.375.10
        libnvidia-fbc.so.1 -> libnvidia-fbc.so.375.10
        libnvidia-glcore.so.375.10 -> libnvidia-glcore.so.375.10
        libnvidia-opencl.so.1 -> libnvidia-opencl.so.375.10
        libEGL_nvidia.so.0 -> libEGL_nvidia.so.375.10
        libnvidia-encode.so.1 -> libnvidia-encode.so.375.10
        libGLX_nvidia.so.0 -> libGLX_nvidia.so.375.10
        libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.375.10
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring

        libnvidia-ifr.so.1 -> libnvidia-ifr.so.375.10
        libnvidia-glsi.so.375.10 -> libnvidia-glsi.so.375.10

Furthermore, there are a few minor annoyances, such as a Automatic GPU detection failed. Building for all known architectures message when using CMake with CUDA.

@flx42, Is there a technical limitation disallowing NVIDIA and CUDA related directories being mounted at build time? IMHO it would be better to have them if possible.

flx42 · 2016-11-08T23:11:11Z

Yes, there are some minor annoyances, but it's better this way. You want the build to be reproducible, and you don't need a GPU or driver files to compile code. You're not supposed to execute computations or tests in a docker build, GPU or CPU.
Yes, there are technical limitations, you can't mount volumes or devices at build time.

tiagoshibata · 2016-11-08T23:50:19Z

OK, thanks for the fast response.

lsb · 2016-11-08T23:52:42Z

We at Graphistry have found that we need various libraries in the environment path when running GPU code (specifically, running via OpenCL) in nvidia-docker.

We've been learning Graphistry and docker as we go, so we're always open to suggestions, but what we've done is to add the paths to libraries in our environment, to compensate for the shell not having these by default.

Our https://hub.docker.com/r/graphistry/gpu-base/ stock GPU container is built with an environment that gets amended via https://github.com/graphistry/infrastructure/blob/master/container-images/gpu-base/Dockerfile#L10 (this is how I'd do it now, as I've since learned), but that's how we find things work in production for us. Suggestions welcome :)

flx42 · 2016-11-09T00:07:23Z

@lsb I don't think I follow, this thread is about the driver libraries at build time, not the CUDA binaries/libraries. The CUDA toolkit is always present at build time, and you don't need a GPU or the NVIDIA driver to compile/build.

By the way, we already set those variables in the environment/ld.cache:
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/devel/Dockerfile#L27
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/Dockerfile#L29-L31
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/Dockerfile#L36

lsb · 2016-11-10T00:09:27Z

Ah, thank you

See NVIDIA/nvidia-docker#225

flx42 added the work as intended label Oct 31, 2016

flx42 mentioned this issue Nov 4, 2016

Problem: caffe and nvidia/cuda:8.0 vs nvidia/cuda:7.5 #236

Closed

flx42 closed this as completed Nov 4, 2016

tiagoshibata added a commit to tiagoshibata/container-deploy that referenced this issue Nov 15, 2016

Update ld cache at every run to match host libraries

d6da1ae

See NVIDIA/nvidia-docker#225

brandonfranzke mentioned this issue Sep 21, 2017

add ENV variable to flag build with CUDA (instead of only "torch.cuda… ruotianluo/pytorch-faster-rcnn#15

Open

lham mentioned this issue Feb 1, 2019

Docker Runtime Error: Not Compiled with GPU support facebookresearch/maskrcnn-benchmark#167

Open

mnicely mentioned this issue Oct 8, 2020

Fix build error rapidsai/cusignal#264

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use nvidia-smi in Dockerfile #225

Use nvidia-smi in Dockerfile #225

Josca commented Oct 20, 2016 •

edited

flx42 commented Oct 20, 2016 •

edited

flx42 commented Oct 31, 2016

tiagoshibata commented Nov 8, 2016

flx42 commented Nov 8, 2016 •

edited

tiagoshibata commented Nov 8, 2016

lsb commented Nov 8, 2016

flx42 commented Nov 9, 2016

lsb commented Nov 10, 2016

Use nvidia-smi in Dockerfile #225

Use nvidia-smi in Dockerfile #225

Comments

Josca commented Oct 20, 2016 • edited

flx42 commented Oct 20, 2016 • edited

flx42 commented Oct 31, 2016

tiagoshibata commented Nov 8, 2016

flx42 commented Nov 8, 2016 • edited

tiagoshibata commented Nov 8, 2016

lsb commented Nov 8, 2016

flx42 commented Nov 9, 2016

lsb commented Nov 10, 2016

Josca commented Oct 20, 2016 •

edited

flx42 commented Oct 20, 2016 •

edited

flx42 commented Nov 8, 2016 •

edited