Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Use nvidia-smi in Dockerfile #225

Closed
Josca opened this issue Oct 20, 2016 · 8 comments
Closed

Use nvidia-smi in Dockerfile #225

Josca opened this issue Oct 20, 2016 · 8 comments

Comments

@Josca
Copy link

Josca commented Oct 20, 2016

Hello,

I would like to call nvidia-smi in Dockerfile, but docker building fails. My Dockerfile:
FROM nvidia/cuda:7.5-cudnn5-devel
RUN nvidia-smi
CMD /bin/bash

I am using building command: nvidia-docker build -t gpu ., but error message is displayed:
/bin/sh: 1: nvidia-smi: not found

When I build another docker image based on nvidia/cuda:7.5-cudnn5-devel and run container using such image, command nvidia-smi works. It seems nvidia GPU and its libraries are not available during docker image building.

Could you help me?

@flx42
Copy link
Member

flx42 commented Oct 20, 2016

It seems nvidia GPU and its libraries are not available during docker image building.

This is correct, the driver files (libraries and binaries) are mounted from the host (using a Docker volume) when the container is started.
When doing a docker build, there is a limited set of options for the build environment: you can't import devices, you can't change the network setting.
Note that nvidia-docker is passthrough to docker for docker build, this is documented here

But this shouldn't be an issue, you don't need to actually a GPU in your system in order to compile CUDA code. You can install the nvcc toolchain on any machine and compile your code, and then during deployment you do need a machine with a GPU and you use nvidia-docker.

@flx42
Copy link
Member

flx42 commented Oct 31, 2016

@Josca: does that answer your question, can we close this?

@tiagoshibata
Copy link

Hi,

I've had problems with missing libraries/directories during a build and arrived at this issue. My exact issue is that I ran a ldconfig during a build and NVIDIA's libraries didn't make it to the loader cache because their directories aren't mounted at build time. Steps to reproduce:

Minimal Dockerfile:

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
RUN ldconfig -v | grep nvidia || true

Output of nvidia-docker build .:

Sending build context to Docker daemon 2.048 kB
Step 1 : FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
 ---> 0e44f0afa846
Step 2 : RUN ldconfig -v | grep nvidia || true
 ---> Running in 2548bd9799b6
/sbin/ldconfig.real: Can't stat /usr/local/cuda/lib: No such file or directory
/sbin/ldconfig.real: Path `/usr/local/cuda/lib64' given more than once
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib64: No such file or directory
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring

 ---> 4e3b5f01e1cb
Removing intermediate container 2548bd9799b6
Successfully built 4e3b5f01e1cb

Libraries are found when ldconfig -v | grep nvidia is executed in a running container:

root@c6a487836b23:/# ldconfig -v | grep nvidia
/sbin/ldconfig.real: Can't stat /usr/local/cuda/lib: No such file or directory
/sbin/ldconfig.real: Path `/usr/local/cuda/lib64' given more than once
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/usr/local/nvidia/lib:
        libnvidia-ptxjitcompiler.so.375.10 -> libnvidia-ptxjitcompiler.so.375.10
        libnvidia-eglcore.so.375.10 -> libnvidia-eglcore.so.375.10
        libnvidia-ml.so.1 -> libnvidia-ml.so.375.10
        libnvidia-fatbinaryloader.so.375.10 -> libnvidia-fatbinaryloader.so.375.10
        libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.375.10
        libnvidia-tls.so.375.10 -> libnvidia-tls.so.375.10
        libnvidia-fbc.so.1 -> libnvidia-fbc.so.375.10
        libnvidia-glcore.so.375.10 -> libnvidia-glcore.so.375.10
        libEGL_nvidia.so.0 -> libEGL_nvidia.so.375.10
        libnvidia-encode.so.1 -> libnvidia-encode.so.375.10
        libGLX_nvidia.so.0 -> libGLX_nvidia.so.375.10
        libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.375.10
        libnvidia-ifr.so.1 -> libnvidia-ifr.so.375.10
        libnvidia-glsi.so.375.10 -> libnvidia-glsi.so.375.10
/usr/local/nvidia/lib64:
        libnvidia-ptxjitcompiler.so.375.10 -> libnvidia-ptxjitcompiler.so.375.10
        libnvidia-eglcore.so.375.10 -> libnvidia-eglcore.so.375.10
        libnvidia-compiler.so.375.10 -> libnvidia-compiler.so.375.10
        libnvidia-ml.so.1 -> libnvidia-ml.so.375.10
        libnvidia-fatbinaryloader.so.375.10 -> libnvidia-fatbinaryloader.so.375.10
        libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.375.10
        libnvidia-tls.so.375.10 -> libnvidia-tls.so.375.10
        libnvidia-fbc.so.1 -> libnvidia-fbc.so.375.10
        libnvidia-glcore.so.375.10 -> libnvidia-glcore.so.375.10
        libnvidia-opencl.so.1 -> libnvidia-opencl.so.375.10
        libEGL_nvidia.so.0 -> libEGL_nvidia.so.375.10
        libnvidia-encode.so.1 -> libnvidia-encode.so.375.10
        libGLX_nvidia.so.0 -> libGLX_nvidia.so.375.10
        libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.375.10
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring

        libnvidia-ifr.so.1 -> libnvidia-ifr.so.375.10
        libnvidia-glsi.so.375.10 -> libnvidia-glsi.so.375.10

Furthermore, there are a few minor annoyances, such as a Automatic GPU detection failed. Building for all known architectures message when using CMake with CUDA.

@flx42, Is there a technical limitation disallowing NVIDIA and CUDA related directories being mounted at build time? IMHO it would be better to have them if possible.

@flx42
Copy link
Member

flx42 commented Nov 8, 2016

Yes, there are some minor annoyances, but it's better this way. You want the build to be reproducible, and you don't need a GPU or driver files to compile code. You're not supposed to execute computations or tests in a docker build, GPU or CPU.
Yes, there are technical limitations, you can't mount volumes or devices at build time.

@tiagoshibata
Copy link

OK, thanks for the fast response.

@lsb
Copy link

lsb commented Nov 8, 2016

We at Graphistry have found that we need various libraries in the environment path when running GPU code (specifically, running via OpenCL) in nvidia-docker.

We've been learning Graphistry and docker as we go, so we're always open to suggestions, but what we've done is to add the paths to libraries in our environment, to compensate for the shell not having these by default.

Our https://hub.docker.com/r/graphistry/gpu-base/ stock GPU container is built with an environment that gets amended via https://github.com/graphistry/infrastructure/blob/master/container-images/gpu-base/Dockerfile#L10 (this is how I'd do it now, as I've since learned), but that's how we find things work in production for us. Suggestions welcome :)

@flx42
Copy link
Member

flx42 commented Nov 9, 2016

@lsb I don't think I follow, this thread is about the driver libraries at build time, not the CUDA binaries/libraries. The CUDA toolkit is always present at build time, and you don't need a GPU or the NVIDIA driver to compile/build.

By the way, we already set those variables in the environment/ld.cache:
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/devel/Dockerfile#L27
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/Dockerfile#L29-L31
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/Dockerfile#L36

@lsb
Copy link

lsb commented Nov 10, 2016

Ah, thank you

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants