Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Only works once #34

Closed
andyneff opened this issue Jan 13, 2016 · 11 comments
Closed

Only works once #34

andyneff opened this issue Jan 13, 2016 · 11 comments
Labels

Comments

@andyneff
Copy link

I'm not very familiar with docker volume, but it appears to be ONLY good for one use.

  1. sudo ./nvidia-docker volume setup

    nvidia_driver_352.55
    
  2. docker volume ls

    DRIVER              VOLUME NAME
    local               nvidia_driver_352.55
    
  3. ./nvidia-docker run --rm nvidia/cuda nvidia-smi

    +------------------------------------------------------+                       
    | NVIDIA-SMI 352.55     Driver Version: 352.55         |                       
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 680     Off  | 0000:01:00.0     N/A |                  N/A |
    | 34%   54C    P8    N/A /  N/A |    653MiB /  4093MiB |     N/A      Default |
    +-------------------------------+----------------------+----------------------+
    |   1  GeForce GTX 580     Off  | 0000:02:00.0     N/A |                  N/A |
    | 46%   54C   P12    N/A /  N/A |      7MiB /  3071MiB |     N/A      Default |
    +-------------------------------+----------------------+----------------------+
    |   2  Tesla K20c          Off  | 0000:03:00.0     Off |                  Off |
    | 37%   49C    P0    48W / 225W |     96MiB /  5119MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID  Type  Process name                               Usage      |
    |=============================================================================|
    |    0                  Not Supported                                         |
    |    1                  Not Supported                                         |
    +-----------------------------------------------------------------------------+
    
  4. ./nvidia-docker run --rm nvidia/cuda nvidia-smi

    Error response from daemon: Error looking up volume plugin nvidia-docker: Plugin not found
    
  5. docker volume ls

    DRIVER              VOLUME NAME
    

Tested on Ubuntu 14.04 running Docker 1.91 and Centos 7 running docker 1.9.0

It just seems like if sudo nvidia-docker volume setup is in the "Initial setup" section, than it shouldn't need to be run every time I create a new container, or am I missing something?

@3XX0
Copy link
Member

3XX0 commented Jan 14, 2016

Yes this is one of our limitations documented here
This is due to --rm removing the volumes attached to a container (equivalent to docker rm -v).
Here is the corresponding Docker issue: moby/moby#17907

A workaround would be to change volume setup to create a data container referencing the volume.
I'm not thrilled by this solution though...

@andyneff
Copy link
Author

I'm actually a little bit of a fan of the data container idea.

  1. I'm not 100% sure how the volumes work, are they a mounts actually to /var/lib/docker/volumes/foo/_data, or more mount magic... But if you are doing hard links in there, that suggests to me it's more of a normal directory. At any rate, I remember hearing from a security stand point, it's better to rely on a data container than direct mounting to your host devices. Some people may care about that
  2. If the driver files are copied to a data container, that should alleviate this
  3. You no longer need root to set it up, you just need docker group permissions.

I was already playing with this idea when you mentioned it, it seems to work well to me :)

I used a Makefile with

install:
        docker build -t nvidia_driver -f Dockerfile_nvidia_driver .
        if docker inspect nvidia_driver_${NVIDIA_VERSION} > /dev/null 2>&1; then \
          docker rm nvidia_driver_${NVIDIA_VERSION}; \
        fi
        docker run -v /usr/bin:/hostbin:ro -v /usr/lib64/nvidia:/hostlib64 --name nvidia_driver_${NVIDIA_VERSION} nvidia_driver

run:
        docker run -it --rm \
                   --volumes-from nvidia_driver_${NVIDIA_VERSION}:ro \
                   $$(ls /dev/nvidia* | sed 's|^|--device |') \
                   cuda_example

And a Dockerfile_nvidia_driver of

FROM centos:7

VOLUME /usr/local/nvidia

CMD mkdir -p /usr/local/nvidia/bin && \
    cp -a /hostbin/nvidia* /usr/local/nvidia/bin/ && \
    cp -ra /hostlib64 /usr/local/nvidia/lib64

Sorry it's a little messy, but it was just a quick poc to prove to myself it would work

@3XX0
Copy link
Member

3XX0 commented Jan 14, 2016

Data containers and volumes are exactly the same thing under the hood. Using volumes directly makes more sense because that's where Docker is headed with persistent volumes and the new volume CLI. It also keeps things unified between the standalone version and the plugin version (i.e. nvidia-docker standalone uses a local driver).

Creating an image and a container for the sake of having a volume referenced is not ideal. Besides, you still have to make sure that the container is not deleted.

This, really is a Docker issue and will be fixed upstream eventually. In the meantime I suggest you run your container without --rm, or use nvidia-docker-plugin.
If you really want to lock the volume with a data container, it's just a matter of doing:

volume="$(sudo nvidia-docker volume setup)"
nvidia-docker create --name=LOCK -v $volume:/data:ro tianon/true
nvidia-docker run --rm nvidia/cuda nvidia-smi
nvidia-docker run --rm nvidia/cuda nvidia-smi

Regarding copy vs hardlink, we chose to do so to keep the ecosystem as light as possible. Copying around MB of driver files in order to launch a container is not an option.

@3XX0 3XX0 added the wontfix label Jan 18, 2016
@3XX0
Copy link
Member

3XX0 commented Jan 18, 2016

Closing since it's an issue with upstream Docker.
I updated the documentation accordingly.

@3XX0 3XX0 closed this as completed Jan 18, 2016
@3XX0
Copy link
Member

3XX0 commented Feb 5, 2016

Fixed in Docker 1.10, the documentation has been updated.

@orian
Copy link

orian commented Apr 13, 2016

Just a notice, I've run the install instruction from README and tried to test, it failed with error:

docker: Error response from daemon: create nvidia_driver_361.28: create nvidia_driver_361.28: Error looking up volume plugin nvidia-docker: plugin not found.

The solution was:

sudo ./nvidia-docker volume setup

Installed version: nvidia-docker_1.0.0.beta.3-1_amd64.deb

@3XX0
Copy link
Member

3XX0 commented Apr 13, 2016

Are you running Ubuntu? If so, can you show me the output of:

cat /var/log/upstart/nvidia-docker.log

@guoquan
Copy link

guoquan commented Apr 25, 2016

hi @3XX0, similar problem as @orian.
I am running Ubuntu 14.04.
Install follows the instructions on the wiki

# Install nvidia-docker and nvidia-docker-plugin
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-beta.3/nvidia-docker_1.0.0.beta.3-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker_1.0.0.beta.3-1_amd64.deb && rm /tmp/nvidia-docker*.deb

and when I test it,

# Test nvidia-smi
nvidia-docker run --rm nvidia/cuda nvidia-smi

it give the the error (which take me to this issue)

Error response from daemon: Error looking up volume plugin nvidia-docker: Plugin Error: Plugin.Activate, 400 Bad Request: malformed Host header

My nvidia-docker.log looks like this

$ sudo cat /var/log/upstart/nvidia-docker.log
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:41 Loading NVIDIA management library
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:41 Loading NVIDIA unified memory
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:41 Discovering GPU devices
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:43 Provisioning volumes at /var/lib/nvidia-docker/volumes
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:43 Serving plugin API at /var/lib/nvidia-docker
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:43 Serving remote API at localhost:3476

@3XX0
Copy link
Member

3XX0 commented Apr 25, 2016

@guoquan see #83

@oneklc
Copy link

oneklc commented Sep 7, 2016

I had this error after upgrading my nvida-driver to the latest version (wanted to use cuda 8):
"nvidia-docker run --rm nvidia/cuda nvidia-smi

^[[Adocker: Error response from daemon: create nvidia_driver_367.44: create nvidia_driver_367.44: Error looking up volume plugin nvidia-docker: plugin not found.
See 'docker run --help'.
."

Running on centos 7.
After a reboot, upgrading docker and nvida-docker-plugin and another reboot i realised that the plugin wasn't running.

sudo systemctl start nvidia-docker

fixed my issues

@abelatnvidia
Copy link

Running on AWS ami linux using nvidia-docker fails to initially launch container nvidia/cuda:7.5-devel

nvidia-docker run --rm nvidia/cuda:7.5-devel nvidia-smi
Error response from daemon: create nvidia_driver_352.99: Post http://%2Frun%2Fdocker%2Fplugins%2Fnvidia-docker.sock/VolumeDriver.Create: http: ContentLength=44 with Body length 0.

nvidia-docker volume ls
DRIVER VOLUME NAME
nvidia-docker nvidia_driver_352.99

Then when I try to launch the container again it succeeds.

Currently using docker version 1.11.2, build b9f10c9/1.11.2

cat /tmp/nvidia-docker.log
nvidia-docker-plugin | 2016/11/09 23:25:35 Loading NVIDIA unified memory
nvidia-docker-plugin | 2016/11/09 23:25:36 Loading NVIDIA management library
nvidia-docker-plugin | 2016/11/09 23:25:36 Discovering GPU devices
nvidia-docker-plugin | 2016/11/09 23:25:40 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2016/11/09 23:25:40 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2016/11/09 23:25:40 Serving remote API at localhost:3476
nvidia-docker-plugin | 2016/11/09 23:32:06 Received activate request
nvidia-docker-plugin | 2016/11/09 23:32:06 Plugins activated [VolumeDriver]
nvidia-docker-plugin | 2016/11/09 23:32:07 Received create request for volume 'nvidia_driver_352.99'

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants