Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

dial unix /var/lib/nvidia-docker/nvidia-docker.sock: connect: no such file or directory. #105

Closed
bwkchee opened this issue Jun 6, 2016 · 18 comments

Comments

@bwkchee
Copy link

bwkchee commented Jun 6, 2016

I'm currently running 14.04; Docker version 1.11.2, build b9f10c9
Docker is running fine however I can't install nvidia-docker via the Debian package:

sudo dpkg -i /tmp/nvidia-docker*.deb
Selecting previously unselected package nvidia-docker.
(Reading database ... 358563 files and directories currently installed.)
Preparing to unpack .../nvidia-docker_1.0.0.rc.2-1_amd64.deb ...
Unpacking nvidia-docker (1.0.0rc.2-1) ...
Setting up nvidia-docker (1.0.0
rc.2-1) ...
Configuring user and permissions ...
start: Job failed to start
invoke-rc.d: initscript nvidia-docker, action "start" failed.
dpkg: error processing package nvidia-docker (--install):
subprocess installed post-installation script returned error exit status 1
Processing triggers for ureadahead (0.100.0-16) ...
Errors were encountered while processing:
nvidia-docker

The plugin looks like it's installed. I can execute the command nvidia-docker, however if I execute: nvidia-docker run --rm nvidia/cuda nvidia-smi

The processing hangs for a while:
docker: Error response from daemon: create nvidia_driver_352.63: Post http://%2Fvar%2Flib%2Fnvidia-docker%2Fnvidia-docker.sock/VolumeDriver.Create: dial unix /var/lib/nvidia-docker/nvidia-docker.sock: connect: no such file or directory.

Dmesg outputs:
[ 2044.961876] init: nvidia-docker main process (10280) terminated with status 1
[ 2044.961894] init: nvidia-docker main process ended, respawning
[ 2044.973468] init: nvidia-docker main process (10289) terminated with status 1
[ 2044.973479] init: nvidia-docker respawning too fast, stopped

/var/log/upstart/docker.log:
time="2016-06-06T09:12:27.675607031-04:00" level=error msg="Handler for GET /v1.23/volumes/nvidia_driver_352.63 returned error: get nvidia_driver_352.63: no such volume"
time="2016-06-06T09:12:27.737727854-04:00" level=warning msg="Unable to connect to plugin: /var/lib/nvidia-docker/nvidia-docker.sock:/VolumeDriver.Get, retrying in 1s"
time="2016-06-06T09:12:28.738114122-04:00" level=warning msg="Unable to connect to plugin: /var/lib/nvidia-docker/nvidia-docker.sock:/VolumeDriver.Get, retrying in 2s"
time="2016-06-06T09:12:30.738467301-04:00" level=warning msg="Unable to connect to plugin: /var/lib/nvidia-docker/nvidia-docker.sock:/VolumeDriver.Get, retrying in 4s"
time="2016-06-06T09:12:34.738802352-04:00" level=warning msg="Unable to connect to plugin: /var/lib/nvidia-docker/nvidia-docker.sock:/VolumeDriver.Get, retrying in 8s"
time="2016-06-06T09:12:42.739232247-04:00" level=warning msg="Unable to connect to plugin: /var/lib/nvidia-docker/nvidia-docker.sock:/VolumeDriver.Create, retrying in 1s"
time="2016-06-06T09:12:43.739571764-04:00" level=warning msg="Unable to connect to plugin: /var/lib/nvidia-docker/nvidia-docker.sock:/VolumeDriver.Create, retrying in 2s"
time="2016-06-06T09:12:45.739925707-04:00" level=warning msg="Unable to connect to plugin: /var/lib/nvidia-docker/nvidia-docker.sock:/VolumeDriver.Create, retrying in 4s"
time="2016-06-06T09:12:49.740278324-04:00" level=warning msg="Unable to connect to plugin: /var/lib/nvidia-docker/nvidia-docker.sock:/VolumeDriver.Create, retrying in 8s"
time="2016-06-06T09:12:57.742093178-04:00" level=error msg="Handler for POST /v1.23/containers/create returned error: create nvidia_driver_352.63: Post http://%2Fvar%2Flib%2Fnvidia-docker%2Fnvidia-docker.sock/VolumeDriver.Create: dial unix /var/lib/nvidia-docker/nvidia-docker.sock: connect: no such file or directory"

@3XX0
Copy link
Member

3XX0 commented Jun 6, 2016

What about /var/log/upstart/nvidia-docker.log?

@3XX0
Copy link
Member

3XX0 commented Jun 14, 2016

Any update? can we close this?

@GBJim
Copy link

GBJim commented Jun 15, 2016

Hi, I have exact same error message when I run
nvidia-docker run --rm nvidia/cuda nvidia-smi
I'm under Ubuntu 14.04; Docker version 1.11.1.
This is my last few lines of my /var/log/upstart/nvidia-docker.log:

/usr/bin/nvidia-docker-plugin | 2016/06/15 16:30:11 Successfully terminated
start-stop-daemon: user 'nvidia-docker' not found
start-stop-daemon: user 'nvidia-docker' not found
start-stop-daemon: user 'nvidia-docker' not found
start-stop-daemon: user 'nvidia-docker' not found
start-stop-daemon: user 'nvidia-docker' not found

@3XX0
Copy link
Member

3XX0 commented Jun 15, 2016

@GBJim you might have upgraded from an old (buggy) package.
Create the user manually and it should work:

sudo useradd -r -M -d /var/lib/nvidia-docker -s /usr/sbin/nologin nvidia-docker
sudo restart nvidia-docker

@GBJim
Copy link

GBJim commented Jun 16, 2016

Hi @3XX0:
Before read your last post, I upgraded my Nvidia driver and docker.
Driver:352.93 Docker:1.11.2
Now when I run
nvidia-docker run --rm nvidia/cuda nvidia-smi
I receive a different error:

docker: Error response from daemon: create nvidia_driver_352.93: VolumeDriver.Create: internal error, check logs for details.
See 'docker run --help'.

My /var/log/upstart/nvidia-docker.log looks like this:

/usr/bin/nvidia-docker-plugin | 2016/06/16 10:35:00 Received create request for volume 'nvidia_driver_352.93'
/usr/bin/nvidia-docker-plugin | 2016/06/16 10:35:00 Error: mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/352.93: permission denied
/usr/bin/nvidia-docker-plugin | 2016/06/16 10:36:12 Received create request for volume 'nvidia_driver_352.93'
/usr/bin/nvidia-docker-plugin | 2016/06/16 10:36:12 Error: mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/352.93: permission denied
/usr/bin/nvidia-docker-plugin | 2016/06/16 10:39:38 Received create request for volume 'nvidia_driver_352.93'
/usr/bin/nvidia-docker-plugin | 2016/06/16 10:39:38 Error: mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/352.93: permission denied

@GBJim
Copy link

GBJim commented Jun 16, 2016

@3XX0
I manually run sudo mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/352.93
Not I can run nvidia-docker successfully :)

@GBJim
Copy link

GBJim commented Jun 16, 2016

Now I have new problem encountered:
#113

@3XX0
Copy link
Member

3XX0 commented Jun 16, 2016

Your first setup has been created with root privileges while the package install everything as non-root.
Best way is to reinstall everything properly:

sudo apt-get remove --purge nvidia-docker
sudo rm -rf /var/lib/nvidia-docker

wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.2/nvidia-docker_1.0.0.rc.2-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb

@GBJim
Copy link

GBJim commented Jun 16, 2016

I try to remove the nvidia-docker as you said.
sudo apt-get remove --purge nvidia-docker
I get the following message and unable to uninstall it

Error response from daemon: Unable to remove volume, volume still in use: remove nvidia_driver_352.93: volume is in use - [18e96e4ab5f25044add3ae127e128e07cd78924505b26623adc28c0dcfe2b642, 81993490e8d874625bb749c196e3f75a26dcf1b62b3c4e3b45fd74d2451b4f73, f207f22788ac866cb7044b56d3ae5d33b3fde631faa385053105070083a92916, 34bfb612d82b2036d5f00e0facfeb048a4837556229b0c44b8b039ca9eb51524, 91d7ab223e882134718c4879986a107c9a4170b0f2f35f130d6a25a342f18715, d8b64a40a09c8e72a762a8980dcddf02c207ac904cdf3c61b2d63843a662742c, 08caa109c818602d321481c29d6fe9277b6e4c48ec466654a62f8528be947302, c0b8eaa6b9efcd08117f03a013b6a826ee59bc544e35fc06e5268eaaa8120b01, 61d75962d5d6b068bd16ca639ade144aa8139f66d9349f1e315c78defbca180b, 2794f34ba0e31c5b193a306181ee8831b7296cc1cef39a4f0c9e2a1132c8fe68, 3389a7da8dd028627f00b9f394a5c94dabd9c518bc48a97bbb82397ecba64b2b, aa934370bdae733a670af8384631832f605f896b43e1b7646753744ed9f5f56b, 622c8145fdb7273d71e19547113ee164a28261776cb4fe65d7e4445326009614]
dpkg: error processing package nvidia-docker (--purge):
subprocess installed pre-removal script returned error exit status 123
Errors were encountered while processing:
nvidia-docker

@3XX0
Copy link
Member

3XX0 commented Jun 16, 2016

You have running/stopped containers that depend on nvidia-docker, you need to remove them first.
You can check their state with docker ps -a
To remove them:

docker rm -f \
18e96e4ab5f25044add3ae127e128e07cd78924505b26623adc28c0dcfe2b642 \
81993490e8d874625bb749c196e3f75a26dcf1b62b3c4e3b45fd74d2451b4f73 \
f207f22788ac866cb7044b56d3ae5d33b3fde631faa385053105070083a92916 \
34bfb612d82b2036d5f00e0facfeb048a4837556229b0c44b8b039ca9eb51524 \
91d7ab223e882134718c4879986a107c9a4170b0f2f35f130d6a25a342f18715 \
d8b64a40a09c8e72a762a8980dcddf02c207ac904cdf3c61b2d63843a662742c \
08caa109c818602d321481c29d6fe9277b6e4c48ec466654a62f8528be947302 \
c0b8eaa6b9efcd08117f03a013b6a826ee59bc544e35fc06e5268eaaa8120b01 \
61d75962d5d6b068bd16ca639ade144aa8139f66d9349f1e315c78defbca180b \
2794f34ba0e31c5b193a306181ee8831b7296cc1cef39a4f0c9e2a1132c8fe68 \
3389a7da8dd028627f00b9f394a5c94dabd9c518bc48a97bbb82397ecba64b2b \
aa934370bdae733a670af8384631832f605f896b43e1b7646753744ed9f5f56b \
622c8145fdb7273d71e19547113ee164a28261776cb4fe65d7e4445326009614

@GBJim
Copy link

GBJim commented Jun 16, 2016

I've deleted all volumes in use.
But I still get this error when I try to purge it:

Removing nvidia-docker (1.0.0~rc.2-1) ...
Purging NVIDIA volumes ...
Error response from daemon: Error while removing volume nvidia_driver_352.39: remove nvidia_driver_352.39: VolumeDriver.Remove: internal error, check logs for details
Error response from daemon: Error while removing volume nvidia_driver_352.93: remove nvidia_driver_352.93: VolumeDriver.Remove: internal error, check logs for details
dpkg: error processing package nvidia-docker (--purge):
subprocess installed pre-removal script returned error exit status 123
Errors were encountered while processing:
nvidia-docker

My log said:

/usr/bin/nvidia-docker-plugin | 2016/06/16 13:39:31 Received remove request for volume 'nvidia_driver_352.93'
/usr/bin/nvidia-docker-plugin | 2016/06/16 13:39:31 Error: remove /var/lib/nvidia-docker/volumes/nvidia_driver/352.93: permission denied

@3XX0
Copy link
Member

3XX0 commented Jun 16, 2016

That's the directory you created earlier as root, remove it:
rm -rf var/lib/nvidia-docker/volumes

@GBJim
Copy link

GBJim commented Jun 16, 2016

I finally removed and reinstalled the nvidia-docker
Now I get this error from running nvidia-docker run --rm nvidia/cuda nvidia-smi

docker: Error response from daemon: no such volume: nvidia_driver_352.93.

Should I manually create a volume for it?

@3XX0
Copy link
Member

3XX0 commented Jun 16, 2016

No I think this one is docker caching every volume information (see #80).
Try restarting docker: sudo restart docker to clear the cache

@GBJim
Copy link

GBJim commented Jun 16, 2016

Now it works like a charm ! Thank you for all your help!
💃

@3XX0
Copy link
Member

3XX0 commented Jun 16, 2016

Sorry for the hassle, hopefully upgrades will be smoother now.

@qiaohaijun
Copy link

hope this solution can solve my issue

@flx42
Copy link
Member

flx42 commented Mar 7, 2017

@qiaohaijun please don't answer to old issues, especially if you don't have any question.

@NVIDIA NVIDIA locked and limited conversation to collaborators Mar 7, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants