Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

can not start docker after restart the computer #787

Closed
liuliyalei opened this issue Jul 9, 2018 · 26 comments
Closed

can not start docker after restart the computer #787

liuliyalei opened this issue Jul 9, 2018 · 26 comments

Comments

@liuliyalei
Copy link

liuliyalei commented Jul 9, 2018

ubuntu 16.04
Nvidia-docker
I can not start docker, does anyone know why?

qq 20180709142111
qq 20180709141729

@LeitchP
Copy link

LeitchP commented Jul 25, 2018

I'm getting the same behaviour. I completely removed docker (and I mean completely), then started installing step by step. Everything was working with repeated restarts until I installed NVidia-Docker 2

It stops docker from running in any form. I believe this is due to issue 504:
[https://github.com//issues/504]

I'm running Ubuntu 18.04 LTS
Also, running Docker-CE latest build (which I can't confirm which version right now because it won't run)

@LeitchP
Copy link

LeitchP commented Jul 26, 2018

If I edit /ect/systemd/system/docker.service.d/override.config and remove the "--add runtime=nvidia=..." docker starts, and I can manually run the nvidia docker, and again see GPU details.

So I ran the entire original ExecStart and I got the message "unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified as a flag and in the configuration file: runtimes: (from flag: [nvidia], from file: map[nvidia:map[path:nvidia-container-runtime runtimeArgs:[]]])

@flx42
Copy link
Member

flx42 commented Jul 26, 2018

@LeitchP this override file was not added by nvidia-docker. How did you configure your machine?

@LeitchP
Copy link

LeitchP commented Jul 26, 2018

@flx42 Perhaps the file wasn't added by the install, but the install process made the bug. To answer your question, this is a clean install on an Alienware laptop with 2x Geforce GTX 680 cards. These are the steps:
1: clean install of Ubuntu Desktop 18.04
2: followed the steps to set up NVidia drivers
3: followed the steps to install docker-ce
Confirmed multiple times that hello-world executes, even with log off/on and restarts.
4: followed the steps to install nvidia-docker2
Confirmed multiple times that hello-world executes until the session is logged off, restarted, or the docker server is stopped and restarted.

So I'm not saying "override" was "added" by nvidia-docker2, I'm saying that the override file contained a specific reference to add the runtime=nvidia. I'm also saying that I've traced the docker failing to that addition. I'm also saying that if I stop at step 3 that "--add runtime" isn't present, but after step 4 it is.

I'm being that specific because I haven't traced what is actually doing the modification. For instance, this may be due to Cuda or some other auto-installs as a component/requirement of nvidia-docker2.

The next thing I'm going to try is another clean install of Ubuntu, and monitor that file step by step.

@LeitchP
Copy link

LeitchP commented Jul 26, 2018

@flx42 Is it the nvidia-docker file, line 16 that was modified by you on the 23rd of February? That's contains a similar string seen in the override file? Or is it the June introduction of daemon.json that replaces (without backup) the existing daemon.json? I'm going to do another clean install tomorrow, so let me know if there are steps I can snapshot for you that might help with this.

@flx42
Copy link
Member

flx42 commented Jul 26, 2018

Are you referring to this? 2e9f20b#diff-03550f513b5eb839088628d4a360b865
This only adds argument to the docker CLI.

daemon.json is the only file that we ship in our packages.

@LeitchP
Copy link

LeitchP commented Jul 26, 2018

@flx42 Yes, that's the one I'm referring to. This is the text contained in the override.conf:

[Service] ExecStart= ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime

@LeitchP
Copy link

LeitchP commented Jul 26, 2018

Run this:
systemctl status docker.service
Get this:
screenshot from 2018-07-26 14-21-58

@LeitchP
Copy link

LeitchP commented Jul 26, 2018

If I remove the code you can see (everything after --add) from the override.conf this command simply works. If I then run the line as a whole (in my user context) it works.

I'm wondering if su needs to be a member of the docker group? I don't know enough about Linux to tell.

@LeitchP
Copy link

LeitchP commented Jul 26, 2018

I was just assuming that the newer Ubuntu version provides more details - but maybe this is a separate issue?

@flx42
Copy link
Member

flx42 commented Jul 26, 2018

What is the output of dpkg -S /etc/systemd/system/docker.service.d/override.conf?

@LeitchP
Copy link

LeitchP commented Jul 26, 2018

@flx42
it says "dpkg-query: no path found matching pattern override.conf"

I'm sorry if I am not understanding what you want to see.
I couldn't find "-S" in the help to see what it would do for dpkg, but there was a lower case "-s" for status. But that returned and error saying "Status needs a valid package name" and the file isn't a package.

@flx42
Copy link
Member

flx42 commented Jul 26, 2018

dpkg -S tells you which package a file on your system comes from.
The output is telling me it was not installed by a package. So it was probably added manually or generated somehow.

@LeitchP
Copy link

LeitchP commented Jul 27, 2018

@flx42 I don't follow your logic.

I tell you it is a clean install, and you tell me a file is probably manually generated?

I tell you that the file exists before installing NVIDIA-docker but is modified by the NVIDIA-docker install and you are focused on how the file is generated?

However this confusion happened - and in the interest of getting to the root cause of the problem - I will detail every step, with screen shots:

  1. An entirely fresh install of Ubuntu (you can see versions and even my hardware details). I checked for updates and there were none.
    screenshot from 2018-07-27 10-30-18
  2. Install Chrome from their website download
  3. Confirm the override.conf file does not exist.
    screenshot from 2018-07-27 10-55-23
  4. Went to docker and installed Docker-CE for Ubuntu using the "Install using the repository" steps.
    screenshot from 2018-07-27 10-57-44
    screenshot from 2018-07-27 11-15-23
  5. Checked the override file still isn't there. Ran the Post-installation steps for Linux. This attempts to create a docker group (which already exists for some reason), and adds the current user to that group.
    screenshot from 2018-07-27 11-20-09
    At this point it needs to restart, so I will continue this on with another reply.

@flx42
Copy link
Member

flx42 commented Jul 27, 2018

However this confusion happened - and in the interest of getting to the root cause of the problem - I will detail every step, with screen shots:

Yes please, that's all I'm asking.

@LeitchP
Copy link

LeitchP commented Jul 27, 2018

  1. After restart check the file still isn't there and check docker now works without sudo (the reason for the reboot).
    screenshot from 2018-07-27 12-53-21

That was the completion of the "Manage Docker as a non-root user" of the Docker post install.
7. I performed the next step to "Configure Docker to start on boot". That allows the launching of Docker at startup. I didn't do any further steps than that for docker, and before I did a restart the file still wasn't there.
screenshot from 2018-07-27 13-56-28

@LeitchP
Copy link

LeitchP commented Jul 27, 2018

The NVidia driver instructions from here were followed:
I got the most recent version of NVIDIA for Linux:
http://uk.download.nvidia.com/XFree86/Linux-x86_64/390.77/NVIDIA-Linux-x86_64-390.77.run

Ran ths:
sudo apt-get install build-essential gcc-multilib dkms

I created the "Blacklist for Nouveau Driver" and ran sudo update-initramfs -u
screenshot from 2018-07-27 14-17-38

At that point the instructions indicate I need to restart.

@LeitchP
Copy link

LeitchP commented Jul 27, 2018

After restart I continued with the steps:
screenshot from 2018-07-27 14-39-04
screenshot from 2018-07-27 14-39-43
screenshot from 2018-07-27 14-40-52

The override file still isn't there yet.

@LeitchP
Copy link

LeitchP commented Jul 27, 2018

I had said I downloaded Cuda but it didn't install because the driver already installed and everything after that aborted.

@LeitchP
Copy link

LeitchP commented Jul 27, 2018

I followed the install instructions on this GIT Repo (NVIDIA-nvidia-docker)

screenshot from 2018-07-27 15-15-14

screenshot from 2018-07-27 15-16-32

@LeitchP
Copy link

LeitchP commented Jul 27, 2018

I don't know what's different but not only is override.conf not there,

I don't know if I had tried this and didn't remove it fully: https://docs.docker.com/config/daemon/systemd/#httphttps-proxy

But whatever the case, every time I removed NVIDA-nvidia-docker2 it would take the ammend to the override.conf out, and everything worked but every time I installed NVIDIA-nvidia-docker2 back in.

In fact, that would back up your assertion that the file was created manually, as per that link.

All I can say is that on a fresh Ubuntu install, with a fresh setup it appears to be working now, even with restarts.

@flx42
Copy link
Member

flx42 commented Jul 27, 2018

Did you follow these instructions at one point?
https://github.com/nvidia/nvidia-container-runtime

That's one step too low in the stack for end users.

@LeitchP
Copy link

LeitchP commented Jul 28, 2018

I think so. That looks familiar - I have my bash history before I cleared the system so I'll confirm.

@LeitchP
Copy link

LeitchP commented Jul 28, 2018

Yes. Yes, I did follow those instructions.

@LeitchP
Copy link

LeitchP commented Jul 28, 2018

Also, I apologise - you said I had created that manually and I stated I did not. Given those instructions, I did.

@flx42
Copy link
Member

flx42 commented Jul 28, 2018

It looks like many users end up in this situation, I need to rephrase the README to redirect them to nvidia-docker instead.

@flx42 flx42 closed this as completed Jul 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants