Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container with NFS volume does not come up on system restart with restart always policy #25584

Closed
ventz opened this issue Aug 10, 2016 · 11 comments

Comments

@ventz
Copy link

ventz commented Aug 10, 2016

UPDATE / SOLUTION FOR ANYONE HAVING THIS PROBLEM
1.) Make sure your NFS mount in /etc/fstab does NOT have the "bg" background field
and
2.) Create: /etc/systemd/system/docker.service.d/override.conf
with:

[Unit]
After=nfs.mount

and then
systemctl daemon-reload

and then reboot.

This is needed until docker adds "After=nfs.mount" to the default docker Unit section.

^^^ UPDATE / SOLUTION FOR ANYONE HAVING THIS PROBLEM ^^^

Output of docker version:

docker version
Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:11:10 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:11:10 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 9
 Running: 7
 Paused: 0
 Stopped: 2
Images: 74
Server Version: 1.12.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 226
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: weavemesh bridge overlay macvlan null host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-34-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 94.41 GiB
Name: s01
ID: YN2E:WSP3:6TPP:YTX7:KO7E:3G2X:NFGU:62J4:VX75:BVRX:G7L2:BZ3J
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):
Physical Dell m610 blade on a m1000 chassis.
Running Ubuntu 16.04 LTS

Steps to reproduce the issue:

  1. Create NFS share from a L2 or L3 network -- call it /nfs
    Let's assume "/nfs/container01" exists and in that folder there is a "service01" dir.
  2. Start a container with a volume mounted from that share:
    docker run -d --restart=always --net=somenetwork --ip=10.0.0.101 -v /nfs/container01/service01:/etc/service01 alpine
    note: same case with Ubuntu, and other systems.
  3. reboot VM or physical host running the container.
    The container will NOT start on reboot.
    Same result if you shutdown the container.

Describe the results you received:
Container does NOT start on reboot, even though it exists and the NFS volume is available.

Describe the results you expected:
Container should start. Local containers (that mount "local" volumes do start)

Additional information you deem important (e.g. issue happens only occasionally):
Tried creating an override file for systemd:

vim /etc/systemd/system/docker.service.d/override.conf

With:

[Unit]
After=nfs-utils.service

and later also tried instead with:

[Unit]
After=remote-fs.target

and then also tried:

[Unit]
After=nfs.mount

In both cases after that with systemctl daemon-reload and then reboot. Same result, no difference.

My assumption is that there is a race condition between docker and NFS. Once the physical host has rebooted, if I start the the container manually, it works. It just won't start automatically.

@cpuguy83
Copy link
Member

Are you seeing an error when trying to start the container?

@ventz
Copy link
Author

ventz commented Aug 10, 2016

Update: It seems like there definitely is a race condition. NFS kicks off, but then backgrounds the "mount", and docker ignores the After because the process has kicked off.

Removing "bg" from the fstab NFS options seems to have helped.

Specifically, having:
$IP:/volume /nfs nfs nfsvers=4,tcp,rw,hard,intr,timeo=600,retrans=2,_netdev,auto 0
^ this works (defaults to 'fg' -- even though it's not there as an explicit option)

What we had before was:
````$IP:/volume /nfs nfs nfsvers=4,tcp,rw,hard,intr,bg,timeo=600,retrans=2,_netdev,auto 0`
^ this did NOT work (note 'bg')

@cpuguy83
Copy link
Member

What does your systemd unit file look like?

@ventz
Copy link
Author

ventz commented Aug 10, 2016

The override looks like this:

/etc/systemd/system/docker.service.d/override.conf

And we added:

[Unit]
After=nfs.mount

[Service]
ExecStart=
ExecStart=/usr/bin/docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --dns 8.8.8.8 --dns 8.8.4.4

They key part there is "After=nfs.mount" --> assuming this should probably be added by default, since if there's no /etc/fstab mount, it will just skip over it.

The default that we are overriding by appending the Unit is:

/lib/systemd/system/docker.service

And has:

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd://
ExecReload=/bin/kill -s HUP $MAINPID
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target

@cpuguy83
Copy link
Member

Docker does not care about nfs here. All it does is try to start the container with the host path mounted in (and if the hostpath doesn't exist it would create it).
Presumably it's failing because the process in the container is looking for something in the share that does not exist?

Did you check the container logs?
What sort of error are you seeing?

@ventz
Copy link
Author

ventz commented Aug 10, 2016

@cpuguy83 - Yea, it's failing just because the config (in this case, our test is an openvpn container and it's looking for the server.conf + certs).

But I think we figured out the issue -- it was NFS backgrounding and tricking docker into thinking that it "ran".

So the summary is in two parts:
1.) The unit file needs a "nfs.mount" added in the existing "After=" field.
2.) If you do NFS mounts in /etc/fstab, you cannot have "bg", because it tricks docker into thinking it successfully ran, while forking in the background and not actually having the NFS mount ready.

@cpuguy83
Copy link
Member

So it sounds like we can close this?

@ventz
Copy link
Author

ventz commented Aug 10, 2016

@cpuguy83 yea, but hopefully you can submit a request to add the parameter to the default Unit file, so that it looks like this:
After=network.target docker.socket nfs.mount

It will not impact people who are not using nfs, as it will just skip over it. But this way, for everyone using NFS, it will actually work.

@cpuguy83
Copy link
Member

@ventz I'm not sure this is a good option to have in the config file for everyone using Docker and is easy enough to add the customization.

@ventz
Copy link
Author

ventz commented Aug 10, 2016

@cpuguy83 ok -- maybe potentially just adding a section to the Docker docs then?

I updated the top here with the solution so that hopefully others who run into this can find the answer quickly for now.

@sam-thibault
Copy link
Contributor

This is an old issue. I will close it as stale.

@sam-thibault sam-thibault closed this as not planned Won't fix, can't repro, duplicate, stale Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants