Skip to content
This repository has been archived by the owner. It is now read-only.

Docker Daemon: Error starting daemon : layer does not exist #1808

Closed
avichalbadaya opened this issue Feb 15, 2017 · 11 comments
Closed

Docker Daemon: Error starting daemon : layer does not exist #1808

avichalbadaya opened this issue Feb 15, 2017 · 11 comments

Comments

@avichalbadaya
Copy link

@avichalbadaya avichalbadaya commented Feb 15, 2017

Issue Report

Bug

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1235.6.0
VERSION_ID=1235.6.0
BUILD_ID=2017-01-10-0545
PRETTY_NAME="Container Linux by CoreOS 1235.6.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

$ docker version
Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   34a2ead
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   34a2ead
 Built:
 OS/Arch:      linux/amd64

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux?

Running on AWS EC2 instance.

Linux 4.7.3-coreos-r2 #1 SMP x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz GenuineIntel GNU/Linux

Expected Behavior

Docker daemon service should run successfully also should heal after reboot.

Actual Behavior

We are seeing very frequent failure of nodes in our coreos boxes due to docker daemon going into corrupt state .

Feb 13 18:58:38 worker-green-5 systemd[1]: Starting Docker Application Container Engine...
Feb 13 18:58:46 worker-green-5 dockerd[1208]: time="2017-02-13T18:58:46.462403219Z" level=fatal msg="Error starting daemon: layer does not exist"
Feb 13 18:58:46 worker-green-5 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 18:58:46 worker-green-5 systemd[1]: Failed to start Docker Application Container Engine.
Feb 13 18:58:46 worker-green-5 systemd[1]: docker.service: Unit entered failed state.
Feb 13 18:58:46 worker-green-5 systemd[1]: docker.service: Failed with result 'exit-code'.
Feb 13 18:59:02 worker-green-5 systemd[1]: Starting Docker Application Container Engine...
Feb 13 18:59:02 worker-green-5 dockerd[1241]: time="2017-02-13T18:59:02.694400078Z" level=fatal msg="Error starting daemon: layer does not exist"
Feb 13 18:59:02 worker-green-5 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 18:59:02 worker-green-5 systemd[1]: Failed to start Docker Application Container Engine.
Feb 13 18:59:02 worker-green-5 systemd[1]: docker.service: Unit entered failed state.
Feb 13 18:59:02 worker-green-5 systemd[1]: docker.service: Failed with result 'exit-code'.
Feb 13 19:00:20 worker-green-5 systemd[1]: Starting Docker Application Container Engine...
Feb 13 19:00:21 worker-green-5 dockerd[1295]: time="2017-02-13T19:00:21.077377986Z" level=fatal msg="Error starting daemon: layer does not exist"

Other Information

If we run cleanup corrupted docker images from /var/lib/docker, using this script https://github.com/docker/docker/blob/620339f166984540f15aadef2348646eee9a5b42/contrib/nuke-graph-directory.sh, docker daemon service could be started. But other other service like Nomad doesnt start. Also, a reboot will again make docker daemon corrupt, and fail its start.

@wkruse
Copy link

@wkruse wkruse commented Mar 24, 2017

Sounds like a duplicate of #1313. I ran into it also after CoreOS upgrade. sudo rm -rf /var/lib/docker fixed it for me.

@jacohend
Copy link

@jacohend jacohend commented Jun 29, 2017

@wkruse He already addressed this under "Other Information"- a reboot will make the docker daemon corrupt again, even after cleaning out /var/lib/docker. Having this under nearly identical circumstances on a CoreOS AMI on AWS

@vrothberg
Copy link

@vrothberg vrothberg commented Jul 5, 2017

@avichalbadaya, which filesystem are you using for /var/lib/docker?

@cyphar
Copy link

@cyphar cyphar commented Jul 5, 2017

We figured this out. Can you see if applying moby/moby@c37bd10 fixes it? Note that this won't fix the actual /var/lib/docker corruption but it should stop it from happening in the future. See moby/moby#32170 for more info.

@euank
Copy link
Contributor

@euank euank commented Oct 12, 2017

We backported the above referenced patch. It should be available in Container Linux versions >= v1548.0.0 (which means beta now, stable in its next major version).

Apologies for the delay in getting to this; I was hopeful we could upgrade to 1.13+ instead, but per #1930, it looks like we do need to continue maintaining 1.12.

Thanks for reporting the issue, and thanks for the heads up about that patch @cyphar.

@euank euank closed this Oct 12, 2017
@meridius
Copy link

@meridius meridius commented Oct 29, 2017

This already happened to me a few times on various docker versions with --data-root dir located on ext3 and using --storage-driver overlay2.
Every time the only solution was to rm -fr the content of dir my --data-root pointed to. So ... yeah, that sucks.

@euank
Copy link
Contributor

@euank euank commented Oct 31, 2017

@meridius Can you please also post which container linux version you're encountering it on (e.g. the contents of /etc/os-release)? If it's occurring on versions after v1548.0.0 we should re-open this issue.

@meridius
Copy link

@meridius meridius commented Oct 31, 2017

I actually run it on Arch Linux, but it seemed you guys have the same problem.

@cyphar
Copy link

@cyphar cyphar commented Oct 31, 2017

@meridius The bug should've been fixed in v1.13.0. See my above comment.

@dhensen
Copy link

@dhensen dhensen commented Nov 15, 2017

I'm experiencing this exact same problem running arch linux on my dev workstation also running arch linux.

$ uname -r
4.13.12-1-ARCH
$ docker version
Client:
 Version:      17.10.0-ce
 API version:  1.33
 Go version:   go1.9.1
 Git commit:   f4ffd2511c
 Built:        Wed Oct 18 23:08:56 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.10.0-ce
 API version:  1.33 (minimum version 1.12)
 Go version:   go1.9.1
 Git commit:   f4ffd2511c
 Built:        Wed Oct 18 23:09:11 2017
 OS/Arch:      linux/amd64
 Experimental: false

Docker consistently fails to start after reboot just like mentioned in previous comments. (I only rebooted twice last month though)

@cyphar
Copy link

@cyphar cyphar commented Nov 16, 2017

Please note that this is the bug tracker for CoreOS. If you have Arch bugs, please report them to either Arch or upstream. This cause of this bug was very specific and was definitely fixed in 1.13.0 -- if you're seeing something that looks similar you probably have a different bug and should report it in the appropriate place.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.