New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Daemon: Error starting daemon : layer does not exist #1808

Closed
avichalbadaya opened this Issue Feb 15, 2017 · 11 comments

Comments

Projects
None yet
9 participants
@avichalbadaya

avichalbadaya commented Feb 15, 2017

Issue Report

Bug

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1235.6.0
VERSION_ID=1235.6.0
BUILD_ID=2017-01-10-0545
PRETTY_NAME="Container Linux by CoreOS 1235.6.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

$ docker version
Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   34a2ead
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   34a2ead
 Built:
 OS/Arch:      linux/amd64

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux?

Running on AWS EC2 instance.

Linux 4.7.3-coreos-r2 #1 SMP x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz GenuineIntel GNU/Linux

Expected Behavior

Docker daemon service should run successfully also should heal after reboot.

Actual Behavior

We are seeing very frequent failure of nodes in our coreos boxes due to docker daemon going into corrupt state .

Feb 13 18:58:38 worker-green-5 systemd[1]: Starting Docker Application Container Engine...
Feb 13 18:58:46 worker-green-5 dockerd[1208]: time="2017-02-13T18:58:46.462403219Z" level=fatal msg="Error starting daemon: layer does not exist"
Feb 13 18:58:46 worker-green-5 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 18:58:46 worker-green-5 systemd[1]: Failed to start Docker Application Container Engine.
Feb 13 18:58:46 worker-green-5 systemd[1]: docker.service: Unit entered failed state.
Feb 13 18:58:46 worker-green-5 systemd[1]: docker.service: Failed with result 'exit-code'.
Feb 13 18:59:02 worker-green-5 systemd[1]: Starting Docker Application Container Engine...
Feb 13 18:59:02 worker-green-5 dockerd[1241]: time="2017-02-13T18:59:02.694400078Z" level=fatal msg="Error starting daemon: layer does not exist"
Feb 13 18:59:02 worker-green-5 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 18:59:02 worker-green-5 systemd[1]: Failed to start Docker Application Container Engine.
Feb 13 18:59:02 worker-green-5 systemd[1]: docker.service: Unit entered failed state.
Feb 13 18:59:02 worker-green-5 systemd[1]: docker.service: Failed with result 'exit-code'.
Feb 13 19:00:20 worker-green-5 systemd[1]: Starting Docker Application Container Engine...
Feb 13 19:00:21 worker-green-5 dockerd[1295]: time="2017-02-13T19:00:21.077377986Z" level=fatal msg="Error starting daemon: layer does not exist"

Other Information

If we run cleanup corrupted docker images from /var/lib/docker, using this script https://github.com/docker/docker/blob/620339f166984540f15aadef2348646eee9a5b42/contrib/nuke-graph-directory.sh, docker daemon service could be started. But other other service like Nomad doesnt start. Also, a reboot will again make docker daemon corrupt, and fail its start.

@wkruse

This comment has been minimized.

Show comment
Hide comment
@wkruse

wkruse Mar 24, 2017

Sounds like a duplicate of #1313. I ran into it also after CoreOS upgrade. sudo rm -rf /var/lib/docker fixed it for me.

wkruse commented Mar 24, 2017

Sounds like a duplicate of #1313. I ran into it also after CoreOS upgrade. sudo rm -rf /var/lib/docker fixed it for me.

@jacohend

This comment has been minimized.

Show comment
Hide comment
@jacohend

jacohend Jun 29, 2017

@wkruse He already addressed this under "Other Information"- a reboot will make the docker daemon corrupt again, even after cleaning out /var/lib/docker. Having this under nearly identical circumstances on a CoreOS AMI on AWS

jacohend commented Jun 29, 2017

@wkruse He already addressed this under "Other Information"- a reboot will make the docker daemon corrupt again, even after cleaning out /var/lib/docker. Having this under nearly identical circumstances on a CoreOS AMI on AWS

@vrothberg

This comment has been minimized.

Show comment
Hide comment
@vrothberg

vrothberg Jul 5, 2017

@avichalbadaya, which filesystem are you using for /var/lib/docker?

vrothberg commented Jul 5, 2017

@avichalbadaya, which filesystem are you using for /var/lib/docker?

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jul 5, 2017

We figured this out. Can you see if applying moby/moby@c37bd10 fixes it? Note that this won't fix the actual /var/lib/docker corruption but it should stop it from happening in the future. See moby/moby#32170 for more info.

cyphar commented Jul 5, 2017

We figured this out. Can you see if applying moby/moby@c37bd10 fixes it? Note that this won't fix the actual /var/lib/docker corruption but it should stop it from happening in the future. See moby/moby#32170 for more info.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Oct 12, 2017

Contributor

We backported the above referenced patch. It should be available in Container Linux versions >= v1548.0.0 (which means beta now, stable in its next major version).

Apologies for the delay in getting to this; I was hopeful we could upgrade to 1.13+ instead, but per #1930, it looks like we do need to continue maintaining 1.12.

Thanks for reporting the issue, and thanks for the heads up about that patch @cyphar.

Contributor

euank commented Oct 12, 2017

We backported the above referenced patch. It should be available in Container Linux versions >= v1548.0.0 (which means beta now, stable in its next major version).

Apologies for the delay in getting to this; I was hopeful we could upgrade to 1.13+ instead, but per #1930, it looks like we do need to continue maintaining 1.12.

Thanks for reporting the issue, and thanks for the heads up about that patch @cyphar.

@euank euank closed this Oct 12, 2017

@meridius

This comment has been minimized.

Show comment
Hide comment
@meridius

meridius Oct 29, 2017

This already happened to me a few times on various docker versions with --data-root dir located on ext3 and using --storage-driver overlay2.
Every time the only solution was to rm -fr the content of dir my --data-root pointed to. So ... yeah, that sucks.

meridius commented Oct 29, 2017

This already happened to me a few times on various docker versions with --data-root dir located on ext3 and using --storage-driver overlay2.
Every time the only solution was to rm -fr the content of dir my --data-root pointed to. So ... yeah, that sucks.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Oct 31, 2017

Contributor

@meridius Can you please also post which container linux version you're encountering it on (e.g. the contents of /etc/os-release)? If it's occurring on versions after v1548.0.0 we should re-open this issue.

Contributor

euank commented Oct 31, 2017

@meridius Can you please also post which container linux version you're encountering it on (e.g. the contents of /etc/os-release)? If it's occurring on versions after v1548.0.0 we should re-open this issue.

@meridius

This comment has been minimized.

Show comment
Hide comment
@meridius

meridius Oct 31, 2017

I actually run it on Arch Linux, but it seemed you guys have the same problem.

meridius commented Oct 31, 2017

I actually run it on Arch Linux, but it seemed you guys have the same problem.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Oct 31, 2017

@meridius The bug should've been fixed in v1.13.0. See my above comment.

cyphar commented Oct 31, 2017

@meridius The bug should've been fixed in v1.13.0. See my above comment.

@dhensen

This comment has been minimized.

Show comment
Hide comment
@dhensen

dhensen Nov 15, 2017

I'm experiencing this exact same problem running arch linux on my dev workstation also running arch linux.

$ uname -r
4.13.12-1-ARCH
$ docker version
Client:
 Version:      17.10.0-ce
 API version:  1.33
 Go version:   go1.9.1
 Git commit:   f4ffd2511c
 Built:        Wed Oct 18 23:08:56 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.10.0-ce
 API version:  1.33 (minimum version 1.12)
 Go version:   go1.9.1
 Git commit:   f4ffd2511c
 Built:        Wed Oct 18 23:09:11 2017
 OS/Arch:      linux/amd64
 Experimental: false

Docker consistently fails to start after reboot just like mentioned in previous comments. (I only rebooted twice last month though)

dhensen commented Nov 15, 2017

I'm experiencing this exact same problem running arch linux on my dev workstation also running arch linux.

$ uname -r
4.13.12-1-ARCH
$ docker version
Client:
 Version:      17.10.0-ce
 API version:  1.33
 Go version:   go1.9.1
 Git commit:   f4ffd2511c
 Built:        Wed Oct 18 23:08:56 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.10.0-ce
 API version:  1.33 (minimum version 1.12)
 Go version:   go1.9.1
 Git commit:   f4ffd2511c
 Built:        Wed Oct 18 23:09:11 2017
 OS/Arch:      linux/amd64
 Experimental: false

Docker consistently fails to start after reboot just like mentioned in previous comments. (I only rebooted twice last month though)

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Nov 16, 2017

Please note that this is the bug tracker for CoreOS. If you have Arch bugs, please report them to either Arch or upstream. This cause of this bug was very specific and was definitely fixed in 1.13.0 -- if you're seeing something that looks similar you probably have a different bug and should report it in the appropriate place.

cyphar commented Nov 16, 2017

Please note that this is the bug tracker for CoreOS. If you have Arch bugs, please report them to either Arch or upstream. This cause of this bug was very specific and was definitely fixed in 1.13.0 -- if you're seeing something that looks similar you probably have a different bug and should report it in the appropriate place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment