Skip to content
This repository has been archived by the owner on Jul 27, 2023. It is now read-only.

slow zookeeper startup due to docker devicemapper error #828

Closed
ryane opened this issue Nov 19, 2015 · 5 comments
Closed

slow zookeeper startup due to docker devicemapper error #828

ryane opened this issue Nov 19, 2015 · 5 comments
Labels
Milestone

Comments

@ryane
Copy link
Contributor

ryane commented Nov 19, 2015

This is one potential cause of the intermittent "wait for zookeeper to listen" failure we sometimes encounter on builds. When zookeeper starts, you may see an error like this in the logs:

Nov 19 03:02:36 resching-control-01.cisco.com systemd[1]: Starting zookeeper...
Nov 19 03:03:03 resching-control-01.cisco.com docker[10779]: Error response from daemon: Cannot destroy container zookeeper: Driver devicemapper failed to remove init filesystem dbcbaa47a36dc59fdbaa8dd9051faaf00ac099a924de11ac222b804ba576ff5a-init: Device is Busy
Nov 19 03:03:03 resching-control-01.cisco.com docker[10779]: Error: failed to remove containers: [zookeeper]
Nov 19 03:03:04 resching-control-01.cisco.com docker[10840]: Trying to pull repository docker.io/ciscocloud/zookeeper ... 0.3: Pulling from ciscocloud/zookeeper

Note that this causes an almost 30 second delay when starting up zookeeper and sometimes results in zookeeper not being available by the time the zookeeper-wait-for-listen.sh times out (60 seconds).

@ryane
Copy link
Contributor Author

ryane commented Nov 19, 2015

we should figure out what the cause of the error is but also this is maybe another reason to investigate #765

@Zogg
Copy link
Contributor

Zogg commented Nov 20, 2015

There can be one more cause for wait for zookeeper to listen failure:

Often times nginx-consul fails to start and hence consul becomes unreachable.
Sometimes, nginx-consul starts - that is service nginx-consul status reports active (running) - , but does not open any ports. In essence it fails to start, but reports as started.

Previously, looking at logs I'v seen errors from zookeeper's container about inability to reach consul. Is ZK dependent on consul?

@ryane ryane added this to the 0.5.1 milestone Nov 23, 2015
@ryane ryane added the bug label Nov 23, 2015
@avnik
Copy link
Contributor

avnik commented Nov 24, 2015

So with LVM containers starting slower?

@ryane ryane modified the milestones: 0.6, 0.5.1 Nov 30, 2015
@Zogg
Copy link
Contributor

Zogg commented Dec 1, 2015

I'v just encountered the wait for zookeeper to listen task failure, and the cause was that the nginx-consul service problem I'v mentioned above.

Zookeeper's logs show:

Dec 01 16:54:01 rtmi-control-01.c.asteris-mi.internal docker[28117]: 2015/12/01 16:54:01 [ERR] (view) "service(zookeeper [passing,warning])" health services: error fetching: Get http://consul.service.consul:8500/v1/health/service/zookeeper?wait=60000ms: dial tcp 10.0.0.6:8500: connection refused
Dec 01 16:54:01 rtmi-control-01.c.asteris-mi.internal docker[28117]: 2015/12/01 16:54:01 [ERR] (runner) watcher reported error: health services: error fetching: Get http://consul.service.consul:8500/v1/health/service/zookeeper?wait=60000ms: dial tcp 10.0.0.6:8500: connection refused

Stopping nginx-consul and starting nginx-consul anew solves the issue. Note: stopping and starting, not restarting.

@ryane
Copy link
Contributor Author

ryane commented Dec 2, 2015

with #873, zookeeper is not running under docker.

@ryane ryane closed this as completed Dec 2, 2015
@ryane ryane unassigned avnik Nov 7, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants