Docker daemon hangs when rapidly spawning containers #1117

Closed
pmoust opened this Issue Feb 11, 2016 · 5 comments
@pmoust
pmoust commented Feb 11, 2016

stable chan, CoreOS 835.12.0
Docker daemon hangs when rapidly spawning containers

Ref: docker/docker#13885 (comment)

Repro:

A_LOT=300 # whatever
for i in `seq 1 $A_LOT`; 
  do docker run --rm alpine sh -c "echo $i; sleep 30" & 
done
sleep 5   # give it some time to start spawning containers
docker ps # hangs
@pmoust
pmoust commented Feb 16, 2016

I filled it here (as opposed to just keeping it posted in docker/docker) cause the common denominator seems to be overlayfs.

See docker/docker#13885 (comment)

I am able to reproduce across all channels.

@pmoust
pmoust commented Feb 29, 2016

@crawford FWIW RedHat has submitted a patch on Feb 16 for LVM/devicemapper I don't know if it's relevant for CoreOS as well docker/docker#13885 (comment)

@crawford
Member
crawford commented Mar 1, 2016

@pmoust it looks like this was fixed in the 4.3 kernel. Can you try to repro on Beta (running 4.3.6)?

@pmoust
pmoust commented Mar 2, 2016

@crawford

docker: An error occurred trying to connect: Post http:///var/run/docker.sock/v1.22/containers/723381b31b91ca0b84eab311916ef57c716b068c6d52a24cce6e4f3049524544/start: EOF.
SIGABRT: abort
PC=0x7f0f5c55069b

goroutine 0 [idle]:

goroutine 1 [running]:
runtime.switchtoM()
    /usr/lib/go/src/runtime/asm_amd64.s:198 fp=0xc208018798 sp=0xc208018790
docker: An error occurred trying to connect: Post http:///var/run/docker.sock/v1.22/containers/faf25e2db84e8a7a0d8f8eb346364d19f50916150878d41117983b1c8429d630/start: EOF.
docker: An error occurred trying to connect: Post http:///var/run/docker.sock/v1.22/containers/create: EOF.
See 'docker run --help'.
docker: An error occurred trying to connect: Post http:///var/run/docker.sock/v1.22/containers/create: EOF.
See 'docker run --help'.
runtime.main()
    /usr/lib/go/src/runtime/proc.go:32 +0x58 fp=0xc2080187e0 sp=0xc208018798
runtime.goexit()
    /usr/lib/go/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc2080187e8 sp=0xc2080187e0

rax     0x0
rbx     0x7ffd962bb580
rcx     0x7f0f5c55069b
rdx     0x6
rdi     0xa0b
rsi     0xa0b
rbp     0x7ffd962bb130
rsp     0x7ffd962bb130
r8      0x7f0f5c8c28e0
r9      0x7f0f5d09f880
r10     0x8
r11     0x202
r12     0x1059300
r13     0x7ffd962bb700
r14     0x0
r15     0x0
rip     0x7f0f5c55069b
rflags  0x202
cs      0x33
fs      0x0
gs      0x0

When trying to spawn 300 containers like the example above.

When spawning 100

docker: Error response from daemon: device or resource busy.

Then again responsive..

core@pph-01 ~ $ sudo journalctl -u docker
-- Logs begin at Wed 2016-03-02 10:33:40 UTC, end at Wed 2016-03-02 11:09:44 UTC. --
Mar 02 11:00:02 pph-01 systemd[1]: Started Docker Application Container Engine.
Mar 02 11:00:03 pph-01 dockerd[1405]: time="2016-03-02T11:00:03.132307754Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Mar 02 11:00:03 pph-01 dockerd[1405]: time="2016-03-02T11:00:03.207303222Z" level=info msg="Firewalld running: false"
Mar 02 11:00:03 pph-01 dockerd[1405]: time="2016-03-02T11:00:03.423448196Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Mar 02 11:00:03 pph-01 dockerd[1405]: time="2016-03-02T11:00:03.791863182Z" level=info msg="Loading containers: start."
Mar 02 11:00:03 pph-01 dockerd[1405]: time="2016-03-02T11:00:03.792947146Z" level=info msg="Loading containers: done."
Mar 02 11:00:03 pph-01 dockerd[1405]: time="2016-03-02T11:00:03.793981034Z" level=info msg="Daemon has completed initialization"
Mar 02 11:00:03 pph-01 dockerd[1405]: time="2016-03-02T11:00:03.794610639Z" level=info msg="Docker daemon" commit=88b8b3a execdriver=native-0.2 graphdriver=overlay version=1.10.1
Mar 02 11:00:03 pph-01 dockerd[1405]: time="2016-03-02T11:00:03.814946014Z" level=info msg="API listen on /var/run/docker.sock"
Mar 02 11:01:32 pph-01 systemd[1]: docker.service: Main process exited, code=killed, status=9/KILL
Mar 02 11:01:32 pph-01 systemd[1]: docker.service: Unit entered failed state.
Mar 02 11:01:32 pph-01 systemd[1]: docker.service: Failed with result 'signal'.
Mar 02 11:01:32 pph-01 systemd[1]: Started Docker Application Container Engine.
Mar 02 11:01:34 pph-01 dockerd[2732]: time="2016-03-02T11:01:34.048587919Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Mar 02 11:01:34 pph-01 dockerd[2732]: time="2016-03-02T11:01:34.097744894Z" level=info msg="Firewalld running: false"
Mar 02 11:01:39 pph-01 dockerd[2732]: time="2016-03-02T11:01:39.836851360Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.255597275Z" level=info msg="Loading containers: start."
Mar 02 11:01:40 pph-01 dockerd[2732]: ........................................time="2016-03-02T11:01:40.328452930Z" level=error msg="Failed to load container c057533b806b2b7a2b5936bb561c491b6704d778434e124611a912eb31ff2616: EOF"
Mar 02 11:01:40 pph-01 dockerd[2732]: .........time="2016-03-02T11:01:40.353721114Z" level=error msg="Error unmounting container c77f0e8a4c82096da0c3c889c16af81249729ce7df8a98a0aac6c583224848c2: not mounted"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.361595493Z" level=error msg="Error unmounting container 5fb0363189478651cd11208b117a431e4979165273f2b722826efd2c533eea49: not mounted"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.370872051Z" level=error msg="Error unmounting container 6d4cb30f12ef7b33862666a95277ba79e218267932b7303188154ca567069540: not mounted"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.377371465Z" level=error msg="Error unmounting container 28cd0833fc66c77943dcdc7f5e419a2a585d789ac72a1e28b5f7f4abe37c503a: not mounted"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.385625964Z" level=error msg="Error unmounting container 80ce0199d0a0cb41623ee1fe5409d109cfc014228150d82beeda08317c5cb239: not mounted"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.393825672Z" level=error msg="Error unmounting container 26b0b4e0bd016b5e8eb63c886c8cac341f890b12a063e2634e6617d54cb1e919: not mounted"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.398095296Z" level=error msg="Error unmounting container 2e3b8d6048357c1cb34fcc7a322f355c18a1dde8f6244a69d6c62125bddc1b11: not mounted"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.403058614Z" level=error msg="Error unmounting container cce358333f28b71de52039304f136828d182bca02bb4e4650d66eceab9347d08: not mounted"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.425571872Z" level=error msg="Error unmounting container 2f70e710cf3cfa6e279703e99ec69a1156554420718b6b044e5e218c15c52086: not mounted"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.435999082Z" level=error msg="Error unmounting container c47599615443977fb43bc48df5f012127dd3e2b44497f56f21427f97684ab61a: not mounted"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.442294474Z" level=error msg="Error unmounting container 9b433c46cc4a5e4e67e90e6bdae77b407f049325478b08560f0d318411393823: not mounted"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.451818433Z" level=info msg="Loading containers: done."
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.452317241Z" level=info msg="Daemon has completed initialization"
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.452387958Z" level=info msg="Docker daemon" commit=88b8b3a execdriver=native-0.2 graphdriver=overlay version=1.10.1
Mar 02 11:01:40 pph-01 dockerd[2732]: time="2016-03-02T11:01:40.460915662Z" level=info msg="API listen on /var/run/docker.sock"
Mar 02 11:01:41 pph-01 dockerd[2732]: time="2016-03-02T11:01:41.845616969Z" level=error msg="Clean up Error! Cannot destroy container 568332e6f6b12f3d6b83d6e484eb7701da1e3e8bcf3f2f0574b9c78b79d1a134: nosuchcontainer: No such container: 56
Mar 02 11:01:41 pph-01 dockerd[2732]: time="2016-03-02T11:01:41.845737389Z" level=error msg="Handler for POST /v1.22/containers/create returned error: device or resource busy"
Mar 02 11:08:08 pph-01 dockerd[2732]: time="2016-03-02T11:08:08.718147395Z" level=error msg="Clean up Error! Cannot destroy container b270ba4a174d144abe7363f818d902227b0563a1078b02667a5b77344131c24a: nosuchcontainer: No such container: b2
Mar 02 11:08:08 pph-01 dockerd[2732]: time="2016-03-02T11:08:08.718258628Z" level=error msg="Handler for POST /v1.22/containers/create returned error: device or resource busy"

System

core@pph-01 ~ $ docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.10.1
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: host bridge null
Kernel Version: 4.4.1-coreos
Operating System: CoreOS 970.1.0 (Coeur Rouge)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.958 GiB
Name: pph-01
ID: KEF3:6KGA:7YPU:P37W:NXBC:H3GX:KMHP:7BC2:NW6C:G3BO:FKDZ:K26K
Username: pmoust
Registry: https://index.docker.io/v1/
core@pph-01 ~ $ docker version
Client:
 Version:      1.10.1
 API version:  1.22
 Go version:   go1.4.3
 Git commit:   88b8b3a
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.1
 API version:  1.22
 Go version:   go1.4.3
 Git commit:   88b8b3a
 Built:
 OS/Arch:      linux/amd64
@pmoust pmoust referenced this issue in docker/docker Mar 2, 2016
Open

Docker Daemon Hangs under load #13885

@crawford crawford added this to the CoreOS Alpha 1263.0.0 milestone Dec 2, 2016
@dm0- dm0- self-assigned this Dec 6, 2016
@dm0- dm0- referenced this issue in coreos/coreos-overlay Dec 14, 2016
Merged

app-emulation/docker: bump to v1.12.4 #2317

@dm0-
Member
dm0- commented Dec 14, 2016

We are updating to Docker 1.12.4, which contains the upstream fixes for the docker daemon deadlocks (docker/docker#29095, docker/docker#29141). It will be available in the alpha later this week. You can reopen this if problems persist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment