Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting a bunch of containers in parallel with a short task sometimes results in "OCI runtime start failed" #647

Open
1 of 3 tasks
cakoolen opened this issue Apr 11, 2019 · 2 comments

Comments

@cakoolen
Copy link

It looks like there is a race condition in the start-up behavior of a container. Starting a bunch of containers in parallel can trigger this race condition where not all containers exit successfully while all call the exact same executable in the exact same image.

The syslog reports the following information:

$ grep "dockerd\[" syslog |grep -e "4a9d55f43a7f8c42ee93a9f3d96bcb9b243f7a2e4e748734753ab4abb5a16b78"
Apr  9 09:16:52 dockerd[1157]: time="2019-04-09T09:16:52+02:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/4a9d55f43a7f8c42ee93a9f3d96bcb9b243f7a2e4e748734753ab4abb5a16b78/shim.sock" debug=false module="containerd/tasks" pid=29033
Apr  9 09:16:59 dockerd[1157]: time="2019-04-09T09:16:59.631212505+02:00" level=warning msg="Ignoring Exit Event, no such exec command found" container=4a9d55f43a7f8c42ee93a9f3d96bcb9b243f7a2e4e748734753ab4abb5a16b78 exec-id=4a9d55f43a7f8c42ee93a9f3d96bcb9b243f7a2e4e748734753ab4abb5a16b78 exec-pid=29124
Apr  9 09:17:00 dockerd[1157]: time="2019-04-09T09:17:00+02:00" level=info msg="shim reaped" id=4a9d55f43a7f8c42ee93a9f3d96bcb9b243f7a2e4e748734753ab4abb5a16b78 module="containerd/tasks"
Apr  9 09:17:01 dockerd[1157]: time="2019-04-09T09:17:01.394884980+02:00" level=error msg="4a9d55f43a7f8c42ee93a9f3d96bcb9b243f7a2e4e748734753ab4abb5a16b78 cleanup: failed to delete container from containerd: no such container"
Apr  9 09:17:01 dockerd[1157]: time="2019-04-09T09:17:01.808935703+02:00" level=error msg="Handler for POST /v1.37/containers/4a9d55f43a7f8c42ee93a9f3d96bcb9b243f7a2e4e748734753ab4abb5a16b78/start returned error: OCI runtime start failed: container process is already dead: unknown"
  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

Expected behavior

The docker should always return the output of the command executed

Actual behavior

Sometimes the docker returns
"docker: Error response from daemon: OCI runtime start failed: container process is already dead: unknown."

Steps to reproduce the behavior

Run this script a few times (in parallel):

#!/bin/bash

BUILDER_IMAGE=ubuntu

for i in $(seq 0 100); do 
  (docker run --rm --entrypoint /bin/sh $BUILDER_IMAGE -c "getent passwd 1 | awk -F: '{ print \$1 }'")&
done

sleep 10

Output of docker version:

Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        6247962
 Built:             Tue Feb 26 23:56:24 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       6247962
  Built:            Tue Feb 12 22:47:29 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 9
 Running: 5
 Paused: 0
 Stopped: 4
Images: 237
Server Version: 18.09.2
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: v0.18.0 (expected: fec3683b971d9c3ef73f284f176672c44b448662)
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-47-generic
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.429GiB
Name: HRLLL-RD01
ID: IY6S:PLHU:HDAW:I2MI:IPOW:5IYR:TG52:6MUZ:SL4U:WMHE:UNJU:4KS3
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.)

Linux version:
Physical device:
Dell Latitude
Intel Core I5 vPro 7th gen

@liggitt
Copy link

liggitt commented Dec 19, 2019

tracked down to opencontainers/runc#2183 (proposed fix in opencontainers/runc#2185)

@iMoses
Copy link

iMoses commented Jul 12, 2020

I'm still experiencing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants