Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple run of docker run --rm hangs in a host #115

Open
2 of 3 tasks
zemanlx opened this issue Sep 26, 2017 · 10 comments
Open
2 of 3 tasks

Multiple run of docker run --rm hangs in a host #115

zemanlx opened this issue Sep 26, 2017 · 10 comments

Comments

@zemanlx
Copy link

zemanlx commented Sep 26, 2017

  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

Expected behaviour

When I run any of docker run --rm image command and command exit, the container should be removed from the list of docker ps docker run` process should also exit.

Actual behaviour

This happens when you run multiple containers in parallel. command send output (and probably exit) but docker run process does not exit and container is in the dead state. I have a computational heavy single-threaded process program that I want to run in parallel in multiple containers on a multicore host (64 cores).

I have to use CentOS and Red Hat. I have found that version 17.03.2 or lower is working flawlessly but 17.06 and above have this issue. It is not working for Red Hat 7.4 with Docker EE nor for CentOS with Docker CE.

BTW I also do not have this issue on Ubuntu 16.04.

Steps to reproduce the behaviour

I have created script test_parallel.sh to test this behaviour

#! /usr/bin/env bash

set -evuo pipefail

run_in_parrallel=${1:-60}
echo -e "\n### Show version info ###"
docker info

echo -e "\n### Run ${run_in_parrallel} containers in background ###"
date
for id in $(seq 1 ${run_in_parrallel}); do
  (docker run --rm --name ping-$id -h ping-$id alpine ash -c 'echo "$(date) && ${HOSTNAME}" '&)
done

echo -e "\n### Wait ${run_in_parrallel} seconds ###" 
sleep ${run_in_parrallel}

echo -e "\n### Show what is still not removed ###"
date
docker ps -a

echo -e "\n### Removing dead containers ###"
date
docker rm -f $(docker ps -a | awk '$2~/alpine/ {print $1}')

Examples of NON-working combination:

CentOS Linux 7 with Docker 17.06.2-ce
# ./test_parallel.sh 

run_in_parrallel=${1:-60}
echo -e "\n### Show version info ###"

### Show version info ###
docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 17.06.2-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.2.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.456GiB
Name: centos74-dockerce
ID: TSDC:GC3E:3UBM:K3KF:IT7I:U6YP:UHVH:ZFG7:F73R:OHLG:OHJH:KG3A
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false


echo -e "\n### Run ${run_in_parrallel} containers in background ###"

### Run 60 containers in background ###
date
Tue 26 Sep 13:04:56 UTC 2017
for id in $(seq 1 ${run_in_parrallel}); do
  (docker run --rm --name ping-$id -h ping-$id alpine ash -c 'echo "$(date) && ${HOSTNAME}" '&)
done

echo -e "\n### Wait ${run_in_parrallel} seconds ###" 
                                                                                                                                                                                                                                   [49/1806]
### Wait 60 seconds ###
sleep ${run_in_parrallel}
Tue Sep 26 13:04:59 UTC 2017 && ping-1
Tue Sep 26 13:05:00 UTC 2017 && ping-10
Tue Sep 26 13:05:01 UTC 2017 && ping-4
Tue Sep 26 13:05:01 UTC 2017 && ping-11
Tue Sep 26 13:05:01 UTC 2017 && ping-2
Tue Sep 26 13:05:02 UTC 2017 && ping-9
Tue Sep 26 13:05:02 UTC 2017 && ping-5
Tue Sep 26 13:05:03 UTC 2017 && ping-6
Tue Sep 26 13:05:03 UTC 2017 && ping-7
Tue Sep 26 13:05:03 UTC 2017 && ping-8
Tue Sep 26 13:05:04 UTC 2017 && ping-16
Tue Sep 26 13:05:04 UTC 2017 && ping-3
Tue Sep 26 13:05:04 UTC 2017 && ping-13
Tue Sep 26 13:05:05 UTC 2017 && ping-30
Tue Sep 26 13:05:05 UTC 2017 && ping-31
Tue Sep 26 13:05:06 UTC 2017 && ping-12
Tue Sep 26 13:05:07 UTC 2017 && ping-23
Tue Sep 26 13:05:07 UTC 2017 && ping-15
Tue Sep 26 13:05:08 UTC 2017 && ping-18
Tue Sep 26 13:05:08 UTC 2017 && ping-17
Tue Sep 26 13:05:08 UTC 2017 && ping-22
Tue Sep 26 13:05:09 UTC 2017 && ping-19
Tue Sep 26 13:05:09 UTC 2017 && ping-25
Tue Sep 26 13:05:10 UTC 2017 && ping-27
Tue Sep 26 13:05:10 UTC 2017 && ping-20
Tue Sep 26 13:05:11 UTC 2017 && ping-14
Tue Sep 26 13:05:11 UTC 2017 && ping-29
Tue Sep 26 13:05:11 UTC 2017 && ping-21
Tue Sep 26 13:05:12 UTC 2017 && ping-28
Tue Sep 26 13:05:12 UTC 2017 && ping-26
Tue Sep 26 13:05:13 UTC 2017 && ping-24
Tue Sep 26 13:05:14 UTC 2017 && ping-45
Tue Sep 26 13:05:15 UTC 2017 && ping-32
Tue Sep 26 13:05:15 UTC 2017 && ping-46
Tue Sep 26 13:05:15 UTC 2017 && ping-35
Tue Sep 26 13:05:16 UTC 2017 && ping-44
Tue Sep 26 13:05:17 UTC 2017 && ping-47
Tue Sep 26 13:05:17 UTC 2017 && ping-40
Tue Sep 26 13:05:17 UTC 2017 && ping-36
Tue Sep 26 13:05:17 UTC 2017 && ping-34
Tue Sep 26 13:05:18 UTC 2017 && ping-43
Tue Sep 26 13:05:18 UTC 2017 && ping-37
Tue Sep 26 13:05:18 UTC 2017 && ping-48
Tue Sep 26 13:05:19 UTC 2017 && ping-58
Tue Sep 26 13:05:20 UTC 2017 && ping-33
Tue Sep 26 13:05:20 UTC 2017 && ping-52
Tue Sep 26 13:05:20 UTC 2017 && ping-53
Tue Sep 26 13:05:21 UTC 2017 && ping-54
Tue Sep 26 13:05:21 UTC 2017 && ping-49
Tue Sep 26 13:05:22 UTC 2017 && ping-55
Tue Sep 26 13:05:22 UTC 2017 && ping-51
Tue Sep 26 13:05:23 UTC 2017 && ping-57
Tue Sep 26 13:05:23 UTC 2017 && ping-41
Tue Sep 26 13:05:23 UTC 2017 && ping-56
Tue Sep 26 13:05:24 UTC 2017 && ping-60
Tue Sep 26 13:05:24 UTC 2017 && ping-50
Tue Sep 26 13:05:25 UTC 2017 && ping-42
Tue Sep 26 13:05:25 UTC 2017 && ping-59
Tue Sep 26 13:05:25 UTC 2017 && ping-39
Tue Sep 26 13:05:25 UTC 2017 && ping-38

echo -e "\n### Show what is still not removed ###"

### Show what is still not removed ###
date
Tue 26 Sep 13:05:58 UTC 2017
docker ps -a
CONTAINER ID        IMAGE               COMMAND                   CREATED              STATUS              PORTS               NAMES
a8d348ec5efb        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-54
71d4446dcea5        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-49
4a9ced854c4d        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-43
b76ca52422cc        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-45
b90c81c8e2f9        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-35
9d137694a4e2        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-36
7afa30aee6e6        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-32
32bfa720a9fa        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-28
7192383e1b57        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-33
5d2e8073bd45        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-27
a8c8cdc05c3e        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-24
a417fbdec214        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-26
7edd8a41cce4        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-20
32159bbc4a0b        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-19
57866869b294        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-16
51600e4ea016        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-7
12200d4fcb4c        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-4
9777631b4d2b        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-3

echo -e "\n### Removing dead containers ###"

### Removing dead containers ###
date
Tue 26 Sep 13:05:58 UTC 2017
docker rm -f $(docker ps -a | awk '$2~/alpine/ {print $1}')
a8d348ec5efb
71d4446dcea5
4a9ced854c4d
b76ca52422cc
b90c81c8e2f9
9d137694a4e2
7afa30aee6e6
32bfa720a9fa
7192383e1b57
5d2e8073bd45
a8c8cdc05c3e
a417fbdec214
7edd8a41cce4
32159bbc4a0b
57866869b294
51600e4ea016
12200d4fcb4c
9777631b4d2b
Red Hat Enterprise Linux Server 7.4 with Docker 17.06.2-ee-3
# ./test_parallel.sh                                                                                                                                                                                                  

run_in_parrallel=${1:-60}
echo -e "\n### Show version info ###"

### Show version info ###
docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 17.06.2-ee-3
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.1.1.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.4 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.456GiB
Name: rh73-dockeree
ID: YOAT:ZK56:J33U:CAQP:P26H:UZ72:2TSG:A7Z7:4N4S:YVYP:N3S6:VTIN
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false


echo -e "\n### Run ${run_in_parrallel} containers in background ###"

### Run 60 containers in background ###
date 
Tue 26 Sep 12:32:34 UTC 2017
for id in $(seq 1 ${run_in_parrallel}); do 
  (docker run --rm --name ping-$id -h ping-$id alpine ash -c 'echo "$(date) && ${HOSTNAME}" '&) 
done 

echo -e "\n### Wait ${run_in_parrallel} seconds ###" 

### Wait 60 seconds ###
sleep ${run_in_parrallel}
Tue Sep 26 12:32:36 UTC 2017 && ping-1
Tue Sep 26 12:32:36 UTC 2017 && ping-2
Tue Sep 26 12:32:37 UTC 2017 && ping-3
Tue Sep 26 12:32:37 UTC 2017 && ping-6
Tue Sep 26 12:32:37 UTC 2017 && ping-4
Tue Sep 26 12:32:38 UTC 2017 && ping-5
Tue Sep 26 12:32:39 UTC 2017 && ping-7
Tue Sep 26 12:32:39 UTC 2017 && ping-8
Tue Sep 26 12:32:39 UTC 2017 && ping-9
Tue Sep 26 12:32:40 UTC 2017 && ping-12
Tue Sep 26 12:32:40 UTC 2017 && ping-11
Tue Sep 26 12:32:40 UTC 2017 && ping-10
Tue Sep 26 12:32:41 UTC 2017 && ping-13
Tue Sep 26 12:32:41 UTC 2017 && ping-16
Tue Sep 26 12:32:41 UTC 2017 && ping-17
Tue Sep 26 12:32:41 UTC 2017 && ping-15
Tue Sep 26 12:32:42 UTC 2017 && ping-19
Tue Sep 26 12:32:42 UTC 2017 && ping-18
Tue Sep 26 12:32:43 UTC 2017 && ping-20
Tue Sep 26 12:32:43 UTC 2017 && ping-21
Tue Sep 26 12:32:43 UTC 2017 && ping-23
Tue Sep 26 12:32:44 UTC 2017 && ping-22
Tue Sep 26 12:32:44 UTC 2017 && ping-14
Tue Sep 26 12:32:44 UTC 2017 && ping-24
Tue Sep 26 12:32:45 UTC 2017 && ping-25
Tue Sep 26 12:32:45 UTC 2017 && ping-27
Tue Sep 26 12:32:45 UTC 2017 && ping-29
Tue Sep 26 12:32:45 UTC 2017 && ping-28
Tue Sep 26 12:32:46 UTC 2017 && ping-33
Tue Sep 26 12:32:46 UTC 2017 && ping-31
Tue Sep 26 12:32:47 UTC 2017 && ping-26
Tue Sep 26 12:32:47 UTC 2017 && ping-30
Tue Sep 26 12:32:47 UTC 2017 && ping-34
Tue Sep 26 12:32:48 UTC 2017 && ping-37
Tue Sep 26 12:32:48 UTC 2017 && ping-36
Tue Sep 26 12:32:48 UTC 2017 && ping-38
Tue Sep 26 12:32:49 UTC 2017 && ping-39
Tue Sep 26 12:32:49 UTC 2017 && ping-32
Tue Sep 26 12:32:49 UTC 2017 && ping-41
Tue Sep 26 12:32:49 UTC 2017 && ping-42
Tue Sep 26 12:32:50 UTC 2017 && ping-35
Tue Sep 26 12:32:50 UTC 2017 && ping-40
Tue Sep 26 12:32:51 UTC 2017 && ping-51
Tue Sep 26 12:32:51 UTC 2017 && ping-58
Tue Sep 26 12:32:52 UTC 2017 && ping-55
Tue Sep 26 12:32:52 UTC 2017 && ping-59
Tue Sep 26 12:32:52 UTC 2017 && ping-43
Tue Sep 26 12:32:53 UTC 2017 && ping-48
Tue Sep 26 12:32:53 UTC 2017 && ping-47
Tue Sep 26 12:32:54 UTC 2017 && ping-52
Tue Sep 26 12:32:54 UTC 2017 && ping-45
Tue Sep 26 12:32:54 UTC 2017 && ping-53
Tue Sep 26 12:32:55 UTC 2017 && ping-56
Tue Sep 26 12:32:55 UTC 2017 && ping-44
Tue Sep 26 12:32:55 UTC 2017 && ping-54
Tue Sep 26 12:32:56 UTC 2017 && ping-46
Tue Sep 26 12:32:56 UTC 2017 && ping-57
Tue Sep 26 12:32:56 UTC 2017 && ping-49
Tue Sep 26 12:32:56 UTC 2017 && ping-60
Tue Sep 26 12:32:56 UTC 2017 && ping-50

echo -e "\n### Show what is still not removed ###"

### Show what is still not removed ###
date 
Tue 26 Sep 12:33:36 UTC 2017
docker ps -a
CONTAINER ID        IMAGE               COMMAND                   CREATED              STATUS              PORTS               NAMES
28f19d9879fd        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-47
071e1fb7c308        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-51
b16a2eddcdda        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-7
f12bf78de735        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-5
6dbd5acbad9a        alpine              "ash -c 'echo \"$(d..."   About a minute ago   Dead                                    ping-3

echo -e "\n### Removing dead containers ###"

### Removing dead containers ###
date
Tue 26 Sep 12:33:36 UTC 2017
docker rm -f $(docker ps -a | awk '$2~/alpine/ {print $1}')
28f19d9879fd
071e1fb7c308
b16a2eddcdda
f12bf78de735
6dbd5acbad9a

Docker processes hang until dead containers are removed.

# ps aux |grep docker
root     11294  2.3  1.5 678920 54628 ?        Ssl  13:04   0:07 /usr/bin/dockerd
root     11297  0.3  0.3 530824 13988 ?        Ssl  13:04   0:01 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc
root     14791  0.1  0.2 123060 10388 pts/0    Sl   13:09   0:00 docker run --rm --name ping-5 -h ping-5 alpine ash -c echo "$(date) && ${HOSTNAME}" 
root     14801  0.0  0.3 123060 12456 pts/0    Sl   13:09   0:00 docker run --rm --name ping-7 -h ping-7 alpine ash -c echo "$(date) && ${HOSTNAME}" 
root     14818  0.0  0.2 123060 10404 pts/0    Sl   13:09   0:00 docker run --rm --name ping-10 -h ping-10 alpine ash -c echo "$(date) && ${HOSTNAME}" 
root     14822  0.0  0.2 123060 10428 pts/0    Sl   13:09   0:00 docker run --rm --name ping-11 -h ping-11 alpine ash -c echo "$(date) && ${HOSTNAME}" 
root     14848  0.0  0.2 123060 10616 pts/0    Sl   13:09   0:00 docker run --rm --name ping-17 -h ping-17 alpine ash -c echo "$(date) && ${HOSTNAME}" 

An example of working combination

CentOS Linux 7 with Docker 17.03.2-ce
# ./test_parallel.sh                                                                                                                                                                                     [78/1909]

run_in_parrallel=${1:-60}
echo -e "\n### Show version info ###"

### Show version info ###
docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 17.03.2-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.2.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.456 GiB
Name: centos74-dockerce
ID: TSDC:GC3E:3UBM:K3KF:IT7I:U6YP:UHVH:ZFG7:F73R:OHLG:OHJH:KG3A
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false


echo -e "\n### Run ${run_in_parrallel} containers in background ###"

### Run 60 containers in background ###
date
Tue 26 Sep 12:52:52 UTC 2017
for id in $(seq 1 ${run_in_parrallel}); do
  (docker run --rm --name ping-$id -h ping-$id alpine ash -c 'echo "$(date) && ${HOSTNAME}" '&)
done

echo -e "\n### Wait ${run_in_parrallel} seconds ###" 
                                                                                                                                                                                                                                  
### Wait 60 seconds ###
sleep ${run_in_parrallel}
Tue Sep 26 12:53:01 UTC 2017 && ping-1
Tue Sep 26 12:53:01 UTC 2017 && ping-3
Tue Sep 26 12:53:02 UTC 2017 && ping-42
Tue Sep 26 12:53:02 UTC 2017 && ping-2
Tue Sep 26 12:53:02 UTC 2017 && ping-6
Tue Sep 26 12:53:03 UTC 2017 && ping-7
Tue Sep 26 12:53:03 UTC 2017 && ping-23
Tue Sep 26 12:53:04 UTC 2017 && ping-17
Tue Sep 26 12:53:04 UTC 2017 && ping-11
Tue Sep 26 12:53:04 UTC 2017 && ping-14
Tue Sep 26 12:53:04 UTC 2017 && ping-10
Tue Sep 26 12:53:04 UTC 2017 && ping-21
Tue Sep 26 12:53:05 UTC 2017 && ping-5
Tue Sep 26 12:53:05 UTC 2017 && ping-43
Tue Sep 26 12:53:06 UTC 2017 && ping-57
Tue Sep 26 12:53:06 UTC 2017 && ping-26
Tue Sep 26 12:53:06 UTC 2017 && ping-8
Tue Sep 26 12:53:06 UTC 2017 && ping-47
Tue Sep 26 12:53:06 UTC 2017 && ping-20
Tue Sep 26 12:53:06 UTC 2017 && ping-32
Tue Sep 26 12:53:06 UTC 2017 && ping-58
Tue Sep 26 12:53:06 UTC 2017 && ping-29
Tue Sep 26 12:53:06 UTC 2017 && ping-49
Tue Sep 26 12:53:07 UTC 2017 && ping-12
Tue Sep 26 12:53:07 UTC 2017 && ping-51
Tue Sep 26 12:53:08 UTC 2017 && ping-48
Tue Sep 26 12:53:08 UTC 2017 && ping-52
Tue Sep 26 12:53:08 UTC 2017 && ping-46
Tue Sep 26 12:53:08 UTC 2017 && ping-9
Tue Sep 26 12:53:08 UTC 2017 && ping-18
Tue Sep 26 12:53:09 UTC 2017 && ping-22
Tue Sep 26 12:53:09 UTC 2017 && ping-27
Tue Sep 26 12:53:09 UTC 2017 && ping-41
Tue Sep 26 12:53:09 UTC 2017 && ping-33
Tue Sep 26 12:53:09 UTC 2017 && ping-38
Tue Sep 26 12:53:09 UTC 2017 && ping-15
Tue Sep 26 12:53:10 UTC 2017 && ping-19
Tue Sep 26 12:53:10 UTC 2017 && ping-55
Tue Sep 26 12:53:10 UTC 2017 && ping-39
Tue Sep 26 12:53:10 UTC 2017 && ping-28
Tue Sep 26 12:53:10 UTC 2017 && ping-16
Tue Sep 26 12:53:10 UTC 2017 && ping-30
Tue Sep 26 12:53:10 UTC 2017 && ping-13
Tue Sep 26 12:53:11 UTC 2017 && ping-24
Tue Sep 26 12:53:11 UTC 2017 && ping-4
Tue Sep 26 12:53:11 UTC 2017 && ping-31
Tue Sep 26 12:53:12 UTC 2017 && ping-35
Tue Sep 26 12:53:13 UTC 2017 && ping-60
Tue Sep 26 12:53:13 UTC 2017 && ping-54
Tue Sep 26 12:53:14 UTC 2017 && ping-40
Tue Sep 26 12:53:14 UTC 2017 && ping-50
Tue Sep 26 12:53:14 UTC 2017 && ping-44
Tue Sep 26 12:53:14 UTC 2017 && ping-37
Tue Sep 26 12:53:14 UTC 2017 && ping-53
Tue Sep 26 12:53:14 UTC 2017 && ping-45
Tue Sep 26 12:53:14 UTC 2017 && ping-59
Tue Sep 26 12:53:14 UTC 2017 && ping-25
Tue Sep 26 12:53:16 UTC 2017 && ping-36
Tue Sep 26 12:53:16 UTC 2017 && ping-34
Tue Sep 26 12:53:16 UTC 2017 && ping-56

echo -e "\n### Show what is still not removed ###"

### Show what is still not removed ###
date
Tue 26 Sep 12:53:54 UTC 2017
docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

echo -e "\n### Removing dead containers ###"

### Removing dead containers ###
date
Tue 26 Sep 12:53:54 UTC 2017
docker rm -f $(docker ps -a | awk '$2~/alpine/ {print $1}')
"docker rm" requires at least 1 argument(s).
See 'docker rm --help'.

Usage:  docker rm [OPTIONS] CONTAINER [CONTAINER...]

Remove one or more containers
@vieux
Copy link

vieux commented Sep 26, 2017

@dnephin could you please take a look? Might be cli related

@dnephin
Copy link

dnephin commented Sep 26, 2017

Might be related to moby/moby#32237 I'll see if I can reproduce the issue.

@zemanlx
Copy link
Author

zemanlx commented Sep 27, 2017

Looks like that this works for yesterday release 17.09.0-ce. Anyway, it is still worth to find out what causes it as there is no solution for ee version.

@dnephin
Copy link

dnephin commented Sep 28, 2017

This seems related to moby/moby#34999

@kolyshkin
Copy link

kolyshkin commented Sep 28, 2017

@zemanlx

  1. Can you please check that you do have any error removing messages in docker daemon logs (as shown by journalctl -u docker) after you have some hung cli processes? If yes, this is definitely ContainerWait on remove: don't stuck on rm fail moby/moby#34999

  2. Can you please check that after executing sudo sysctl fs.may_detach_mounts=1 (you need RHEL/CentOS 7.4 kernel for that to work) the issue is gone?

@zemanlx
Copy link
Author

zemanlx commented Sep 29, 2017

@kolyshkin
regarding 1.

Sep 29 08:17:49 centos74-dockerce dockerd[1194]: time="2017-09-29T08:17:49.235488909Z" level=error msg="Error removing mounted layer 309490d1fd7c6472ab5dcb8ad546fa2f8f4428917164e212fb606e85d764f14a: remove /var/lib/docker/overlay/d75542fc44aa352f8d89dca4c14cf32a332b2717e859d01386ab68a7a32be1b2/merge
d: device or resource busy"
Sep 29 08:17:49 centos74-dockerce dockerd[1194]: time="2017-09-29T08:17:49.235560677Z" level=error msg="error removing container" container=309490d1fd7c6472ab5dcb8ad546fa2f8f4428917164e212fb606e85d764f14a error="driver \"overlay\" failed to remove root filesystem for 309490d1fd7c6472ab5dcb8ad546fa2f
8f4428917164e212fb606e85d764f14a: remove /var/lib/docker/overlay/d75542fc44aa352f8d89dca4c14cf32a332b2717e859d01386ab68a7a32be1b2/merged: device or resource busy"

regarding 2. That works.

@kolyshkin
Copy link

Great, so to fix this we need moby/moby#34999 and moby/moby#34886. First one will fix the issue on older RHEL7 kernels, second one will eliminate the cause for those running RHEL7.4+ kernel.

@mhingston
Copy link

I believe I'm getting the same/similar issue on Ubuntu 16.04 (4.4.0-112-generic #135-Ubuntu SMP) using Docker version 17.12.0-ce, build c97c6d6. I'm using cron on the host to run multiple short lived containers in parallel with the --rm flag. After a number of hours they hang, when I try and connect to them with docker exec I get:

connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused": unknown

journalctl -u docker

-- Logs begin at Wed 2018-01-24 23:33:27 GMT, end at Wed 2018-01-24 23:52:27 GMT. --
Jan 24 23:33:44 ubuntu dockerd[1239]: time="2018-01-24T23:33:44.023028332Z" level=error msg="stream copy error: reading from a closed fifo"
Jan 24 23:33:44 ubuntu dockerd[1239]: time="2018-01-24T23:33:44.024091853Z" level=error msg="Error running exec 55b5153888c2631d907c735d659b0ea71b0faa7f84b2fb38a3a143444c7727c6 in container: 
Jan 24 23:33:46 ubuntu dockerd[1239]: time="2018-01-24T23:33:46.127394977Z" level=error msg="stream copy error: reading from a closed fifo"
Jan 24 23:33:46 ubuntu dockerd[1239]: time="2018-01-24T23:33:46.128355348Z" level=error msg="Error running exec 58a0d85452dada5f32f95e4aa65ddbcf350e5eef1a52e5eac64f5e1cefc6c1b3 in container: 
Jan 24 23:35:01 ubuntu dockerd[1239]: time="2018-01-24T23:35:01.871958346Z" level=error msg="900f2ed5df349e02fc0009b44ed2af7e2952b548854107221a3db97cbf8381cc cleanup: failed to delete contain
Jan 24 23:35:01 ubuntu dockerd[1239]: time="2018-01-24T23:35:01.883650809Z" level=error msg="Handler for POST /v1.35/containers/900f2ed5df349e02fc0009b44ed2af7e2952b548854107221a3db97cbf8381c
Jan 24 23:35:02 ubuntu dockerd[1239]: time="2018-01-24T23:35:02.002007849Z" level=error msg="25d5d217785fb75edd42e8c9acd6fe6acd8737eed30f86fcbac424702838e025 cleanup: failed to delete contain
Jan 24 23:35:02 ubuntu dockerd[1239]: time="2018-01-24T23:35:02.044491181Z" level=error msg="Handler for POST /v1.35/containers/25d5d217785fb75edd42e8c9acd6fe6acd8737eed30f86fcbac424702838e02
Jan 24 23:40:01 ubuntu dockerd[1239]: time="2018-01-24T23:40:01.730889040Z" level=error msg="335abc34e59e8c399c6b519586f694bea33371c9b9055532bae2c0d48715a1ac cleanup: failed to delete contain
Jan 24 23:40:01 ubuntu dockerd[1239]: time="2018-01-24T23:40:01.794458271Z" level=error msg="Handler for POST /v1.35/containers/335abc34e59e8c399c6b519586f694bea33371c9b9055532bae2c0d48715a1a
Jan 24 23:40:02 ubuntu dockerd[1239]: time="2018-01-24T23:40:02.009045088Z" level=error msg="6ce8cfa93c5a0bcc52f31b91c3810afa18a213be6d4003fde4455a6cdbe68655 cleanup: failed to delete contain
Jan 24 23:40:02 ubuntu dockerd[1239]: time="2018-01-24T23:40:02.020335077Z" level=error msg="Handler for POST /v1.35/containers/6ce8cfa93c5a0bcc52f31b91c3810afa18a213be6d4003fde4455a6cdbe6865
Jan 24 23:45:01 ubuntu dockerd[1239]: time="2018-01-24T23:45:01.610989953Z" level=error msg="7684ad08d34f522567588f4c1d38fe8ee8d6932f24e1539ab256ebf51015581e cleanup: failed to delete contain
Jan 24 23:45:01 ubuntu dockerd[1239]: time="2018-01-24T23:45:01.622443186Z" level=error msg="Handler for POST /v1.35/containers/7684ad08d34f522567588f4c1d38fe8ee8d6932f24e1539ab256ebf51015581
Jan 24 23:45:01 ubuntu dockerd[1239]: time="2018-01-24T23:45:01.871168666Z" level=error msg="d12ef24ebd8b3653d87faec8c74b5f77611d633e7bfddb3c89ee2ca610bb97a0 cleanup: failed to delete contain
Jan 24 23:45:01 ubuntu dockerd[1239]: time="2018-01-24T23:45:01.880626078Z" level=error msg="Handler for POST /v1.35/containers/d12ef24ebd8b3653d87faec8c74b5f77611d633e7bfddb3c89ee2ca610bb97a
Jan 24 23:50:02 ubuntu dockerd[1239]: time="2018-01-24T23:50:02.583294810Z" level=error msg="c1559b8578771139556601cd8d3cf7be8fb3c30e84f5af07bcf6a77519b45ad5 cleanup: failed to delete contain
Jan 24 23:50:02 ubuntu dockerd[1239]: time="2018-01-24T23:50:02.592404755Z" level=error msg="Handler for POST /v1.35/containers/c1559b8578771139556601cd8d3cf7be8fb3c30e84f5af07bcf6a77519b45ad
Jan 24 23:50:02 ubuntu dockerd[1239]: time="2018-01-24T23:50:02.702518261Z" level=error msg="b477eb5e5ccec23d352c9e95547ab1b47e5bb662ac11a85507ee862fe2019cd2 cleanup: failed to delete contain
Jan 24 23:50:02 ubuntu dockerd[1239]: time="2018-01-24T23:50:02.714101453Z" level=error msg="Handler for POST /v1.35/containers/b477eb5e5ccec23d352c9e95547ab1b47e5bb662ac11a85507ee862fe2019cd

@cpuguy83
Copy link
Collaborator

@mhingston This is a different issue, which we are also tracking.... I believe docker-archive/docker-ce#395 would fix it.

Note it does not seem like an issue with 17.12 specifically, but likely more due to recent kernel changes that exposed a race condition in the runc codebase.

@MrGarry2016
Copy link

Same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants