Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create container stuck by run init #5280

Closed
wu0407 opened this issue Mar 29, 2021 · 4 comments
Closed

create container stuck by run init #5280

wu0407 opened this issue Mar 29, 2021 · 4 comments
Labels
kind/bug kind/external Issue in external component being tracked by containerd

Comments

@wu0407
Copy link

wu0407 commented Mar 29, 2021

Description
docker cli stuck on create container on it had running many containers, it cause by runc init still running.
total running 128 containers
I debug into runc init find that hang on filter.ExportBPF
https://github.com/opencontainers/runc/blob/12644e614e25b05da6fd08a38ffa0cfe1903fdec/libcontainer/seccomp/patchbpf/enosys_linux.go#L117-L120

relate issue opencontainers/runc#2828 (comment)

[root@sh-saas-k8s1-node-dev-14 ~]# ps aux |grep "runc init" |head
root         985  0.0  0.0 158716 23116 ?        Ssl  Mar26   0:00 runc init
root        1829  0.0  0.0 168320 16708 ?        Ssl  02:37   0:00 runc init
root        2795  0.0  0.0 168320 19084 ?        Ssl  Mar26   0:00 runc init
root        3521  0.0  0.0 232448 16240 ?        Ssl  02:37   0:00 runc init
root        5115  0.0  0.0 168320 18208 ?        Ssl  02:37   0:00 runc init
root        5254  0.0  0.0 158716 19120 ?        Ssl  Mar26   0:00 runc init
root        6823  0.0  0.0 160124 18896 ?        Ssl  Mar26   0:00 runc init
root        7184  0.0  0.0 158716 16400 ?        Ssl  02:38   0:00 runc init
root        8608  0.0  0.0 158716 19016 ?        Ssl  Mar26   0:00 runc init
root        9352  0.0  0.0 160124 21072 ?        Ssl  02:38   0:00 runc init
# cat /proc/6823/stat
6823 (runc:[2:INIT]) S 6806 6823 6823 0 -1 4194624 424 0 0 0 2 4 0 0 20 0 5 0 678265920 163966976 4724 18446744073709551615 93973857521664 93973866689428 140731346266928 0 0 0 0 0 2143420159 0 0 0 17 2 0 0 0 0 0 93973868788800 93973874988408 93973895401472 140731346267829 140731346267839 140731346267839 140731346268136 0
# cat /proc/6823/status 
Name:   runc:[2:INIT]
State:  S (sleeping)
Tgid:   6823
Ngid:   0
Pid:    6823
PPid:   6806
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 64
Groups:
NStgid: 6823    1
NSpid:  6823    1
NSpgid: 6823    1
NSsid:  6823    1
VmPeak:   160124 kB
VmSize:   160124 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:     18960 kB
VmRSS:     18896 kB
VmData:   136504 kB
VmStk:       132 kB
VmExe:      8956 kB
VmLib:      2208 kB
VmPTE:       108 kB
VmPMD:        20 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
Threads:        5
SigQ:   1/515167
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: ffffffffffc1feff
  pInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
Seccomp:        0
Speculation_Store_Bypass:       vulnerable
Cpus_allowed:   ffffffff
Cpus_allowed_list:      0-31
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        45
nonvoluntary_ctxt_switches:     20
[root@sh-saas-k8s1-node-dev-14 ~]# lsof -p 6823
COMMAND    PID USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME
runc:[2:I 6823 root  cwd       DIR             0,4741     4096 2378446586 /
runc:[2:I 6823 root  rtd       DIR             0,4741     4096 2378446586 /
runc:[2:I 6823 root  txt       REG              253,1 19719152     278544 /
runc:[2:I 6823 root  mem       REG              253,1  2156160     265623 /usr/lib64/libc-2.17.so
runc:[2:I 6823 root  mem       REG              253,1   266680     273930 /usr/lib64/libseccomp.so.2.3.1
runc:[2:I 6823 root  mem       REG              253,1   142232     265649 /usr/lib64/libpthread-2.17.so
runc:[2:I 6823 root  mem       REG              253,1   163400     265614 /usr/lib64/ld-2.17.so
runc:[2:I 6823 root    0u      CHR                1,3      0t0 2378445574 /dev/null
runc:[2:I 6823 root    1w     FIFO               0,10      0t0 2378435473 pipe
runc:[2:I 6823 root    2w     FIFO               0,10      0t0 2378435474 pipe
runc:[2:I 6823 root    3u     unix 0xffff8810c5ff0c00      0t0 2378426053 socket
runc:[2:I 6823 root    4w     FIFO               0,10      0t0 2378426055 pipe
runc:[2:I 6823 root    5u     FIFO               0,20      0t0 2378426052 /run/docker/runtime-runc/moby/2c3ca7e8e1848756da0d2e6d6721146e6a09b5a86e31b45a047cf63cab1c186b/exec.fifo
runc:[2:I 6823 root    6r     FIFO               0,10      0t0 2378442670 pipe
runc:[2:I 6823 root    7u  a_inode               0,11        0       7487 [eventpoll]
runc:[2:I 6823 root    8w     FIFO               0,10      0t0 2378442670 pipe

Steps to reproduce the issue:

  1. run follow script, it will be stuck
for ((i=0;i<=100;i++));do docker run -d --rm nginx;done

Describe the results you received:
shell stuck

Describe the results you expected:
all container create success

What version of containerd are you using:

$ containerd --version
containerd containerd.io 1.4.4 05f951a3781f4f2c1911b05e61c160e9c30eaa8e

Any other relevant information (runC version, CRI configuration, OS/Kernel version, etc.):

docker version

docker version
Client: Docker Engine - Community
Version: 20.10.5
API version: 1.40
Go version: go1.13.15
Git commit: 55c4c88
Built: Tue Mar 2 20:33:55 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 19.03.15
API version: 1.40 (minimum version 1.12)
Go version: go1.13.15
Git commit: 99e3ed8919
Built: Sat Jan 30 03:16:33 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.4
GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
runc:
Version: 1.0.0-rc93
GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
docker-init:
Version: 0.18.0
GitCommit: fec3683

uname -a
$ uname -a
Linux sh-saas-k8stest-node-dev-01 4.4.234-1.el7.elrepo.x86_64 #1 SMP Mon Aug 24 18:12:08 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux

stack dump:
containerd.705747.stacks.log

@cseufert
Copy link

cseufert commented Apr 1, 2021

Have you tried running rc92 of runc? I have considered it but not that happy to do it on a production box.

You can get static binaries for runc here: https://github.com/opencontainers/runc/releases/tag/v1.0.0-rc92

@wu0407
Copy link
Author

wu0407 commented Apr 1, 2021

Have you tried running rc92 of runc? I have considered it but not that happy to do it on a production box.

You can get static binaries for runc here: https://github.com/opencontainers/runc/releases/tag/v1.0.0-rc92

not reproduce with runc rc92

@cseufert
Copy link

cseufert commented Apr 2, 2021

Good to know, thanks

@cpuguy83
Copy link
Member

cpuguy83 commented Apr 5, 2021

This seems likely due to opencontainers/runc#2865

I don't think we have any actionable items here.
If people are hitting this you'll need to downgrade runc to rc92 or wait for rc94 (or use a build from HEAD which is fixed).

@cpuguy83 cpuguy83 closed this as completed Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug kind/external Issue in external component being tracked by containerd
Projects
None yet
Development

No branches or pull requests

4 participants