Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent failures when trying to start a container #856

Closed
2 of 3 tasks
willnx opened this issue Nov 18, 2019 · 1 comment
Closed
2 of 3 tasks

Intermittent failures when trying to start a container #856

willnx opened this issue Nov 18, 2019 · 1 comment

Comments

@willnx
Copy link

willnx commented Nov 18, 2019

  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

Expected behavior

Starting a container works consistently.

Actual behavior

When trying to start a container, it seems to fail randomly. This is what I see dockerd log when a failure occurs:

Nov 18 14:31:20 <REDACTED> dockerd[28406]: time="2019-11-18T14:31:20.676030461-05:00" level=error msg="Handler for POST /v1.35/containers/17a7bd03184580db71dc8c655ba64cdff41eac960d41aea0fb4f96f64a852e9b/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:319: getting the final child's pid from pipe caused \\\"EOF\\\"\": unknown"

If I wait a couple of minutes, I'm able to re-create the container and start it successfully.

Steps to reproduce the behavior

¯\_(ツ)_/¯ The failure is intermittent.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea
 Built:             Wed Nov 13 07:25:41 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea
  Built:            Wed Nov 13 07:24:18 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 53
  Running: 53
  Paused: 0
  Stopped: 0
 Images: 6
 Server Version: 19.03.5
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-957.5.1.el7.x86_64
 Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 47.01GiB
 Name: evsappprd001
 ID: QFDK:HDCA:HMKQ:OXJV:3PGG:SFBI:ZDE6:RIFH:PYIL:C4WV:THE7:VGEQ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.)

@willnx
Copy link
Author

willnx commented Nov 18, 2019

I think i see what the problem is. I'm using the --memory flag on a 3.10 kernel 😓.

When I look at /var/log/messages (because the runc code looks like it's just trying to open a file) I see stuff like this:

Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffb9d61e41>] dump_stack+0x19/0x1b
Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffb97dc1eb>] kmem_cache_create_memcg+0x17b/0x280
Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffb97dc31b>] kmem_cache_create+0x2b/0x30
Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffc07609a1>] nf_conntrack_init_net+0xf1/0x260 [nf_conntrack]
Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffc07612a4>] nf_conntrack_pernet_init+0x14/0x150 [nf_conntrack]
Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffb9c2de14>] ops_init+0x44/0x150
Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffb9c2dfc3>] setup_net+0xa3/0x160
Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffb9c2e765>] copy_net_ns+0xb5/0x180
Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffb96c6e79>] create_new_namespaces+0xf9/0x180
Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffb96c70ba>] unshare_nsproxy_namespaces+0x5a/0xc0
Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffb9696bc3>] SyS_unshare+0x173/0x2e0
Nov 18 14:31:20 evsappprd001 kernel: [<ffffffffb9d74ddb>] system_call_fastpath+0x22/0x27
Nov 18 14:31:20 evsappprd001 kernel: Unable to create nf_conn slab cache

And some google-foo lead me to this: moby/moby#37722

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant