Docker completely stuck - unregister_netdevice: waiting for lo to become free. Usage count = 1 #254

Closed
rkononov opened this Issue Jan 28, 2015 · 69 comments

Comments

@rkononov

After few hours of running phantomjs inside docker containers(we constantly rotating them) we started getting " unregister_netdevice: waiting for lo to become free. Usage count = 1" and systemd show that docker.service unit became unresponsive, restarting docker.service doesn't help.

callstack:

Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff814ed1a9>] ? system_call_fastpath+0x16/0x1b
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff814ed489>] stub_clone+0x69/0x90
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff81065346>] SyS_clone+0x16/0x20
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff811ca81b>] ? __fd_install+0x5b/0x70
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff811b69d7>] ? __do_pipe_flags+0x77/0xb0
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff811ca7b0>] ? get_unused_fd_flags+0x30/0x40
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff8106501c>] do_fork+0xbc/0x360
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff811ca729>] ? __alloc_fd+0xd9/0x130
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff81063cd5>] copy_process.part.30+0x8a5/0x1a50
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff81083b12>] copy_namespaces+0x82/0xb0
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff810839e1>] create_new_namespaces+0x101/0x1b0
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff813e60cc>] copy_net_ns+0x7c/0x140
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff814eabe3>] mutex_lock+0x23/0x37
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff813e5901>] ? net_alloc_generic+0x21/0x30
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff814eab35>] __mutex_lock_slowpath+0x155/0x1e0
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff814e8c6e>] schedule_preempt_disabled+0xe/0x10
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff814e8939>] schedule+0x29/0x70
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  [<ffffffff81288849>] ? ida_pre_get+0x69/0x100
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel: Call Trace:
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  ffffffff81288849 ffffffff81ab1da0 ffffffff81ab1da4 ffff880076a16400
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  ffff88025ba73fd8 0000000000012dc0 ffffffff81a1b480 ffff880076a16400
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:  ffff88025ba73ca0 0000000000000082 ffff880076a16400 0000000000012dc0
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel: docker          D ffff8803e0412dc0     0 22299      1 0x00000000
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel:       Not tainted 3.17.8+ #2
Jan 27 20:22:32 ip-10-123-146-10.ec2.internal kernel: INFO: task docker:22299 blocked for more than 120 seconds.
Jan 27 20:22:26 ip-10-123-146-10.ec2.internal kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Jan 27 20:22:15 ip-10-123-146-10.ec2.internal kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Jan 27 20:22:05 ip-10-123-146-10.ec2.internal kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Jan 27 20:21:55 ip-10-123-146-10.ec2.internal kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1

Coreos version

NAME=CoreOS
ID=coreos
VERSION=522.5.0
VERSION_ID=522.5.0
BUILD_ID=
PRETTY_NAME="CoreOS 522.5.0"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

uname -a:

Linux ip-10-123-146-10.ec2.internal 3.17.8+ #2 SMP Fri Jan 9 23:58:08 UTC 2015 x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz GenuineIntel GNU/Linux

docker -H 0.0.0.0:2375 version:

Client version: 1.3.3
Client API version: 1.15
Go version (client): go1.3.2
Git commit (client): 54d900a
OS/Arch (client): linux/amd64

also i've noticed that docker process became zombie:
699 ? Zsl 36:30 [docker] <defunct>

not sure that there is easy way to reproduce :(

@popsikle

This comment has been minimized.

Show comment
Hide comment
@popsikle

popsikle Apr 25, 2015

We are also seeing this. We wrapped phantomjs with a timeout script ( http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3 ) which seemed to help but can reproduce this easily on deploys when lots of containers are running (300+). It looks like the container does not shutdown because phantomjs is locked and docker gets stuck trying to unregister the net device. Even after the container processes are long gone something in docker hangs on to the device.

I saw a bug upstream somewhere but I cant seem to find it now, and I have seen this behavior on ubuntu and debian as well, so I am assuming that this is in dockers hands but someone may be able to follow up on that better than me.

We are also seeing this. We wrapped phantomjs with a timeout script ( http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3 ) which seemed to help but can reproduce this easily on deploys when lots of containers are running (300+). It looks like the container does not shutdown because phantomjs is locked and docker gets stuck trying to unregister the net device. Even after the container processes are long gone something in docker hangs on to the device.

I saw a bug upstream somewhere but I cant seem to find it now, and I have seen this behavior on ubuntu and debian as well, so I am assuming that this is in dockers hands but someone may be able to follow up on that better than me.

@popsikle

This comment has been minimized.

Show comment
Hide comment
@popsikle

popsikle May 11, 2015

Upstream ticket : docker/docker#5618

Issues is widespread across docker installs on multiple OS's

Upstream ticket : docker/docker#5618

Issues is widespread across docker installs on multiple OS's

@popsikle

This comment has been minimized.

Show comment
Hide comment
@popsikle

popsikle May 11, 2015

This issue requires a reboot of a node, which messes everything up.

This issue requires a reboot of a node, which messes everything up.

@popsikle

This comment has been minimized.

Show comment
Hide comment
@popsikle

popsikle May 26, 2015

@rkononov are you running ubuntu within your containers? There is a long running ubuntu bug report, with activity here : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152

@rkononov are you running ubuntu within your containers? There is a long running ubuntu bug report, with activity here : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152

@rkononov

This comment has been minimized.

Show comment
Hide comment
@rkononov

rkononov May 27, 2015

@popsikle yep we're using ubuntu

@popsikle yep we're using ubuntu

@lucacri

This comment has been minimized.

Show comment
Hide comment
@lucacri

lucacri Aug 9, 2015

+1, using ubuntu. Very critical bug unfortunately

lucacri commented Aug 9, 2015

+1, using ubuntu. Very critical bug unfortunately

@crawford crawford referenced this issue Sep 24, 2015

Closed

Cloudinit, the next generation #765

7 of 12 tasks complete
@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Oct 20, 2015

Member

Is this still an issue with the later versions of CoreOS and Docker?

Member

crawford commented Oct 20, 2015

Is this still an issue with the later versions of CoreOS and Docker?

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Nov 24, 2015

Member

Closing due to inactivity.

Member

crawford commented Nov 24, 2015

Closing due to inactivity.

@crawford crawford closed this Nov 24, 2015

@binarybana

This comment has been minimized.

Show comment
Hide comment
@binarybana

binarybana Nov 30, 2015

It looks like this may be a kernel bug with a patch released upstream. Though I haven't tested it myself.

It looks like this may be a kernel bug with a patch released upstream. Though I haven't tested it myself.

@PunKeel

This comment has been minimized.

Show comment
Hide comment
@PunKeel

PunKeel Dec 6, 2015

I'm having this bug with a home-compiled Kernel 4.3 running on KVM + Debian8
May I help in testing/finding a solution? (If so, how?)

PunKeel commented Dec 6, 2015

I'm having this bug with a home-compiled Kernel 4.3 running on KVM + Debian8
May I help in testing/finding a solution? (If so, how?)

@ruslan40

This comment has been minimized.

Show comment
Hide comment
@ruslan40

ruslan40 Jan 9, 2016

Also experiencing this issue.

CentOS 7.2 running 4.3.3 (elrepo) kernel, guest is Debian running nginx (nginx:latest official stock container).

Please let me know if there is any way I can help.

ruslan40 commented Jan 9, 2016

Also experiencing this issue.

CentOS 7.2 running 4.3.3 (elrepo) kernel, guest is Debian running nginx (nginx:latest official stock container).

Please let me know if there is any way I can help.

@iameli

This comment has been minimized.

Show comment
Hide comment
@iameli

iameli Jan 19, 2016

Having this problem on CoreOS alpha 921.0.0, on a cluster brought up by your very own kube-aws tool.

iameli commented Jan 19, 2016

Having this problem on CoreOS alpha 921.0.0, on a cluster brought up by your very own kube-aws tool.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Jan 19, 2016

Member

@iameli Can you give 928.0.0 a shot? This has the 4.4 kernel with the patch mentioned by @binarybana.

Member

crawford commented Jan 19, 2016

@iameli Can you give 928.0.0 a shot? This has the 4.4 kernel with the patch mentioned by @binarybana.

@levipierce

This comment has been minimized.

Show comment
Hide comment
@levipierce

levipierce Jan 19, 2016

kudos!!! I have been seeing this all the time!

kudos!!! I have been seeing this all the time!

@iameli

This comment has been minimized.

Show comment
Hide comment
@iameli

iameli Jan 20, 2016

@crawford Sure. The cluster is up and running, I'll check back in here if it breaks.

iameli commented Jan 20, 2016

@crawford Sure. The cluster is up and running, I'll check back in here if it breaks.

@meonkeys

This comment has been minimized.

Show comment
Hide comment
@meonkeys

meonkeys Jan 29, 2016

@crawford I took over for @iameli. FYI, 928.0.0 still exhibits the problem. Happened to one node after about 6 days of uptime. The other node in the cluster is fine, strangely.

I did notice influxdb and heapster running out of their allocated cgroup memory and getting OOM-killed, so I scaled both of those down to zero (since we don't need them as far as I know).

The kernel message is a bit different this time... I don't see a stack trace, but docker does go zombie.

Here's the tail end of dmesg:

[  102.608365] eth0: renamed from veth1ed5641
[  102.642752] docker0: port 3(veth249bda1) entered forwarding state
[  102.646182] docker0: port 3(veth249bda1) entered forwarding state
[  103.078115] IPv6: eth0: IPv6 duplicate address fe80::42:aff:fe02:3904 detected!
[  103.160124] IPv6: eth0: IPv6 duplicate address fe80::42:aff:fe02:3903 detected!
[  116.767074] docker0: port 1(veth836fe1f) entered forwarding state
[  117.567062] docker0: port 2(veth3d78f53) entered forwarding state
[  117.663072] docker0: port 3(veth249bda1) entered forwarding state
[543431.489079] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543441.543100] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543451.593092] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543461.648116] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543471.693077] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543481.738097] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543491.793081] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543501.839081] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543511.884079] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543521.935076] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543531.980110] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543542.025107] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543552.070089] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543562.115107] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543572.160077] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543582.218090] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543592.263117] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543602.309105] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543612.354076] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1

@crawford I took over for @iameli. FYI, 928.0.0 still exhibits the problem. Happened to one node after about 6 days of uptime. The other node in the cluster is fine, strangely.

I did notice influxdb and heapster running out of their allocated cgroup memory and getting OOM-killed, so I scaled both of those down to zero (since we don't need them as far as I know).

The kernel message is a bit different this time... I don't see a stack trace, but docker does go zombie.

Here's the tail end of dmesg:

[  102.608365] eth0: renamed from veth1ed5641
[  102.642752] docker0: port 3(veth249bda1) entered forwarding state
[  102.646182] docker0: port 3(veth249bda1) entered forwarding state
[  103.078115] IPv6: eth0: IPv6 duplicate address fe80::42:aff:fe02:3904 detected!
[  103.160124] IPv6: eth0: IPv6 duplicate address fe80::42:aff:fe02:3903 detected!
[  116.767074] docker0: port 1(veth836fe1f) entered forwarding state
[  117.567062] docker0: port 2(veth3d78f53) entered forwarding state
[  117.663072] docker0: port 3(veth249bda1) entered forwarding state
[543431.489079] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543441.543100] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543451.593092] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543461.648116] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543471.693077] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543481.738097] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543491.793081] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543501.839081] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543511.884079] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543521.935076] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543531.980110] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543542.025107] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543552.070089] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543562.115107] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543572.160077] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543582.218090] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543592.263117] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543602.309105] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
[543612.354076] unregister_netdevice: waiting for vethe142506 to become free. Usage count = 1
@taojin0505

This comment has been minimized.

Show comment
Hide comment
@taojin0505

taojin0505 Mar 9, 2016

have this issue on debian running a centos7 image.

hadoop@jin:$ uname -a
Linux jin 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.5-1
bpo8+1 (2016-02-23) x86_64 GNU/Linux
hadoop@jin:~$ docker version
Client:
Version: 1.10.2
API version: 1.22
Go version: go1.5.3
Git commit: c3959b1
Built: Mon Feb 22 21:22:53 2016
OS/Arch: linux/amd64

Server:
Version: 1.10.2
API version: 1.22
Go version: go1.5.3
Git commit: c3959b1
Built: Mon Feb 22 21:22:53 2016
OS/Arch: linux/amd64
hadoop@jin:$ sudo tail -n 100 /var/log/syslog
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): carrier is OFF
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): new Veth device (driver: 'unknown' ifindex: 27)
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): exported as /org/freedesktop/NetworkManager/Devices/34
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41]
Mar 9 13:45:16 jin NetworkManager[624]: (br-c906938cfc04): bridge port vethe8359a3 was attached
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): enslaved to br-c906938cfc04
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: unavailable -> disconnected (reason 'connection-assumed') [20 30 41]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) starting connection 'vethe8359a3'
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 1 of 5 (Device Prepare) scheduled...
Mar 9 13:45:16 jin NetworkManager[624]: devices added (path: /sys/devices/virtual/net/veth625c53c, iface: veth625c53c)
Mar 9 13:45:16 jin NetworkManager[624]: device added (path: /sys/devices/virtual/net/veth625c53c, iface: veth625c53c): no ifupdown configuration found.
Mar 9 13:45:16 jin NetworkManager[624]: devices added (path: /sys/devices/virtual/net/vethe8359a3, iface: vethe8359a3)
Mar 9 13:45:16 jin NetworkManager[624]: device added (path: /sys/devices/virtual/net/vethe8359a3, iface: vethe8359a3): no ifupdown configuration found.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 1 of 5 (Device Prepare) started...
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: disconnected -> prepare (reason 'none') [30 40 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 2 of 5 (Device Configure) scheduled...
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 1 of 5 (Device Prepare) complete.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 2 of 5 (Device Configure) starting...
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: prepare -> config (reason 'none') [40 50 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 2 of 5 (Device Configure) successful.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 3 of 5 (IP Configure Start) scheduled.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 2 of 5 (Device Configure) complete.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 3 of 5 (IP Configure Start) started...
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: config -> ip-config (reason 'none') [50 70 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 3 of 5 (IP Configure Start) complete.
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: ip-config -> secondaries (reason 'none') [70 90 0]
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: secondaries -> activated (reason 'none') [90 100 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) successful, device activated.
Mar 9 13:45:16 jin nm-dispatcher: Dispatching action 'up' for vethe8359a3
Mar 9 13:45:16 jin avahi-daemon[649]: Withdrawing workstation service for veth625c53c.
Mar 9 13:45:16 jin kernel: [14057.711885] eth0: renamed from veth625c53c
Mar 9 13:45:16 jin NetworkManager[624]: devices removed (path: /sys/devices/virtual/net/veth625c53c, iface: veth625c53c)
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): link connected
Mar 9 13:45:16 jin kernel: [14057.731723] IPv6: ADDRCONF(NETDEV_CHANGE): vethe8359a3: link becomes ready
Mar 9 13:45:16 jin docker[926]: time="2016-03-09T13:45:16.831954236+08:00" level=warning msg="Your kernel does not support OOM notifications: There is no path for "memory" in state"
Mar 9 13:45:16 jin docker[926]: time="2016-03-09T13:45:16.831991522+08:00" level=warning msg="Your kernel does not support OOM notifications: There is no path for "memory" in state"
Mar 9 13:45:16 jin kernel: [14057.874783] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)
Mar 9 13:45:16 jin NetworkManager[624]: (vethabe6ac9): device is virtual, marking as unmanaged
Mar 9 13:45:16 jin NetworkManager[624]: (vethabe6ac9): carrier is OFF
Mar 9 13:45:16 jin NetworkManager[624]: (vethabe6ac9): new Veth device (driver: 'unknown' ifindex: 28)
Mar 9 13:45:16 jin NetworkManager[624]: (vethabe6ac9): exported as /org/freedesktop/NetworkManager/Devices/35
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device is virtual, marking as unmanaged
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): carrier is OFF
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): new Veth device (driver: 'unknown' ifindex: 29)
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): exported as /org/freedesktop/NetworkManager/Devices/36
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41]
Mar 9 13:45:16 jin NetworkManager[624]: (br-c906938cfc04): bridge port veth36f649d was attached
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): enslaved to br-c906938cfc04
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: unavailable -> disconnected (reason 'connection-assumed') [20 30 41]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) starting connection 'veth36f649d'
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 1 of 5 (Device Prepare) scheduled...
Mar 9 13:45:16 jin NetworkManager[624]: devices added (path: /sys/devices/virtual/net/veth36f649d, iface: veth36f649d)
Mar 9 13:45:16 jin NetworkManager[624]: device added (path: /sys/devices/virtual/net/veth36f649d, iface: veth36f649d): no ifupdown configuration found.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 1 of 5 (Device Prepare) started...
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: disconnected -> prepare (reason 'none') [30 40 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 2 of 5 (Device Configure) scheduled...
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 1 of 5 (Device Prepare) complete.
Mar 9 13:45:16 jin kernel: [14057.879586] device veth36f649d entered promiscuous mode
Mar 9 13:45:16 jin kernel: [14057.879785] IPv6: ADDRCONF(NETDEV_UP): veth36f649d: link is not ready
Mar 9 13:45:16 jin kernel: [14057.879789] br-c906938cfc04: port 4(veth36f649d) entered forwarding state
Mar 9 13:45:16 jin kernel: [14057.879810] br-c906938cfc04: port 4(veth36f649d) entered forwarding state
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 2 of 5 (Device Configure) starting...
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: prepare -> config (reason 'none') [40 50 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 2 of 5 (Device Configure) successful.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 3 of 5 (IP Configure Start) scheduled.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 2 of 5 (Device Configure) complete.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 3 of 5 (IP Configure Start) started...
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: config -> ip-config (reason 'none') [50 70 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 3 of 5 (IP Configure Start) complete.
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: ip-config -> secondaries (reason 'none') [70 90 0]
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: secondaries -> activated (reason 'none') [90 100 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) successful, device activated.
Mar 9 13:45:16 jin NetworkManager[624]: devices added (path: /sys/devices/virtual/net/vethabe6ac9, iface: vethabe6ac9)
Mar 9 13:45:16 jin NetworkManager[624]: device added (path: /sys/devices/virtual/net/vethabe6ac9, iface: vethabe6ac9): no ifupdown configuration found.
Mar 9 13:45:16 jin nm-dispatcher: Dispatching action 'up' for veth36f649d
Mar 9 13:45:17 jin avahi-daemon[649]: Withdrawing workstation service for vethabe6ac9.
Mar 9 13:45:17 jin kernel: [14057.964090] eth0: renamed from vethabe6ac9
Mar 9 13:45:17 jin NetworkManager[624]: devices removed (path: /sys/devices/virtual/net/vethabe6ac9, iface: vethabe6ac9)
Mar 9 13:45:17 jin kernel: [14057.982965] IPv6: ADDRCONF(NETDEV_CHANGE): veth36f649d: link becomes ready
Mar 9 13:45:17 jin NetworkManager[624]: (veth36f649d): link connected
Mar 9 13:45:17 jin docker[926]: time="2016-03-09T13:45:17.102463190+08:00" level=warning msg="Your kernel does not support OOM notifications: There is no path for "memory" in state"
Mar 9 13:45:17 jin docker[926]: time="2016-03-09T13:45:17.102500358+08:00" level=warning msg="Your kernel does not support OOM notifications: There is no path for "memory" in state"
Mar 9 13:45:17 jin avahi-daemon[649]: Joining mDNS multicast group on interface veth72eaeb0.IPv6 with address fe80::6857:4eff:fe69:5e76.
Mar 9 13:45:17 jin avahi-daemon[649]: New relevant interface veth72eaeb0.IPv6 for mDNS.
Mar 9 13:45:17 jin avahi-daemon[649]: Registering new address record for fe80::6857:4eff:fe69:5e76 on veth72eaeb0..
Mar 9 13:45:17 jin avahi-daemon[649]: Joining mDNS multicast group on interface vethe8359a3.IPv6 with address fe80::3cba:13ff:fe85:7123.
Mar 9 13:45:17 jin avahi-daemon[649]: New relevant interface vethe8359a3.IPv6 for mDNS.
Mar 9 13:45:17 jin avahi-daemon[649]: Registering new address record for fe80::3cba:13ff:fe85:7123 on vethe8359a3.
.
Mar 9 13:45:17 jin avahi-daemon[649]: Joining mDNS multicast group on interface veth1f1871e.IPv6 with address fe80::9415:65ff:feaa:6ef2.
Mar 9 13:45:17 jin avahi-daemon[649]: New relevant interface veth1f1871e.IPv6 for mDNS.
Mar 9 13:45:17 jin avahi-daemon[649]: Registering new address record for fe80::9415:65ff:feaa:6ef2 on veth1f1871e..
Mar 9 13:45:18 jin avahi-daemon[649]: Joining mDNS multicast group on interface veth36f649d.IPv6 with address fe80::f4:3fff:fe56:ea90.
Mar 9 13:45:18 jin avahi-daemon[649]: New relevant interface veth36f649d.IPv6 for mDNS.
Mar 9 13:45:18 jin avahi-daemon[649]: Registering new address record for fe80::f4:3fff:fe56:ea90 on veth36f649d.
.
Mar 9 13:45:31 jin kernel: [14072.313085] br-c906938cfc04: port 1(veth72eaeb0) entered forwarding state
Mar 9 13:45:31 jin kernel: [14072.441032] br-c906938cfc04: port 2(veth1f1871e) entered forwarding state
Mar 9 13:45:31 jin kernel: [14072.632947] br-c906938cfc04: port 3(vethe8359a3) entered forwarding state
Mar 9 13:45:31 jin kernel: [14072.888995] br-c906938cfc04: port 4(veth36f649d) entered forwarding state
Mar 9 13:47:05 jin kernel: [14166.618311] unregister_netdevice: waiting for lo to become free. Usage count = 1
Mar 9 13:55:01 jin CRON[11625]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
hadoop@jin:
$

have this issue on debian running a centos7 image.

hadoop@jin:$ uname -a
Linux jin 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.5-1
bpo8+1 (2016-02-23) x86_64 GNU/Linux
hadoop@jin:~$ docker version
Client:
Version: 1.10.2
API version: 1.22
Go version: go1.5.3
Git commit: c3959b1
Built: Mon Feb 22 21:22:53 2016
OS/Arch: linux/amd64

Server:
Version: 1.10.2
API version: 1.22
Go version: go1.5.3
Git commit: c3959b1
Built: Mon Feb 22 21:22:53 2016
OS/Arch: linux/amd64
hadoop@jin:$ sudo tail -n 100 /var/log/syslog
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): carrier is OFF
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): new Veth device (driver: 'unknown' ifindex: 27)
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): exported as /org/freedesktop/NetworkManager/Devices/34
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41]
Mar 9 13:45:16 jin NetworkManager[624]: (br-c906938cfc04): bridge port vethe8359a3 was attached
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): enslaved to br-c906938cfc04
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: unavailable -> disconnected (reason 'connection-assumed') [20 30 41]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) starting connection 'vethe8359a3'
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 1 of 5 (Device Prepare) scheduled...
Mar 9 13:45:16 jin NetworkManager[624]: devices added (path: /sys/devices/virtual/net/veth625c53c, iface: veth625c53c)
Mar 9 13:45:16 jin NetworkManager[624]: device added (path: /sys/devices/virtual/net/veth625c53c, iface: veth625c53c): no ifupdown configuration found.
Mar 9 13:45:16 jin NetworkManager[624]: devices added (path: /sys/devices/virtual/net/vethe8359a3, iface: vethe8359a3)
Mar 9 13:45:16 jin NetworkManager[624]: device added (path: /sys/devices/virtual/net/vethe8359a3, iface: vethe8359a3): no ifupdown configuration found.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 1 of 5 (Device Prepare) started...
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: disconnected -> prepare (reason 'none') [30 40 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 2 of 5 (Device Configure) scheduled...
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 1 of 5 (Device Prepare) complete.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 2 of 5 (Device Configure) starting...
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: prepare -> config (reason 'none') [40 50 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 2 of 5 (Device Configure) successful.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 3 of 5 (IP Configure Start) scheduled.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 2 of 5 (Device Configure) complete.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 3 of 5 (IP Configure Start) started...
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: config -> ip-config (reason 'none') [50 70 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) Stage 3 of 5 (IP Configure Start) complete.
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: ip-config -> secondaries (reason 'none') [70 90 0]
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): device state change: secondaries -> activated (reason 'none') [90 100 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (vethe8359a3) successful, device activated.
Mar 9 13:45:16 jin nm-dispatcher: Dispatching action 'up' for vethe8359a3
Mar 9 13:45:16 jin avahi-daemon[649]: Withdrawing workstation service for veth625c53c.
Mar 9 13:45:16 jin kernel: [14057.711885] eth0: renamed from veth625c53c
Mar 9 13:45:16 jin NetworkManager[624]: devices removed (path: /sys/devices/virtual/net/veth625c53c, iface: veth625c53c)
Mar 9 13:45:16 jin NetworkManager[624]: (vethe8359a3): link connected
Mar 9 13:45:16 jin kernel: [14057.731723] IPv6: ADDRCONF(NETDEV_CHANGE): vethe8359a3: link becomes ready
Mar 9 13:45:16 jin docker[926]: time="2016-03-09T13:45:16.831954236+08:00" level=warning msg="Your kernel does not support OOM notifications: There is no path for "memory" in state"
Mar 9 13:45:16 jin docker[926]: time="2016-03-09T13:45:16.831991522+08:00" level=warning msg="Your kernel does not support OOM notifications: There is no path for "memory" in state"
Mar 9 13:45:16 jin kernel: [14057.874783] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)
Mar 9 13:45:16 jin NetworkManager[624]: (vethabe6ac9): device is virtual, marking as unmanaged
Mar 9 13:45:16 jin NetworkManager[624]: (vethabe6ac9): carrier is OFF
Mar 9 13:45:16 jin NetworkManager[624]: (vethabe6ac9): new Veth device (driver: 'unknown' ifindex: 28)
Mar 9 13:45:16 jin NetworkManager[624]: (vethabe6ac9): exported as /org/freedesktop/NetworkManager/Devices/35
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device is virtual, marking as unmanaged
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): carrier is OFF
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): new Veth device (driver: 'unknown' ifindex: 29)
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): exported as /org/freedesktop/NetworkManager/Devices/36
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41]
Mar 9 13:45:16 jin NetworkManager[624]: (br-c906938cfc04): bridge port veth36f649d was attached
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): enslaved to br-c906938cfc04
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: unavailable -> disconnected (reason 'connection-assumed') [20 30 41]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) starting connection 'veth36f649d'
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 1 of 5 (Device Prepare) scheduled...
Mar 9 13:45:16 jin NetworkManager[624]: devices added (path: /sys/devices/virtual/net/veth36f649d, iface: veth36f649d)
Mar 9 13:45:16 jin NetworkManager[624]: device added (path: /sys/devices/virtual/net/veth36f649d, iface: veth36f649d): no ifupdown configuration found.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 1 of 5 (Device Prepare) started...
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: disconnected -> prepare (reason 'none') [30 40 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 2 of 5 (Device Configure) scheduled...
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 1 of 5 (Device Prepare) complete.
Mar 9 13:45:16 jin kernel: [14057.879586] device veth36f649d entered promiscuous mode
Mar 9 13:45:16 jin kernel: [14057.879785] IPv6: ADDRCONF(NETDEV_UP): veth36f649d: link is not ready
Mar 9 13:45:16 jin kernel: [14057.879789] br-c906938cfc04: port 4(veth36f649d) entered forwarding state
Mar 9 13:45:16 jin kernel: [14057.879810] br-c906938cfc04: port 4(veth36f649d) entered forwarding state
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 2 of 5 (Device Configure) starting...
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: prepare -> config (reason 'none') [40 50 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 2 of 5 (Device Configure) successful.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 3 of 5 (IP Configure Start) scheduled.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 2 of 5 (Device Configure) complete.
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 3 of 5 (IP Configure Start) started...
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: config -> ip-config (reason 'none') [50 70 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) Stage 3 of 5 (IP Configure Start) complete.
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: ip-config -> secondaries (reason 'none') [70 90 0]
Mar 9 13:45:16 jin NetworkManager[624]: (veth36f649d): device state change: secondaries -> activated (reason 'none') [90 100 0]
Mar 9 13:45:16 jin NetworkManager[624]: Activation (veth36f649d) successful, device activated.
Mar 9 13:45:16 jin NetworkManager[624]: devices added (path: /sys/devices/virtual/net/vethabe6ac9, iface: vethabe6ac9)
Mar 9 13:45:16 jin NetworkManager[624]: device added (path: /sys/devices/virtual/net/vethabe6ac9, iface: vethabe6ac9): no ifupdown configuration found.
Mar 9 13:45:16 jin nm-dispatcher: Dispatching action 'up' for veth36f649d
Mar 9 13:45:17 jin avahi-daemon[649]: Withdrawing workstation service for vethabe6ac9.
Mar 9 13:45:17 jin kernel: [14057.964090] eth0: renamed from vethabe6ac9
Mar 9 13:45:17 jin NetworkManager[624]: devices removed (path: /sys/devices/virtual/net/vethabe6ac9, iface: vethabe6ac9)
Mar 9 13:45:17 jin kernel: [14057.982965] IPv6: ADDRCONF(NETDEV_CHANGE): veth36f649d: link becomes ready
Mar 9 13:45:17 jin NetworkManager[624]: (veth36f649d): link connected
Mar 9 13:45:17 jin docker[926]: time="2016-03-09T13:45:17.102463190+08:00" level=warning msg="Your kernel does not support OOM notifications: There is no path for "memory" in state"
Mar 9 13:45:17 jin docker[926]: time="2016-03-09T13:45:17.102500358+08:00" level=warning msg="Your kernel does not support OOM notifications: There is no path for "memory" in state"
Mar 9 13:45:17 jin avahi-daemon[649]: Joining mDNS multicast group on interface veth72eaeb0.IPv6 with address fe80::6857:4eff:fe69:5e76.
Mar 9 13:45:17 jin avahi-daemon[649]: New relevant interface veth72eaeb0.IPv6 for mDNS.
Mar 9 13:45:17 jin avahi-daemon[649]: Registering new address record for fe80::6857:4eff:fe69:5e76 on veth72eaeb0..
Mar 9 13:45:17 jin avahi-daemon[649]: Joining mDNS multicast group on interface vethe8359a3.IPv6 with address fe80::3cba:13ff:fe85:7123.
Mar 9 13:45:17 jin avahi-daemon[649]: New relevant interface vethe8359a3.IPv6 for mDNS.
Mar 9 13:45:17 jin avahi-daemon[649]: Registering new address record for fe80::3cba:13ff:fe85:7123 on vethe8359a3.
.
Mar 9 13:45:17 jin avahi-daemon[649]: Joining mDNS multicast group on interface veth1f1871e.IPv6 with address fe80::9415:65ff:feaa:6ef2.
Mar 9 13:45:17 jin avahi-daemon[649]: New relevant interface veth1f1871e.IPv6 for mDNS.
Mar 9 13:45:17 jin avahi-daemon[649]: Registering new address record for fe80::9415:65ff:feaa:6ef2 on veth1f1871e..
Mar 9 13:45:18 jin avahi-daemon[649]: Joining mDNS multicast group on interface veth36f649d.IPv6 with address fe80::f4:3fff:fe56:ea90.
Mar 9 13:45:18 jin avahi-daemon[649]: New relevant interface veth36f649d.IPv6 for mDNS.
Mar 9 13:45:18 jin avahi-daemon[649]: Registering new address record for fe80::f4:3fff:fe56:ea90 on veth36f649d.
.
Mar 9 13:45:31 jin kernel: [14072.313085] br-c906938cfc04: port 1(veth72eaeb0) entered forwarding state
Mar 9 13:45:31 jin kernel: [14072.441032] br-c906938cfc04: port 2(veth1f1871e) entered forwarding state
Mar 9 13:45:31 jin kernel: [14072.632947] br-c906938cfc04: port 3(vethe8359a3) entered forwarding state
Mar 9 13:45:31 jin kernel: [14072.888995] br-c906938cfc04: port 4(veth36f649d) entered forwarding state
Mar 9 13:47:05 jin kernel: [14166.618311] unregister_netdevice: waiting for lo to become free. Usage count = 1
Mar 9 13:55:01 jin CRON[11625]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
hadoop@jin:
$

@ntquyen

This comment has been minimized.

Show comment
Hide comment
@ntquyen

ntquyen Mar 29, 2016

This issue happens regularly in my CoreOS machines

$ docker info
Containers: 140
Images: 284
Server Version: 1.9.1
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.3.6-coreos
Operating System: CoreOS 899.13.0
CPUs: 4
Total Memory: 2.864 GiB
Name: dev-coreos-k8s_16
ID: ******
Username: *****
Registry: https://index.docker.io/v1/
$ dmesg
[ 3454.288789] veth15fe3b7: renamed from eth0
[ 3454.428263] docker0: port 8(veth9bc56d3) entered disabled state
[ 3455.853996] docker0: port 8(veth9bc56d3) entered forwarding state
[ 3456.006591] docker0: port 8(veth9bc56d3) entered forwarding state
[ 3456.306205] IPv6: veth15fe3b7: IPv6 duplicate address fe80::42:aff:fe02:4509 detected!
[ 3456.430095] docker0: port 8(veth9bc56d3) entered disabled state
[ 3456.486652] device veth9bc56d3 left promiscuous mode
[ 3456.531678] docker0: port 8(veth9bc56d3) entered disabled state
[ 3466.662046] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3476.940070] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3487.144036] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3497.360033] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3507.498082] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3517.621075] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3527.798045] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3538.028077] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3548.158067] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3558.345041] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3568.542075] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3578.749063] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3588.956034] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3599.095094] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3609.231037] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3619.422043] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
....

Is there a work around this issue?

ntquyen commented Mar 29, 2016

This issue happens regularly in my CoreOS machines

$ docker info
Containers: 140
Images: 284
Server Version: 1.9.1
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.3.6-coreos
Operating System: CoreOS 899.13.0
CPUs: 4
Total Memory: 2.864 GiB
Name: dev-coreos-k8s_16
ID: ******
Username: *****
Registry: https://index.docker.io/v1/
$ dmesg
[ 3454.288789] veth15fe3b7: renamed from eth0
[ 3454.428263] docker0: port 8(veth9bc56d3) entered disabled state
[ 3455.853996] docker0: port 8(veth9bc56d3) entered forwarding state
[ 3456.006591] docker0: port 8(veth9bc56d3) entered forwarding state
[ 3456.306205] IPv6: veth15fe3b7: IPv6 duplicate address fe80::42:aff:fe02:4509 detected!
[ 3456.430095] docker0: port 8(veth9bc56d3) entered disabled state
[ 3456.486652] device veth9bc56d3 left promiscuous mode
[ 3456.531678] docker0: port 8(veth9bc56d3) entered disabled state
[ 3466.662046] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3476.940070] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3487.144036] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3497.360033] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3507.498082] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3517.621075] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3527.798045] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3538.028077] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3548.158067] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3558.345041] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3568.542075] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3578.749063] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3588.956034] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3599.095094] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3609.231037] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
[ 3619.422043] unregister_netdevice: waiting for veth15fe3b7 to become free. Usage count = 1
....

Is there a work around this issue?

@spiffxp

This comment has been minimized.

Show comment
Hide comment
@spiffxp

spiffxp Apr 13, 2016

Running into this on CoreOS 1010.1.0 with kubernetes 1.2.2 and docker 1.10.3

spiffxp commented Apr 13, 2016

Running into this on CoreOS 1010.1.0 with kubernetes 1.2.2 and docker 1.10.3

@binarybana

This comment has been minimized.

Show comment
Hide comment
@binarybana

binarybana Apr 14, 2016

@spiffxp Are you using promiscuous mode or veth hairpin mode for the container bridge devices?
cf: kubernetes/kubernetes#20096

@spiffxp Are you using promiscuous mode or veth hairpin mode for the container bridge devices?
cf: kubernetes/kubernetes#20096

@bkeroackdsc

This comment has been minimized.

Show comment
Hide comment
@bkeroackdsc

bkeroackdsc Apr 28, 2016

I'm seeing this on 1010.1.0 via EC2. Everything is at default. I'm pulling and running containers via cloud-config and they're stalling with this issue.

I'm seeing this on 1010.1.0 via EC2. Everything is at default. I'm pulling and running containers via cloud-config and they're stalling with this issue.

@levipierce

This comment has been minimized.

Show comment
Hide comment
@levipierce

levipierce May 3, 2016

Still seeing this as well here in AWS

CoreOS stable (1010.1.0)
Docker version 1.10.3, build 8acee1b

Still seeing this as well here in AWS

CoreOS stable (1010.1.0)
Docker version 1.10.3, build 8acee1b

@spiffxp

This comment has been minimized.

Show comment
Hide comment
@ajinkyasurya

This comment has been minimized.

Show comment
Hide comment
@ajinkyasurya

ajinkyasurya May 6, 2016

Been waiting for fix on this for last six months now :(
Host: ubuntu 14.04
Docker Image: centOS7
$ dmesg
[60244.438117] device vethdb2b4ec entered promiscuous mode [60244.438227] IPv6: ADDRCONF(NETDEV_UP): vethdb2b4ec: link is not ready [60244.438232] docker0: port 2(vethdb2b4ec) entered forwarding state [60244.438236] docker0: port 2(vethdb2b4ec) entered forwarding state [60244.438456] docker0: port 2(vethdb2b4ec) entered disabled state [60246.744069] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60256.800077] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60266.856081] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60276.892073] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60286.948071] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60297.000070] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60307.080072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60317.140069] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60327.192074] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60337.236076] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60347.300075] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60357.368074] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60367.432071] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60377.496073] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60387.560073] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60397.620072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60407.692074] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60417.752076] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60427.816072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60437.888072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60447.956075] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60458.016072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60468.076072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60478.132072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60481.220084] INFO: task docker:1526 blocked for more than 120 seconds. [60481.223477] Not tainted 3.13.0-74-generic #118-Ubuntu [60481.226271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [60481.230196] docker D ffff88078fcb3180 0 1526 1 0x00000000 [60481.230202] ffff8807635e9d40 0000000000000082 ffff880763ba1800 ffff8807635e9fd8 [60481.230204] 0000000000013180 0000000000013180 ffff880763ba1800 ffffffff81cdbce0 [60481.230207] ffffffff81cdbce4 ffff880763ba1800 00000000ffffffff ffffffff81cdbce8 [60481.230211] Call Trace: [60481.230218] [<ffffffff81729499>] schedule_preempt_disabled+0x29/0x70 [60481.230221] [<ffffffff8172b305>] __mutex_lock_slowpath+0x135/0x1b0 [60481.230225] [<ffffffff811a31f9>] ? __kmalloc+0x1e9/0x230 [60481.230227] [<ffffffff8172b39f>] mutex_lock+0x1f/0x2f [60481.230230] [<ffffffff81620e61>] copy_net_ns+0x71/0x130 [60481.230234] [<ffffffff8108fbc9>] create_new_namespaces+0xf9/0x180 [60481.230236] [<ffffffff8108fcc3>] copy_namespaces+0x73/0xa0 [60481.230239] [<ffffffff81065b96>] copy_process.part.26+0x9a6/0x16b0 [60481.230241] [<ffffffff81066a75>] do_fork+0xd5/0x340 [60481.230243] [<ffffffff81066d66>] SyS_clone+0x16/0x20 [60481.230245] [<ffffffff81735ab9>] stub_clone+0x69/0x90 [60481.230247] [<ffffffff8173575d>] ? system_call_fastpath+0x1a/0x1f [60488.200069] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60498.256071] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60508.320074] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60518.392070] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60528.452067] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60538.508066] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60548.572067] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60558.624060] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60568.672071] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60578.720071] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60588.788074] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60598.848085] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60601.228081] INFO: task docker:1526 blocked for more than 120 seconds. [60601.231498] Not tainted 3.13.0-74-generic #118-Ubuntu [60601.234917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [60601.239668] docker D ffff88078fcb3180 0 1526 1 0x00000000 [60601.239672] ffff8807635e9d40 0000000000000082 ffff880763ba1800 ffff8807635e9fd8 [60601.239675] 0000000000013180 0000000000013180 ffff880763ba1800 ffffffff81cdbce0 [60601.239677] ffffffff81cdbce4 ffff880763ba1800 00000000ffffffff ffffffff81cdbce8 [60601.239680] Call Trace: [60601.239687] [<ffffffff81729499>] schedule_preempt_disabled+0x29/0x70 [60601.239690] [<ffffffff8172b305>] __mutex_lock_slowpath+0x135/0x1b0 [60601.239694] [<ffffffff811a31f9>] ? __kmalloc+0x1e9/0x230 [60601.239696] [<ffffffff8172b39f>] mutex_lock+0x1f/0x2f [60601.239699] [<ffffffff81620e61>] copy_net_ns+0x71/0x130 [60601.239703] [<ffffffff8108fbc9>] create_new_namespaces+0xf9/0x180 [60601.239706] [<ffffffff8108fcc3>] copy_namespaces+0x73/0xa0 [60601.239709] [<ffffffff81065b96>] copy_process.part.26+0x9a6/0x16b0 [60601.239711] [<ffffffff81066a75>] do_fork+0xd5/0x340 [60601.239712] [<ffffffff81066d66>] SyS_clone+0x16/0x20 [60601.239715] [<ffffffff81735ab9>] stub_clone+0x69/0x90 [60601.239717] [<ffffffff8173575d>] ? system_call_fastpath+0x1a/0x1f [60608.904068] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60618.960068] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60629.024094] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60639.088078] unregister_netdevice: waiting for lo to become free. Usage count = 1

Does anyone have a workaround?
An update on this would be appreciated.

Been waiting for fix on this for last six months now :(
Host: ubuntu 14.04
Docker Image: centOS7
$ dmesg
[60244.438117] device vethdb2b4ec entered promiscuous mode [60244.438227] IPv6: ADDRCONF(NETDEV_UP): vethdb2b4ec: link is not ready [60244.438232] docker0: port 2(vethdb2b4ec) entered forwarding state [60244.438236] docker0: port 2(vethdb2b4ec) entered forwarding state [60244.438456] docker0: port 2(vethdb2b4ec) entered disabled state [60246.744069] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60256.800077] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60266.856081] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60276.892073] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60286.948071] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60297.000070] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60307.080072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60317.140069] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60327.192074] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60337.236076] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60347.300075] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60357.368074] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60367.432071] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60377.496073] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60387.560073] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60397.620072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60407.692074] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60417.752076] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60427.816072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60437.888072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60447.956075] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60458.016072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60468.076072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60478.132072] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60481.220084] INFO: task docker:1526 blocked for more than 120 seconds. [60481.223477] Not tainted 3.13.0-74-generic #118-Ubuntu [60481.226271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [60481.230196] docker D ffff88078fcb3180 0 1526 1 0x00000000 [60481.230202] ffff8807635e9d40 0000000000000082 ffff880763ba1800 ffff8807635e9fd8 [60481.230204] 0000000000013180 0000000000013180 ffff880763ba1800 ffffffff81cdbce0 [60481.230207] ffffffff81cdbce4 ffff880763ba1800 00000000ffffffff ffffffff81cdbce8 [60481.230211] Call Trace: [60481.230218] [<ffffffff81729499>] schedule_preempt_disabled+0x29/0x70 [60481.230221] [<ffffffff8172b305>] __mutex_lock_slowpath+0x135/0x1b0 [60481.230225] [<ffffffff811a31f9>] ? __kmalloc+0x1e9/0x230 [60481.230227] [<ffffffff8172b39f>] mutex_lock+0x1f/0x2f [60481.230230] [<ffffffff81620e61>] copy_net_ns+0x71/0x130 [60481.230234] [<ffffffff8108fbc9>] create_new_namespaces+0xf9/0x180 [60481.230236] [<ffffffff8108fcc3>] copy_namespaces+0x73/0xa0 [60481.230239] [<ffffffff81065b96>] copy_process.part.26+0x9a6/0x16b0 [60481.230241] [<ffffffff81066a75>] do_fork+0xd5/0x340 [60481.230243] [<ffffffff81066d66>] SyS_clone+0x16/0x20 [60481.230245] [<ffffffff81735ab9>] stub_clone+0x69/0x90 [60481.230247] [<ffffffff8173575d>] ? system_call_fastpath+0x1a/0x1f [60488.200069] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60498.256071] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60508.320074] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60518.392070] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60528.452067] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60538.508066] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60548.572067] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60558.624060] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60568.672071] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60578.720071] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60588.788074] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60598.848085] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60601.228081] INFO: task docker:1526 blocked for more than 120 seconds. [60601.231498] Not tainted 3.13.0-74-generic #118-Ubuntu [60601.234917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [60601.239668] docker D ffff88078fcb3180 0 1526 1 0x00000000 [60601.239672] ffff8807635e9d40 0000000000000082 ffff880763ba1800 ffff8807635e9fd8 [60601.239675] 0000000000013180 0000000000013180 ffff880763ba1800 ffffffff81cdbce0 [60601.239677] ffffffff81cdbce4 ffff880763ba1800 00000000ffffffff ffffffff81cdbce8 [60601.239680] Call Trace: [60601.239687] [<ffffffff81729499>] schedule_preempt_disabled+0x29/0x70 [60601.239690] [<ffffffff8172b305>] __mutex_lock_slowpath+0x135/0x1b0 [60601.239694] [<ffffffff811a31f9>] ? __kmalloc+0x1e9/0x230 [60601.239696] [<ffffffff8172b39f>] mutex_lock+0x1f/0x2f [60601.239699] [<ffffffff81620e61>] copy_net_ns+0x71/0x130 [60601.239703] [<ffffffff8108fbc9>] create_new_namespaces+0xf9/0x180 [60601.239706] [<ffffffff8108fcc3>] copy_namespaces+0x73/0xa0 [60601.239709] [<ffffffff81065b96>] copy_process.part.26+0x9a6/0x16b0 [60601.239711] [<ffffffff81066a75>] do_fork+0xd5/0x340 [60601.239712] [<ffffffff81066d66>] SyS_clone+0x16/0x20 [60601.239715] [<ffffffff81735ab9>] stub_clone+0x69/0x90 [60601.239717] [<ffffffff8173575d>] ? system_call_fastpath+0x1a/0x1f [60608.904068] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60618.960068] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60629.024094] unregister_netdevice: waiting for lo to become free. Usage count = 1 [60639.088078] unregister_netdevice: waiting for lo to become free. Usage count = 1

Does anyone have a workaround?
An update on this would be appreciated.

@mischief

This comment has been minimized.

Show comment
Hide comment
@mischief

mischief May 6, 2016

does anyone have a reliable way to reproduce this, or can share their configurations on top of CoreOS for testing? also, if you are running into this regularly, can you reproduce it on CoreOS alpha 1032.1.0 with linux 4.5.2?

mischief commented May 6, 2016

does anyone have a reliable way to reproduce this, or can share their configurations on top of CoreOS for testing? also, if you are running into this regularly, can you reproduce it on CoreOS alpha 1032.1.0 with linux 4.5.2?

@lsztachanski

This comment has been minimized.

Show comment
Hide comment
@lsztachanski

lsztachanski May 10, 2016

I'm experiencing the same issue with K8s on CoreOS Alpha. Docker on k8s-nodes stops responding when deleting big deployment (100 pods and more). Happens to one of the node at a time

NAME=CoreOS
ID=coreos
VERSION=1032.1.0
VERSION_ID=1032.1.0
BUILD_ID=2016-05-05-1941
PRETTY_NAME="CoreOS 1032.1.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

May 10 07:28:53 tectonic-mgmt2 kernel: unregister_netdevice: waiting for veth19618b9 to become free. Usage count = 1
veth pair veth19618b9 doesn't exist on the system.

I'm experiencing the same issue with K8s on CoreOS Alpha. Docker on k8s-nodes stops responding when deleting big deployment (100 pods and more). Happens to one of the node at a time

NAME=CoreOS
ID=coreos
VERSION=1032.1.0
VERSION_ID=1032.1.0
BUILD_ID=2016-05-05-1941
PRETTY_NAME="CoreOS 1032.1.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

May 10 07:28:53 tectonic-mgmt2 kernel: unregister_netdevice: waiting for veth19618b9 to become free. Usage count = 1
veth pair veth19618b9 doesn't exist on the system.

@mikesplain

This comment has been minimized.

Show comment
Hide comment
@mikesplain

mikesplain May 13, 2016

I'm seeing similiar results as @lsztachanski on 899.15.0 using K8s. I've had this happen twice now when I'm using Jenkins Kubernetes to build about 15 containers on a single node and shut them down quickly on build finish.

May 13 00:01:41 ip-10-21-40-154.ec2.internal kernel: unregister_netdevice: waiting for veth7091e93 to become free. Usage count = 1```

I'm seeing similiar results as @lsztachanski on 899.15.0 using K8s. I've had this happen twice now when I'm using Jenkins Kubernetes to build about 15 containers on a single node and shut them down quickly on build finish.

May 13 00:01:41 ip-10-21-40-154.ec2.internal kernel: unregister_netdevice: waiting for veth7091e93 to become free. Usage count = 1```
@sethp-jive

This comment has been minimized.

Show comment
Hide comment
@sethp-jive

sethp-jive Jun 1, 2016

@mischief I've had good luck reproducing this with a specific kubernetes Job mentioned here: docker/docker#23175 . I can confirm that it's still happening with that job on what I believe is a relatively stock kubernetes cluster on a CoreOS 1053.2.0 node running 4.6.0 – I'm happy to share any other information you might find helpful too!

@mischief I've had good luck reproducing this with a specific kubernetes Job mentioned here: docker/docker#23175 . I can confirm that it's still happening with that job on what I believe is a relatively stock kubernetes cluster on a CoreOS 1053.2.0 node running 4.6.0 – I'm happy to share any other information you might find helpful too!

@crawford crawford reopened this Oct 18, 2016

@crawford crawford added this to the CoreOS Alpha 1221.0.0 milestone Oct 18, 2016

@crawford crawford added priority/P1 and removed priority/P2 labels Oct 18, 2016

@stensonb

This comment has been minimized.

Show comment
Hide comment
@stensonb

stensonb Oct 18, 2016

I'm seeing issues which I believe are related:

Arch: Bare Metal (HP)
CoreOS: 1185.1.0
Docker: 1.11.2

I'm handling this with a fleet job that greps for "unregister_netdevice: waiting for veth" and reboots (using locksmithctl to get a cluster lock first)...

Here are some of my log entries: https://gist.github.com/stensonb/5e54282c0d351f746bf738590464e342

Notice, the first detection of "unregister_netdevice: waiting for veth" happens at 21:28:27, but the machine isn't rebooted until at least 21:34:51 (taking almost 6 minutes to "nicely" reboot due to this issue).

I'm happy to provide additional help/logs if that'll help identify this issue.

stensonb commented Oct 18, 2016

I'm seeing issues which I believe are related:

Arch: Bare Metal (HP)
CoreOS: 1185.1.0
Docker: 1.11.2

I'm handling this with a fleet job that greps for "unregister_netdevice: waiting for veth" and reboots (using locksmithctl to get a cluster lock first)...

Here are some of my log entries: https://gist.github.com/stensonb/5e54282c0d351f746bf738590464e342

Notice, the first detection of "unregister_netdevice: waiting for veth" happens at 21:28:27, but the machine isn't rebooted until at least 21:34:51 (taking almost 6 minutes to "nicely" reboot due to this issue).

I'm happy to provide additional help/logs if that'll help identify this issue.

@pietervisser

This comment has been minimized.

Show comment
Hide comment
@pietervisser

pietervisser Oct 19, 2016

@crawford thanks for your reply. I would really be able to help here, but we don't know the exact steps to reproduce this issue. Restarting the nginx-container on our production-setup does trigger the issue sometimes.

If you need additional info, please let me know!

Oct 18 13:10:20 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:10:30 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:10:40 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:10:50 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:00 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:10 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:20 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:30 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:40 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:50 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:12:00 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:12:10 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:12:21 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4

@crawford thanks for your reply. I would really be able to help here, but we don't know the exact steps to reproduce this issue. Restarting the nginx-container on our production-setup does trigger the issue sometimes.

If you need additional info, please let me know!

Oct 18 13:10:20 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:10:30 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:10:40 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:10:50 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:00 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:10 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:20 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:30 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:40 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:11:50 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:12:00 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:12:10 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
Oct 18 13:12:21 coreos-node-01 kernel: unregister_netdevice: waiting for veth3b967c3 to become free. Usage count = 4
@sethp-jive

This comment has been minimized.

Show comment
Hide comment
@sethp-jive

sethp-jive Oct 31, 2016

@crawford as per docker/docker#5618 (comment) it seems the fix for this specific issue would be updating the version of the Linux kernel in CoreOS to 4.4.22 (the issue isn't specific to docker, it seems concurrent veth calls from any userland software could trigger it).

The work around in coreos/docker#29 may help here by reducing the frequency with with docker calls into the kernel concurrently (esp. with systemd), but ultimately this lockup is an issue deep inside the ip stack in the kernel (and docker blocking forever on stuck system calls).

@crawford as per docker/docker#5618 (comment) it seems the fix for this specific issue would be updating the version of the Linux kernel in CoreOS to 4.4.22 (the issue isn't specific to docker, it seems concurrent veth calls from any userland software could trigger it).

The work around in coreos/docker#29 may help here by reducing the frequency with with docker calls into the kernel concurrently (esp. with systemd), but ultimately this lockup is an issue deep inside the ip stack in the kernel (and docker blocking forever on stuck system calls).

@mischief

This comment has been minimized.

Show comment
Hide comment
@mischief

mischief Oct 31, 2016

@sethp-jive the relevant upstream commit (torvalds/linux@751eb6b) is apparently now in 4.8+, which is in alpha 1214.0.0.

@sethp-jive the relevant upstream commit (torvalds/linux@751eb6b) is apparently now in 4.8+, which is in alpha 1214.0.0.

@crawford

This comment has been minimized.

Show comment
Hide comment
Member

crawford commented Nov 1, 2016

@noqcks

This comment has been minimized.

Show comment
Hide comment
@noqcks

noqcks Dec 23, 2016

@crawford We're still receiving this on CoreOS-alpha-1262.0.0-hvm - ami-8c76789b which includes coreos/coreos-overlay#2237.

Dec 22 12:30:10 ip-10-65-73-60.ec2.internal kernel:  Memory cgroup out of memory: Kill process 8771 (phantomjs) score 962 or sacrifice child
Dec 22 12:30:10 ip-10-65-73-60.ec2.internal kernel:  Killed process 8771 (phantomjs) total-vm:2566388kB, anon-rss:389016kB, file-rss:3956kB, shmem-rss:0kB
Dec 22 12:30:10 ip-10-65-73-60.ec2.internal kernel:  oom_reaper: reaped process 8771 (phantomjs), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Dec 22 12:30:15 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:21 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:25 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:31 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:35 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:41 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:45 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:51 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:55 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:01 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:04 ip-10-0-176-241.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:05 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:11 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:15 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:21 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:25 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:31 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:35 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:41 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:45 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:51 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:55 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:02 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:05 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:12 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:15 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:22 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:25 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:32 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:  INFO: task exe:11686 blocked for more than 120 seconds.
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:        Not tainted 4.9.0-coreos #1
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:  exe             D    0 11686   1978 0x00000080
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   ffff9a5b8b3b7800 ffff9a5a383df080 ffff9a59994a8000 ffff9a5ad5e6d7c0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   ffff9a5ba0258300 ffffadd80225fc68 ffffffffb55c52ed 0000000000000000
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   ffffadd80225fc38 00000000b55c5a6e ffffadd80225fc90 ffff9a5ad5e6d7c0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:  Call Trace:
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55c52ed>] ? __schedule+0x23d/0x6e0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55c57c6>] schedule+0x36/0x80
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55c5a6e>] schedule_preempt_disabled+0xe/0x10
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55c7749>] __mutex_lock_slowpath+0xb9/0x130
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55c77df>] mutex_lock+0x1f/0x30
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb54b51cf>] copy_net_ns+0x9f/0x170
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb509b56b>] create_new_namespaces+0x11b/0x1e0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb509b6a3>] copy_namespaces+0x73/0xa0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb5077cf4>] copy_process.part.32+0x974/0x1be0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb5079147>] ? _do_fork+0xd7/0x390
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb5079147>] _do_fork+0xd7/0x390
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb50794a9>] SyS_clone+0x19/0x20
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb5003b3a>] do_syscall_64+0x5a/0x160
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55ca7eb>] entry_SYSCALL64_slow_path+0x25/0x25

noqcks commented Dec 23, 2016

@crawford We're still receiving this on CoreOS-alpha-1262.0.0-hvm - ami-8c76789b which includes coreos/coreos-overlay#2237.

Dec 22 12:30:10 ip-10-65-73-60.ec2.internal kernel:  Memory cgroup out of memory: Kill process 8771 (phantomjs) score 962 or sacrifice child
Dec 22 12:30:10 ip-10-65-73-60.ec2.internal kernel:  Killed process 8771 (phantomjs) total-vm:2566388kB, anon-rss:389016kB, file-rss:3956kB, shmem-rss:0kB
Dec 22 12:30:10 ip-10-65-73-60.ec2.internal kernel:  oom_reaper: reaped process 8771 (phantomjs), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Dec 22 12:30:15 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:21 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:25 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:31 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:35 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:41 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:45 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:51 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:30:55 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:01 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:04 ip-10-0-176-241.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:05 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:11 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:15 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:21 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:25 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:31 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:35 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:41 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:45 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:51 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:31:55 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:02 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:05 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:12 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:15 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:22 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:25 ip-10-216-196-165.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:32 ip-10-65-73-60.ec2.internal kernel:  unregister_netdevice: waiting for lo to become free. Usage count = 1
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:  INFO: task exe:11686 blocked for more than 120 seconds.
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:        Not tainted 4.9.0-coreos #1
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:  exe             D    0 11686   1978 0x00000080
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   ffff9a5b8b3b7800 ffff9a5a383df080 ffff9a59994a8000 ffff9a5ad5e6d7c0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   ffff9a5ba0258300 ffffadd80225fc68 ffffffffb55c52ed 0000000000000000
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   ffffadd80225fc38 00000000b55c5a6e ffffadd80225fc90 ffff9a5ad5e6d7c0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:  Call Trace:
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55c52ed>] ? __schedule+0x23d/0x6e0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55c57c6>] schedule+0x36/0x80
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55c5a6e>] schedule_preempt_disabled+0xe/0x10
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55c7749>] __mutex_lock_slowpath+0xb9/0x130
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55c77df>] mutex_lock+0x1f/0x30
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb54b51cf>] copy_net_ns+0x9f/0x170
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb509b56b>] create_new_namespaces+0x11b/0x1e0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb509b6a3>] copy_namespaces+0x73/0xa0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb5077cf4>] copy_process.part.32+0x974/0x1be0
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb5079147>] ? _do_fork+0xd7/0x390
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb5079147>] _do_fork+0xd7/0x390
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb50794a9>] SyS_clone+0x19/0x20
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb5003b3a>] do_syscall_64+0x5a/0x160
Dec 22 12:32:35 ip-10-65-73-60.ec2.internal kernel:   [<ffffffffb55ca7eb>] entry_SYSCALL64_slow_path+0x25/0x25
@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Dec 27, 2016

Member

The bug that will never die.

Member

crawford commented Dec 27, 2016

The bug that will never die.

@crawford crawford reopened this Dec 27, 2016

@jamesjryan

This comment has been minimized.

Show comment
Hide comment
@jamesjryan

jamesjryan Jan 21, 2017

Still experience this under heavy load on Docker version 1.12.6, build 7392c3b/1.12.6 on 4.4.39-34.54.amzn1.x86_64 AWS Linux AMI.

Still experience this under heavy load on Docker version 1.12.6, build 7392c3b/1.12.6 on 4.4.39-34.54.amzn1.x86_64 AWS Linux AMI.

@cjpetrus

This comment has been minimized.

Show comment
Hide comment
@cjpetrus

cjpetrus Jan 24, 2017

Still experience this under heavy load on Docker version 1.12.6, build 7392c3b/1.12.6 on 4.4.39-34.54.amzn1.x86_64 AWS Linux AMI.

Still experience this under heavy load on Docker version 1.12.6, build 7392c3b/1.12.6 on 4.4.39-34.54.amzn1.x86_64 AWS Linux AMI.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Jan 24, 2017

Member

@cjpetrus @jamesjryan It sounds like you're using Amazon Linux, not Container Linux. Perhaps you meant to comment on moby/moby#5618 or the AWS support forums?

Member

euank commented Jan 24, 2017

@cjpetrus @jamesjryan It sounds like you're using Amazon Linux, not Container Linux. Perhaps you meant to comment on moby/moby#5618 or the AWS support forums?

@jamesjryan

This comment has been minimized.

Show comment
Hide comment
@jamesjryan

jamesjryan Jan 25, 2017

@euank indeed I did, thanks.

@euank indeed I did, thanks.

@Random-Liu Random-Liu referenced this issue in kubernetes/node-problem-detector Feb 10, 2017

Merged

Fix kernel monitor issues #81

@truongexp

This comment has been minimized.

Show comment
Hide comment
@truongexp

truongexp Feb 20, 2017

I've got same issues. Attachments are all of logs that made from ecs-logs-collector script.
Much appreciated for any help :)
collect.zip

I've got same issues. Attachments are all of logs that made from ecs-logs-collector script.
Much appreciated for any help :)
collect.zip

@AronNovak AronNovak referenced this issue in Gizra/drupal-elm-starter Feb 23, 2017

Merged

Testing WebDriverIO #43

@virtuman

This comment has been minimized.

Show comment
Hide comment
@virtuman

virtuman Mar 1, 2017

Same issue on centos 7.3 bare metal docker host with docker 1.13.1

virtuman commented Mar 1, 2017

Same issue on centos 7.3 bare metal docker host with docker 1.13.1

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Mar 1, 2017

Member

@truongexp, @virtuman, it looks like neither of you are using Container Linux. Perhaps you meant to post on docker/docker#5618 or to contact your respective distro vendor's issue trackers.

I don't see any reports yet for one of our releases with a 4.9.9 kernel, but it might be too soon for me to get my hopes up.

Member

euank commented Mar 1, 2017

@truongexp, @virtuman, it looks like neither of you are using Container Linux. Perhaps you meant to post on docker/docker#5618 or to contact your respective distro vendor's issue trackers.

I don't see any reports yet for one of our releases with a 4.9.9 kernel, but it might be too soon for me to get my hopes up.

@istrau2 istrau2 referenced this issue in stellar/docker-stellar-core-horizon Apr 7, 2017

Closed

Container seems to have stalled. How to restart? #12

@vfreex

This comment has been minimized.

Show comment
Hide comment
@vfreex

vfreex May 22, 2017

Same issue when I mounted a NFS volume in a privileged container.

vfreex commented May 22, 2017

Same issue when I mounted a NFS volume in a privileged container.

@lucab

This comment has been minimized.

Show comment
Hide comment
@lucab

lucab Jun 12, 2017

Member

Another source of leakage has been fixed by torvalds/linux@f647821, which is part of 4.11.

Member

lucab commented Jun 12, 2017

Another source of leakage has been fixed by torvalds/linux@f647821, which is part of 4.11.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Jun 14, 2017

Member

torvalds/linux@b7c8487 is in the same batch of commits, is also in 4.11, and fixes another possible cause

Member

euank commented Jun 14, 2017

torvalds/linux@b7c8487 is in the same batch of commits, is also in 4.11, and fixes another possible cause

@igorwwwwwwwwwwwwwwwwwwww igorwwwwwwwwwwwwwwwwwwww referenced this issue in travis-ci/packer-templates Jun 20, 2017

Merged

Bump to linux kernel 4.11.6 #442

1 of 1 task complete
@stuart-warren

This comment has been minimized.

Show comment
Hide comment

moby/moby#5618 references torvalds/linux@d747a7a which is in 4.12

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Jul 20, 2017

Member

If you're not using Container Linux, you're looking for moby/moby#5618

I'm optimistically closing this bug with the assertion that kernel 4.12 (in Container Linux beta/alpha channels now) should have fixed all the common causes.

There hasn't been much noise on this issue actually related to Container Linux for a while now, which I also think is a good sign we might be in the clear to close it.

If you can still reproduce this issue on an up to date Container Linux machine, please do give a shout.

Member

euank commented Jul 20, 2017

If you're not using Container Linux, you're looking for moby/moby#5618

I'm optimistically closing this bug with the assertion that kernel 4.12 (in Container Linux beta/alpha channels now) should have fixed all the common causes.

There hasn't been much noise on this issue actually related to Container Linux for a while now, which I also think is a good sign we might be in the clear to close it.

If you can still reproduce this issue on an up to date Container Linux machine, please do give a shout.

@euank euank closed this Jul 20, 2017

@dockercore

This comment has been minimized.

Show comment
Hide comment
@dockercore

dockercore Aug 16, 2017

Centos 6.8
you rare ??????
Error:
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Fix:
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
sysctl -w net.ipv4.ip_forward=1 >> /dev/null 2>&1

Centos 6.8
you rare ??????
Error:
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Fix:
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
sysctl -w net.ipv4.ip_forward=1 >> /dev/null 2>&1

@vagharsh

This comment has been minimized.

Show comment
Hide comment
@vagharsh

vagharsh Sep 12, 2017

same isshue with CentOS Linux release 7.3.1611and Docker 1.12.6 the only container running is nginx:stable-alpine

same isshue with CentOS Linux release 7.3.1611and Docker 1.12.6 the only container running is nginx:stable-alpine

@amanagarwal2189

This comment has been minimized.

Show comment
Hide comment
@amanagarwal2189

amanagarwal2189 Oct 25, 2017

RHEL.. same issue... Dokcer EE under heavy load run

RHEL.. same issue... Dokcer EE under heavy load run

@edward-of-clt

This comment has been minimized.

Show comment
Hide comment
@edward-of-clt

edward-of-clt Nov 4, 2017

Centos 7 — kernel:unregister_netdevice: waiting for lo to become free. Usage count = 2

Centos 7 — kernel:unregister_netdevice: waiting for lo to become free. Usage count = 2

@miglesiassarria

This comment has been minimized.

Show comment
Hide comment
@miglesiassarria

miglesiassarria Nov 17, 2017

Same Isue on Centos 7 in AWS
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Same Isue on Centos 7 in AWS
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Nov 17, 2017

Member

Locking this issue. If you're not using CoreOS Container Linux, please see moby/moby#5618. If you're seeing this issue on a current release of Container Linux, please open a new bug.

Member

bgilbert commented Nov 17, 2017

Locking this issue. If you're not using CoreOS Container Linux, please see moby/moby#5618. If you're seeing this issue on a current release of Container Linux, please open a new bug.

@coreos coreos locked and limited conversation to collaborators Nov 17, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.