Skip to content
This repository has been archived by the owner on Feb 1, 2021. It is now read-only.

Swarm overlay networks leave stray virtual network interfaces after removal #1980

Closed
morxa opened this issue Mar 16, 2016 · 2 comments
Closed

Comments

@morxa
Copy link

morxa commented Mar 16, 2016

The issue

I'm using Swarm with 3 Swarm agents and one separate master and etcd for cluster discovery.

I frequently create new overlay networks, run containers in the network, and remove the network after the containers were destroyed. Sometimes, I observe stray network interfaces that should have been removed, such as this one:

10: veth2401f79: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default 
    link/ether 46:ce:b7:56:55:f6 brd ff:ff:ff:ff:ff:ff

How to reproduce

I can reproduce the issue with the following script swarm-nettest.sh:

#!/bin/bash
run_unit () {
  docker network create test_$1
  c1=$(docker run -d --net=test_$1 ubuntu:14.04 sleep 5)
  c2=$(docker run -d --net=test_$1 ubuntu:14.04 sleep 5)
  docker wait $c1
  docker rm $c1
  docker wait $c2
  docker rm $c2
  docker network rm test_$1
}

create_network () {
  docker network create test_$1
  docker network rm test_$1
}

for i in $(seq 1 $1); do
  run_unit $i &
  #create_network $i &
done
wait

Before running the script, I had the following network interfaces on the Swarm agents:

$ ansible -i cluster.list ad-sim -a 'ip link'
ad-sim01 | SUCCESS | rc=0 >>
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:50:56:a0:4c:30 brd ff:ff:ff:ff:ff:ff
3: docker_gwbridge: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:e7:22:c0:5a brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:ca:f5:b0:8f brd ff:ff:ff:ff:ff:ff
6: veth84940a6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether 0a:96:51:a4:06:ea brd ff:ff:ff:ff:ff:ff
ad-sim02 | SUCCESS | rc=0 >>
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:50:56:a0:3f:cc brd ff:ff:ff:ff:ff:ff
3: docker_gwbridge: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:20:80:c3:0b brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:ee:9f:1f:a4 brd ff:ff:ff:ff:ff:ff
6: veth2296a98: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether 5e:97:19:2a:64:8b brd ff:ff:ff:ff:ff:ff
ad-sim03 | SUCCESS | rc=0 >>
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:50:56:a0:4c:91 brd ff:ff:ff:ff:ff:ff
3: docker_gwbridge: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:f9:36:95:83 brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:0c:2c:bb:15 brd ff:ff:ff:ff:ff:ff
6: veth9fc6eaf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether 4a:0a:86:1c:06:de brd ff:ff:ff:ff:ff:ff

After running ./swarm-nettest.sh 10, I have the following network interfaces on the Swarm agents:

$ ansible -i cluster.list ad-sim -a 'ip link'
ad-sim01 | SUCCESS | rc=0 >>
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:50:56:a0:4c:30 brd ff:ff:ff:ff:ff:ff
3: docker_gwbridge: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:e7:22:c0:5a brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:ca:f5:b0:8f brd ff:ff:ff:ff:ff:ff
6: veth84940a6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether 0a:96:51:a4:06:ea brd ff:ff:ff:ff:ff:ff

ad-sim02 | SUCCESS | rc=0 >>
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:50:56:a0:3f:cc brd ff:ff:ff:ff:ff:ff
3: docker_gwbridge: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:20:80:c3:0b brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:ee:9f:1f:a4 brd ff:ff:ff:ff:ff:ff
6: veth2296a98: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether 5e:97:19:2a:64:8b brd ff:ff:ff:ff:ff:ff
9: vethef60919: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default 
    link/ether 02:42:0a:00:02:02 brd ff:ff:ff:ff:ff:ff
10: veth2401f79: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default 
    link/ether 46:ce:b7:56:55:f6 brd ff:ff:ff:ff:ff:ff

ad-sim03 | SUCCESS | rc=0 >>
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:50:56:a0:4c:91 brd ff:ff:ff:ff:ff:ff
3: docker_gwbridge: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:f9:36:95:83 brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:0c:2c:bb:15 brd ff:ff:ff:ff:ff:ff
6: veth9fc6eaf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether 4a:0a:86:1c:06:de brd ff:ff:ff:ff:ff:ff
9: veth71b9c51: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default 
    link/ether 02:42:0a:00:03:02 brd ff:ff:ff:ff:ff:ff
10: vethaf86582: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default 
    link/ether 9e:97:6b:cb:71:f8 brd ff:ff:ff:ff:ff:ff

As you can see, there are some additional veth* interfaces that are DOWN.

All containers and all overlay networks have been removed:

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
cacea96e9632        swarm               "/swarm join --addr=5"   About an hour ago   Up About an hour    2375/tcp            ad-sim03/swarm
41e0cb36a871        swarm               "/swarm join --addr=5"   About an hour ago   Up About an hour    2375/tcp            ad-sim02/swarm
3e43affbeda1        swarm               "/swarm join --addr=5"   About an hour ago   Up About an hour    2375/tcp            ad-sim01/swarm

$ docker network ls
NETWORK ID          NAME                       DRIVER
7a6cc1bfb574        ad-sim02/none              null                
cc4063de5fe6        ad-sim02/host              host                
a0d4c009773d        ad-sim03/bridge            bridge              
4e13f2513e0c        ad-sim03/none              null                
bd7469ecff2d        ad-sim03/host              host                
76bdb767f678        ad-sim03/docker_gwbridge   bridge              
608e07808654        ad-sim02/bridge            bridge              
1540b7162de8        ad-sim02/docker_gwbridge   bridge              
d2e7ea54885a        ad-sim01/docker_gwbridge   bridge              
8920890377ec        ad-sim01/none              null                
a17c8b04a5a3        ad-sim01/host              host                
1ac2ce472b9a        ad-sim01/bridge            bridge

Interestingly, if I call create_network instead of run_unit in the script, the additional interfaces are all removed properly. This suggests that this is somehow related to attaching/detaching containers to the network.

If I repeat calling the script, I end up with hundreds of virtual network interfaces. At that point, it takes 10-20 seconds to create a new overlay network.

Additional information

$ ansible -i cluster.list all -a 'uname -a'
ad-sim01 | SUCCESS | rc=0 >>
Linux ad-sim01 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

ci | SUCCESS | rc=0 >>
Linux ci 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

ad-sim02 | SUCCESS | rc=0 >>
Linux ad-sim02 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

ad-sim03 | SUCCESS | rc=0 >>
Linux ad-sim03 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

ci is the Swarm master.

$ ansible -i cluster.list all -a 'docker -v'
ad-sim01 | SUCCESS | rc=0 >>
Docker version 1.10.3, build 20f81dd

ci | SUCCESS | rc=0 >>
Docker version 1.10.3, build 20f81dd

ad-sim02 | SUCCESS | rc=0 >>
Docker version 1.10.3, build 20f81dd

ad-sim03 | SUCCESS | rc=0 >>
Docker version 1.10.3, build 20f81dd

All nodes run swarm:latest (291cbe419fe6).

All nodes run with the following /etc/default/docker:

DOCKER_OPTS="
  -H tcp://0.0.0.0:2375
  -H unix:///var/run/docker.sock
  --insecure-registry=ci:5000
  --cluster-store=etcd://<ci_ip>:2379/
  --cluster-advertise=eth0:2375
  --dns <dns_ip>"

# If you need Docker to use an HTTP proxy, it can also be specified here.
export http_proxy="http://127.0.0.1:3128/"
export https_proxy="http://127.0.0.1:3128/"
export HTTP_PROXY="http://127.0.0.1:3128/"
export HTTPS_PROXY="http://127.0.0.1:3128/"

etcd also runs on ci with version 2.2.5 and the following config:

export ETCD_INITIAL_CLUSTER="etcd0=http://<ci_ip>:2380"
export ETCD_INITIAL_CLUSTER_STATE="new"
export ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
export ETCD_INITIAL_ADVERTISE_PEER_URLS="http://<ci_ip>:2380"
export ETCD_DATA_DIR="/var/etcd"
export ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
export ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379,http://0.0.0.0:4001"
export ETCD_ADVERTISE_CLIENT_URLS="http://<ci_ip>:2379,http://<ci_ip>:4001"
export ETCD_NAME="etcd0"
@mavenugo
Copy link
Contributor

This is a dupe of moby/libnetwork#984. Will be resolved in 1.11.

@nishanttotla
Copy link
Contributor

Closing due to lack of activity. Please reopen if you wish to continue discussing it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants