Overlay network cannot accept containers on a newly provisioned machine after many network events have been triggered #1069

JamieGarside · 2016-04-01T11:59:50Z

I'm seeing issues where after a cluster using overlay networking has been running for a period of time, with many containers brought up and down, new machines added to the cluster cannot reliably communicate with existing containers running on other cluster machines.

As an example, if we spin up a cluster, then schedule a container we want to keep running (say, nginx), then add a new machine to the cluster after a period of time, it cannot communicate with the nginx container (and all attempts to communicate yield "Destination Host Unreachable"). If the nginx container is restarted however, containers running on the new machine can communicate successfully.

After a dig through, I'm thinking this is due to how memberships to the cluster are handled. When a machine is added to the cluster, it joins the existing Serf cluster and relays the containers/networks it knows about to the cluster (drivers/overlay/overlay.go:166-171). Of course, the new machine doesn't have any containers and thus nothing to relay. When joining Serf, the ignoreOld flag is false (drivers/overlay/ov_serf.go:76) and as such, it gets an event replay from Serf, normally allowing it to reconstruct the state of the cluster.

After a number of network events have been triggered though, the Serf replay log is truncated (currently at 1024 entries), and thus the new machine does not obtain any routing information for old containers running on existing machines in the cluster. After the old containers are restarted though, the routing information is again broadcast over Serf and hence propagates to the new machine, allowing it to communicate.

I've seen this happen using Docker 1.10.x on stock installs of Debian 8, Ubuntu 14.04LTS and Boot2Docker 1.10.3 as provisioned using docker-machine. The below script gives us a reproducible test case:

#!/bin/bash
set -e
MACHINE_NAME=consul-host
MACHINE_ARGS="-s `pwd`/test_machines"

# Create the consul machine
echo "Provisioning Consul machine"
docker-machine ${MACHINE_ARGS} create -d virtualbox ${MACHINE_NAME}

sleep 10

eval $(docker-machine ${MACHINE_ARGS} env ${MACHINE_NAME})
CONSUL_IP=$(docker-machine ${MACHINE_ARGS} ip ${MACHINE_NAME})
export CONSUL=consul://${CONSUL_IP}:8500
export HOST_IP=$CONSUL_IP

# Bootstrap Consul
echo "Bootstrapping Consul"
docker run --name consul -d -p 8400:8400 -p 8500:8500 -p 8600:8600 progrium/consul -server -bootstrap

# Create a new machine for testing
MACHINE_NAME=test-1
echo "Provisioning first testing machine"
docker-machine ${MACHINE_ARGS} create -d virtualbox \
    --swarm-master --swarm \
    --engine-opt cluster-store=${CONSUL} --engine-opt cluster-advertise=eth1:2376 \
    --swarm-discovery ${CONSUL} \
    ${MACHINE_NAME}

sleep 10

eval $(docker-machine ${MACHINE_ARGS} env --swarm ${MACHINE_NAME})

# Should all be up and running
# Create a dummy network
echo "Creating test network"
docker network create testnet

# Spin up a container we want to keep around to demonstrate
echo "Creating nginx container"
docker run --name nginx -d --net testnet nginx

# Now, spin up and down the helloworld image a bunch of times :)
# This will cause the Serf event log to exceed 1024 entries.
# Lets do ~10 at a time
for i in `seq 0 103`; do
    echo "helloworld $i / 103"
    docker run --rm --name demo-container-1 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-2 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-3 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-4 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-5 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-6 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-7 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-8 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-9 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-10 --net testnet hello-world > /dev/null 2>&1 &
    wait
done

# Now, create a new machine and join it to the swarm
MACHINE_NAME=test-2
echo "Provisioning second testing machine"
docker-machine ${MACHINE_ARGS} create -d virtualbox \
    --swarm \
    --engine-opt cluster-store=${CONSUL} --engine-opt cluster-advertise=eth1:2376 \
    --swarm-discovery ${CONSUL} \
    ${MACHINE_NAME}

sleep 10

# Now, bring up debian and try and ping the nginx container
# We're still in the swarm, just use a constraint here
echo "Running test ping"
docker run -t --name deb-test --net testnet -e "constraint:node==test-2" debian ping nginx

I expect that the final command should ping the host successfully, but instead, ping simply responds with "Destination Host Unreachable", although correctly resolves the target address:

PING nginx (10.0.0.2): 56 data bytes
92 bytes from 0243995e1640 (10.0.0.3): Destination Host Unreachable
92 bytes from 0243995e1640 (10.0.0.3): Destination Host Unreachable

After restarting the nginx container, the ping is successful.

I've not seen any other reports of this from a search around, although #962 appears to be quite similar. I'm also not 100% convinced that I've got a full picture of what's going on (does the resolvePeer function resolve just a Serf partner, or should it attempt to resolve a container running elsewhere in the cluster?), but it appears that on all my tests, the machine joining the cluster is relying upon Serf's event replay to figure out what the state of the cluster is.

The text was updated successfully, but these errors were encountered:

mrjana · 2016-04-12T02:55:14Z

@JamieGarside Thanks for the issue report. Yes, Serf's user event buffer is limited. That is why we have the resolvePeer function for any misses on the local node. Can you start docker with debugs enabled in the host you are seeing Destination Host Unreachable message and post the output to see if resolvePeer is facing some issues in doing a cluster-wide query?

JamieGarside · 2016-04-12T09:29:17Z

Thanks for replying @mrjana. The logs from both test machines are attached:

test-1.txt (running nginx)
test-2.txt (attempting to access nginx)

I've done a quick grep over the files and can't find any matches for either "peer" or "query", so I'm assuming that the logging statements from both processQuery and resolvePeer aren't being hit. I do see some errors from netlink on test-1 however:

$ grep netlink test-1.txt time="2016-04-12T08:18:48.539003347Z" level=error msg="Failed to receive from netlink: no buffer space available " time="2016-04-12T08:19:47.960339227Z" level=error msg="Failed to receive from netlink: no buffer space available " time="2016-04-12T08:20:48.493243376Z" level=error msg="Failed to receive from netlink: no buffer space available " time="2016-04-12T08:21:48.281069446Z" level=error msg="Failed to receive from netlink: no buffer space available " time="2016-04-12T08:22:47.946960276Z" level=error msg="Failed to receive from netlink: no buffer space available " time="2016-04-12T08:23:51.677480046Z" level=error msg="Failed to receive from netlink: no buffer space available "

I'm assuming this isn't a big problem though as these are on the machine running nginx, not the machine trying to resolve it.

It does appear that query is never actually being fired though. I attached another Serf agent to test-1's Serf endpoint dumping events with this script:


echo "EVENT: ${SERF_EVENT}" >> serf.log

if [[ $SERF_EVENT == "user" ]]; then
    echo "USER EVENT: ${SERF_USER_EVENT}" >> serf.log
fi

if [[ $SERF_EVENT == "query" ]]; then
    echo "QUERY NAME: ${SERF_QUERY_NAME}" >> serf.log
fi

echo "BODY:" >> serf.log
cat /dev/stdin >> serf.log
echo "" >> serf.log

Then starting Serf with serf agent -join 192.168.99.134:7946 -event-handler=./serf_event_dump.sh. This similarly doesn't contain any query events.

mrjana · 2016-04-12T16:03:08Z

@JamieGarside Failed to receive from netlink could be a potential problem related to this but it is happening on the wrong host. That is the path where we receive the l3 miss notification for every Destination Host Unreachable message that you are seeing and that should trigger a cluster query. But seems like you are either not getting the miss notification on test-2 or it is getting discarded. We will try to recreate this and see if we can reproduce.

sanimej · 2016-04-13T22:54:13Z

@JamieGarside I am able to recreate the issue. Thanks for reporting. The problem is because of a check in the L3 miss handling which inadvertently skips valid misses as well.

JamieGarside · 2016-04-15T13:38:33Z

Thanks for replicating and fixing @sanimej. In the meantime, we have a workaround for our case :)

thaJeztah · 2016-04-23T11:27:22Z

this will be fixed in 1.11.1 through moby/moby#22261 (which was just merged)

JamieGarside changed the title ~~Overlay networking cannot accept new machines after many network events have been triggered~~ Overlay network cannot accept containers on a newly provisioned machine after many network events have been triggered Apr 1, 2016

sanimej mentioned this issue Apr 15, 2016

Correct the check in l3 miss handling in overlay driver #1116

Merged

mrjana closed this as completed in #1116 Apr 15, 2016

sanimej mentioned this issue Apr 22, 2016

Vendor Libnetwork v0.7.0-rc.7 moby/moby#22261

Merged

aboch mentioned this issue May 8, 2016

Vendoring libnetwork b66c038 moby/moby#22582

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overlay network cannot accept containers on a newly provisioned machine after many network events have been triggered #1069

Overlay network cannot accept containers on a newly provisioned machine after many network events have been triggered #1069

JamieGarside commented Apr 1, 2016

mrjana commented Apr 12, 2016

JamieGarside commented Apr 12, 2016

mrjana commented Apr 12, 2016

sanimej commented Apr 13, 2016

JamieGarside commented Apr 15, 2016

thaJeztah commented Apr 23, 2016

Overlay network cannot accept containers on a newly provisioned machine after many network events have been triggered #1069

Overlay network cannot accept containers on a newly provisioned machine after many network events have been triggered #1069

Comments

JamieGarside commented Apr 1, 2016

mrjana commented Apr 12, 2016

JamieGarside commented Apr 12, 2016

mrjana commented Apr 12, 2016

sanimej commented Apr 13, 2016

JamieGarside commented Apr 15, 2016

thaJeztah commented Apr 23, 2016