Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlay network cannot accept containers on a newly provisioned machine after many network events have been triggered #1069

Closed
JamieGarside opened this issue Apr 1, 2016 · 6 comments

Comments

@JamieGarside
Copy link

I'm seeing issues where after a cluster using overlay networking has been running for a period of time, with many containers brought up and down, new machines added to the cluster cannot reliably communicate with existing containers running on other cluster machines.

As an example, if we spin up a cluster, then schedule a container we want to keep running (say, nginx), then add a new machine to the cluster after a period of time, it cannot communicate with the nginx container (and all attempts to communicate yield "Destination Host Unreachable"). If the nginx container is restarted however, containers running on the new machine can communicate successfully.

After a dig through, I'm thinking this is due to how memberships to the cluster are handled. When a machine is added to the cluster, it joins the existing Serf cluster and relays the containers/networks it knows about to the cluster (drivers/overlay/overlay.go:166-171). Of course, the new machine doesn't have any containers and thus nothing to relay. When joining Serf, the ignoreOld flag is false (drivers/overlay/ov_serf.go:76) and as such, it gets an event replay from Serf, normally allowing it to reconstruct the state of the cluster.

After a number of network events have been triggered though, the Serf replay log is truncated (currently at 1024 entries), and thus the new machine does not obtain any routing information for old containers running on existing machines in the cluster. After the old containers are restarted though, the routing information is again broadcast over Serf and hence propagates to the new machine, allowing it to communicate.

I've seen this happen using Docker 1.10.x on stock installs of Debian 8, Ubuntu 14.04LTS and Boot2Docker 1.10.3 as provisioned using docker-machine. The below script gives us a reproducible test case:

#!/bin/bash
set -e
MACHINE_NAME=consul-host
MACHINE_ARGS="-s `pwd`/test_machines"

# Create the consul machine
echo "Provisioning Consul machine"
docker-machine ${MACHINE_ARGS} create -d virtualbox ${MACHINE_NAME}

sleep 10

eval $(docker-machine ${MACHINE_ARGS} env ${MACHINE_NAME})
CONSUL_IP=$(docker-machine ${MACHINE_ARGS} ip ${MACHINE_NAME})
export CONSUL=consul://${CONSUL_IP}:8500
export HOST_IP=$CONSUL_IP

# Bootstrap Consul
echo "Bootstrapping Consul"
docker run --name consul -d -p 8400:8400 -p 8500:8500 -p 8600:8600 progrium/consul -server -bootstrap

# Create a new machine for testing
MACHINE_NAME=test-1
echo "Provisioning first testing machine"
docker-machine ${MACHINE_ARGS} create -d virtualbox \
    --swarm-master --swarm \
    --engine-opt cluster-store=${CONSUL} --engine-opt cluster-advertise=eth1:2376 \
    --swarm-discovery ${CONSUL} \
    ${MACHINE_NAME}

sleep 10

eval $(docker-machine ${MACHINE_ARGS} env --swarm ${MACHINE_NAME})

# Should all be up and running
# Create a dummy network
echo "Creating test network"
docker network create testnet

# Spin up a container we want to keep around to demonstrate
echo "Creating nginx container"
docker run --name nginx -d --net testnet nginx

# Now, spin up and down the helloworld image a bunch of times :)
# This will cause the Serf event log to exceed 1024 entries.
# Lets do ~10 at a time
for i in `seq 0 103`; do
    echo "helloworld $i / 103"
    docker run --rm --name demo-container-1 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-2 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-3 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-4 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-5 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-6 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-7 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-8 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-9 --net testnet hello-world > /dev/null 2>&1 &
    docker run --rm --name demo-container-10 --net testnet hello-world > /dev/null 2>&1 &
    wait
done

# Now, create a new machine and join it to the swarm
MACHINE_NAME=test-2
echo "Provisioning second testing machine"
docker-machine ${MACHINE_ARGS} create -d virtualbox \
    --swarm \
    --engine-opt cluster-store=${CONSUL} --engine-opt cluster-advertise=eth1:2376 \
    --swarm-discovery ${CONSUL} \
    ${MACHINE_NAME}

sleep 10

# Now, bring up debian and try and ping the nginx container
# We're still in the swarm, just use a constraint here
echo "Running test ping"
docker run -t --name deb-test --net testnet -e "constraint:node==test-2" debian ping nginx

I expect that the final command should ping the host successfully, but instead, ping simply responds with "Destination Host Unreachable", although correctly resolves the target address:

PING nginx (10.0.0.2): 56 data bytes
92 bytes from 0243995e1640 (10.0.0.3): Destination Host Unreachable
92 bytes from 0243995e1640 (10.0.0.3): Destination Host Unreachable

After restarting the nginx container, the ping is successful.

I've not seen any other reports of this from a search around, although #962 appears to be quite similar. I'm also not 100% convinced that I've got a full picture of what's going on (does the resolvePeer function resolve just a Serf partner, or should it attempt to resolve a container running elsewhere in the cluster?), but it appears that on all my tests, the machine joining the cluster is relying upon Serf's event replay to figure out what the state of the cluster is.

@JamieGarside JamieGarside changed the title Overlay networking cannot accept new machines after many network events have been triggered Overlay network cannot accept containers on a newly provisioned machine after many network events have been triggered Apr 1, 2016
@mrjana
Copy link
Contributor

mrjana commented Apr 12, 2016

@JamieGarside Thanks for the issue report. Yes, Serf's user event buffer is limited. That is why we have the resolvePeer function for any misses on the local node. Can you start docker with debugs enabled in the host you are seeing Destination Host Unreachable message and post the output to see if resolvePeer is facing some issues in doing a cluster-wide query?

@JamieGarside
Copy link
Author

Thanks for replying @mrjana. The logs from both test machines are attached:

test-1.txt (running nginx)
test-2.txt (attempting to access nginx)

I've done a quick grep over the files and can't find any matches for either "peer" or "query", so I'm assuming that the logging statements from both processQuery and resolvePeer aren't being hit. I do see some errors from netlink on test-1 however:

$ grep netlink test-1.txt time="2016-04-12T08:18:48.539003347Z" level=error msg="Failed to receive from netlink: no buffer space available " time="2016-04-12T08:19:47.960339227Z" level=error msg="Failed to receive from netlink: no buffer space available " time="2016-04-12T08:20:48.493243376Z" level=error msg="Failed to receive from netlink: no buffer space available " time="2016-04-12T08:21:48.281069446Z" level=error msg="Failed to receive from netlink: no buffer space available " time="2016-04-12T08:22:47.946960276Z" level=error msg="Failed to receive from netlink: no buffer space available " time="2016-04-12T08:23:51.677480046Z" level=error msg="Failed to receive from netlink: no buffer space available "

I'm assuming this isn't a big problem though as these are on the machine running nginx, not the machine trying to resolve it.

It does appear that query is never actually being fired though. I attached another Serf agent to test-1's Serf endpoint dumping events with this script:


echo "EVENT: ${SERF_EVENT}" >> serf.log

if [[ $SERF_EVENT == "user" ]]; then
    echo "USER EVENT: ${SERF_USER_EVENT}" >> serf.log
fi

if [[ $SERF_EVENT == "query" ]]; then
    echo "QUERY NAME: ${SERF_QUERY_NAME}" >> serf.log
fi

echo "BODY:" >> serf.log
cat /dev/stdin >> serf.log
echo "" >> serf.log

Then starting Serf with serf agent -join 192.168.99.134:7946 -event-handler=./serf_event_dump.sh. This similarly doesn't contain any query events.

@mrjana
Copy link
Contributor

mrjana commented Apr 12, 2016

@JamieGarside Failed to receive from netlink could be a potential problem related to this but it is happening on the wrong host. That is the path where we receive the l3 miss notification for every Destination Host Unreachable message that you are seeing and that should trigger a cluster query. But seems like you are either not getting the miss notification on test-2 or it is getting discarded. We will try to recreate this and see if we can reproduce.

@sanimej
Copy link

sanimej commented Apr 13, 2016

@JamieGarside I am able to recreate the issue. Thanks for reporting. The problem is because of a check in the L3 miss handling which inadvertently skips valid misses as well.

@JamieGarside
Copy link
Author

Thanks for replicating and fixing @sanimej. In the meantime, we have a workaround for our case :)

@thaJeztah
Copy link
Member

this will be fixed in 1.11.1 through moby/moby#22261 (which was just merged)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants