Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Feature idea: on startup, read subnet.env and attempt to acquire that lease #610
We are concerned about preserving network connectivity for containers in the face of etcd outages and data loss. For example, during upgrades of etcd, Cloud Foundry users have encountered various "split-brain"-like scenarios. These scenarios are most easily resolved by stopping all etcd nodes, removing all the data directories, and finally restarting all the nodes. But CF users expect their containers to remain connected to the network throughout. Therefore, we prefer that etcd clients like flannel be resilient to etcd outages and data loss.
In manual testing of etcd outages that included data loss, we've found that flannel networks retain connectivity only if the
Consider this sequence:
In testing, we've found that after step 6, host B correctly re-acquires its existing subnet lease -- the one in use by its running containers. But host A does not: instead it picks a new subnet at random, which invariably does not match the IP addresses of containers it is hosting. As a result, the containers on host A cannot reach host B.
What do y'all think? Would you be open to a PR?