Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
Prevent flapping slave from rejoining cluster #1428
This adds a node to zk (
/cc @ssalinas Is the right way to go about this?
This adds a node to zk (`/singularity/inactive`) that keeps just contains an array of hosts whose slaves have been marked as inactive. When Singularity checks if an offer should be accepted, it grabs the node from zk. If the offer is from a slave on a bad host, Singularity will discard it.
Complete the functions that are actually used to mark slaves as activated or deactivated. Previously I was manually editing the list in zk.
Add a pair of tests for the the `InactiveSlaveManager`. This is mostly just to make sure that I get what's going on; in practice the tests don't do much more than run Curator through a very small trial run.
When a previously-seen slave on a host which is marked as inactive attempts to join the cluster, it is now marked as `DECOMMISSIONED`. Previously, it was ignored and nothing actually happened to it. This actually will stop it from accepting offers, as well as provide visibility into what is actually going on w/r/t the flapping slave.
Next step toward being able to mark a machine for a slave up-for-review via the UI. Previously you would have needed to manually edit the node in ZK in order mark a node as inactive. Notably, un-marking this host as inactive will not allow the slave to being accepting offers until the slave is also un-marked as decom'ed or it disappears and reappears with a new slaveID. So restoring a slave will likely be a two step process.
Adds a button on each slave for marking the host that it's on as active/inactive. When there are hosts that are marked as inactive, it will also display a list of all hosts that have been marked as inactive.
In particular, it changes a handful of phrasing issues on the UI and reorganizes how the data is stored in zk. In particular, rather than single node which contains an array as children, it now has a main node whose (empty) children represent the hosts that have been deactivated. There is now an additional method which simply checks whether a host is active. This allows the query to zk to be deferred until it actually receives an offer that reveals a new slave.