Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Is there a way to temporarily disable scheduling to a node (eg. for maintainance) ? #1508
I am running a swarm cluster where the containers are distributed using the default spread scheduling.
However, at times I would like to be able to temporarily stop scheduling to a particular node, eg. for example, to gradually drain it of containers so that it can be taken down for maintenance.
Currently I just stop the
If I understand https://docs.docker.com/swarm/scheduler/filter/ correctly
It looks like labels are set statically on startup. The idea is to be able to dynamically set the node for maintenance while its running, eg. 'take it out of the pool' so this won't help. I am using swarm as a kind of load balancer and would like to disable a node (for example, to make updates on it, etc..) and then put it back in.
If labels could be added dynamically, that could work.
@ikreymer Thanks for the feedback. I think this is the scope of #1486 which aims to improve Node Management. Draining a node that goes down for maintenance should fall into the scope of this proposal. So we'll make sure we address that scenario as part of the changes.
Using labels for this is impractical (and as you outlined, they cannot be added dynamically, you have to stop the Engine and restart it).
In the meantime, there's a hack you can do.
You can get around the missing functionality of dynamically changing node labels by running containers with a specific set of labels on drained machines and applying affinities to never co-schedule containers next to them.
Note that I'm using docker create rather than docker run. The container needs not to be running and take resources, it just needs to exist in the machine. I'm using "busybox" as it's a tiny image, but you could use anything else, preferably one that is already on the machine so there's no need to pull new content.
This will force Swarm to never run your containers next to another one labeled "drained = true", therefore in this example it will skip "node-123".
Another variant of workaround I just used, based on the resource constraints.
Given a node with 30 GB of RAM, run the following container:
As Swarm checks for remaining resources (RAM here), it won't schedule new containers on this node until you remove that