New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to temporarily disable scheduling to a node (eg. for maintainance) ? #1508

Closed
ikreymer opened this Issue Dec 7, 2015 · 8 comments

Comments

Projects
None yet
6 participants
@ikreymer

ikreymer commented Dec 7, 2015

I am running a swarm cluster where the containers are distributed using the default spread scheduling.

However, at times I would like to be able to temporarily stop scheduling to a particular node, eg. for example, to gradually drain it of containers so that it can be taken down for maintenance.

Currently I just stop the swarm-agent container on the node, but this is less than ideal (cleanup tasks may not run), is there any other way to do this?

@dongluochen

This comment has been minimized.

Show comment
Hide comment
@dongluochen

dongluochen Dec 7, 2015

Contributor

@ikreymer You can use constraints to exclude the node under maintenance from scheduler, e.g., docker -H swarm_ip:swarm_port run -e constraint:node!=#NODE_UNDER_MAINTENANCE# hello-world. Built-in support may be available in the future for node exclusion.

Contributor

dongluochen commented Dec 7, 2015

@ikreymer You can use constraints to exclude the node under maintenance from scheduler, e.g., docker -H swarm_ip:swarm_port run -e constraint:node!=#NODE_UNDER_MAINTENANCE# hello-world. Built-in support may be available in the future for node exclusion.

@ikreymer

This comment has been minimized.

Show comment
Hide comment
@ikreymer

ikreymer Dec 7, 2015

If I understand https://docs.docker.com/swarm/scheduler/filter/ correctly

To tag a node with a specific set of key/value pairs, one must pass a list of --label options at docker startup time

It looks like labels are set statically on startup. The idea is to be able to dynamically set the node for maintenance while its running, eg. 'take it out of the pool' so this won't help. I am using swarm as a kind of load balancer and would like to disable a node (for example, to make updates on it, etc..) and then put it back in.

If labels could be added dynamically, that could work.

ikreymer commented Dec 7, 2015

If I understand https://docs.docker.com/swarm/scheduler/filter/ correctly

To tag a node with a specific set of key/value pairs, one must pass a list of --label options at docker startup time

It looks like labels are set statically on startup. The idea is to be able to dynamically set the node for maintenance while its running, eg. 'take it out of the pool' so this won't help. I am using swarm as a kind of load balancer and would like to disable a node (for example, to make updates on it, etc..) and then put it back in.

If labels could be added dynamically, that could work.

@abronan

This comment has been minimized.

Show comment
Hide comment
@abronan

abronan Dec 7, 2015

Contributor

@ikreymer Thanks for the feedback. I think this is the scope of #1486 which aims to improve Node Management. Draining a node that goes down for maintenance should fall into the scope of this proposal. So we'll make sure we address that scenario as part of the changes.

Using labels for this is impractical (and as you outlined, they cannot be added dynamically, you have to stop the Engine and restart it).

Contributor

abronan commented Dec 7, 2015

@ikreymer Thanks for the feedback. I think this is the scope of #1486 which aims to improve Node Management. Draining a node that goes down for maintenance should fall into the scope of this proposal. So we'll make sure we address that scenario as part of the changes.

Using labels for this is impractical (and as you outlined, they cannot be added dynamically, you have to stop the Engine and restart it).

@ikreymer

This comment has been minimized.

Show comment
Hide comment
@ikreymer

ikreymer Dec 7, 2015

Thanks for the quick response, looking at the proposal, I think this is also similar to #1341

ikreymer commented Dec 7, 2015

Thanks for the quick response, looking at the proposal, I think this is also similar to #1341

@aluzzardi

This comment has been minimized.

Show comment
Hide comment
@aluzzardi

aluzzardi Dec 14, 2015

Contributor

In the meantime, there's a hack you can do.

You can get around the missing functionality of dynamically changing node labels by running containers with a specific set of labels on drained machines and applying affinities to never co-schedule containers next to them.

Example:

  1. Create a container named "node-123-drain" on the machine "node-123" and label it as "drained=true".
docker create --name node-123-drain --label drained=true -e constraint:node==node-123 busybox

Note that I'm using docker create rather than docker run. The container needs not to be running and take resources, it just needs to exist in the machine. I'm using "busybox" as it's a tiny image, but you could use anything else, preferably one that is already on the machine so there's no need to pull new content.

  1. Run all your containers with an affinity so they never get co-scheduled on a machine where a "drain container" is present:
docker run -d -e 'affinity:drained!=true' [...]

This will force Swarm to never run your containers next to another one labeled "drained = true", therefore in this example it will skip "node-123".

  1. Put the machine back in rotation by removing the drain container
docker rm -f node-123-drain
Contributor

aluzzardi commented Dec 14, 2015

In the meantime, there's a hack you can do.

You can get around the missing functionality of dynamically changing node labels by running containers with a specific set of labels on drained machines and applying affinities to never co-schedule containers next to them.

Example:

  1. Create a container named "node-123-drain" on the machine "node-123" and label it as "drained=true".
docker create --name node-123-drain --label drained=true -e constraint:node==node-123 busybox

Note that I'm using docker create rather than docker run. The container needs not to be running and take resources, it just needs to exist in the machine. I'm using "busybox" as it's a tiny image, but you could use anything else, preferably one that is already on the machine so there's no need to pull new content.

  1. Run all your containers with an affinity so they never get co-scheduled on a machine where a "drain container" is present:
docker run -d -e 'affinity:drained!=true' [...]

This will force Swarm to never run your containers next to another one labeled "drained = true", therefore in this example it will skip "node-123".

  1. Put the machine back in rotation by removing the drain container
docker rm -f node-123-drain
@ikreymer

This comment has been minimized.

Show comment
Hide comment
@ikreymer

ikreymer Dec 14, 2015

Thanks @aluzzardi that is clever, I will try this.

ikreymer commented Dec 14, 2015

Thanks @aluzzardi that is clever, I will try this.

@batmat

This comment has been minimized.

Show comment
Hide comment
@batmat

batmat Jan 21, 2016

Another variant of workaround I just used, based on the resource constraints.
(Note: it requires to interact directly with the docker daemon of the node to be removed).

Given a node with 30 GB of RAM, run the following container:

$ docker run --name maintenance-filling -m 30G hello-world

As Swarm checks for remaining resources (RAM here), it won't schedule new containers on this node until you remove that maintenance-filling one.

HTH

batmat commented Jan 21, 2016

Another variant of workaround I just used, based on the resource constraints.
(Note: it requires to interact directly with the docker daemon of the node to be removed).

Given a node with 30 GB of RAM, run the following container:

$ docker run --name maintenance-filling -m 30G hello-world

As Swarm checks for remaining resources (RAM here), it won't schedule new containers on this node until you remove that maintenance-filling one.

HTH

@amitshukla

This comment has been minimized.

Show comment
Hide comment
@amitshukla

amitshukla Jan 29, 2016

The immediate question is resolved, and Node management (#1486) is tracking a complete solution.

amitshukla commented Jan 29, 2016

The immediate question is resolved, and Node management (#1486) is tracking a complete solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment