Skip to content

Clearly explain that draining a swarm node does not wait for replcas to be started on an active node before stopping tasks on a node being drained #9917

Open
@airmnichols

Description

@airmnichols

File: engine/swarm/swarm-tutorial/drain-node.md

States:

"Sometimes, such as planned maintenance times, you need to set a node to DRAIN availability. DRAIN availability prevents a node from receiving new tasks from the swarm manager. It also means the manager stops tasks running on the node and launches replica tasks on a node with ACTIVE availability."

This is misleading in that a drain operation has no logic to maintain the configured number of replicas during a drain operation.

This should be clearly explained.

If you have a two worker node swarm and have performed maintenance on worker node 1, this has all replicas running on worker node 2.

If you then drain worker node 2 for patching, it causes downtime because swarm doesn't for example, stop replica 1 on node 2, start replica 1 on node 1 before moving on to do the same for replica 2.

The current design causes downtime for applications.
Support advised this is expected behavior and a workaround is to reconfigure all running services to have more replicas to force them to start on another worker node before issuing a drain command for a node.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions