Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional information about scaling a service #178

Merged
merged 1 commit into from
Oct 20, 2016
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 40 additions & 21 deletions engine/swarm/admin_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ maintain the swarm.

This article covers the following swarm administration tasks:

* [Using a static IP for manager node advertise address](admin_guide.md#use-a-static-ip-for-manager-node-advertise-address)
* [Adding manager nodes for fault tolerance](admin_guide.md#add-manager-nodes-for-fault-tolerance)
* [Distributing manager nodes](admin_guide.md#distribute-manager-nodes)
* [Running manager-only nodes](admin_guide.md#run-manager-only-nodes)
* [Backing up the swarm state](admin_guide.md#back-up-the-swarm-state)
* [Monitoring the swarm health](admin_guide.md#monitor-swarm-health)
* [Troubleshooting a manager node](admin_guide.md#troubleshoot-a-manager-node)
* [Forcefully removing a node](admin_guide.md#force-remove-a-node)
* [Recovering from disaster](admin_guide.md#recover-from-disaster)
* [Using a static IP for manager node advertise address](#use-a-static-ip-for-manager-node-advertise-address)
* [Adding manager nodes for fault tolerance](#add-manager-nodes-for-fault-tolerance)
* [Distributing manager nodes](#distribute-manager-nodes)
* [Running manager-only nodes](#run-manager-only-nodes)
* [Backing up the swarm state](#back-up-the-swarm-state)
* [Monitoring the swarm health](#monitor-swarm-health)
* [Troubleshooting a manager node](#troubleshoot-a-manager-node)
* [Forcefully removing a node](#force-remove-a-node)
* [Recovering from disaster](#recover-from-disaster)

Refer to [How nodes work](how-swarm-mode-works/nodes.md)
for a brief overview of Docker Swarm mode and the difference between manager and
Expand Down Expand Up @@ -91,7 +91,7 @@ guaranteed if you encounter more than two network partitions.
For example, in a swarm with *5 nodes*, if you lose *3 nodes*, you don't have a
quorum. Therefore you can't add or remove nodes until you recover one of the
unavailable manager nodes or recover the swarm with disaster recovery
commands. See [Recover from disaster](admin_guide.md#recover-from-disaster).
commands. See [Recover from disaster](#recover-from-disaster).

While it is possible to scale a swarm down to a single manager node, it is
impossible to demote the last manager node. This ensures you maintain access to
Expand Down Expand Up @@ -154,7 +154,7 @@ directory:
```

Back up the `raft` data directory often so that you can use it in case of
[disaster recovery](admin_guide.md#recover-from-disaster). Then you can take the `raft`
[disaster recovery](#recover-from-disaster). Then you can take the `raft`
directory of one of the manager nodes to restore to a new swarm.

## Monitor swarm health
Expand Down Expand Up @@ -263,13 +263,32 @@ manager node of a single-node swarm. It discards swarm membership information
that existed before the loss of the quorum but it retains data necessary to the
Swarm such as services, tasks and the list of worker nodes.

### Joining a previously failed node

If a node becomes unavailable, it cannot communicate with the rest of the swarm
and its workload is redistributed among the other nodes.
If access to that node is restored, it will join the swarm automatically, but it
will join with no workload because the containers it was assigned have been
reassigned. The node will only receive new workloads when the swarm is rebalanced.
To force the swarm to be rebalanced, you can
[update](../reference/commandline/service_update/) or
[scale](../reference/commandline/service_scale/) the service.
### Forcing the swarm to rebalance

Generally, you do not need to force the swarm to rebalance its tasks. When you
add a new node to a swarm, or a node reconnects to the swarm after a
period of unavailability, the swarm does not automatically give a workload to
the idle node. This is a design decision. If the swarm periodically shifted tasks
to different nodes for the sake of balance, the clients using those tasks would
be disrupted. The goal is to avoid disrupting running services for the sake of
balance across the swarm. When new tasks start, or when a node with running
tasks becomes unavailable, those tasks are given to less busy nodes. The goal
is eventual balance, with minimal disruption to the end user.

If you are concerned about an even balance of load and don't mind disrupting
running tasks, you can force your swarm to re-balance by temporarily scaling
the service upward.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this PR, if accepted, would probably allow rebalancing without having to change the scale; moby/swarmkit#1664

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I can't talk about it until / unless it is. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know; it was just a heads-up 😄


Use `docker service inspect --pretty <servicename>` to see the configured scale
of a service. When you use `docker service scale`, the nodes with the lowest
number of tasks are targeted to receive the new workloads. There may be multiple
under-loaded nodes in your swarm. You may need to scale the service up by modest
increments a few times to achieve the balance you want across all the nodes.

When the load is balanced to your satisfaction, you can scale the service back
down to the original scale. You can use `docker service ps` to assess the current
balance of your service across nodes.

See also
[`docker service scale`](../reference/commandline/service_scale.md) and
[`docker service ps`](../reference/commandline/service_ps.md).