Skip to content

Commit

Permalink
Merge pull request #51826 from zdover23/wip-doc-2023-05-30-backport-5…
Browse files Browse the repository at this point in the history
…1798-to-pacific

pacific: doc/rados: edit balancer.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
  • Loading branch information
zdover23 committed May 30, 2023
2 parents d8c5d34 + d3881ad commit 1a16f1a
Showing 1 changed file with 89 additions and 74 deletions.
163 changes: 89 additions & 74 deletions doc/rados/operations/balancer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,15 @@
Balancer
========

The *balancer* can optimize the placement of PGs across OSDs in
order to achieve a balanced distribution, either automatically or in a
supervised fashion.
The *balancer* can optimize the allocation of placement groups (PGs) across
OSDs in order to achieve a balanced distribution. The balancer can operate
either automatically or in a supervised fashion.


Status
------

The current status of the balancer can be checked at any time with:
To check the current status of the balancer, run the following command:

.. prompt:: bash $

Expand All @@ -20,70 +21,78 @@ The current status of the balancer can be checked at any time with:
Automatic balancing
-------------------

The automatic balancing feature is enabled by default in ``upmap``
mode. Please refer to :ref:`upmap` for more details. The balancer can be
turned off with:
When the balancer is in ``upmap`` mode, the automatic balancing feature is
enabled by default. For more details, see :ref:`upmap`. To disable the
balancer, run the following command:

.. prompt:: bash $

ceph balancer off

The balancer mode can be changed to ``crush-compat`` mode, which is
backward compatible with older clients, and will make small changes to
the data distribution over time to ensure that OSDs are equally utilized.
The balancer mode can be changed from ``upmap`` mode to ``crush-compat`` mode.
``crush-compat`` mode is backward compatible with older clients. In
``crush-compat`` mode, the balancer automatically makes small changes to the
data distribution in order to ensure that OSDs are utilized equally.


Throttling
----------

No adjustments will be made to the PG distribution if the cluster is
degraded (e.g., because an OSD has failed and the system has not yet
healed itself).
If the cluster is degraded (that is, if an OSD has failed and the system hasn't
healed itself yet), then the balancer will not make any adjustments to the PG
distribution.

When the cluster is healthy, the balancer will throttle its changes
such that the percentage of PGs that are misplaced (i.e., that need to
be moved) is below a threshold of (by default) 5%. The
``target_max_misplaced_ratio`` threshold can be adjusted with:
When the cluster is healthy, the balancer will incrementally move a small
fraction of unbalanced PGs in order to improve distribution. This fraction
will not exceed a certain threshold that defaults to 5%. To adjust this
``target_max_misplaced_ratio`` threshold setting, run the following command:

.. prompt:: bash $

ceph config set mgr target_max_misplaced_ratio .07 # 7%

Set the number of seconds to sleep in between runs of the automatic balancer:
The balancer sleeps between runs. To set the number of seconds for this
interval of sleep, run the following command:

.. prompt:: bash $

ceph config set mgr mgr/balancer/sleep_interval 60

Set the time of day to begin automatic balancing in HHMM format:
To set the time of day (in HHMM format) at which automatic balancing begins,
run the following command:

.. prompt:: bash $

ceph config set mgr mgr/balancer/begin_time 0000

Set the time of day to finish automatic balancing in HHMM format:
To set the time of day (in HHMM format) at which automatic balancing ends, run
the following command:

.. prompt:: bash $

ceph config set mgr mgr/balancer/end_time 2359

Restrict automatic balancing to this day of the week or later.
Uses the same conventions as crontab, 0 is Sunday, 1 is Monday, and so on:
Automatic balancing can be restricted to certain days of the week. To restrict
it to a specific day of the week or later (as with crontab, ``0`` is Sunday,
``1`` is Monday, and so on), run the following command:

.. prompt:: bash $

ceph config set mgr mgr/balancer/begin_weekday 0

Restrict automatic balancing to this day of the week or earlier.
Uses the same conventions as crontab, 0 is Sunday, 1 is Monday, and so on:
To restrict automatic balancing to a specific day of the week or earlier
(again, ``0`` is Sunday, ``1`` is Monday, and so on), run the following
command:

.. prompt:: bash $

ceph config set mgr mgr/balancer/end_weekday 6

Pool IDs to which the automatic balancing will be limited.
The default for this is an empty string, meaning all pools will be balanced.
The numeric pool IDs can be gotten with the :command:`ceph osd pool ls detail` command:
Automatic balancing can be restricted to certain pools. By default, the value
of this setting is an empty string, so that all pools are automatically
balanced. To restrict automatic balancing to specific pools, retrieve their
numeric pool IDs (by running the :command:`ceph osd pool ls detail` command),
and then run the following command:

.. prompt:: bash $

Expand All @@ -93,43 +102,41 @@ The numeric pool IDs can be gotten with the :command:`ceph osd pool ls detail` c
Modes
-----

There are currently two supported balancer modes:
There are two supported balancer modes:

#. **crush-compat**. The CRUSH compat mode uses the compat weight-set
feature (introduced in Luminous) to manage an alternative set of
weights for devices in the CRUSH hierarchy. The normal weights
should remain set to the size of the device to reflect the target
amount of data that we want to store on the device. The balancer
then optimizes the weight-set values, adjusting them up or down in
small increments, in order to achieve a distribution that matches
the target distribution as closely as possible. (Because PG
placement is a pseudorandom process, there is a natural amount of
variation in the placement; by optimizing the weights we
counter-act that natural variation.)
#. **crush-compat**. This mode uses the compat weight-set feature (introduced
in Luminous) to manage an alternative set of weights for devices in the
CRUSH hierarchy. When the balancer is operating in this mode, the normal
weights should remain set to the size of the device in order to reflect the
target amount of data intended to be stored on the device. The balancer will
then optimize the weight-set values, adjusting them up or down in small
increments, in order to achieve a distribution that matches the target
distribution as closely as possible. (Because PG placement is a pseudorandom
process, it is subject to a natural amount of variation; optimizing the
weights serves to counteract that natural variation.)

Notably, this mode is *fully backwards compatible* with older
clients: when an OSDMap and CRUSH map is shared with older clients,
we present the optimized weights as the "real" weights.
Note that this mode is *fully backward compatible* with older clients: when
an OSD Map and CRUSH map are shared with older clients, Ceph presents the
optimized weights as the "real" weights.

The primary restriction of this mode is that the balancer cannot
handle multiple CRUSH hierarchies with different placement rules if
the subtrees of the hierarchy share any OSDs. (This is normally
not the case, and is generally not a recommended configuration
because it is hard to manage the space utilization on the shared
OSDs.)
The primary limitation of this mode is that the balancer cannot handle
multiple CRUSH hierarchies with different placement rules if the subtrees of
the hierarchy share any OSDs. (Such sharing of OSDs is not typical and,
because of the difficulty of managing the space utilization on the shared
OSDs, is generally not recommended.)

#. **upmap**. Starting with Luminous, the OSDMap can store explicit
mappings for individual OSDs as exceptions to the normal CRUSH
placement calculation. These `upmap` entries provide fine-grained
control over the PG mapping. This CRUSH mode will optimize the
placement of individual PGs in order to achieve a balanced
distribution. In most cases, this distribution is "perfect," which
an equal number of PGs on each OSD (+/-1 PG, since they might not
divide evenly).
#. **upmap**. In Luminous and later releases, the OSDMap can store explicit
mappings for individual OSDs as exceptions to the normal CRUSH placement
calculation. These ``upmap`` entries provide fine-grained control over the
PG mapping. This balancer mode optimizes the placement of individual PGs in
order to achieve a balanced distribution. In most cases, the resulting
distribution is nearly perfect: that is, there is an equal number of PGs on
each OSD (±1 PG, since the total number might not divide evenly).

Note that using upmap requires that all clients be Luminous or newer.
To use``upmap``, all clients must be Luminous or newer.

The default mode is ``upmap``. The mode can be adjusted with:
The default mode is ``upmap``. The mode can be changed to ``crush-compat`` by
running the following command:

.. prompt:: bash $

Expand All @@ -138,69 +145,77 @@ The default mode is ``upmap``. The mode can be adjusted with:
Supervised optimization
-----------------------

The balancer operation is broken into a few distinct phases:
Supervised use of the balancer can be understood in terms of three distinct
phases:

#. building a *plan*
#. evaluating the quality of the data distribution, either for the current PG distribution, or the PG distribution that would result after executing a *plan*
#. executing the *plan*
#. building a plan
#. evaluating the quality of the data distribution, either for the current PG
distribution or for the PG distribution that would result after executing a
plan
#. executing the plan

To evaluate and score the current distribution:
To evaluate the current distribution, run the following command:

.. prompt:: bash $

ceph balancer eval

You can also evaluate the distribution for a single pool with:
To evaluate the distribution for a single pool, run the following command:

.. prompt:: bash $

ceph balancer eval <pool-name>

Greater detail for the evaluation can be seen with:
To see the evaluation in greater detail, run the following command:

.. prompt:: bash $

ceph balancer eval-verbose ...

The balancer can generate a plan, using the currently configured mode, with:

To instruct the balancer to generate a plan (using the currently configured
mode), make up a name (any useful identifying string) for the plan, and run the
following command:

.. prompt:: bash $

ceph balancer optimize <plan-name>

The name is provided by the user and can be any useful identifying string. The contents of a plan can be seen with:
To see the contents of a plan, run the following command:

.. prompt:: bash $

ceph balancer show <plan-name>

All plans can be shown with:
To display all plans, run the following command:

.. prompt:: bash $

ceph balancer ls

Old plans can be discarded with:
To discard an old plan, run the following command:

.. prompt:: bash $

ceph balancer rm <plan-name>

Currently recorded plans are shown as part of the status command:
To see currently recorded plans, examine the output of the following status
command:

.. prompt:: bash $

ceph balancer status

The quality of the distribution that would result after executing a plan can be calculated with:
To evaluate the distribution that would result from executing a specific plan,
run the following command:

.. prompt:: bash $

ceph balancer eval <plan-name>

Assuming the plan is expected to improve the distribution (i.e., it has a lower score than the current cluster state), the user can execute that plan with:
If a plan is expected to improve the distribution (that is, the plan's score is
lower than the current cluster state's score), you can execute that plan by
running the following command:

.. prompt:: bash $

ceph balancer execute <plan-name>

0 comments on commit 1a16f1a

Please sign in to comment.