Skip to content
This repository has been archived by the owner on Oct 17, 2022. It is now read-only.

Commit

Permalink
3.x fair share scheduler documetation
Browse files Browse the repository at this point in the history
A short description on how the algorithm works along with the
configuration sections.

Main PR: apache/couchdb#3364
  • Loading branch information
nickva committed Mar 15, 2021
1 parent eaad75f commit db0624a
Show file tree
Hide file tree
Showing 2 changed files with 94 additions and 0 deletions.
49 changes: 49 additions & 0 deletions src/config/replicator.rst
Expand Up @@ -249,3 +249,52 @@ Replicator Database Configuration

.. note::
In version 2.2, the session plugin is considered experimental and is not enabled by default.

.. config:option:: usage_coeff
.. versionadded:: 3.2.0

Usage coefficient decays historic fair share usage every
scheduling cycle. The value must be between 0.0 and 1.0. Lower
values will ensure historic usage decays quicker and higher
values means it will be remembered longer::

[replicator]
usage_coeff = 0.5

.. config:option:: priority_coeff
.. versionadded:: 3.2.0

Priority coefficient decays all the job priorities such that they slowly
drift towards the front of the run queue. This coefficient defines a maximum
time window over which this algorithm would operate. For example, if this
value is too small (0.1), after a few cycles quite a few jobs would end up at
priority 0, and would render this algorithm useless. The default value of
0.98 is picked such that if a job ran for one scheduler cycle, then didn't
get to run for 7 hours, it would still have priority > 0. 7 hours was picked
as it was close enough to 8 hours which is the default maximum error backoff
interval::

[replicator]
priority_coeff = 0.98

.. _config/replicator.shares:

Fair Share Replicator Share Allocation
======================================

.. config:section:: replicator.shares :: Per-Database Fair Share Allocation
.. config:option:: $replicator_db
.. versionadded:: 3.2.0

Fair share configuration section. More shares result in a
higher chance that jobs from that db get to run. The default
value is 100, minimum is 1 and maximum is 1000. The
configuration may be set even if the database does not exit::

[replicator.shares]
_replicator_db = 100
$another/_replicator_db = 100
45 changes: 45 additions & 0 deletions src/replication/replicator.rst
Expand Up @@ -21,6 +21,11 @@ Replicator Database
anymore. There are new replication job states and new API endpoints
``_scheduler/jobs`` and ``_scheduler/docs``.

.. versionchanged:: 3.2.0 Fair share scheduling was
introduced. Multiple ``_replicator`` databases get an equal chance
(configurable) of running their jobs. Previously replication jobs
were scheduled without any regard of their originating database.

The ``_replicator`` database works like any other in CouchDB, but
documents added to it will trigger replications. Create (``PUT`` or
``POST``) a document to start replication. ``DELETE`` a replication
Expand Down Expand Up @@ -539,6 +544,46 @@ After this operation, replication pulling from server X will be stopped
and the replications in the ``_replicator`` database (pulling from
servers A and B) will continue.

Fair Share Job Scheduling
=========================

When multiple ``_replicator`` databases are used, and the total number
of jobs on any node is greater than ``max_jobs``, replication jobs
will be scheduled such that each of the ``_replicator`` databases by
default get an equal chance of running their jobs.

This is accomplished by assigning a number of "shares" to each
``_replicator`` database and then automatically adjusting the
proportion of running jobs to match each database's proportion of
shares. By default, each ``_replicator`` database is assigned 100
shares. It is possible to alter the share assignments for each
individual ``_replicator`` database in the :ref:`[replicator.shares]
<config/replicator.shares>` configuration section.

The fair share behavior is perhaps easier described with a set of
examples. Each example assumes the default of ``max_jobs = 500``, and
two replicator databases: ``_replicator`` and ``another/_replicator``.

Example 1: If ``_replicator`` has 1000 jobs and
``another/_replicator`` has 10, the scheduler will run about 490 jobs
from ``_replicator`` and 10 jobs from ``another/_replicator``.

Example 2: If ``_replicator`` has 200 jobs and ``another/_replicator``
also has 200 jobs, all 400 jobs will get to run as the sum of all the
jobs is less than the ``max_jobs`` limit.

Example 3: If both replicator databases have 1000 jobs each, the
scheduler will run about 250 jobs from each database on average.

Example 4: If both replicator databases have 1000 jobs each, but
``_replicator`` was assigned 400 shares, then on average the scheduler
would run about 400 jobs from ``_replicator`` and 100 jobs from
``_another/replicator``.

The proportions described in the examples are approximate and might
oscillate a bit, and also might take anywhere from tens of minutes to
an hour to converge.

Replicating the replicator database
===================================

Expand Down

0 comments on commit db0624a

Please sign in to comment.