Fair Share Replication Scheduler Implementation (3.x) #3364

nickva · 2021-02-08T20:43:25Z

Fair share replication scheduler allows configuring job priorities
per-replicator db.

Previously jobs from all the replication dbs would be added to the scheduler
and run in a round-robin order. This update makes it possible to specify the
relative priority of jobs from different databases. For example, there could be
low, high and default priority _replicator dbs.

The original algorithm comes from the A Fair Share
Scheduler paper by Judy Kay and Piers Lauder. A summary of how
the algorithm works is included in the top level comment in the
couch_replicator_share module.

There is minimal modification to the main scheduler logic. Besides the
share accounting logic each cycle, the other changes are:

Running and stopping candidates are now picked based on the priority first,
and then on their last_started timestamp.
When jobs finish executing mid-cycle, their charges are accounted for. That
holds for jobs which terminate normally, are removed by the user, or crash.

Other interesting aspects are the interaction with the error back-off mechanism
and how one-shot replications are treated:

The exponential error back-off mechanism is unaltered and takes precedence
over the priority values. That means unhealthy jobs are rejected and
"penalized" before the priority value is even looked at.
One-shot replications, once started, are not stopped during each scheduling
cycle unless the operator manually adjusts the max_jobs parameter. That
behavior is necessary to preserve the "snapshot" semantics and is retained in
this update.

nickva · 2021-02-19T23:05:33Z

To help test the PR I created a silly script which runs with a local ./dev/run -n1 cluster. It configures replication jobs and shares for three _replicator dbs and then continuously samples the proportion of running jobs in each db:

https://gist.github.com/nickva/7e86b3df19537a60372217e0f68b693a

So for the default parameters it might output something like:

 ** run stats num dbs: 3, total jobs: 100
rdb1/_replicator 66 66%
rdb2/_replicator 23 23%
rdb3/_replicator 11 11%

 ** run stats num dbs: 3, total jobs: 100
rdb1/_replicator 61 61%
rdb2/_replicator 31 31%
rdb3/_replicator 8 8%

With a configuration where the shares are all even (100) but the dbs have an uneven number of jobs, after 5-10 minutes there should be a roughly even share of running jobs from each db, even though one db has 500 jobs and others have less.

INTERVAL = 20000
CONFIG = {
    "replicator": {
        "max_jobs": "100",
        "max_churn": "20",
        "interval" : str(INTERVAL),
    },
    "replicator.shares": {
        RDB1 : "100",
        RDB2 : "100",
        RDB3 : "100",
    }
}
REPS = {
    RDB1 : 200,
    RDB2 : 300,
    RDB3 : 500,
}

 ** run stats num dbs: 3, total jobs: 100
rdb1/_replicator 30 30%
rdb2/_replicator 34 34%
rdb3/_replicator 36 36%

 ** run stats num dbs: 3, total jobs: 100
rdb1/_replicator 29 29%
rdb2/_replicator 33 33%
rdb3/_replicator 38 38%

davisp

My initial review of this PR didn't find anything major in terms of the implementation. Though I do think we're missing a high level overview comment in the couch_replicator_share module that provides a high level mental model for how this works. I know there's a paper I can go read for more detail but a quick two or three paragraphs that explain how shares and charges work and their relation I think would be a pretty good starting point. Along with maybe a couple scenarios that would describe how the priority calculations work out.

Other than that most of my comments were style nits.

rel/overlay/etc/default.ini

src/couch_replicator/src/couch_replicator_scheduler.erl

src/couch_replicator/src/couch_replicator_share.erl

src/couch_replicator/src/couch_replicator_scheduler.erl

src/couch_replicator/src/couch_replicator_share.erl

davisp

+1. Awesome work as per usual.

This is needed to prepare for the Fair Share scheduler feature since both the scheduler and the fair share module will end up referencing the #job record.

Fair share replication scheduler allows configuring job priorities per-replicator db. Previously jobs from all the replication dbs would be added to the scheduler and run in a round-robin order. This update makes it possible to specify the relative priority of jobs from different databases. For example, there could be low, high and default priority _replicator dbs. The original algorithm comes from the [A Fair Share Scheduler](https://proteusmaster.urcf.drexel.edu/urcfwiki/images/KayLauderFairShare.pdf "Fair Share Scheduler") paper by Judy Kay and Piers Lauder. A summary of how the algorithm works is included in the top level comment in the couch_replicator_share module. There is minimal modification to the main scheduler logic. Besides the share accounting logic each cycle, the other changes are: * Running and stopping candidates are now picked based on the priority first, and then on their last_started timestamp. * When jobs finish executing mid-cycle, their charges are accounted for. That holds for jobs which terminate normally, are removed by the user, or crash. Other interesting aspects are the interaction with the error back-off mechanism and how one-shot replications are treated: * The exponential error back-off mechanism is unaltered and takes precedence over the priority values. That means unhealthy jobs are rejected and "penalized" before the priority value is even looked at. * One-shot replications, once started, are not stopped during each scheduling cycle unless the operator manually adjusts the `max_jobs` parameter. That behavior is necessary to preserve the "snapshot" semantics and is retained in this update.

nickva · 2021-03-11T18:12:57Z

Will merge the PR and noting that I still owe a documentation update for it.

A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364

A short description on how the algorithm works along with the configuration sections. Main PR: apache#3364

nickva force-pushed the fair-share-scheduler-3.x branch 9 times, most recently from 43b60b4 to b68113b Compare February 15, 2021 22:46

nickva marked this pull request as ready for review February 15, 2021 22:51

nickva mentioned this pull request Feb 15, 2021

[rfc] Fair Share Replication Scheduler apache/couchdb-documentation#617

Merged

nickva force-pushed the fair-share-scheduler-3.x branch 2 times, most recently from 5a5ebe6 to a9163ec Compare February 16, 2021 05:34

nickva changed the title ~~Fair share scheduler 3.x~~ Fair Share Replication Scheduler Implementation (3.x) Feb 16, 2021

nickva force-pushed the fair-share-scheduler-3.x branch 3 times, most recently from 339daf3 to 2ef7b50 Compare February 16, 2021 17:49

nickva requested a review from davisp February 16, 2021 17:56

nickva force-pushed the fair-share-scheduler-3.x branch 5 times, most recently from 5b0d79d to a4780e4 Compare February 19, 2021 22:57

nickva force-pushed the fair-share-scheduler-3.x branch from a4780e4 to 8ec4e33 Compare February 22, 2021 22:13

davisp requested changes Feb 23, 2021

View reviewed changes

nickva force-pushed the fair-share-scheduler-3.x branch 4 times, most recently from d066d86 to 78ceeed Compare March 1, 2021 17:33

nickva force-pushed the fair-share-scheduler-3.x branch from 78ceeed to 886222d Compare March 1, 2021 19:12

davisp reviewed Mar 1, 2021

View reviewed changes

src/couch_replicator/src/couch_replicator_scheduler.erl Show resolved Hide resolved

davisp reviewed Mar 1, 2021

View reviewed changes

src/couch_replicator/src/couch_replicator_share.erl Show resolved Hide resolved

davisp approved these changes Mar 1, 2021

View reviewed changes

nickva added 4 commits March 11, 2021 12:44

Move replicator #job record to couch_replicator.hrl

113614b

This is needed to prepare for the Fair Share scheduler feature since both the scheduler and the fair share module will end up referencing the #job record.

[fixup] Use =< when clearing 0 entries from priority and usage tables

12f5fa0

[fixup] remove extra blank line

7ad40ff

nickva force-pushed the fair-share-scheduler-3.x branch from 1cc4ee8 to 7ad40ff Compare March 11, 2021 17:45

nickva merged commit 1add534 into 3.x Mar 11, 2021

nickva deleted the fair-share-scheduler-3.x branch March 11, 2021 18:13

nickva added a commit to apache/couchdb-documentation that referenced this pull request Mar 15, 2021

3.x fair share scheduler documetation

ef6ebac

A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364

nickva mentioned this pull request Mar 15, 2021

3.x fair share scheduler documetation apache/couchdb-documentation#629

Merged

nickva added a commit to apache/couchdb-documentation that referenced this pull request Mar 15, 2021

3.x fair share scheduler documetation

31a572f

A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364

nickva added a commit to apache/couchdb-documentation that referenced this pull request Mar 15, 2021

3.x fair share scheduler documetation

409f58b

A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364

nickva added a commit to apache/couchdb-documentation that referenced this pull request Mar 15, 2021

3.x fair share scheduler documetation

49a4e6b

A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364

nickva added a commit to apache/couchdb-documentation that referenced this pull request Mar 15, 2021

3.x fair share scheduler documetation

db0624a

A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364

nickva added a commit to apache/couchdb-documentation that referenced this pull request Mar 17, 2021

3.x fair share scheduler documetation

185e93c

A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364

nickva added a commit to apache/couchdb-documentation that referenced this pull request Mar 17, 2021

3.x fair share scheduler documetation (#629)

1186c83

A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364

nickva added a commit to nickva/couchdb that referenced this pull request Sep 7, 2022

3.x fair share scheduler documetation (apache#629)

a3460c1

A short description on how the algorithm works along with the configuration sections. Main PR: apache#3364

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fair Share Replication Scheduler Implementation (3.x) #3364

Fair Share Replication Scheduler Implementation (3.x) #3364

nickva commented Feb 8, 2021 •

edited

nickva commented Feb 19, 2021 •

edited

davisp left a comment

davisp left a comment

nickva commented Mar 11, 2021

Fair Share Replication Scheduler Implementation (3.x) #3364

Fair Share Replication Scheduler Implementation (3.x) #3364

Conversation

nickva commented Feb 8, 2021 • edited

nickva commented Feb 19, 2021 • edited

davisp left a comment

Choose a reason for hiding this comment

davisp left a comment

Choose a reason for hiding this comment

nickva commented Mar 11, 2021

nickva commented Feb 8, 2021 •

edited

nickva commented Feb 19, 2021 •

edited