New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fair Share Replication Scheduler Implementation (3.x) #3364
Conversation
43b60b4
to
b68113b
Compare
5a5ebe6
to
a9163ec
Compare
339daf3
to
2ef7b50
Compare
5b0d79d
to
a4780e4
Compare
To help test the PR I created a silly script which runs with a local https://gist.github.com/nickva/7e86b3df19537a60372217e0f68b693a So for the default parameters it might output something like:
With a configuration where the shares are all even (100) but the dbs have an uneven number of jobs, after 5-10 minutes there should be a roughly even share of running jobs from each db, even though one db has 500 jobs and others have less. INTERVAL = 20000
CONFIG = {
"replicator": {
"max_jobs": "100",
"max_churn": "20",
"interval" : str(INTERVAL),
},
"replicator.shares": {
RDB1 : "100",
RDB2 : "100",
RDB3 : "100",
}
}
REPS = {
RDB1 : 200,
RDB2 : 300,
RDB3 : 500,
}
|
a4780e4
to
8ec4e33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial review of this PR didn't find anything major in terms of the implementation. Though I do think we're missing a high level overview comment in the couch_replicator_share
module that provides a high level mental model for how this works. I know there's a paper I can go read for more detail but a quick two or three paragraphs that explain how shares and charges work and their relation I think would be a pretty good starting point. Along with maybe a couple scenarios that would describe how the priority calculations work out.
Other than that most of my comments were style nits.
d066d86
to
78ceeed
Compare
78ceeed
to
886222d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Awesome work as per usual.
This is needed to prepare for the Fair Share scheduler feature since both the scheduler and the fair share module will end up referencing the #job record.
Fair share replication scheduler allows configuring job priorities per-replicator db. Previously jobs from all the replication dbs would be added to the scheduler and run in a round-robin order. This update makes it possible to specify the relative priority of jobs from different databases. For example, there could be low, high and default priority _replicator dbs. The original algorithm comes from the [A Fair Share Scheduler](https://proteusmaster.urcf.drexel.edu/urcfwiki/images/KayLauderFairShare.pdf "Fair Share Scheduler") paper by Judy Kay and Piers Lauder. A summary of how the algorithm works is included in the top level comment in the couch_replicator_share module. There is minimal modification to the main scheduler logic. Besides the share accounting logic each cycle, the other changes are: * Running and stopping candidates are now picked based on the priority first, and then on their last_started timestamp. * When jobs finish executing mid-cycle, their charges are accounted for. That holds for jobs which terminate normally, are removed by the user, or crash. Other interesting aspects are the interaction with the error back-off mechanism and how one-shot replications are treated: * The exponential error back-off mechanism is unaltered and takes precedence over the priority values. That means unhealthy jobs are rejected and "penalized" before the priority value is even looked at. * One-shot replications, once started, are not stopped during each scheduling cycle unless the operator manually adjusts the `max_jobs` parameter. That behavior is necessary to preserve the "snapshot" semantics and is retained in this update.
1cc4ee8
to
7ad40ff
Compare
Will merge the PR and noting that I still owe a documentation update for it. |
A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364
A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364
A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364
A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364
A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364
A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364
A short description on how the algorithm works along with the configuration sections. Main PR: apache/couchdb#3364
A short description on how the algorithm works along with the configuration sections. Main PR: apache#3364
Fair share replication scheduler allows configuring job priorities
per-replicator db.
Previously jobs from all the replication dbs would be added to the scheduler
and run in a round-robin order. This update makes it possible to specify the
relative priority of jobs from different databases. For example, there could be
low, high and default priority _replicator dbs.
The original algorithm comes from the A Fair Share
Scheduler paper by Judy Kay and Piers Lauder. A summary of how
the algorithm works is included in the top level comment in the
couch_replicator_share
module.There is minimal modification to the main scheduler logic. Besides the
share accounting logic each cycle, the other changes are:
Running and stopping candidates are now picked based on the priority first,
and then on their last_started timestamp.
When jobs finish executing mid-cycle, their charges are accounted for. That
holds for jobs which terminate normally, are removed by the user, or crash.
Other interesting aspects are the interaction with the error back-off mechanism
and how one-shot replications are treated:
The exponential error back-off mechanism is unaltered and takes precedence
over the priority values. That means unhealthy jobs are rejected and
"penalized" before the priority value is even looked at.
One-shot replications, once started, are not stopped during each scheduling
cycle unless the operator manually adjusts the
max_jobs
parameter. Thatbehavior is necessary to preserve the "snapshot" semantics and is retained in
this update.