Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fair Share Replication Scheduler Implementation
Fair share replication scheduler allows configuring job priorities per-replicator db. Previously jobs from all the replication dbs would be added to the scheduler and run in a round-robin order. This update makes it possible to specify the relative priority of jobs from different databases. For example, there could be low, high and default priority _replicator dbs. The original algorithm comes from the [A Fair Share Scheduler](https://proteusmaster.urcf.drexel.edu/urcfwiki/images/KayLauderFairShare.pdf "Fair Share Scheduler") paper by Judy Kay and Piers Lauder. A summary of how the algorithm works is included in the top level comment in the couch_replicator_share module. There is minimal modification to the main scheduler logic. Besides the share accounting logic each cycle, the other changes are: * Running and stopping candidates are now picked based on the priority first, and then on their last_started timestamp. * When jobs finish executing mid-cycle, their charges are accounted for. That holds for jobs which terminate normally, are removed by the user, or crash. Other interesting aspects are the interaction with the error back-off mechanism and how one-shot replications are treated: * The exponential error back-off mechanism is unaltered and takes precedence over the priority values. That means unhealthy jobs are rejected and "penalized" before the priority value is even looked at. * One-shot replications, once started, are not stopped during each scheduling cycle unless the operator manually adjusts the `max_jobs` parameter. That behavior is necessary to preserve the "snapshot" semantics and is retained in this update.
- Loading branch information