Reduce replicator.retries_per_request value from 10 to 5 #843

nickva · 2017-09-27T14:43:38Z

Previously an individual failed request would be tried 10 times in a row with
an exponential backoff starting at 0.25 seconds. So the intervals in seconds
would be:

0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128

For a total of about 250 seconds (or about 4 minutes). This made sense before
the scheduling replicator because if a replication job had crashed in the
startup phase enough times it would not be retried anymore. With a scheduling
replicator, it makes more sense to stop the whole task, and let the scheduling
replicatgor retry later. retries_per_request then becomes something used
mainly for short intermettent network issues.

The new retry schedule is

0.25, 0.5, 1, 2, 4

Or about 8 seconds.

An additional benefit when the job is stopped quicker, the user can find out
about the problem sooner from the _scheduler/docs and _scheduler/jobs status
endpoints and can rectify the problem. Otherwise a single request retrying for
4 minutes would be indicated there as the job is healthy and running.

Fixes #810

Previously an individual failed request would be tried 10 times in a row with an exponential backoff starting at 0.25 seconds. So the intervals in seconds would be: `0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128` For a total of about 250 seconds (or about 4 minutes). This made sense before the scheduling replicator because if a replication job had crashed in the startup phase enough times it would not be retried anymore. With a scheduling replicator, it makes more sense to stop the whole task, and let the scheduling replicatgor retry later. `retries_per_request` then becomes something used mainly for short intermettent network issues. The new retry schedule is `0.25, 0.5, 1, 2, 4` Or about 8 seconds. An additional benefit when the job is stopped quicker, the user can find out about the problem sooner from the _scheduler/docs and _scheduler/jobs status endpoints and can rectify the problem. Otherwise a single request retrying for 4 minutes would be indicated there as the job is healthy and running. Fixes apache#810

nickva · 2017-09-27T14:44:58Z

See associated documentation PR: apache/couchdb-documentation#165

iilyak

+1

janl · 2017-09-27T15:25:45Z

+1

nickva mentioned this pull request Sep 27, 2017

Updated documentation for new replicator retries_per_request value apache/couchdb-documentation#165

Merged

iilyak approved these changes Sep 27, 2017

View reviewed changes

nickva merged commit 7267f92 into apache:master Sep 27, 2017

nickva deleted the issue-810 branch September 27, 2017 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce replicator.retries_per_request value from 10 to 5 #843

Reduce replicator.retries_per_request value from 10 to 5 #843

nickva commented Sep 27, 2017

nickva commented Sep 27, 2017

iilyak left a comment

janl commented Sep 27, 2017

Reduce replicator.retries_per_request value from 10 to 5 #843

Reduce replicator.retries_per_request value from 10 to 5 #843

Conversation

nickva commented Sep 27, 2017

nickva commented Sep 27, 2017

iilyak left a comment

Choose a reason for hiding this comment

janl commented Sep 27, 2017