Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce replicator.retries_per_request value from 10 to 5 #843

Merged
merged 1 commit into from
Sep 27, 2017

Conversation

nickva
Copy link
Contributor

@nickva nickva commented Sep 27, 2017

Previously an individual failed request would be tried 10 times in a row with
an exponential backoff starting at 0.25 seconds. So the intervals in seconds
would be:

0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128

For a total of about 250 seconds (or about 4 minutes). This made sense before
the scheduling replicator because if a replication job had crashed in the
startup phase enough times it would not be retried anymore. With a scheduling
replicator, it makes more sense to stop the whole task, and let the scheduling
replicatgor retry later. retries_per_request then becomes something used
mainly for short intermettent network issues.

The new retry schedule is

0.25, 0.5, 1, 2, 4

Or about 8 seconds.

An additional benefit when the job is stopped quicker, the user can find out
about the problem sooner from the _scheduler/docs and _scheduler/jobs status
endpoints and can rectify the problem. Otherwise a single request retrying for
4 minutes would be indicated there as the job is healthy and running.

Fixes #810

Previously an individual failed request would be tried 10 times in a row with
an exponential backoff starting at 0.25 seconds. So the intervals in seconds
would be:

   `0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128`

For a total of about 250 seconds (or about 4 minutes). This made sense before
the scheduling replicator because if a replication job had crashed in the
startup phase enough times it would not be retried anymore. With a scheduling
replicator, it makes more sense to stop the whole task, and let the scheduling
replicatgor retry later. `retries_per_request` then becomes something used
mainly for short intermettent network issues.

The new retry schedule is

   `0.25, 0.5, 1, 2, 4`

Or about 8 seconds.

An additional benefit when the job is stopped quicker, the user can find out
about the problem sooner from the _scheduler/docs and _scheduler/jobs status
endpoints and can rectify the problem. Otherwise a single request retrying for
4 minutes would be indicated there as the job is healthy and running.

Fixes apache#810
@nickva
Copy link
Contributor Author

nickva commented Sep 27, 2017

See associated documentation PR: apache/couchdb-documentation#165

Copy link
Contributor

@iilyak iilyak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@janl
Copy link
Member

janl commented Sep 27, 2017

+1

@nickva nickva merged commit 7267f92 into apache:master Sep 27, 2017
@nickva nickva deleted the issue-810 branch September 27, 2017 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants