Reduce replicator.retries_per_request value from 10 to 5 #843
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously an individual failed request would be tried 10 times in a row with
an exponential backoff starting at 0.25 seconds. So the intervals in seconds
would be:
0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128
For a total of about 250 seconds (or about 4 minutes). This made sense before
the scheduling replicator because if a replication job had crashed in the
startup phase enough times it would not be retried anymore. With a scheduling
replicator, it makes more sense to stop the whole task, and let the scheduling
replicatgor retry later.
retries_per_request
then becomes something usedmainly for short intermettent network issues.
The new retry schedule is
0.25, 0.5, 1, 2, 4
Or about 8 seconds.
An additional benefit when the job is stopped quicker, the user can find out
about the problem sooner from the _scheduler/docs and _scheduler/jobs status
endpoints and can rectify the problem. Otherwise a single request retrying for
4 minutes would be indicated there as the job is healthy and running.
Fixes #810