release-24.3: amazon: add settings to control retry behavior #151875
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 1/1 commits from #151817.
/cc @cockroachdb/release
Release justification: Adds a setting
cloudstorage.s3.client_retry_token_bucket.enabled
which can be set to false to mitigate an issue impacting backups on large AWS clusters.This change adds two settings to control the S3 client retry behavior:
cloudstorage.s3.max_retries
: this replaces a constant that was hardcoded to 10. It controls how many times the s3 client will retry an error.cloudstorage.s3.client_retry_token_bucket.enabled
: this disables the client retry token bucket. The token bucket has no time based refill, and can only refill when new operations are started, so it can get stuck in an empty state with fixed concurrency operations like backup and restore.There is also a minor improvement to verbose logging in the S3 client:
warnings.
Informs: #151748
Release note: Tunes S3 client retry behavior to be more reliable in the presence of correlated errors.