This repository has been archived by the owner on Jul 21, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 14
Implement retries in S3 requests #41
Labels
api-client-retries
Our Rust cloud API clients need better retry logic
enhancement
New feature or request
good first issue
Good for newcomers
Milestone
Comments
tgeoghegan
added
p1
Must be fixed for the corresponding milestone to be reached.
internal
Improvements to project infrastructure or code that don't directly affect the end result.
and removed
internal
Improvements to project infrastructure or code that don't directly affect the end result.
labels
Oct 19, 2020
tgeoghegan
added a commit
that referenced
this issue
Dec 3, 2020
During Narnia testing, we encountered cases where we failed to upload validation batches to S3 because the ingestion batches were _just_ big enough so that roughly ten seconds would elapse between the CreateMultipartUpload call and the first UploadPart. We had previously set the Hyper connection pool idle timeout to 10 seconds to avoid having S3 close connections on us after 20 seconds, so sometimes we would lose the race with Hyper and have uploads spuriously fail. Our fix is to make up to three attempts to upload. This is far from a perfect solution to problem #41, because we (1) should only retry on errors that are known to be retryable and (2) we should use exponential backoffs.
tgeoghegan
added a commit
that referenced
this issue
Dec 3, 2020
During Narnia testing, we encountered cases where we failed to upload validation batches to S3 because the ingestion batches were _just_ big enough so that roughly ten seconds would elapse between the CreateMultipartUpload call and the first UploadPart. We had previously set the Hyper connection pool idle timeout to 10 seconds to avoid having S3 close connections on us after 20 seconds, so sometimes we would lose the race with Hyper and have uploads spuriously fail. Our fix is to make up to three attempts to upload. This is far from a perfect solution to problem #41, because we (1) should only retry on errors that are known to be retryable and (2) we should use exponential backoffs.
#260 is progress towards this, but I'm leaving this open because that PR only handles HTTP dispatch errors. We should have more comprehensive error handling as described in Amazon docs, and should also do exponential backoff. That work is not a P1 for production, though, so downgrading this ticket. |
tgeoghegan
removed
the
p1
Must be fixed for the corresponding milestone to be reached.
label
Dec 3, 2020
tgeoghegan
added
enhancement
New feature or request
good first issue
Good for newcomers
labels
Dec 11, 2020
More good material on AWS retry best practices: rusoto/rusoto#234 (comment) |
tgeoghegan
added
the
api-client-retries
Our Rust cloud API clients need better retry logic
label
Feb 4, 2021
tgeoghegan
added a commit
that referenced
this issue
Apr 21, 2021
Adapts `aws_credentials::retry_request` to use mod `retries` (and hence `backoff::ExponentialBackoff` so that we get more graceful retry behavior, in line with AWS developer guidelines. Additionally, our assessment of which errors are retryable is more sophisticated: we now retry on all credential fetch failures, on server errors (i.e., HTTP 5xx) and when throttled. Resolves #41, #354
tgeoghegan
added a commit
that referenced
this issue
Apr 26, 2021
Adapts `aws_credentials::retry_request` to use mod `retries` (and hence `backoff::ExponentialBackoff` so that we get more graceful retry behavior, in line with AWS developer guidelines. Additionally, our assessment of which errors are retryable is more sophisticated: we now retry on all credential fetch failures, on server errors (i.e., HTTP 5xx) and when throttled. Resolves #41, #354
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
api-client-retries
Our Rust cloud API clients need better retry logic
enhancement
New feature or request
good first issue
Good for newcomers
As a workaround for a mismatch between connection pool timeouts in Hyper and AWS S3, #36 carefully constructs HTTP clients with a carefully chosen timeout value. For more robustness, we should implement retries.
The text was updated successfully, but these errors were encountered: