-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3manager: no retries on body upload/download failures #466
Comments
For now, I made an outer retryer that does its retry logic if it gets non-aws error: https://github.com/grammarly/rocker/blob/master/src/storage/s3/s3.go#L231 |
Hi @ybogdanov thanks for contacting us. I think this makes sense. If the network connection fails while the body copy is occurring the SDK should retry the chunk. With a limit on retry attempts similar to other operations. We may need to be conscious of errors which occurred due to writing the chunk to the destination. Since this type of error would most likely be caused by a file IO error. Your outer retryer is a good workaround until S3 download manager supports retrying IO stream errors. |
👍 |
We are going to have to move back to s3gof3r due to these issues. If the number of parts is sufficient, and the traffic on the machine is sufficiently high, it seems that this occurs often enough that no number of retries is sufficient to download the file. |
@eldondevcg are you seeing that even with retrying failed parts you're still not able to download an object from s3? |
Sorry, no longer using this download code, so hard to say. The code in #679 was what was encountering this issue if you want to try to repro. I could speculate about issues within our VPC etc, but since we've changed directions, I can't say for sure. I wanted to try setting KeepAlive here (because I don't believe it was set), but never got around to it. |
👍 on fixing this. We're hitting "error running command: read tcp 10.254.0.206:32961->52.216.65.11:443: read: connection reset by peer" crazy frequently when downloading from S3 in a flaky network environment. |
@ljfranklin AWS team doesn't seem to be moving on this ticket much. If you need a lot of multipart upload/download, I highly recomment s3gof3r or similar |
Thanks for the feedback. This will help us prioritize this feature in our backlog. We're also always glad to review pull request if anyone is interested in submitting a change. |
We're seeing this quite frequently as well. |
Adds the ability to automatically retried S3 object parts that fail after the initial request response is provided. This includes scenarios such as the network connection is terminated, or broken. The body parts will be retried based on the S3 service clients Max Retries configuration. For example setting the aws.Config.MaxRetries to 3 will allow the S3 Downloader to retry part body failures up to 3 times. This feature is enabled automatically. No additional work is needed to take advantage of it. Fix aws#466
Adds the ability to automatically retried S3 object parts that fail after the initial request response is provided. This includes scenarios such as the network connection is terminated, or broken. The body parts will be retried based on the S3 service clients Max Retries configuration. For example setting the aws.Config.MaxRetries to 3 will allow the S3 Downloader to retry part body failures up to 3 times. This feature is enabled automatically. No additional work is needed to take advantage of it. Fix aws#466
Adds the ability to automatically retried S3 object parts that fail after the initial request response is provided. This includes scenarios such as the network connection is terminated, or broken. The body parts will be retried based on the S3 service clients Max Retries configuration. For example setting the aws.Config.MaxRetries to 3 will allow the S3 Downloader to retry part body failures up to 3 times. This feature is enabled automatically. No additional work is needed to take advantage of it. Fix aws#466
Adds the ability to automatically retried S3 object parts that fail after the initial request response is provided. This includes scenarios such as the network connection is terminated, or broken. The body parts will be retried based on the S3 service clients Max Retries configuration. For example setting the aws.Config.MaxRetries to 3 will allow the S3 Downloader to retry part body failures up to 3 times. This feature is enabled automatically. No additional work is needed to take advantage of it. Fix #466
In release v1.4.11 we released the feature that will allow S3 downloads to retry if the connection is broken while downloading the object. Let us know if you have any issue, or feedback on this update. Thanks! |
Is it me or the uploader does not have retry logic implemented yet? Thanks |
@fermin-silva they uploader should be retrying each part if the part fails. The retry logic in internal to the SDK's UploadPart API operation will automatically retry the part upload if a failure occurs. Are you seeing the uploader not attempting to retry? |
Thanks @jasdel for the quick reply. I jumped into conclusions too quickly apparently, because we are trying to upload a very large (close to 1TB) file, so you can imagine the size of the log files. Will dig deeper into the log files and get back to you. I saw retry logic in https://github.com/aws/aws-sdk-go/blob/master/service/s3/s3manager/download.go Thanks again |
@fermin-silva thanks for the update. You can also configure the SDK to retry each part request more times by setting the SDK's MaxRetries config attribute. sess := session.Must(session.NewSession(&aws.Config{
MaxRetries: aws.Int(retryCount),
}))
uploader := s3manager.NewUploader(sess) |
Thanks again @jasdel. I was creating it with Sorry for the noise :$ |
Glad to help, let us know if you have any questions, issues, or feedback with the SDK @fermin-silva |
Hi guys,
Looks like you have a fancy retry mechanism and it works pretty well for most of the cases. But the case when the request was succeeded, and the body started downloading, but then the connection closed is not covered by retries.
As you can see from here https://github.com/aws/aws-sdk-go/blob/master/service/s3/s3manager/download.go#L226 the body is being copied outside of retry handling logic, so a failure on that stage will just return an error, but not retrying the request.
What do you think?
The text was updated successfully, but these errors were encountered: