Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3manager: no retries on body upload/download failures #466

Closed
ybogdanov opened this issue Dec 11, 2015 · 17 comments
Closed

s3manager: no retries on body upload/download failures #466

ybogdanov opened this issue Dec 11, 2015 · 17 comments
Labels
feature-request A feature should be added or improved.

Comments

@ybogdanov
Copy link

Hi guys,

Looks like you have a fancy retry mechanism and it works pretty well for most of the cases. But the case when the request was succeeded, and the body started downloading, but then the connection closed is not covered by retries.

As you can see from here https://github.com/aws/aws-sdk-go/blob/master/service/s3/s3manager/download.go#L226 the body is being copied outside of retry handling logic, so a failure on that stage will just return an error, but not retrying the request.

What do you think?

@ybogdanov
Copy link
Author

ybogdanov commented Dec 11, 2015

For now, I made an outer retryer that does its retry logic if it gets non-aws error: https://github.com/grammarly/rocker/blob/master/src/storage/s3/s3.go#L231
https://github.com/grammarly/rocker/blob/master/src/storage/s3/retryer.go#L65

@jasdel jasdel added feature-request A feature should be added or improved. and removed feature-request A feature should be added or improved. labels Dec 16, 2015
@jasdel
Copy link
Contributor

jasdel commented Dec 16, 2015

Hi @ybogdanov thanks for contacting us. I think this makes sense. If the network connection fails while the body copy is occurring the SDK should retry the chunk. With a limit on retry attempts similar to other operations.

We may need to be conscious of errors which occurred due to writing the chunk to the destination. Since this type of error would most likely be caused by a file IO error.

Your outer retryer is a good workaround until S3 download manager supports retrying IO stream errors.

@jasdel jasdel added the feature-request A feature should be added or improved. label Dec 16, 2015
@eldondevcg
Copy link

👍
This is a significant issue/necessity for us.

@eldondevcg
Copy link

We are going to have to move back to s3gof3r due to these issues. If the number of parts is sufficient, and the traffic on the machine is sufficiently high, it seems that this occurs often enough that no number of retries is sufficient to download the file.

@jasdel
Copy link
Contributor

jasdel commented May 13, 2016

@eldondevcg are you seeing that even with retrying failed parts you're still not able to download an object from s3?

@eldondevcg
Copy link

eldondevcg commented May 13, 2016

Sorry, no longer using this download code, so hard to say. The code in #679 was what was encountering this issue if you want to try to repro. I could speculate about issues within our VPC etc, but since we've changed directions, I can't say for sure. I wanted to try setting KeepAlive here (because I don't believe it was set), but never got around to it.

@ljfranklin
Copy link

👍 on fixing this. We're hitting "error running command: read tcp 10.254.0.206:32961->52.216.65.11:443: read: connection reset by peer" crazy frequently when downloading from S3 in a flaky network environment.

@eldondevcg
Copy link

@ljfranklin AWS team doesn't seem to be moving on this ticket much. If you need a lot of multipart upload/download, I highly recomment s3gof3r or similar

@jasdel
Copy link
Contributor

jasdel commented Aug 24, 2016

Thanks for the feedback. This will help us prioritize this feature in our backlog. We're also always glad to review pull request if anyone is interested in submitting a change.

@ctaymor
Copy link

ctaymor commented Aug 25, 2016

We're seeing this quite frequently as well.

jasdel added a commit to jasdel/aws-sdk-go that referenced this issue Sep 20, 2016
Adds the ability to automatically retried S3 object parts that fail
after the initial request response is provided. This includes scenarios
such as the network connection is terminated, or broken.

The body parts will be retried based on the S3 service clients Max
Retries configuration. For example setting the aws.Config.MaxRetries to
3 will allow the S3 Downloader to retry part body failures up to 3
times.

This feature is enabled automatically. No additional work is needed to
take advantage of it.

Fix aws#466
jasdel added a commit to jasdel/aws-sdk-go that referenced this issue Sep 20, 2016
Adds the ability to automatically retried S3 object parts that fail
after the initial request response is provided. This includes scenarios
such as the network connection is terminated, or broken.

The body parts will be retried based on the S3 service clients Max
Retries configuration. For example setting the aws.Config.MaxRetries to
3 will allow the S3 Downloader to retry part body failures up to 3
times.

This feature is enabled automatically. No additional work is needed to
take advantage of it.

Fix aws#466
jasdel added a commit to jasdel/aws-sdk-go that referenced this issue Sep 20, 2016
Adds the ability to automatically retried S3 object parts that fail
after the initial request response is provided. This includes scenarios
such as the network connection is terminated, or broken.

The body parts will be retried based on the S3 service clients Max
Retries configuration. For example setting the aws.Config.MaxRetries to
3 will allow the S3 Downloader to retry part body failures up to 3
times.

This feature is enabled automatically. No additional work is needed to
take advantage of it.

Fix aws#466
jasdel added a commit that referenced this issue Sep 20, 2016
Adds the ability to automatically retried S3 object parts that fail
after the initial request response is provided. This includes scenarios
such as the network connection is terminated, or broken.

The body parts will be retried based on the S3 service clients Max
Retries configuration. For example setting the aws.Config.MaxRetries to
3 will allow the S3 Downloader to retry part body failures up to 3
times.

This feature is enabled automatically. No additional work is needed to
take advantage of it.

Fix #466
jasdel added a commit that referenced this issue Sep 20, 2016
@jasdel
Copy link
Contributor

jasdel commented Sep 21, 2016

In release v1.4.11 we released the feature that will allow S3 downloads to retry if the connection is broken while downloading the object. Let us know if you have any issue, or feedback on this update. Thanks!

@fermin-silva
Copy link

Is it me or the uploader does not have retry logic implemented yet?
Couldn't see any retry logic in upload.go

Thanks

@jasdel
Copy link
Contributor

jasdel commented Jun 29, 2017

@fermin-silva they uploader should be retrying each part if the part fails. The retry logic in internal to the SDK's UploadPart API operation will automatically retry the part upload if a failure occurs.

Are you seeing the uploader not attempting to retry?

@fermin-silva
Copy link

Thanks @jasdel for the quick reply. I jumped into conclusions too quickly apparently, because we are trying to upload a very large (close to 1TB) file, so you can imagine the size of the log files.

Will dig deeper into the log files and get back to you.

I saw retry logic in https://github.com/aws/aws-sdk-go/blob/master/service/s3/s3manager/download.go
but couldn't see anything retry-related in https://github.com/aws/aws-sdk-go/blob/master/service/s3/s3manager/upload.go

Thanks again

@jasdel
Copy link
Contributor

jasdel commented Jun 30, 2017

@fermin-silva thanks for the update. You can also configure the SDK to retry each part request more times by setting the SDK's MaxRetries config attribute.

sess := session.Must(session.NewSession(&aws.Config{
    MaxRetries: aws.Int(retryCount),
}))

uploader := s3manager.NewUploader(sess)

@fermin-silva
Copy link

fermin-silva commented Jul 12, 2017

Thanks again @jasdel. I was creating it with awsConfig.WithMaxRetries(5), but the upload process was failing for another reason.
I simply saw this issue and saw no retry logic in the uploader and thought it was not implemented, my bad.

Sorry for the noise :$

@jasdel
Copy link
Contributor

jasdel commented Jul 12, 2017

Glad to help, let us know if you have any questions, issues, or feedback with the SDK @fermin-silva

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

No branches or pull requests

6 participants