Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will there be serious memory improvements for the S3 Upload Manager? #343

Closed
robin865 opened this issue Jul 23, 2019 · 6 comments
Closed
Labels
guidance Question that needs advice or information. refactor

Comments

@robin865
Copy link

I intend to give the new developer preview of the go sdk a try, but am wondering if there is a high level description of some of the more fundamental changes you have made (I already looked at the suggestions mentioned in #187). For instance, have their been any improvements with memory consumption? Such as solutions to the problems mentioned here:
aws/aws-sdk-go#2036.

Will there be a way to directly stream data instead of buffering it in memory? Just trying to get an idea of what things I should monitor when giving it a try. Currently the memory issues with the UploadManager in V1 are my biggest problems so hoping something will address that.

Cheers

Robin

@jasdel jasdel added feature-request A feature should be added or improved. refactor guidance Question that needs advice or information. and removed feature-request A feature should be added or improved. labels Jul 25, 2019
@jasdel
Copy link
Contributor

jasdel commented Jul 25, 2019

Thanks for reaching out to us @Rsm10. We plan to investigate performance improvements for both the S3 upload and download managers. We've not made these refactor changes yet. A high level of the current changes to the v2 SDK can be found on the v2 AWS SDK for Go developer blog

Couple areas for improvement:

@eriksw
Copy link

eriksw commented Jul 26, 2019

Not so much a performance improvement, but an opportunity for API cleanup that I hope will be considered before v2 is frozen and unable to make breaking changes: aws/aws-sdk-go#2500

(Should I make a separate issue re that and v2?)

@robin865
Copy link
Author

@jasdel so does that mean you would potentially be adding support for aws-chunked? This seems like one obvious way to reduce the amount of data that needs to be kept in memory (instead of needing an entire part in memory for calculating the signature; you just need a (much smaller) chunk).

To clarify, my main two problems with the v1 sdk are.

  1. The amount of memory consumed is quite large as buffers don't get re-used between requests; though I think v1 has added a shared buffer to its interface to help with this

  2. To upload a very large object (5TB -> 10,000 5 GB parts); the SDK is basically not usable as reading 5 GB into memory for each upload is a no go. This is a use case where aws-chunked would make the most sense to me

@jasdel
Copy link
Contributor

jasdel commented Oct 30, 2019

@eriksw This issue should definitely be fixed in the V2 SDK. We also need to investigate if this issue can be addressed in the v1 SDK without a breaking change.

@Rsm10 Our plans with performance improvements mostly focused on generated API serializes for marshaling and unmarshaling requests removing the need for reflection in that path. The s3 transfer manager performance will be a target for v2. With regard to aws-chunked we need to investigate this more. I'm not positive Go's HTTP client supports chunked transfer encoding chunk headers. We need to investigate this more though.

@robin865
Copy link
Author

robin865 commented Nov 1, 2019

@jasdel Ok, I guess a sort of high level question I'd ask you to think about is:

"How would I use this SDK to upload a 5 TB object to AWS?"

  • Copying each part to disk first is not an acceptable answer for me; as that is too great a performance impact

  • storing a 5 GB part entirely in memory is also not really an acceptable answer; as that does not scale if I tried to do concurrent uploads (and 5 GB is still a lot even if only doing a single upload concurrently)

@jasdel
Copy link
Contributor

jasdel commented Dec 2, 2019

While reviewing aws-chunked content-encoding I realized it does not use chunked transfer encodings headers, but its own unique encoding scheme. This is an optimization the SDK may be able to implement to improve streaming upload performance. This would improve the multipart performance/throughput uploading large files.

https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html

@aws aws locked and limited conversation to collaborators Apr 1, 2022
@vudh1 vudh1 converted this issue into discussion #1650 Apr 1, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
guidance Question that needs advice or information. refactor
Projects
None yet
Development

No branches or pull requests

3 participants