Will there be serious memory improvements for the S3 Upload Manager? #343

robin865 · 2019-07-23T19:24:46Z

I intend to give the new developer preview of the go sdk a try, but am wondering if there is a high level description of some of the more fundamental changes you have made (I already looked at the suggestions mentioned in #187). For instance, have their been any improvements with memory consumption? Such as solutions to the problems mentioned here:
aws/aws-sdk-go#2036.

Will there be a way to directly stream data instead of buffering it in memory? Just trying to get an idea of what things I should monitor when giving it a try. Currently the memory issues with the UploadManager in V1 are my biggest problems so hoping something will address that.

Cheers

Robin

jasdel · 2019-07-25T17:50:56Z

Thanks for reaching out to us @Rsm10. We plan to investigate performance improvements for both the S3 upload and download managers. We've not made these refactor changes yet. A high level of the current changes to the v2 SDK can be found on the v2 AWS SDK for Go developer blog

Couple areas for improvement:

IO Thrashing Potential Bottleneck in Usage of io.Copy for Large File Downloads from S3 aws-sdk-go#2662 - Both upload and download manager's have known issues with IO thrashing when reading the file during an upload, and writing the file on download. This issue seems to be more pronounced on Windows platform for some reason.
Memory Upload manager does not re-use buffer pools across requests aws-sdk-go#2036, Sending streams to s3manager uses gobs of ram aws-sdk-go#2591 - When uploading a file from an unbounded stream, e.g no seek or length, the upload manager tends to create significant amount of memory.
Multiple Reads Example putObjectWithProcess is incorrect. aws-sdk-go#2468 - When uploading a file, the payload of a request will be read multiple times to compute the AWS request signature, and checksum. We're investigating improvements to this process to reduce the duplicate reads.
S3.PutObject SDK's output io.Reader prevents SDK from retrying failed GET. #31 - The SDK's S3 PutObject operation's Body requires an io.ReadSeeker. This should be changed to a io.Reader. so that arbitrary streams can be uploaded to S3, without requiring the s3 upload manager to be used.
Tracking Progress Example putObjectWithProcess is incorrect. aws-sdk-go#2468 - It is difficult to track upload and download progress due to the multiple reads performed by the manager. Adding additional hooks, and reduce multiple reads are expected to improve this.

eriksw · 2019-07-26T22:10:40Z

Not so much a performance improvement, but an opportunity for API cleanup that I hope will be considered before v2 is frozen and unable to make breaking changes: aws/aws-sdk-go#2500

(Should I make a separate issue re that and v2?)

robin865 · 2019-10-29T18:32:57Z

@jasdel so does that mean you would potentially be adding support for aws-chunked? This seems like one obvious way to reduce the amount of data that needs to be kept in memory (instead of needing an entire part in memory for calculating the signature; you just need a (much smaller) chunk).

To clarify, my main two problems with the v1 sdk are.

The amount of memory consumed is quite large as buffers don't get re-used between requests; though I think v1 has added a shared buffer to its interface to help with this
To upload a very large object (5TB -> 10,000 5 GB parts); the SDK is basically not usable as reading 5 GB into memory for each upload is a no go. This is a use case where aws-chunked would make the most sense to me

jasdel · 2019-10-30T20:30:58Z

@eriksw This issue should definitely be fixed in the V2 SDK. We also need to investigate if this issue can be addressed in the v1 SDK without a breaking change.

@Rsm10 Our plans with performance improvements mostly focused on generated API serializes for marshaling and unmarshaling requests removing the need for reflection in that path. The s3 transfer manager performance will be a target for v2. With regard to aws-chunked we need to investigate this more. I'm not positive Go's HTTP client supports chunked transfer encoding chunk headers. We need to investigate this more though.

robin865 · 2019-11-01T17:58:29Z

@jasdel Ok, I guess a sort of high level question I'd ask you to think about is:

"How would I use this SDK to upload a 5 TB object to AWS?"

Copying each part to disk first is not an acceptable answer for me; as that is too great a performance impact
storing a 5 GB part entirely in memory is also not really an acceptable answer; as that does not scale if I tried to do concurrent uploads (and 5 GB is still a lot even if only doing a single upload concurrently)

jasdel · 2019-12-02T22:10:59Z

While reviewing aws-chunked content-encoding I realized it does not use chunked transfer encodings headers, but its own unique encoding scheme. This is an optimization the SDK may be able to implement to improve streaming upload performance. This would improve the multipart performance/throughput uploading large files.

https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html

jasdel added feature-request A feature should be added or improved. refactor guidance Question that needs advice or information. and removed feature-request A feature should be added or improved. labels Jul 25, 2019

eriksw mentioned this issue Jan 13, 2021

manager.Uploader.Upload silently ignores PutObjectInput.ContentMD5 #1040

Open

aws locked and limited conversation to collaborators Apr 1, 2022

vudh1 converted this issue into discussion #1650 Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Will there be serious memory improvements for the S3 Upload Manager? #343

Will there be serious memory improvements for the S3 Upload Manager? #343

robin865 commented Jul 23, 2019

jasdel commented Jul 25, 2019 •

edited

Loading

eriksw commented Jul 26, 2019

robin865 commented Oct 29, 2019

jasdel commented Oct 30, 2019

robin865 commented Nov 1, 2019

jasdel commented Dec 2, 2019

This issue was moved to a discussion.

This issue was moved to a discussion.

Will there be serious memory improvements for the S3 Upload Manager? #343

Will there be serious memory improvements for the S3 Upload Manager? #343

Comments

robin865 commented Jul 23, 2019

jasdel commented Jul 25, 2019 • edited Loading

eriksw commented Jul 26, 2019

robin865 commented Oct 29, 2019

jasdel commented Oct 30, 2019

robin865 commented Nov 1, 2019

jasdel commented Dec 2, 2019

This issue was moved to a discussion.

jasdel commented Jul 25, 2019 •

edited

Loading