New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory management in copyDataToS3File #58962
Fix memory management in copyDataToS3File #58962
Conversation
- ReadBufferFromS3 is created now only when it's necessary and destroyed after that.
This is an automated comment for commit 19bc0b7 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
@@ -91,20 +91,24 @@ namespace | |||
|
|||
struct UploadPartTask | |||
{ | |||
std::unique_ptr<Aws::AmazonWebServiceRequest> req; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here was a problem. The req
contains a ReadBufferFromS3
, and so keeping req
in each task means keeping all such read buffers until the end of copying all the parts (and there can be thousands of parts). Each ReadBufferFromS3
normally takes 1MB of memory so keeping read buffers from already uploaded parts means we could waste multiple gigabytes of memory.
This PR is fixing that - now req
is created and destroyed while running each task in processUploadTask()
without keeping those requests or read buffers until the end of the copying process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice description, probably add it into the src file as comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/IO/S3/copyS3File.cpp
Outdated
} | ||
|
||
virtual std::unique_ptr<Aws::AmazonWebServiceRequest> fillUploadPartRequest(size_t part_number, size_t part_offset, size_t part_size) = 0; | ||
virtual std::unique_ptr<Aws::AmazonWebServiceRequest> makeUploadPartRequest(size_t part_number, size_t part_offset, size_t part_size) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU previously this method was called in one thread from uploadPart() and with the fix it will be called from processUploadTask(*task) scheduled in a thread pool. Looks like it can be declared const
and have a comment that derived class needs to take care about synchronization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Changelog category:
Changelog entry:
Fix memory management in copyDataToS3File.