-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize memory consumption during backup to S3 #45188
Optimize memory consumption during backup to S3 #45188
Conversation
1f8e6d0
to
39153b0
Compare
I like that approach. Actually this is the best way when we upload file from local disk to S3. When you copy file you know its size. Why do you still use exponential part upload size instead some fixed? When you copy from S3 to S3 why do you prefer seakable reading from S3 instead UploadPartCopyRequest? My concern is the following. With seakable reading you read a file twice. First is to calc sha hash. Second is to upload it to S3. Indeed this read is made with fixed buffer size, memory consumption is low. However I think it is worth to do with existing code in order not to have one more realisation with UploadPartCopyRequest. |
Yes you're right, that code was copied from
I don't quite understand. I don't prefer seekable reading over |
Sorry. Indeed, you don't. You have dedicated realisation for S3toS3 with UploadPartCopyRequest. This is great. |
I did that. Now an uploaded file is splitted into parts of the same size. |
87e4f70
to
a934fbf
Compare
…S3 to decrease memory consumption.
Correctly use AbortMultipleUpload request. Support std::ios_base::end StdStreamBufFromReadBuffer::seekpos().
a934fbf
to
1c84518
Compare
{ | ||
task->req = fillUploadPartRequest(part_number, part_offset, part_size); | ||
|
||
schedule([this, task, task_finish_notify]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Help me, please, to understand a thing here, how many concurrent uploads are run here? Is it limited thread pool?
My concern is that it runs a lot of tasks, each makes a new connection. It could run out of available inodes or face other limitation inside CH.
In compare with WriteBufferFromS3, WriteBufferFromS3 uses threads to write the data which it consumed. If consuming is not fast, it just upload in a background with 1 thread. However WriteBufferFromS3 also is able to produce a lot of tasks. Here is a different situation, a lot of task is run at start. Each allocates at least 1M (buffer) + new connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's normally used with IOThreadPool
. It's limited of course. By default it runs up to 100 threads, so maximum 100 parallel uploads. But it actually must be smaller because IOThreadPool
is also used for other tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact it should use its own thread pools backups_thread_pool
and restores_thread_pool
, so I changed the code to use the correct thread pools. Those thread pools are created specially for backups and restores.
|
||
for (size_t part_number = 1; position < end_position; ++part_number) | ||
{ | ||
if (multipart_upload_aborted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to also check multipart_upload_aborted right before processUploadPartRequest? In order to stop work as fast as possible if non retryable error occurs.
My concern is that the code have to wait all task which has been scheduled. If 1000 task were scheduled and only 100 of them is running and finishing with connection timeout. It would take 10 x timeout to finish the upload with an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the queue size is pretty big
auto queue_size = config.getUInt(".threadpool_writer_queue_size", 1000000);
and all task are about to be pushed to the schedule's queue. In case if first task fail no need to run the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to also check multipart_upload_aborted right before processUploadPartRequest?
Done, I changed the code.
633d795
to
1a680b0
Compare
The test |
… WriteBufferFromS3RequestsErrors Ref: ClickHouse#45188
StdStreamFromReadBuffer(std::unique_ptr<ReadBuffer> buf, size_t size) : Base(&stream_buf), stream_buf(std::move(buf), size) { } | ||
StdStreamFromReadBuffer(ReadBuffer & buf, size_t size) : Base(&stream_buf), stream_buf(buf, size) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions about this,
Base(&stream_buf)
will call the following constructor
template <class _CharT, class _Traits>
basic_streambuf<_CharT, _Traits>::basic_streambuf()
: __binp_(nullptr),
__ninp_(nullptr),
__einp_(nullptr),
__bout_(nullptr),
__nout_(nullptr),
__eout_(nullptr)
{
}
template <class _CharT, class _Traits>
basic_streambuf<_CharT, _Traits>::basic_streambuf(const basic_streambuf& __sb)
: __loc_(__sb.__loc_),
__binp_(__sb.__binp_),
__ninp_(__sb.__ninp_),
__einp_(__sb.__einp_),
__bout_(__sb.__bout_),
__nout_(__sb.__nout_),
__eout_(__sb.__eout_)
{
}
so, __ninp_
and __einp_
are nullptr. But std::iostream will call sgetc()
to get char, so we will always call underflow()
which does not advance the position.
inline _LIBCPP_HIDE_FROM_ABI_AFTER_V1
int_type sgetc() {
if (__ninp_ == __einp_)
return underflow();
return traits_type::to_int_type(*__ninp_);
}
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Optimize memory consumption during backup to S3: files to S3 now will be copied directly without using
WriteBufferFromS3
(which could use a lot of memory).This PR fixes #43083