New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to configure S3 upload requests concurrency? #48389
Comments
Unfortunately, at the moment there is no way to properly limit the number of in-flight S3 requests. But you can try a number of other things:
A proper solution should appear after implementing IO scheduling. It is a pretty long task and you can track implementation here #47009 (and maybe some follow-up PRs) |
|
@serxa Thank you for your willingness to help and for providing several solutions. Unfortunately, none of these could completely resolve my issue:
As a temporary solution, I've limited |
I'm trying to export data to S3.
If the file is bigger than
s3_max_single_part_upload_size
then the file is sent as multiparts. Let's say here we split the file into 500 parts with a size of 8 MB each. ClickHouse starts sending all the parts in parallel, this is managed by setting the limit of requests per second (s3_max_put_rps
). If any request is not answered in 3 seconds, then the request is sent again. This leads to a situation where S3 still processes the previous request, but receives a new one for the same part, up to 11 times per each. This happens with a slow network or high load on the S3 side.In the worst case in the given example, 5500 requests may be sent instead of 500, and all of them may be sent in parallel. And each consumes resources on the receiving side.
Is there a way to prevent this? Either:
Query:
The text was updated successfully, but these errors were encountered: