-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More parallel execution for queries with FINAL
#36396
Conversation
d299ff3
to
ba4e7b0
Compare
ba4e7b0
to
3b3e3f5
Compare
FINAL
FINAL
FINAL
FINAL
FINAL
6b91b79
to
b3ea5ff
Compare
@Mergifyio update |
✅ Branch has been successfully updated |
extern const int LOGICAL_ERROR; | ||
} | ||
|
||
struct IIndexAccess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, it's good to write a comment to
- every class (both interface and implementation)
- all virtual methods
- and, actually, every method which purpose is not obvious from the first glance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this iface has single implementation.
Do we actually need it? I think that, generally, you should not create an interfece if you don't plan to have > 1 implementations...
|
||
struct IIndexAccess | ||
{ | ||
struct Value : std::vector<Field> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inheritance from std containers is suspicious. Maybe, composition?
return std::accumulate( | ||
parts.begin(), parts.end(), static_cast<size_t>(0), [](size_t sum, const auto & part) { return sum + part.getRowsCount(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, but I would prefer for
loop :)
Hm, can we use it in order to lower amount of memory usage during GROUP BY column_from_order_by. If each thread will have it's own range of column_from_order_by values, it does mean, that threads will not have all possible values in their own hash tables. For example |
// NULL_LAST | ||
if (value[i].isNull()) | ||
value[i] = POSITIVE_INFINITY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear - why this is needed?
enum class Type | ||
{ | ||
Border, | ||
RangeBeginning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can also add RangeEnd
event to remove some ranges from event_queue
We have been running this code for 2 weeks, and from my perspective, it is very promising and delivering huge performance gain 🎉 . Here are some numbers (unfortunately i did not get any number before patch to compare, but i can tell you it was an order of magnitude slower).
And it seems to scale really well. Mostly the two pain point i have are not directly related to this pr :
|
thank you for the feedback!
|
And it can make sense to integrate with parallel replicas processing: #26748 |
if you have a server with high core count, but many load. It makes sense to run with max_final_threads=16 to process many requests in parallel. But if there is some imbalance in data distribution, one server might have twice more data, and this could benefit of having max_final_threads=32 only there. So having a setting saying 'max_ranges_size' instead would achieve that, with max_final_threads as an upper bound ?
That would be great 💯 my question specifically though is do you plan to make it stack ? For cases like :
a good usecase to get first/last entry in time log data
In those case it makes a lot of sense to have GROUP BY operating on the same ranges of final ? though that is not totally trivial actually, because the ranges are computed for full primary key, not the prefix so some aggregation are not local, without using the prefix only to compute ranges. |
final won't run faster if you set
yep, AIO should be able to use splitting regardless of presence of final in query |
@rschu1ze Robert, thank you for the review. I will make changes in a separate pr, to let this one to lend into the release. |
@nickitat, you forgot to take a look at failed tests before merging |
…ckHouse#36396)"" This reverts commit 5bfb152.
…ckHouse#36396)"" This reverts commit 5bfb152.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Now we split data parts into layers and distribute them among threads instead of whole parts to make the execution of queries with
FINAL
more data-parallel.an example of query pipeline
perf tests