New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreading after window functions #50771
Multithreading after window functions #50771
Conversation
…s to allow parallel stream processing
…ms-window-function
This is an automated comment for commit ac07b34 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
…ms-window-function' into feat-preserve-num-streams-window-function
also would be good to add perf tests |
01568_window_functions_distributed - I suppose we need similar logic to what we have for aggregation, when remote nodes pass buckets to initiator in order through a single stream. |
…between WindowTransforms (and WindowTransform works on single stream anyway).
@frinkr a few tests left to fix. take a look pls |
…s to allow parallel stream processing
…between WindowTransforms (and WindowTransform works on single stream anyway).
…m:frinkr/ClickHouse into feat-preserve-num-streams-window-function
@nickitat I didn't see how the performance failures (zeros_mt, hasAll, etc.) related to the PR. |
@@ -60,6 +62,8 @@ WindowStep::WindowStep( | |||
|
|||
void WindowStep::transformPipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &) | |||
{ | |||
auto num_streams = pipeline.getNumThreads(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe getNumStreams()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getNumStreams()
return 1 if there is a ORDER BY
in the window,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, we don't preserve the number of stream, but rather always fan out after the WindowStep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I think we should name the setting accordingly
pls create a perf test with the queries you've provided in the description |
@nickitat a new perf test 'window_functions_downstream_multithreading' was added, but the CI reports run errors. How do I know the error details? I run the perf with command |
LGTM. @nickitat How is it going on? |
apart from the setting name lgtm |
How about |
I'm ok with |
…y_plan_enable_multithreading_after_window_functions
@nickitat the setting was renamed. |
Ok, let's fix clang-tidy:
and it will be ready for merging. |
@alexey-milovidov the clang-tidy was fixed. |
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20231028) * fix build due to ClickHouse/ClickHouse#50771 --------- Co-authored-by: kyligence-git <gluten@kyligence.io> Co-authored-by: Chang Chen <baibaichen@gmail.com>
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add option
query_plan_preserve_num_streams_after_window_functions
to preserve the number of streams after evaluating window functions to allow parallel stream processing.Detail
The
WindowTransform
needs to merge streams into one, but the new option will split the result fromWindowTransform
to multiple streams, which enables the parallel processing of down-streams functions/aggregates.Benchmark
Test data
Test queries
Test Env
Test Result