-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refine](Operator) When _stop_emplace_flag is not set to true, perform batch processing on the block. #33173
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38418 ms
|
TPC-DS: Total hot run time: 182907 ms
|
ClickBench: Total hot run time: 29.67 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38654 ms
|
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 38869 ms
|
TPC-DS: Total hot run time: 182934 ms
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
return Status::OK(); | ||
} | ||
|
||
bool DistinctStreamingAggOperatorX::need_more_input_data(RuntimeState* state) const { | ||
auto& local_state = get_local_state(state); | ||
return local_state._aggregated_block->empty() && !local_state._child_eos && | ||
(_limit == -1 || local_state._output_distinct_rows < _limit); | ||
const bool need_batch = local_state._stop_emplace_flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'need_more_input_data' can be made static [readability-convert-member-functions-to-static]
const bool need_batch = local_state._stop_emplace_flag | |
bool DistinctStreamingAggOperatorX::need_more_input_data(RuntimeState* state) { |
be/src/pipeline/exec/distinct_streaming_aggregation_operator.h:105:
- bool need_more_input_data(RuntimeState* state) const override;
+ static bool need_more_input_data(RuntimeState* state) override;
TeamCity be ut coverage result: |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
return Status::OK(); | ||
} | ||
|
||
Status DistinctStreamingAggLocalState::open(RuntimeState* state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'open' can be made static [readability-convert-member-functions-to-static]
Status DistinctStreamingAggLocalState::open(RuntimeState* state) { | |
static Status DistinctStreamingAggLocalState::open(RuntimeState* state) { |
run buildall |
TeamCity be ut coverage result: |
run buildall |
TeamCity be ut coverage result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
run buildall |
TeamCity be ut coverage result: |
run buildall |
TPC-H: Total hot run time: 39195 ms
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 186973 ms
|
ClickBench: Total hot run time: 31.27 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
…m batch processing on the block. (#33173)
…m batch processing on the block. (#33173)
Proposed changes
The setting of _stop_emplace_flag is a heuristic operation.
Previously, there was no batch processing on PipelineX (push one block, then pull one block), so it was easily misled by the first few small blocks.
Therefore, batch processing is adopted here, and the first two values of STREAMING_HT_MIN_REDUCTION are set to 0.
Only when enough data is obtained will _stop_emplace_flag be set to true.
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...