-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35979: [C++] Refactor Acero scalar and hash aggregation into separate files #35980
Conversation
…sting aggregation code implement window_scalar_aggregation implement window_groupby_aggregation add window_*_aggregation tests
|
@@ -29,7 +29,11 @@ endmacro() | |||
|
|||
set(ARROW_ACERO_SRCS | |||
accumulation_queue.cc | |||
aggregate_node.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregate_node.cc
was split into scalar_aggregate_node.cc
, groupby_aggregate_node.cc
and aggregate_internal.cc
- the existing code was moved as-is
Thanks @ildipo! @westonpace Since this is a fairly large PR I think it might be beneficial if reviewers get an understanding of high level design and get on the same page (maybe a meeting? Or some shared doc?). WDYT? Also @ianmcook IIRC you also mentioned interested in reviewing windowing design/functionality |
@@ -0,0 +1,276 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this was inside aggregate_node.cc
@@ -0,0 +1,447 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this was inside aggregate_node.cc
@@ -0,0 +1,322 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this was inside aggregate_node.cc
230124c
to
4b4723b
Compare
@icexelloss happy to meet in person and go over the high level design. Just from a glance it looks like the aggregate node was split into the scalar and group-by cases with common parts put into aggregate_internal (I'm guessing to help share code between the aggregate cases and the window aggregates). Then two new window nodes were added. It seems that both of these nodes rely on the InputReceived being called in a sorted order in the same way that the asof join node does? |
@westonpace that is correct |
I decided to split this into 2 to simplify review. |
int total_output_batches_ = 0; | ||
}; | ||
|
||
class WindowScalarNode : public ExecNode, public TracedNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this?
cpp/src/arrow/acero/options.h
Outdated
@@ -297,6 +297,26 @@ class ARROW_ACERO_EXPORT ProjectNodeOptions : public ExecNodeOptions { | |||
std::vector<std::string> names; | |||
}; | |||
|
|||
/// windows proceed left to right | |||
class ARROW_ACERO_EXPORT WindowAggregateArgs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert this?
@ildipo I am in favor of splitting up aggregate code to make it more reusable. +1 for the refactoring. It looks like there are still some windowing related changes in this PR though. |
@@ -69,3 +69,4 @@ message SegmentedAggregateRel { | |||
// A list of one or more aggregate expressions along with an optional filter. | |||
repeated substrait.AggregateRel.Measure measures = 3; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: revert this change? (Since we don't touch this file at all probably cleaner if we don't leave a new line change)
@ildipo LGTM. Minor comments on the substrait file (since we don't change it probably better not to leave untouched) Lint issue probably need:
Do you mind also updating the title and description to better reflect the reduced change set? |
LGTM. cc @westonpace in case you want to take a look. (Should be pretty straight forward) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@westonpace I plan to merge this later Today if you don't have objections. |
@westonpace Since I didn't hear from you I went ahead and merged this. (No changes to external headers so we should be fine). But let me know in case you have some more comments we can do it as a follow up. |
Conbench analyzed the 7 benchmark runs on commit There were 4 benchmark results indicating a performance regression:
The full Conbench report has more details. |
Rationale for this change
Some refactoring to simplify relations development and pave the way for implementing window aggregation.
What changes are included in this PR?
Existing Acero aggregation (scalar and group-by) sources have been refactored into separate files with no changes.