-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33976: [Python] Remove usage of TableGroupBy helper in favor of pyarrow.acero.Declaration #34769
GH-33976: [Python] Remove usage of TableGroupBy helper in favor of pyarrow.acero.Declaration #34769
Conversation
… of pyarrow.acero.Declaration
@westonpace I don't see the C++
|
I'm kind of +0 but yes, let's go ahead and remove it. Having a single interface to Acero is probably easier to maintain long-term. |
To be clear I am fine with also keeping it if it would be useful on the C++ side (although also there, it's only a few lines of code through the declaration interface) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The python code looks good. I reviewed some of the group by code again. I thought it was doing a bit more to patch up the output from the exec plan. However, I believe I am remembering an older state of the code. So yes, let's go ahead and remove the C++ side too if you don't mind.
On the C++ side, those are publicly exposed in Acero I assume (use of |
Another question: |
If you don't mind, keeping the standalone groupby_test would be helpful. I want to encourage splitting plan_test up into tests-per-node (see order_by_node_test and fetch_node_test which are similar in size to groupby_test). Maybe you could rename it to aggregate_node_test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks a lot!
Benchmark runs are scheduled for baseline = c1359c5 and contender = 0bb2d83. 0bb2d83 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
Rationale for this change
Now we have the pyarrow.acero building blocks (GH-33976), we can easily construct the Declaration, which
arrow::compute::TableGroupBy
creates under the hood, ourselves in pyarrow.Are these changes tested?
Existing tests are passing.
Are there any user-facing changes?
No