-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15609: [C++][Compute] Support hash_aggregate with only keys #12369
Conversation
for (auto& state : states) { | ||
ARROW_ASSIGN_OR_RAISE(state, InitKernels(kernels, ctx, aggregates, argument_descrs)); | ||
} | ||
if (!arguments.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Init arguments related variables only if arguments exists
@@ -2586,7 +2592,8 @@ Result<Datum> GroupBy(const std::vector<Datum>& arguments, const std::vector<Dat | |||
|
|||
// start "streaming" execution | |||
ExecBatch key_batch, argument_batch; | |||
while (argument_batch_iterator->Next(&argument_batch) && | |||
while ((argument_batch_iterator == NULLPTR || | |||
argument_batch_iterator->Next(&argument_batch)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argument_batch_iterator == NULLPTR
means there are no arguments, the reading and check of argument_batch can be skipped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically, this is equivalent to selecting all unique tuples of the input keys? I suppose we have similar vector kernels already, but being vector kernels they're not usable in the query engine, and this has the added benefit of being incrementally computed.
And it looks like this already works with ExecPlan, so we only needed to change the helper GroupBy function.
Yes, the result is the same.
That's right, and then such SQL query can be supported |
👍 thank you! |
Benchmark runs are scheduled for baseline = 3a8e409 and contender = e4e866f. e4e866f is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Currently, hash_aggregate requires keys, arguments, and aggregation functions, sometimes we want to execute group_by without any aggregation functions.