GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel #34885

rtpsw · 2023-04-04T13:49:30Z

See #34786

Closes: [C++] Output schema calculated by Substrait consumer for aggregate rel seems incorrect. #34786

…mer for AggregateRel

github-actions · 2023-04-04T13:49:54Z

Closes: [C++] Output schema calculated by Substrait consumer for aggregate rel seems incorrect. #34786

github-actions · 2023-04-04T13:49:57Z

⚠️ GitHub issue #34786 has been automatically assigned in GitHub to PR creator.

rtpsw · 2023-04-04T13:50:08Z

cc @westonpace @icexelloss

icexelloss · 2023-04-04T14:36:58Z

cpp/src/arrow/acero/aggregate_node.cc

 }

-Result<const HashAggregateKernel*> GetKernel(ExecContext* ctx, const Aggregate& aggregate,
+void DefaultAggregateOptions(Aggregate* aggregate_ptr,


Why do we need to add this?

icexelloss · 2023-04-04T14:38:07Z

cpp/src/arrow/acero/aggregate_node.cc

+using GetKernel = std::function<Result<const Kernel*>(ExecContext*, Aggregate*,
+                                                      const std::vector<TypeHolder>&)>;
+
+Result<const Kernel*> GetScalarAggregateKernel(ExecContext* ctx, Aggregate* aggregate_ptr,


What is this used for?

icexelloss · 2023-04-04T14:39:01Z

cpp/src/arrow/acero/aggregate_node.cc

+  std::vector<const Kernel*> kernels(in_types.size());
  for (size_t i = 0; i < aggregates.size(); ++i) {
-    ARROW_ASSIGN_OR_RAISE(kernels[i], GetKernel(ctx, aggregates[i], in_types[i]));
+    ARROW_ASSIGN_OR_RAISE(kernels[i], get_kernel(ctx, &aggregates[i], in_types[i]));


Why this change?

icexelloss · 2023-04-04T14:40:03Z

cpp/src/arrow/acero/aggregate_node.cc

      auto ctx = plan_->query_context()->exec_context();
      KernelContext kernel_ctx{ctx};
-      kernel_ctx.SetState(state->agg_states[i].get());
+      kernel_ctx.SetState(state->agg_states[i][0].get());


Why this change?

icexelloss · 2023-04-04T14:40:51Z

cpp/src/arrow/acero/aggregate_node.cc

  Status ResetKernelStates() {
    auto ctx = plan()->query_context()->exec_context();
-    ARROW_RETURN_NOT_OK(InitKernels(agg_kernels_, ctx, aggs_, agg_src_types_));
+    ARROW_RETURN_NOT_OK(InitKernels(InitHashAggregateKernel, agg_kernels_, ctx,


Why passing do we need to pass /*num_states_per_kernel=*/1?

icexelloss · 2023-04-04T14:43:09Z

@rtpsw Not sure I follow what you are doing - seems like a lot of refactor is done. Can you explain your approach?

icexelloss · 2023-04-04T14:47:50Z

cpp/src/arrow/acero/aggregate_node.h

+using compute::KernelState;
+using compute::RowSegmenter;
+
+struct ARROW_ACERO_EXPORT AggregateNodeArgs {


Why do we need this?

rtpsw · 2023-04-04T19:07:47Z

@rtpsw Not sure I follow what you are doing - seems like a lot of refactor is done. Can you explain your approach?

In this PR, the main goal is a single method MakeOutputSchema providing the output schema for an aggregation. The problem is that the original code has two classes , ScalarAggregateNode and GroupByNode, for aggregation that do not share much code between them for the purpose of constructing the output schema. To prepare the stage, I started with refactoring the original code to make them share code for this purpose.For this, I needed to encapsulate certain differences between them:

Get kernel: This is encapsulated by GetKernel. The two implementations are GetScalarAggregateKernel and GetHashAggregateKernel. The latter has the function dispatch on the types extended with the group-id.
Init kernel: This is encapsulated by InitKernel. The two implementations are InitScalarAggregateKernel and InitHashAggregateKernel. The latter has the kernel-args configured using the types extended with the group-id.
Resolve kernels: This is encapsulated by ResolveKernels. The two implementations are ResolveScalarAggregateKernels and ResolveHashAggregateKernels. The latter resolves each kernel using the types extended with the group-id.

Additional parts of the refactoring are:

Adding MakeAggregateNodeArgs as a common method for setting up the arguments needed for constructing an aggregation node, whether it is a ScalarAggregateNode or a GroupByNode.
Cleaning up ScalarAggregateNode::Make and GroupByNode::Make to use the above consistently.
Adding MakeOutputSchema that uses MakeAggregateNodeArgs to return the output schema that the aggregation node is constructed with.

rtpsw · 2023-04-10T13:13:56Z

Replaced by #34904

…r AggregateRel (#34904) See #34786 * Closes: #34786 Can replace #34885 Lead-authored-by: Yaron Gvili <rtpsw@hotmail.com> Co-authored-by: rtpsw <rtpsw@hotmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

…mer for AggregateRel (apache#34904) See apache#34786 * Closes: apache#34786 Can replace apache#34885 Lead-authored-by: Yaron Gvili <rtpsw@hotmail.com> Co-authored-by: rtpsw <rtpsw@hotmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

apacheGH-34786: [C++] Fix output schema calculated by Substrait consu…

1e7925d

…mer for AggregateRel

rtpsw requested a review from westonpace as a code owner April 4, 2023 13:49

github-actions bot added the Component: C++ label Apr 4, 2023

github-actions bot added the awaiting review Awaiting review label Apr 4, 2023

icexelloss reviewed Apr 4, 2023

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Apr 4, 2023

icexelloss reviewed Apr 4, 2023

View reviewed changes

clean up aggergate node API

85349f1

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 4, 2023

rtpsw mentioned this pull request Apr 5, 2023

GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel #34904

Merged

rtpsw closed this Apr 10, 2023

GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel #34885

GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel #34885

Uh oh!

Conversation

rtpsw commented Apr 4, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 4, 2023

Uh oh!

github-actions bot commented Apr 4, 2023

Uh oh!

rtpsw commented Apr 4, 2023

Uh oh!

icexelloss Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

icexelloss Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

icexelloss Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

icexelloss Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

icexelloss Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

icexelloss commented Apr 4, 2023

Uh oh!

icexelloss Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

rtpsw commented Apr 4, 2023

Uh oh!

rtpsw commented Apr 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rtpsw commented Apr 4, 2023 •

edited by github-actions bot

Loading