Skip to content

Conversation

@rtpsw
Copy link
Contributor

@rtpsw rtpsw commented Apr 4, 2023

@rtpsw rtpsw requested a review from westonpace as a code owner April 4, 2023 13:49
@github-actions
Copy link

github-actions bot commented Apr 4, 2023

@github-actions
Copy link

github-actions bot commented Apr 4, 2023

⚠️ GitHub issue #34786 has been automatically assigned in GitHub to PR creator.

@rtpsw
Copy link
Contributor Author

rtpsw commented Apr 4, 2023

cc @westonpace @icexelloss

@github-actions github-actions bot added the awaiting review Awaiting review label Apr 4, 2023
}

Result<const HashAggregateKernel*> GetKernel(ExecContext* ctx, const Aggregate& aggregate,
void DefaultAggregateOptions(Aggregate* aggregate_ptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add this?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Apr 4, 2023
using GetKernel = std::function<Result<const Kernel*>(ExecContext*, Aggregate*,
const std::vector<TypeHolder>&)>;

Result<const Kernel*> GetScalarAggregateKernel(ExecContext* ctx, Aggregate* aggregate_ptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for?

std::vector<const Kernel*> kernels(in_types.size());
for (size_t i = 0; i < aggregates.size(); ++i) {
ARROW_ASSIGN_OR_RAISE(kernels[i], GetKernel(ctx, aggregates[i], in_types[i]));
ARROW_ASSIGN_OR_RAISE(kernels[i], get_kernel(ctx, &aggregates[i], in_types[i]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

auto ctx = plan_->query_context()->exec_context();
KernelContext kernel_ctx{ctx};
kernel_ctx.SetState(state->agg_states[i].get());
kernel_ctx.SetState(state->agg_states[i][0].get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Status ResetKernelStates() {
auto ctx = plan()->query_context()->exec_context();
ARROW_RETURN_NOT_OK(InitKernels(agg_kernels_, ctx, aggs_, agg_src_types_));
ARROW_RETURN_NOT_OK(InitKernels(InitHashAggregateKernel, agg_kernels_, ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why passing do we need to pass /*num_states_per_kernel=*/1?

@icexelloss
Copy link
Contributor

@rtpsw Not sure I follow what you are doing - seems like a lot of refactor is done. Can you explain your approach?

using compute::KernelState;
using compute::RowSegmenter;

struct ARROW_ACERO_EXPORT AggregateNodeArgs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

@rtpsw
Copy link
Contributor Author

rtpsw commented Apr 4, 2023

@rtpsw Not sure I follow what you are doing - seems like a lot of refactor is done. Can you explain your approach?

In this PR, the main goal is a single method MakeOutputSchema providing the output schema for an aggregation. The problem is that the original code has two classes , ScalarAggregateNode and GroupByNode, for aggregation that do not share much code between them for the purpose of constructing the output schema. To prepare the stage, I started with refactoring the original code to make them share code for this purpose.For this, I needed to encapsulate certain differences between them:

  • Get kernel: This is encapsulated by GetKernel. The two implementations are GetScalarAggregateKernel and GetHashAggregateKernel. The latter has the function dispatch on the types extended with the group-id.
  • Init kernel: This is encapsulated by InitKernel. The two implementations are InitScalarAggregateKernel and InitHashAggregateKernel. The latter has the kernel-args configured using the types extended with the group-id.
  • Resolve kernels: This is encapsulated by ResolveKernels. The two implementations are ResolveScalarAggregateKernels and ResolveHashAggregateKernels. The latter resolves each kernel using the types extended with the group-id.

Additional parts of the refactoring are:

  • Adding MakeAggregateNodeArgs as a common method for setting up the arguments needed for constructing an aggregation node, whether it is a ScalarAggregateNode or a GroupByNode.
  • Cleaning up ScalarAggregateNode::Make and GroupByNode::Make to use the above consistently.
  • Adding MakeOutputSchema that uses MakeAggregateNodeArgs to return the output schema that the aggregation node is constructed with.

@rtpsw
Copy link
Contributor Author

rtpsw commented Apr 10, 2023

Replaced by #34904

@rtpsw rtpsw closed this Apr 10, 2023
westonpace added a commit that referenced this pull request Apr 12, 2023
…r AggregateRel (#34904)

See #34786
* Closes: #34786

Can replace #34885

Lead-authored-by: Yaron Gvili <rtpsw@hotmail.com>
Co-authored-by: rtpsw <rtpsw@hotmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
…mer for AggregateRel (apache#34904)

See apache#34786
* Closes: apache#34786

Can replace apache#34885

Lead-authored-by: Yaron Gvili <rtpsw@hotmail.com>
Co-authored-by: rtpsw <rtpsw@hotmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…mer for AggregateRel (apache#34904)

See apache#34786
* Closes: apache#34786

Can replace apache#34885

Lead-authored-by: Yaron Gvili <rtpsw@hotmail.com>
Co-authored-by: rtpsw <rtpsw@hotmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Output schema calculated by Substrait consumer for aggregate rel seems incorrect.

2 participants