Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Simplify AggregateNodeOptions aggregates/targets #31909

Closed
asfimport opened this issue May 12, 2022 · 4 comments
Closed

[C++] Simplify AggregateNodeOptions aggregates/targets #31909

asfimport opened this issue May 12, 2022 · 4 comments

Comments

@asfimport
Copy link

Currently AggregateNodeOptions is:


class ARROW_EXPORT AggregateNodeOptions : public ExecNodeOptions {
 public:
  // aggregations which will be applied to the targetted fields
  std::vector<internal::Aggregate> aggregates;
  // fields to which aggregations will be applied
  std::vector<FieldRef> targets;
  // output field names for aggregations
  std::vector<std::string> names;
  // keys by which aggregations will be grouped
  std::vector<FieldRef> keys;
};

It is not very obvious how aggregates and targets are related. My initial read of the comments led me to think that each aggregate would be applied to each target and you would end up with len(aggregates) * len(targets) output fields. In reality the aggregate at index i only applies to the target at index i. It would be simpler to add a FieldRef target to internal::Aggregate (and Aggregate should not be internal).

Alternatively, the entire internal::Aggregate could be replaced by a "call" arrow::compute::Expression

Reporter: Weston Pace / @westonpace
Assignee: Vibhatha Lakmal Abeykoon / @vibhatha

PRs and other links:

Note: This issue was originally created as ARROW-16549. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Vibhatha Lakmal Abeykoon / @vibhatha:
cc @westonpace  I think this make sense. Also it is easier for the user. I think it would be okay to have it as a field within the Aggregate and put it in arrow::compute:: namespace as you're suggesting. But call is much straight forward. 

But to make it clear to the user, I would say the struct is fine for now. WDYT?

@asfimport
Copy link
Author

Vibhatha Lakmal Abeykoon / @vibhatha:
How about including names field within the aggregates? cc @westonpace 

I have added the main change which is moving 'targetinside theAggregate` object. 

But the current change breaks the R-bindings. And looking at this specific piece of code, 

node <- node$Aggregate(

I think, may be we can put the names inside the Aggregate? (optional or user-provided).

Appreiciate your thoughts,   @nealrichardson

@asfimport
Copy link
Author

Weston Pace / @westonpace:
Moving names inside of the Aggregate makes a lot of sense to me. I think that's a good idea.

@asfimport
Copy link
Author

Weston Pace / @westonpace:
Issue resolved by pull request 13150
#13150

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant