Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Avoid unnecessary projections #32727

Closed
asfimport opened this issue Aug 18, 2022 · 1 comment
Closed

[R] Avoid unnecessary projections #32727

asfimport opened this issue Aug 18, 2022 · 1 comment

Comments

@asfimport
Copy link
Collaborator

In ExecPlan$Build(), we call Project in a few places, and there is code to make sure that there is at least one ProjectNode in the query in order to remove augmented fields from a Dataset scan (unless the user has added them). As a result, it is possible to get multiple ProjectNodes in a row that are essentially no-op. One example is with grouped aggregation: there is a projection to get the order of the columns back to what R expects, and then a no-op projection after that:

> mtcars |> arrow_table() |> count(cyl) |> explain()
ExecPlan with 6 nodes:
5:SinkNode{}
  4:ProjectNode{projection=[cyl, n]}
    3:ProjectNode{projection=[cyl, n]}
      2:GroupByNode{keys=["cyl"], aggregates=[
      	hash_sum(n, {skip_nulls=true, min_count=1}),
      ]}
        1:ProjectNode{projection=["n": 1, cyl]}
          0:TableSourceNode{}

IDK how significant of a performance impact this would have, but it certainly looks wasteful and should be avoidable.

Reporter: Neal Richardson / @nealrichardson
Assignee: Neal Richardson / @nealrichardson

PRs and other links:

Note: This issue was originally created as ARROW-17463. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
Issue resolved by pull request 13954
#13954

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants