You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In ExecPlan$Build(), we call Project in a few places, and there is code to make sure that there is at least one ProjectNode in the query in order to remove augmented fields from a Dataset scan (unless the user has added them). As a result, it is possible to get multiple ProjectNodes in a row that are essentially no-op. One example is with grouped aggregation: there is a projection to get the order of the columns back to what R expects, and then a no-op projection after that:
In ExecPlan$Build(), we call Project in a few places, and there is code to make sure that there is at least one ProjectNode in the query in order to remove augmented fields from a Dataset scan (unless the user has added them). As a result, it is possible to get multiple ProjectNodes in a row that are essentially no-op. One example is with grouped aggregation: there is a projection to get the order of the columns back to what R expects, and then a no-op projection after that:
IDK how significant of a performance impact this would have, but it certainly looks wasteful and should be avoidable.
Reporter: Neal Richardson / @nealrichardson
Assignee: Neal Richardson / @nealrichardson
PRs and other links:
Note: This issue was originally created as ARROW-17463. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: