Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17463: [R] Avoid unnecessary projections #13954

Merged
merged 1 commit into from
Aug 27, 2022

Conversation

nealrichardson
Copy link
Member

Before:

> mtcars |> arrow_table() |> count(cyl) |> explain()
ExecPlan with 6 nodes:
5:SinkNode{}
  4:ProjectNode{projection=[cyl, n]}
    3:ProjectNode{projection=[cyl, n]}
      2:GroupByNode{keys=["cyl"], aggregates=[
      	hash_sum(n, {skip_nulls=true, min_count=1}),
      ]}
        1:ProjectNode{projection=["n": 1, cyl]}
          0:TableSourceNode{}

After:

ExecPlan with 5 nodes:
4:SinkNode{}
  3:ProjectNode{projection=[cyl, n]}
    2:GroupByNode{keys=["cyl"], aggregates=[
    	hash_sum(n, {skip_nulls=true, min_count=1}),
    ]}
      1:ProjectNode{projection=["n": 1, cyl]}
        0:TableSourceNode{}

@github-actions
Copy link

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of needs_projection() seems sound to me and the coverage of cases you noticed that added an extra projection seems excellent. Our existing coverage for datasets is extensive and I'm not worried this will break anything that isn't covered by CI somehow!

@nealrichardson
Copy link
Member Author

CI failures are unrelated (one fixed by #13952, the other is the rtools35 build). Merging.

@nealrichardson nealrichardson merged commit 80bba29 into apache:master Aug 27, 2022
@ursabot
Copy link

ursabot commented Aug 28, 2022

Benchmark runs are scheduled for baseline = 1b9c57e and contender = 80bba29. 80bba29 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Failed ⬇️6.58% ⬆️0.27%] ursa-i9-9960x
[Finished ⬇️0.25% ⬆️0.07%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 80bba299 ec2-t3-xlarge-us-east-2
[Failed] 80bba299 test-mac-arm
[Failed] 80bba299 ursa-i9-9960x
[Finished] 80bba299 ursa-thinkcentre-m75q
[Finished] 1b9c57e2 ec2-t3-xlarge-us-east-2
[Failed] 1b9c57e2 test-mac-arm
[Failed] 1b9c57e2 ursa-i9-9960x
[Finished] 1b9c57e2 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Aug 28, 2022

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

anjakefala pushed a commit to anjakefala/arrow that referenced this pull request Aug 31, 2022
Before:

```
> mtcars |> arrow_table() |> count(cyl) |> explain()
ExecPlan with 6 nodes:
5:SinkNode{}
  4:ProjectNode{projection=[cyl, n]}
    3:ProjectNode{projection=[cyl, n]}
      2:GroupByNode{keys=["cyl"], aggregates=[
      	hash_sum(n, {skip_nulls=true, min_count=1}),
      ]}
        1:ProjectNode{projection=["n": 1, cyl]}
          0:TableSourceNode{}
```

After:

```
ExecPlan with 5 nodes:
4:SinkNode{}
  3:ProjectNode{projection=[cyl, n]}
    2:GroupByNode{keys=["cyl"], aggregates=[
    	hash_sum(n, {skip_nulls=true, min_count=1}),
    ]}
      1:ProjectNode{projection=["n": 1, cyl]}
        0:TableSourceNode{}
```

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
zagto pushed a commit to zagto/arrow that referenced this pull request Oct 7, 2022
Before:

```
> mtcars |> arrow_table() |> count(cyl) |> explain()
ExecPlan with 6 nodes:
5:SinkNode{}
  4:ProjectNode{projection=[cyl, n]}
    3:ProjectNode{projection=[cyl, n]}
      2:GroupByNode{keys=["cyl"], aggregates=[
      	hash_sum(n, {skip_nulls=true, min_count=1}),
      ]}
        1:ProjectNode{projection=["n": 1, cyl]}
          0:TableSourceNode{}
```

After:

```
ExecPlan with 5 nodes:
4:SinkNode{}
  3:ProjectNode{projection=[cyl, n]}
    2:GroupByNode{keys=["cyl"], aggregates=[
    	hash_sum(n, {skip_nulls=true, min_count=1}),
    ]}
      1:ProjectNode{projection=["n": 1, cyl]}
        0:TableSourceNode{}
```

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
fatemehp pushed a commit to fatemehp/arrow that referenced this pull request Oct 17, 2022
Before:

```
> mtcars |> arrow_table() |> count(cyl) |> explain()
ExecPlan with 6 nodes:
5:SinkNode{}
  4:ProjectNode{projection=[cyl, n]}
    3:ProjectNode{projection=[cyl, n]}
      2:GroupByNode{keys=["cyl"], aggregates=[
      	hash_sum(n, {skip_nulls=true, min_count=1}),
      ]}
        1:ProjectNode{projection=["n": 1, cyl]}
          0:TableSourceNode{}
```

After:

```
ExecPlan with 5 nodes:
4:SinkNode{}
  3:ProjectNode{projection=[cyl, n]}
    2:GroupByNode{keys=["cyl"], aggregates=[
    	hash_sum(n, {skip_nulls=true, min_count=1}),
    ]}
      1:ProjectNode{projection=["n": 1, cyl]}
        0:TableSourceNode{}
```

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants