GH-33987: [R] Support new dplyr .by/by argument #35667

eitsupi · 2023-05-18T12:35:54Z

Rationale for this change

Implement the .by argument for mutate, summarise, filter and slice_* family.

What changes are included in this PR?

The .by argument that matches dplyr has been added to some functions.

Most of the internal functions, such as compute_by, are copied from the existing dplyr backends, dbplyr and dtplyr.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

Closes: [R] Support new dplyr .by/by argument #33987

github-actions · 2023-05-18T12:36:16Z

Closes: [R] Support new dplyr .by/by argument #33987

github-actions · 2023-05-18T12:36:19Z

⚠️ GitHub issue #33987 has been automatically assigned in GitHub to PR creator.

thisisnic · 2023-05-26T08:11:09Z

@eitsupi - if you give this PR a rebase to fix the CI failures, I'll take a look at this

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>

eitsupi · 2023-05-28T06:34:59Z

@eitsupi - if you give this PR a rebase to fix the CI failures, I'll take a look at this

Done.

thisisnic

Thanks for making this PR; it's always good to keep up with the changes in the dplyr API, and this is really thorough.

Please could you add or update some of the tests to use more than 1 grouping variable here?

r/R/dplyr-by.R

thisisnic · 2023-05-31T12:06:27Z

r/tests/testthat/test-dplyr-filter.R

+  compare_dplyr_binding(
+    .input %>%
+      filter(int > 2, pnorm(dbl) > .99, .by = chr) %>%
+      collect(),
+    tbl,
+    warning = "Expression pnorm\\(dbl\\) > 0.99 not supported in Arrow; pulling data into R"
+  )


This test is pulling the data into R, so we end up comparing dplyr with dplyr instead of Arrow. Is the purpose of this test to test with multiple filters? If so, how about swapping pnorm() out for something else? Or is the intention here different?

This is an intentional test to ensure that no grouping occurs during conversion to data.frame.
Added a comment.

Great, thanks for clarifying that!

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>

eitsupi · 2023-05-31T14:09:50Z

Please could you add or update some of the tests to use more than 1 grouping variable here?

I think I did it, but these are pretty much the same test cases. I think it is better to parameterize these tests using patrick.
I created an issue for that #35844.

thisisnic

I'm happy with this, thanks! Will leave it a little longer before merging in case any other R folk want to have a skim over.

eitsupi · 2023-06-06T23:05:41Z

Are there any plans for merging? (I am just worried this has been forgotten)

thisisnic · 2023-06-07T08:20:41Z

Thanks for the reminder @eitsupi!

eitsupi · 2023-06-07T09:39:04Z

Thanks for merging!

ursabot · 2023-06-08T06:36:22Z

Benchmark runs are scheduled for baseline = c62ce6b and contender = a0d28de. a0d28de is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.71% ⬆️0.06%] test-mac-arm
[Finished ⬇️0.33% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.45% ⬆️0.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] a0d28dee ec2-t3-xlarge-us-east-2
[Finished] a0d28dee test-mac-arm
[Finished] a0d28dee ursa-i9-9960x
[Finished] a0d28dee ursa-thinkcentre-m75q
[Finished] c62ce6b1 ec2-t3-xlarge-us-east-2
[Finished] c62ce6b1 test-mac-arm
[Finished] c62ce6b1 ursa-i9-9960x
[Finished] c62ce6b1 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

eitsupi requested review from paleolimbot and thisisnic as code owners May 18, 2023 12:35

github-actions bot added Component: R awaiting review Awaiting review labels May 18, 2023

eitsupi added 2 commits May 28, 2023 06:33

supprt .by

64b78dc

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>

should return original .data

d76a9d3

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>

eitsupi force-pushed the r-dplyr-by branch from e4ee438 to d76a9d3 Compare May 28, 2023 06:33

thisisnic reviewed May 31, 2023

View reviewed changes

r/R/dplyr-by.R Outdated Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels May 31, 2023

thisisnic reviewed May 31, 2023

View reviewed changes

eitsupi added 2 commits May 31, 2023 13:32

improve error message

be64c45

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>

test: add a comment on the intent of the test

4cdd264

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 31, 2023

eitsupi added 2 commits May 31, 2023 13:58

fix tests for error message

4d3e468

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>

add tests for 2 .by and tidyselect .by

587b96f

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 31, 2023

thisisnic approved these changes May 31, 2023

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels May 31, 2023

eitsupi mentioned this pull request Jun 6, 2023

[R] impliment dplyr::reframe #35929

Open

thisisnic merged commit a0d28de into apache:main Jun 7, 2023
13 checks passed

eitsupi deleted the r-dplyr-by branch June 7, 2023 09:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-33987: [R] Support new dplyr .by/by argument #35667

GH-33987: [R] Support new dplyr .by/by argument #35667

eitsupi commented May 18, 2023 •

edited by github-actions bot

Loading

github-actions bot commented May 18, 2023

github-actions bot commented May 18, 2023

thisisnic commented May 26, 2023

eitsupi commented May 28, 2023

thisisnic left a comment •

edited

Loading

thisisnic May 31, 2023

eitsupi May 31, 2023

thisisnic May 31, 2023

eitsupi commented May 31, 2023

thisisnic left a comment •

edited

Loading

eitsupi commented Jun 6, 2023

thisisnic commented Jun 7, 2023

eitsupi commented Jun 7, 2023

ursabot commented Jun 8, 2023

GH-33987: [R] Support new dplyr .by/by argument #35667

GH-33987: [R] Support new dplyr .by/by argument #35667

Conversation

eitsupi commented May 18, 2023 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented May 18, 2023

github-actions bot commented May 18, 2023

thisisnic commented May 26, 2023

eitsupi commented May 28, 2023

thisisnic left a comment • edited Loading

Choose a reason for hiding this comment

thisisnic May 31, 2023

Choose a reason for hiding this comment

eitsupi May 31, 2023

Choose a reason for hiding this comment

thisisnic May 31, 2023

Choose a reason for hiding this comment

eitsupi commented May 31, 2023

thisisnic left a comment • edited Loading

Choose a reason for hiding this comment

eitsupi commented Jun 6, 2023

thisisnic commented Jun 7, 2023

eitsupi commented Jun 7, 2023

ursabot commented Jun 8, 2023

eitsupi commented May 18, 2023 •

edited by github-actions bot

Loading

thisisnic left a comment •

edited

Loading

thisisnic left a comment •

edited

Loading