Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-33987: [R] Support new dplyr .by/by argument #35667

Merged
merged 6 commits into from
Jun 7, 2023

Conversation

eitsupi
Copy link
Contributor

@eitsupi eitsupi commented May 18, 2023

Rationale for this change

Implement the .by argument for mutate, summarise, filter and slice_* family.

What changes are included in this PR?

The .by argument that matches dplyr has been added to some functions.

Most of the internal functions, such as compute_by, are copied from the existing dplyr backends, dbplyr and dtplyr.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #33987 has been automatically assigned in GitHub to PR creator.

@thisisnic
Copy link
Member

@eitsupi - if you give this PR a rebase to fix the CI failures, I'll take a look at this

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
@eitsupi
Copy link
Contributor Author

eitsupi commented May 28, 2023

@eitsupi - if you give this PR a rebase to fix the CI failures, I'll take a look at this

Done.

Copy link
Member

@thisisnic thisisnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this PR; it's always good to keep up with the changes in the dplyr API, and this is really thorough.

Please could you add or update some of the tests to use more than 1 grouping variable here?

r/R/dplyr-by.R Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels May 31, 2023
Comment on lines +444 to +450
compare_dplyr_binding(
.input %>%
filter(int > 2, pnorm(dbl) > .99, .by = chr) %>%
collect(),
tbl,
warning = "Expression pnorm\\(dbl\\) > 0.99 not supported in Arrow; pulling data into R"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is pulling the data into R, so we end up comparing dplyr with dplyr instead of Arrow. Is the purpose of this test to test with multiple filters? If so, how about swapping pnorm() out for something else? Or is the intention here different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an intentional test to ensure that no grouping occurs during conversion to data.frame.
Added a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for clarifying that!

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 31, 2023
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
@eitsupi
Copy link
Contributor Author

eitsupi commented May 31, 2023

Please could you add or update some of the tests to use more than 1 grouping variable here?

I think I did it, but these are pretty much the same test cases. I think it is better to parameterize these tests using patrick.
I created an issue for that #35844.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 31, 2023
Copy link
Member

@thisisnic thisisnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this, thanks! Will leave it a little longer before merging in case any other R folk want to have a skim over.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels May 31, 2023
@eitsupi
Copy link
Contributor Author

eitsupi commented Jun 6, 2023

Are there any plans for merging? (I am just worried this has been forgotten)

@thisisnic thisisnic merged commit a0d28de into apache:main Jun 7, 2023
13 checks passed
@thisisnic
Copy link
Member

Thanks for the reminder @eitsupi!

@eitsupi eitsupi deleted the r-dplyr-by branch June 7, 2023 09:38
@eitsupi
Copy link
Contributor Author

eitsupi commented Jun 7, 2023

Thanks for merging!

@ursabot
Copy link

ursabot commented Jun 8, 2023

Benchmark runs are scheduled for baseline = c62ce6b and contender = a0d28de. a0d28de is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.71% ⬆️0.06%] test-mac-arm
[Finished ⬇️0.33% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.45% ⬆️0.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] a0d28dee ec2-t3-xlarge-us-east-2
[Finished] a0d28dee test-mac-arm
[Finished] a0d28dee ursa-i9-9960x
[Finished] a0d28dee ursa-thinkcentre-m75q
[Finished] c62ce6b1 ec2-t3-xlarge-us-east-2
[Finished] c62ce6b1 test-mac-arm
[Finished] c62ce6b1 ursa-i9-9960x
[Finished] c62ce6b1 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R] Support new dplyr .by/by argument
3 participants