GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list #36305

paleolimbot · 2023-06-26T14:43:53Z

Rationale for this change

As reported by @eitsupi, dplyr adds missing grouping variables to the beginning of the variable list; however, we add them to the end of the variable list. Our general policy is to match dplyr's behaviour everywhere.

Before this PR:

library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
library(dplyr, warn.conflicts = FALSE)

tibble::tibble(int = 1:4, chr = letters[1:4]) |> 
  group_by(chr) |> 
  select(int) |> 
  collect()
#> Adding missing grouping variables: `chr`
#> # A tibble: 4 × 2
#> # Groups:   chr [4]
#>   chr     int
#>   <chr> <int>
#> 1 a         1
#> 2 b         2
#> 3 c         3
#> 4 d         4

arrow_table(int = 1:4, chr = letters[1:4]) |> 
  group_by(chr) |> 
  select(int) |> 
  collect()
#> # A tibble: 4 × 2
#> # Groups:   chr [4]
#>     int chr  
#>   <int> <chr>
#> 1     1 a    
#> 2     2 b    
#> 3     3 c    
#> 4     4 d

After this PR:

library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
library(dplyr, warn.conflicts = FALSE)

tibble::tibble(int = 1:4, chr = letters[1:4]) |> 
  group_by(chr) |> 
  select(int) |> 
  collect()
#> Adding missing grouping variables: `chr`
#> # A tibble: 4 × 2
#> # Groups:   chr [4]
#>   chr     int
#>   <chr> <int>
#> 1 a         1
#> 2 b         2
#> 3 c         3
#> 4 d         4

arrow_table(int = 1:4, chr = letters[1:4]) |> 
  group_by(chr) |> 
  select(int) |> 
  collect()
#> # A tibble: 4 × 2
#> # Groups:   chr [4]
#>   chr     int
#>   <chr> <int>
#> 1 a         1
#> 2 b         2
#> 3 c         3
#> 4 d         4

Are these changes tested?

Yes, a test was added.

Are there any user-facing changes?

Yes, column ordering will be different. This could be a breaking change because existing code that refers to columns by index may change; however, referring to a column by name is much more common.

Closes: [R] Column order after group_by(foo) |> select(...) is different from dplyr #35534

github-actions · 2023-06-26T14:44:20Z

⚠️ GitHub issue #35534 has been automatically assigned in GitHub to PR creator.

thisisnic

Great, thanks for fixing this!

conbench-apache-arrow · 2023-06-30T00:45:08Z

Conbench analyzed the 6 benchmark runs on commit 7de273b4.

There was 1 benchmark result indicating a performance regression:

Commit Run on ursa-thinkcentre-m75q at 2023-06-28 18:14:26Z
- params=<STATIC_VECTOR(std::shared_ptr)>, source=cpp-micro, suite=arrow-small-vector-benchmark

The full Conbench report has more details.

paleolimbot added 2 commits June 26, 2023 11:23

swap order

a4a349c

test column order

e8313d9

paleolimbot requested a review from thisisnic as a code owner June 26, 2023 14:43

paleolimbot marked this pull request as draft June 26, 2023 14:43

github-actions bot added Component: R awaiting committer review Awaiting committer review labels Jun 26, 2023

paleolimbot marked this pull request as ready for review June 26, 2023 17:02

thisisnic approved these changes Jun 26, 2023

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Jun 26, 2023

paleolimbot merged commit 7de273b into apache:main Jun 27, 2023
13 checks passed

paleolimbot removed the awaiting merge Awaiting merge label Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list #36305

GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list #36305

paleolimbot commented Jun 26, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Jun 26, 2023

thisisnic left a comment

conbench-apache-arrow bot commented Jun 30, 2023

GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list #36305

GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list #36305

Conversation

paleolimbot commented Jun 26, 2023 • edited by github-actions bot Loading

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented Jun 26, 2023

thisisnic left a comment

Choose a reason for hiding this comment

conbench-apache-arrow bot commented Jun 30, 2023

paleolimbot commented Jun 26, 2023 •

edited by github-actions bot

Loading