[R] Column order after `group_by(foo) |> select(...)` is different from dplyr #35534

eitsupi · 2023-05-10T16:00:47Z

Describe the bug, including details regarding any error messages, version, and platform.

Related to #35473

I corrected the column order after mutate and transmute in #35473, but it seems that it needs to be corrected in select as well.

The columns used for groups should to be moved to the left.

mtcars |> dplyr::group_by(cyl) |> dplyr::select(mpg)
#> Adding missing grouping variables: `cyl`
#> # A tibble: 32 × 2
#> # Groups:   cyl [3]
#>      cyl   mpg
#>    <dbl> <dbl>
#>  1     6  21
#>  2     6  21
#>  3     4  22.8
#>  4     6  21.4
#>  5     8  18.7
#>  6     6  18.1
#>  7     8  14.3
#>  8     4  24.4
#>  9     4  22.8
#> 10     6  19.2
#> # … with 22 more rows
mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::select(mpg) |> dplyr::collect()
#> # A tibble: 32 × 2
#> # Groups:   cyl [3]
#>      mpg   cyl
#>    <dbl> <dbl>
#>  1  21       6
#>  2  21       6
#>  3  22.8     4
#>  4  21.4     6
#>  5  18.7     8
#>  6  18.1     6
#>  7  14.3     8
#>  8  24.4     4
#>  9  22.8     4
#> 10  19.2     6
#> # … with 22 more rows

^{Created on 2023-05-10 with reprex v2.0.2}

Component(s)

R

The text was updated successfully, but these errors were encountered:

nealrichardson · 2023-05-12T12:57:13Z

Thanks. This would be in ensure_group_vars(), and it looks like we might also want to mimic the Adding missing grouping variables message to be explicit when this happens.

…nning of the variable list (#36305) ### Rationale for this change As reported by @ eitsupi, dplyr adds missing grouping variables to the beginning of the variable list; however, we add them to the *end* of the variable list. Our general policy is to match dplyr's behaviour everywhere. Before this PR: ``` r library(arrow, warn.conflicts = FALSE) #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information. library(dplyr, warn.conflicts = FALSE) tibble::tibble(int = 1:4, chr = letters[1:4]) |> group_by(chr) |> select(int) |> collect() #> Adding missing grouping variables: `chr` #> # A tibble: 4 × 2 #> # Groups: chr [4] #> chr int #> <chr> <int> #> 1 a 1 #> 2 b 2 #> 3 c 3 #> 4 d 4 arrow_table(int = 1:4, chr = letters[1:4]) |> group_by(chr) |> select(int) |> collect() #> # A tibble: 4 × 2 #> # Groups: chr [4] #> int chr #> <int> <chr> #> 1 1 a #> 2 2 b #> 3 3 c #> 4 4 d ``` After this PR: ``` r library(arrow, warn.conflicts = FALSE) #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information. library(dplyr, warn.conflicts = FALSE) tibble::tibble(int = 1:4, chr = letters[1:4]) |> group_by(chr) |> select(int) |> collect() #> Adding missing grouping variables: `chr` #> # A tibble: 4 × 2 #> # Groups: chr [4] #> chr int #> <chr> <int> #> 1 a 1 #> 2 b 2 #> 3 c 3 #> 4 d 4 arrow_table(int = 1:4, chr = letters[1:4]) |> group_by(chr) |> select(int) |> collect() #> # A tibble: 4 × 2 #> # Groups: chr [4] #> chr int #> <chr> <int> #> 1 a 1 #> 2 b 2 #> 3 c 3 #> 4 d 4 ``` ### Are these changes tested? Yes, a test was added. ### Are there any user-facing changes? Yes, column ordering will be different. This could be a breaking change because existing code that refers to columns by index may change; however, referring to a column by name is much more common. * Closes: #35534 Authored-by: Dewey Dunnington <dewey@voltrondata.com> Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>

eitsupi added the Type: bug label May 10, 2023

github-actions bot added the Component: R label May 10, 2023

thisisnic added Priority: Critical Priority: Medium and removed Priority: Critical labels May 30, 2023

github-actions bot mentioned this issue Jun 26, 2023

GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list #36305

Merged

github-actions bot assigned paleolimbot Jun 26, 2023

paleolimbot closed this as completed in #36305 Jun 27, 2023

paleolimbot added this to the 13.0.0 milestone Jun 27, 2023

raulcd added the Breaking Change Includes a breaking change to the API label Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R] Column order after `group_by(foo) |> select(...)` is different from dplyr #35534

[R] Column order after `group_by(foo) |> select(...)` is different from dplyr #35534

eitsupi commented May 10, 2023

nealrichardson commented May 12, 2023

[R] Column order after group_by(foo) |> select(...) is different from dplyr #35534

[R] Column order after group_by(foo) |> select(...) is different from dplyr #35534

Comments

eitsupi commented May 10, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

nealrichardson commented May 12, 2023

[R] Column order after `group_by(foo) |> select(...)` is different from dplyr #35534

[R] Column order after `group_by(foo) |> select(...)` is different from dplyr #35534