Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] dplyr::compute should convert from grouped arrow_dplyr_query to arrow Table #32973

Closed
asfimport opened this issue Sep 15, 2022 · 6 comments
Closed

Comments

@asfimport
Copy link
Collaborator

asfimport commented Sep 15, 2022

It is expected that dplyr::compute() will perform the calculation on the arrow dplyr query and convert it to a Table, but it does not seem to work correctly for grouped arrow dplyr queries and does not result in a Table.

mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::compute() |> class()
#> [1] "arrow_dplyr_query"
mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::ungroup() |> dplyr::compute() |> class()
#> [1] "Table"        "ArrowTabular" "ArrowObject"  "R6"

as_arrow_table() works fine.

mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> class()
#> [1] "arrow_dplyr_query"
mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::compute() |> class()
#> [1] "arrow_dplyr_query"
mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::collect(FALSE) |> class()
#> [1] "arrow_dplyr_query"
mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> arrow::as_arrow_table() |> class()
#> [1] "Table"        "ArrowTabular" "ArrowObject"  "R6"

It seems to revert to arrow dplyr query in the following line.

df <- as_adq(df)
df$group_by_vars <- query$group_by_vars
df$drop_empty_groups <- query$drop_empty_groups

 

Reporter: SHIMA Tatsuya / @eitsupi
Assignee: SHIMA Tatsuya / @eitsupi

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-17738. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
Could you clarify what is not working?

@asfimport
Copy link
Collaborator Author

SHIMA Tatsuya / @eitsupi:
I have updated the description.
Grouped arrow dplyr queries are not converted to tables by dplyr::compute.

@asfimport
Copy link
Collaborator Author

SHIMA Tatsuya / @eitsupi:
Ah, is this the intended behavior?
I didn't understand why this behavior was intended, I think compute should return a Table here, just as dbplyr and dtplyr do.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
They are evaluated and converted to Tables, but then if there are groups, group_by is called on the Table, which results in an arrow_dplyr_query object containing the Table. So, yes, this was intentional. Do you have a use case where this is a problem?

@asfimport
Copy link
Collaborator Author

SHIMA Tatsuya / @eitsupi:
I think it is confusing to users when compute does not result in a Table as intended when the group is left after summarise, etc. is executed.

mtcars |> arrow::arrow_table() |> dplyr::group_by(vs, am) |> dplyr::summarise(wt = mean(wt)) |> dplyr::compute()
#> Table (query)
#> vs: double
#> am: double
#> wt: double
#>
#> * Grouped by vs
#> See $.data for the source Arrow object

@asfimport
Copy link
Collaborator Author

Dewey Dunnington / @paleolimbot:
Issue resolved by pull request 14160
#14160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant