-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17737: [R] Groups before conversion to a Table must not be restored after collect()
#14175
Conversation
56d8238
to
a97b53e
Compare
b9fb2e3
to
42a95d0
Compare
379c3b5
to
1a41638
Compare
@@ -182,7 +182,7 @@ dim.arrow_dplyr_query <- function(x) { | |||
# Query on in-memory Table, so evaluate the filter | |||
# Don't need any columns | |||
x <- select.arrow_dplyr_query(x, NULL) | |||
rows <- nrow(compute.arrow_dplyr_query(x)) | |||
rows <- nrow(as_arrow_table(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because manipulating metadata for a table with no rows will cause the size to be updated to 0 x 0.
mtcars |> arrow::arrow_table() |> dplyr::select(NULL) |> arrow::as_arrow_table()
#> Table
#> 32 rows x 0 columns
#>
#>
#> See $metadata for additional Schema metadata
mtcars |> arrow::arrow_table() |> dplyr::select(NULL) |> arrow::as_arrow_table() |> dplyr::ungroup()
#> Table
#> 0 rows x 0 columns
#>
#>
#> See $metadata for additional Schema metadata
Created on 2022-10-07 with reprex v2.0.2
I don't know if this (handling of tables with no rows) is a problem.
A table with 0 rows and multiple columns appears to be quite exceptional, since creating a table from a data frame with no rows results in 0 x 0.
mtcars |> dplyr::select(NULL) |> arrow::arrow_table()
#> Table
#> 0 rows x 0 columns
#>
#>
#> See $metadata for additional Schema metadata
Created on 2022-10-07 with reprex v2.0.2
collect()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion for simplifying this change. Thanks for taking this on, will be nice to get this in the upcoming release along with your other changes around here.
225975e
to
943d603
Compare
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
This reverts commit 83eafbe.
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
…ibutes$.group_vars should not character(0) Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
because of Table with 0 columns handling Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
943d603
to
a706268
Compare
Benchmark runs are scheduled for baseline = d1a8f4b and contender = d008c17. d008c17 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
If a grouped data.frame is converted to arrow dplyr query and then back to a data.frame again, the data.frame-era groups are restored, even if it is ungrouped in the query.
This PR will update to ensure that the arrow dplyr query's groups are applied when
compute
orcollect
.