Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Use Arrow engine for summarize() by default #29259

Closed
asfimport opened this issue Aug 12, 2021 · 1 comment
Closed

[R] Use Arrow engine for summarize() by default #29259

asfimport opened this issue Aug 12, 2021 · 1 comment

Comments

@asfimport
Copy link

asfimport commented Aug 12, 2021

ARROW-13344 enabled the dplyr verb summarise() to use the Arrow engine but kept this off by default, controlled by the arrow.debug option.

Before this can be turned on by default, we should ensure that the following are all implemented:

  • a sufficient set of hash aggregate kernels and R aggregate function mappings to them, covering the vast majority of all aggregate functions that dplyr users call in summarise() (add any additional required ones to ARROW-13339)
  • support for a sufficient set of data types in aggregates
  • support for a sufficient set of data types in grouping columns
  • handling of NA and NaN values in aggregates and the na.rm option consistent with base R and dplyr (ARROW-13497 and possibly other issues)
  • handling of NA and NaN values in grouping columns consistent with dplyr
  • handling empty or bad input to summarise() (ARROW-13543)
  • many new tests to confirm equivalent results from a variety of group_by() %>% summarise() queries on data frames and on Arrow data
  • resolution of various related bugs

Reporter: Ian Cook / @ianmcook
Assignee: Ian Cook / @ianmcook

Related issues:

Note: This issue was originally created as ARROW-13618. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Neal Richardson / @nealrichardson:
All linked tasks have been completed 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants