New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Implement ScalarAggregateOptions for count_distinct (grouped) #29392
Comments
Antoine Pitrou / @pitrou: |
Neal Richardson / @nealrichardson: |
Neal Richardson / @nealrichardson: > data.frame(keys = c(0, 0, 1, 1, NA), values = c("a", NA, "b", "c", "d")) %>% group_by(keys) %>% summarize(n_distinct(values))
# A tibble: 3 × 2
keys `n_distinct(values)`
<dbl> <int>
1 0 2
2 1 2
3 NA 1
> data.frame(keys = c(0, 0, 1, 1, NA), values = c("a", NA, "b", "c", "d")) %>% group_by(keys) %>% summarize(n_distinct(values, na.rm = TRUE))
# A tibble: 3 × 2
keys `n_distinct(values, na.rm = TRUE)`
<dbl> <int>
1 0 1
2 1 2
3 NA 1 |
Nicola Crane / @thisisnic: |
Antoine Pitrou / @pitrou: |
I'm writing the R bindings for the grouped
count_distinct
kernel, but the current implementation counts nulls as their own group. To match the R behaviour, I need to be able to specify whether or not to remove NA/NULL values.Please could we have ScalarAggregateOptions implemented for
count_distinct
?Reporter: Nicola Crane / @thisisnic
Assignee: David Li / @lidavidm
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-13764. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: