New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13574: [C++] Add 'count all' option to count kernels #10895
Conversation
Hmm... It's weird that Can we instead make |
Rebased, fixed conflicts, added CountOptions, added GLib/Python/R bindings for CountOptions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, looks good in general.
/// Count only non-null values. | ||
NON_NULL = 0, | ||
/// Count only null values. | ||
NULLS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but the plural / singular is a bit inconsistent. Perhaps ONLY_NULL vs ONLY_VALID (or NULLS vs NON_NULLS, or NULL_VALUES vs VALID_VALUES...)?
*out = Datum(state.nulls); | ||
break; | ||
case CountOptions::ALL: | ||
*out = Datum(state.non_nulls + state.nulls); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, it's probably not necessary to popcount the null bitmap?
@@ -562,7 +570,7 @@ const FunctionDoc count_doc{"Count the number of null / non-null values", | |||
("By default, only non-null values are counted.\n" | |||
"This can be changed through ScalarAggregateOptions."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"CountOptions"
Pandas calls this 'size'. I opted not to extend ScalarAggregateOptions as the option wouldn't apply to any other kernel.