Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-13574: [C++] Add 'count all' option to count kernels #10895

Closed
wants to merge 2 commits into from

Conversation

lidavidm
Copy link
Member

@lidavidm lidavidm commented Aug 6, 2021

Pandas calls this 'size'. I opted not to extend ScalarAggregateOptions as the option wouldn't apply to any other kernel.

@github-actions
Copy link

github-actions bot commented Aug 6, 2021

@pitrou
Copy link
Member

pitrou commented Aug 9, 2021

Hmm... It's weird that Count takes ScalarAggregateOptions, yet 1) completely ignores min_count 2) has an unexpected interpretation of skip_nulls.

Can we instead make Count take a CountOptions that would have a enum describing whether one wants to 1) count non-nulls 2) count nulls 3) count everything?

@lidavidm
Copy link
Member Author

Rebased, fixed conflicts, added CountOptions, added GLib/Python/R bindings for CountOptions.

@lidavidm lidavidm changed the title ARROW-13574: [C++] Add count_all and hash_count_all kernels ARROW-13574: [C++] Add 'count all' option to count kernels Aug 10, 2021
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, looks good in general.

/// Count only non-null values.
NON_NULL = 0,
/// Count only null values.
NULLS,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but the plural / singular is a bit inconsistent. Perhaps ONLY_NULL vs ONLY_VALID (or NULLS vs NON_NULLS, or NULL_VALUES vs VALID_VALUES...)?

*out = Datum(state.nulls);
break;
case CountOptions::ALL:
*out = Datum(state.non_nulls + state.nulls);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it's probably not necessary to popcount the null bitmap?

@@ -562,7 +570,7 @@ const FunctionDoc count_doc{"Count the number of null / non-null values",
("By default, only non-null values are counted.\n"
"This can be changed through ScalarAggregateOptions."),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"CountOptions"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants