Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs][C++] "value_counts" kernel doc incorrectly mentions to skip nulls #35817

Closed
jorisvandenbossche opened this issue May 30, 2023 · 1 comment · Fixed by #35818
Closed

Comments

@jorisvandenbossche
Copy link
Member

The "value_counts" kernel indicates that nulls in the input are skipped:

const FunctionDoc value_counts_doc(
"Compute counts of unique elements",
("For each distinct value, compute the number of times it occurs in the array.\n"
"The result is returned as an array of `struct<input type, int64>`.\n"
"Nulls in the input are ignored."),
{"array"});

But that's not actually the case. Nulls are also counted and included in the output:

In [1]: import pyarrow.compute as pc

In [2]: pc.value_counts([1, 2, 2, None, None])
Out[2]: 
<pyarrow.lib.StructArray object at 0x7fb88e1c6f20>
-- is_valid: all not null
-- child 0 type: int64
  [
    1,
    2,
    null
  ]
-- child 1 type: int64
  [
    1,
    2,
    2
  ]
@jorisvandenbossche
Copy link
Member Author

Same is true for unique:

In [3]: pc.unique([1, 2, 2, None, None])
Out[3]: 
<pyarrow.lib.Int64Array object at 0x7fb88e1c75e0>
[
  1,
  2,
  null
]

And to show it's just the docs, we have various tests that cover this.

jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue May 30, 2023
pitrou pushed a commit that referenced this issue May 30, 2023
…#35818)

### Rationale for this change

The documentation of "unique" and "value_counts" kernels indicate that nulls in the input are ignored, but this is not the case: they are considered as distinct values that are counted.

* Closes: #35817

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 13.0.0 milestone May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants