Skip to content

[C++][Python] Sliced unions can report incorrect null counts in count #50113

@fenfeng9

Description

@fenfeng9

Describe the bug, including details regarding any error messages, version, and platform.

Sliced union arrays can report incorrect logical null counts in count.

Expected behavior:

For sliced sparse and dense unions, count(..., mode="only_valid") and count(..., mode="only_null") should reflect the logical nullness of the slice.

Actual behavior:

For both sliced sparse and dense unions, count(..., mode="only_valid") and count(..., mode="only_null") return incorrect results.

Reproduce

import pyarrow as pa
import pyarrow.compute as pc


def main():
    # sparse_arr == [0.5, None, None, 3.0, True].
    sparse_arr = pa.UnionArray.from_sparse(
        pa.array([0, 1, 0, 0, 1], type=pa.int8()),
        [
            pa.array([0.5, 99.0, None, 3.0, 88.0]),
            pa.array([False, None, True, False, True]),
        ],
    )
    # sparse == [None, None, 3.0, True].
    sparse = sparse_arr.slice(1, 4)

    # dense_arr == [0.5, True, 1.5, None, None, False].
    dense_arr = pa.UnionArray.from_dense(
        pa.array([0, 1, 0, 0, 1, 1], type=pa.int8()),
        pa.array([0, 0, 1, 2, 1, 2], type=pa.int32()),
        [
            pa.array([0.5, 1.5, None]),
            pa.array([True, None, False]),
        ],
    )
    # dense == [True, 1.5, None, None].
    dense = dense_arr.slice(1, 4)

    print(f"pyarrow: {pa.__version__}")
    print()

    # Logical sparse slice: [None, None, 3.0, True].
    print("sparse count only_valid expected: 2")
    print(f"sparse count only_valid actual:   {pc.count(sparse, mode='only_valid').as_py()}")
    print("sparse count only_null expected:  2")
    print(f"sparse count only_null actual:    {pc.count(sparse, mode='only_null').as_py()}")
    print()

    # Logical dense slice: [True, 1.5, None, None].
    print("dense count only_valid expected:  2")
    print(f"dense count only_valid actual:    {pc.count(dense, mode='only_valid').as_py()}")
    print("dense count only_null expected:   2")
    print(f"dense count only_null actual:     {pc.count(dense, mode='only_null').as_py()}")


if __name__ == "__main__":
    main()

Result

pyarrow: 24.0.0

sparse count only_valid expected: 2
sparse count only_valid actual:   4
sparse count only_null expected:  2
sparse count only_null actual:    0

dense count only_valid expected:  2
dense count only_valid actual:    4
dense count only_null expected:   2
dense count only_null actual:     0

Component(s)

C++, Python

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions