Describe the bug, including details regarding any error messages, version, and platform.
Sliced union arrays can report incorrect logical null counts in count.
Expected behavior:
For sliced sparse and dense unions, count(..., mode="only_valid") and count(..., mode="only_null") should reflect the logical nullness of the slice.
Actual behavior:
For both sliced sparse and dense unions, count(..., mode="only_valid") and count(..., mode="only_null") return incorrect results.
Reproduce
import pyarrow as pa
import pyarrow.compute as pc
def main():
# sparse_arr == [0.5, None, None, 3.0, True].
sparse_arr = pa.UnionArray.from_sparse(
pa.array([0, 1, 0, 0, 1], type=pa.int8()),
[
pa.array([0.5, 99.0, None, 3.0, 88.0]),
pa.array([False, None, True, False, True]),
],
)
# sparse == [None, None, 3.0, True].
sparse = sparse_arr.slice(1, 4)
# dense_arr == [0.5, True, 1.5, None, None, False].
dense_arr = pa.UnionArray.from_dense(
pa.array([0, 1, 0, 0, 1, 1], type=pa.int8()),
pa.array([0, 0, 1, 2, 1, 2], type=pa.int32()),
[
pa.array([0.5, 1.5, None]),
pa.array([True, None, False]),
],
)
# dense == [True, 1.5, None, None].
dense = dense_arr.slice(1, 4)
print(f"pyarrow: {pa.__version__}")
print()
# Logical sparse slice: [None, None, 3.0, True].
print("sparse count only_valid expected: 2")
print(f"sparse count only_valid actual: {pc.count(sparse, mode='only_valid').as_py()}")
print("sparse count only_null expected: 2")
print(f"sparse count only_null actual: {pc.count(sparse, mode='only_null').as_py()}")
print()
# Logical dense slice: [True, 1.5, None, None].
print("dense count only_valid expected: 2")
print(f"dense count only_valid actual: {pc.count(dense, mode='only_valid').as_py()}")
print("dense count only_null expected: 2")
print(f"dense count only_null actual: {pc.count(dense, mode='only_null').as_py()}")
if __name__ == "__main__":
main()
Result
pyarrow: 24.0.0
sparse count only_valid expected: 2
sparse count only_valid actual: 4
sparse count only_null expected: 2
sparse count only_null actual: 0
dense count only_valid expected: 2
dense count only_valid actual: 4
dense count only_null expected: 2
dense count only_null actual: 0
Component(s)
C++, Python
Describe the bug, including details regarding any error messages, version, and platform.
Sliced union arrays can report incorrect logical null counts in
count.Expected behavior:
For sliced sparse and dense unions,
count(..., mode="only_valid")andcount(..., mode="only_null")should reflect the logical nullness of the slice.Actual behavior:
For both sliced sparse and dense unions,
count(..., mode="only_valid")andcount(..., mode="only_null")return incorrect results.Reproduce
Result
Component(s)
C++, Python