Skip to content

[Python] Table.group_by(...).aggregate([("flag", "any")]) returns incorrect True` on a sliced Boolean array with nulls #50043

@Fly-a-Kite

Description

@Fly-a-Kite

Describe the bug, including details regarding any error messages, version, and platform.

Environment

pyarrow: 24.0.0
Python: 3.12.3
Platform: Ubuntu24.04

24.0.0 was the latest PyPI release when reproduced on 2026-05-26.

Reproducer

import pyarrow as pa
import pyarrow.compute as pc

base = pa.table({"g": [99, 10, 10], "flag": [True, False, None]})
sliced = base.slice(1)
rebuilt = pa.table(sliced.to_pydict())

def grouped_any(table):
    return (
        table.group_by("g", use_threads=False)
        .aggregate([("flag", "any")])
        .column("flag_any")
        .to_pylist()
    )

print("offset:", sliced["flag"].chunk(0).offset)
print("scalar any:", pc.any(pa.array([False, None])).as_py())
print("sliced:", grouped_any(sliced))
print("rebuilt:", grouped_any(rebuilt))

Actual Output

offset: 1
scalar any: False
sliced: [True]
rebuilt: [False]

Expected Output

sliced: [False]
rebuilt: [False]

After slice(1), the only rows in group g=10 have flag values
[False, None]. The grouped any result should therefore agree with scalar
compute.any and the zero-offset rebuilt table.

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions