Skip to content

Fix BinaryStats int overflow #3449

@Jiayi-Wang-db

Description

@Jiayi-Wang-db

Describe the bug, including details regarding any error messages, version, and platform.

BinaryStatistics.isSmallerThan() computes min.length() + max.length() as int + int. When a binary column contains values >= 2^30 bytes (1 GB), this sum overflows to a negative number, which is then incorrectly evaluated as smaller than MAX_STATS_SIZE (4096). This causes the full multi-GB min/max statistics to be serialized into the Thrift footer instead of being dropped, resulting in a corrupted Parquet file whose footer exceeds the 4-byte length field and cannot be read back.

Component(s)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions