-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
Description
Describe the bug, including details regarding any error messages, version, and platform.
BinaryStatistics.isSmallerThan() computes min.length() + max.length() as int + int. When a binary column contains values >= 2^30 bytes (1 GB), this sum overflows to a negative number, which is then incorrectly evaluated as smaller than MAX_STATS_SIZE (4096). This causes the full multi-GB min/max statistics to be serialized into the Thrift footer instead of being dropped, resulting in a corrupted Parquet file whose footer exceeds the 4-byte length field and cannot be read back.
Component(s)
No response
Reactions are currently unavailable