Skip to content

Validate FIXED_LEN_BYTE_ARRAY length for DECIMAL and INTERVAL types#9985

Open
CynicDog wants to merge 1 commit into
apache:mainfrom
CynicDog:validate-type-length-decimal-interval
Open

Validate FIXED_LEN_BYTE_ARRAY length for DECIMAL and INTERVAL types#9985
CynicDog wants to merge 1 commit into
apache:mainfrom
CynicDog:validate-type-length-decimal-interval

Conversation

@CynicDog
Copy link
Copy Markdown

@CynicDog CynicDog commented May 16, 2026

Which issue does this PR close?

Rationale for this change

from_fixed_len_byte_array in parquet/src/arrow/schema/primitive.rs does not validate type_length. While PrimitiveTypeBuilder::build() enforces these constraints during schema construction (schema/types.rs:477 for INTERVAL, :565-580 for DECIMAL), schemas decoded directly from Thrift bypass that validation path entirely. As a result:

  • DECIMAL with a type_length outside 1..=32 was silently routed through Decimal128 / Decimal256 using invalid parameters.
  • INTERVAL with a type_length != 12 silently returned Interval(DayTime) regardless.

The same function already rejects FLOAT16 when type_length != 2. This PR mirrors that pattern for DECIMAL and INTERVAL, closing the TODO introduced in #1682.

What changes are included in this PR?

  • Added a check_decimal_length helper to reject type_length values outside 1..=32 for both LogicalType::Decimal and ConvertedType::DECIMAL.
  • Added an inline type_length == 12 check for ConvertedType::INTERVAL.

Are these changes tested?

Yes. Added five new tests in parquet/src/arrow/schema/primitive.rs::tests covering:

  • Invalid lengths ({-1, 0, 33} for DECIMAL, {0, 11, 13} for INTERVAL)
  • Valid lengths (16 → Decimal128, 32 → Decimal256, 12 → Interval(DayTime))

To exercise the reader-side check, the tests construct a valid Type::PrimitiveType via the builder and directly modify the type_length on the resulting enum, simulating a malformed schema decoded from Thrift.

Are there any user-facing changes?

No public API changes. The only behavior change is on the reader side: schemas with an out-of-range type_length for DECIMAL or INTERVAL will now return a ParquetError::General instead of silently producing a mismatched Arrow type.

The Rust `PrimitiveTypeBuilder` rejects bad `type_length` at construction,
but schemas decoded directly from Thrift bypass that path. Previously a
malformed DECIMAL length silently produced the wrong Arrow type, and an
INTERVAL of length != 12 was accepted unchecked.

Validate `type_length` in `from_fixed_len_byte_array`, mirroring the
existing FLOAT16 pattern: 1..=32 bytes for DECIMAL (Decimal128 below 16,
Decimal256 up to 32), exactly 12 bytes for INTERVAL.
@github-actions github-actions Bot added the parquet Changes to the parquet crate label May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate FIXED_LEN_BYTE_ARRAY type_length for DECIMAL and INTERVAL in Parquet → Arrow schema conversion

1 participant