Skip to content

fix(parquet): avoid panic on ColumnIndex length mismatch#9833

Open
pchintar wants to merge 1 commit intoapache:mainfrom
pchintar:columnindex-length-validation
Open

fix(parquet): avoid panic on ColumnIndex length mismatch#9833
pchintar wants to merge 1 commit intoapache:mainfrom
pchintar:columnindex-length-validation

Conversation

@pchintar
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

In parquet/src/file/page_index/column_index.rs, ColumnIndex decoding assumes that page-aligned arrays (null_pages, min_values, max_values, and optional arrays) have matching lengths, but this is not validated.

As a result, malformed metadata can trigger an out-of-bounds panic during decoding instead of returning a ParquetError. Since parquet files are external input, this should be handled safely.

What changes are included in this PR?

  • Added validation in:

    • PrimitiveColumnIndex::try_new
    • ByteArrayColumnIndex::try_new
  • Ensures:

    • min_values.len() == null_pages.len()
    • max_values.len() == null_pages.len()
    • optional arrays (null_counts, histograms) are consistent with page count
  • Returns ParquetError on mismatch instead of panicking

Are these changes tested?

Yes.

Added a unit test:

  • test_column_index_rejects_mismatched_min_max_lengths

This constructs a ColumnIndex with mismatched lengths and verifies that decoding returns an error instead of panicking.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the parquet Changes to the parquet crate label Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ColumnIndex length mismatch can cause panic during decoding in Parquet

1 participant