Skip to content

ColumnIndex length mismatch can cause panic during decoding in Parquet #9832

@pchintar

Description

@pchintar

Description

Decoding ColumnIndex assumes that page-aligned arrays (null_pages, min_values, max_values) have matching lengths. This assumption is not validated, leading to a panic when they are inconsistent.


Root Cause

In parquet/src/file/page_index/column_index.rs, decoding performs unchecked indexing:

let len = null_pages.len();

for (i, is_null) in null_pages.iter().enumerate().take(len) {
    if !is_null {
        let min = min_bytes[i];
        let max = max_bytes[i];
        ...
    }
}

Similarly for byte array indexes:

let min = min_values[i];
let max = max_values[i];

But there is no validation that:

min_values.len() == null_pages.len()
max_values.len() == null_pages.len()

Impact

  • Panic (index out of bounds) on malformed or corrupted metadata
  • Inconsistent with expected behavior (should return ParquetError)
  • Affects robustness when handling external/untrusted parquet files

Reproduction

// Two pages are declared via null_pages
// But only ONE min/max entry is provided --> length mismatch
let column_index = ThriftColumnIndex {
    null_pages: vec![false, false],     // 2 pages
    min_values: vec![&[1, 0, 0, 0]],   // only 1 entry
    max_values: vec![&[10, 0, 0, 0]],  // only 1 entry
    null_counts: None,
    repetition_level_histograms: None,
    definition_level_histograms: None,
    boundary_order: BoundaryOrder::UNORDERED,
};

let _ = PrimitiveColumnIndex::<i32>::try_from_thrift(column_index);

Results in Panic:

index out of bounds: the len is 1 but the index is 1

Expected Behavior

Return a ParquetError when array lengths do not match the number of pages.


Proposed Fix

Validate lengths in:

  • PrimitiveColumnIndex::try_new
  • ByteArrayColumnIndex::try_new

before indexing into min_values / max_values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions