You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
parquet::parquet_thrift::read_thrift_vec reads a Thrift compact-protocol list header and then calls Vec::with_capacity(list_ident.size as usize), where list_ident.size is a 32-bit varint pulled directly from attacker-controlled bytes. With a malformed input, the value can be close to i32::MAX; after the per-element-size multiplication this becomes a multi-GB to multi-hundred-GB allocation request, which panics in alloc::raw_vec::handle_error with capacity overflow (or OOM-kills the process on smaller-but-still-huge values) before any element is decoded.
Every public sync metadata entry point funnels through this function:
ParquetMetaDataReader::parse_and_finish
ParquetMetaDataReader::decode_metadata
SerializedFileReader::new
ParquetRecordBatchReaderBuilder::try_new
the async_reader family (re-exports decode_metadata after prefetching footer bytes)
So any downstream code that hands attacker-controlled bytes to a parquet reader gets a panic-on-decode DoS.
Per SECURITY.md this is a bug, not a vulnerability — there is no information disclosure or RCE path, only availability — but it is reachable from every metadata entry point, so it is worth tracking and closing properly.
To Reproduce
10 bytes:
0x28 0xfc 0xfc 0xfc 0xfc 0xfc 0xfc 0xfc 0xfc 0x51
let bytes:&[u8] = &[0x28,0xfc,0xfc,0xfc,0xfc,0xfc,0xfc,0xfc,0xfc,0x51];let _ = parquet::file::metadata::ParquetMetaDataReader::decode_metadata(bytes);
A 45-byte reproducer with valid PAR1 magic drives the same panic through ParquetRecordBatchReaderBuilder::try_new.
Expected behavior
Malformed Parquet input should return a ParquetError that callers can handle, never panic.
Found via
cargo-fuzz libFuzzer harness wrapping ParquetMetaDataReader::parse_and_finish and ParquetRecordBatchReaderBuilder::try_new over bytes::Bytes. ~95 unique crashing inputs (different VLQ sizes, different element types — SchemaElement, RowGroup, ColumnChunk — different offsets in the metadata graph) all converged on this single root cause within ~3 minutes of single-thread fuzzing. One bug, many surface symptoms.
Broader thrift parser hardening — per @etseidl on fix(parquet): Prevent negative list sizes in Thrift compact protocol parser #9868: "I think we could make a few more improvements to the thrift parser here." This issue is the umbrella for those follow-ups (e.g. reviewing every with_capacity/reserve in parquet_thrift against attacker-controlled sizes, audit of nested-list / map-key-or-value paths, any other decoders that could allocate before reading).
Happy to do further fuzzing passes against specific suspected hotspots if maintainers point at where to look.
Describe the bug
parquet::parquet_thrift::read_thrift_vecreads a Thrift compact-protocol list header and then callsVec::with_capacity(list_ident.size as usize), wherelist_ident.sizeis a 32-bit varint pulled directly from attacker-controlled bytes. With a malformed input, the value can be close toi32::MAX; after the per-element-size multiplication this becomes a multi-GB to multi-hundred-GB allocation request, which panics inalloc::raw_vec::handle_errorwithcapacity overflow(or OOM-kills the process on smaller-but-still-huge values) before any element is decoded.Every public sync metadata entry point funnels through this function:
ParquetMetaDataReader::parse_and_finishParquetMetaDataReader::decode_metadataSerializedFileReader::newParquetRecordBatchReaderBuilder::try_newasync_readerfamily (re-exportsdecode_metadataafter prefetching footer bytes)So any downstream code that hands attacker-controlled bytes to a parquet reader gets a panic-on-decode DoS.
Per
SECURITY.mdthis is a bug, not a vulnerability — there is no information disclosure or RCE path, only availability — but it is reachable from every metadata entry point, so it is worth tracking and closing properly.To Reproduce
10 bytes:
Output (with
RUST_BACKTRACE=1):A 45-byte reproducer with valid
PAR1magic drives the same panic throughParquetRecordBatchReaderBuilder::try_new.Expected behavior
Malformed Parquet input should return a
ParquetErrorthat callers can handle, never panic.Found via
cargo-fuzzlibFuzzer harness wrappingParquetMetaDataReader::parse_and_finishandParquetRecordBatchReaderBuilder::try_newoverbytes::Bytes. ~95 unique crashing inputs (different VLQ sizes, different element types —SchemaElement,RowGroup,ColumnChunk— different offsets in the metadata graph) all converged on this single root cause within ~3 minutes of single-thread fuzzing. One bug, many surface symptoms.Scope
This issue tracks two related strands:
Vec::with_capacityinread_thrift_vecby remaining input bytes, since every Thrift element costs at least one wire byte. Also reject negative declared sizes explicitly. Proposed in fix(parquet): Prevent negative list sizes in Thrift compact protocol parser #9868.with_capacity/reserveinparquet_thriftagainst attacker-controlled sizes, audit of nested-list / map-key-or-value paths, any other decoders that could allocate before reading).Happy to do further fuzzing passes against specific suspected hotspots if maintainers point at where to look.
Environment
parquetfd86c75(see fix(parquet): Prevent negative list sizes in Thrift compact protocol parser #9868 merge-base)Related