parquet: panic-on-decode DoS from unchecked Thrift list size in read_thrift_vec (and broader thrift parser hardening)

## Describe the bug

`parquet::parquet_thrift::read_thrift_vec` reads a Thrift compact-protocol list header and then calls `Vec::with_capacity(list_ident.size as usize)`, where `list_ident.size` is a 32-bit varint pulled directly from attacker-controlled bytes. With a malformed input, the value can be close to `i32::MAX`; after the per-element-size multiplication this becomes a multi-GB to multi-hundred-GB allocation request, which panics in `alloc::raw_vec::handle_error` with `capacity overflow` (or OOM-kills the process on smaller-but-still-huge values) before any element is decoded.

Every public sync metadata entry point funnels through this function:

- `ParquetMetaDataReader::parse_and_finish`
- `ParquetMetaDataReader::decode_metadata`
- `SerializedFileReader::new`
- `ParquetRecordBatchReaderBuilder::try_new`
- the `async_reader` family (re-exports `decode_metadata` after prefetching footer bytes)

So any downstream code that hands attacker-controlled bytes to a parquet reader gets a panic-on-decode DoS.

Per `SECURITY.md` this is a **bug, not a vulnerability** — there is no information disclosure or RCE path, only availability — but it is reachable from every metadata entry point, so it is worth tracking and closing properly.

## To Reproduce

10 bytes:

```
0x28 0xfc 0xfc 0xfc 0xfc 0xfc 0xfc 0xfc 0xfc 0x51
```

```rust
let bytes: &[u8] = &[0x28, 0xfc, 0xfc, 0xfc, 0xfc, 0xfc, 0xfc, 0xfc, 0xfc, 0x51];
let _ = parquet::file::metadata::ParquetMetaDataReader::decode_metadata(bytes);
```

Output (with `RUST_BACKTRACE=1`):

```
panicked at library/alloc/src/raw_vec/mod.rs:28:5: capacity overflow
   3: alloc::raw_vec::handle_error
   4: alloc::raw_vec::with_capacity_in
   7: alloc::vec::Vec::with_capacity::<SchemaElement>
   8: parquet::parquet_thrift::read_thrift_vec       (parquet_thrift.rs:690)
   9: parquet::file::metadata::thrift::parquet_metadata_from_bytes
                                                     (thrift/mod.rs:790)
  10: parquet::file::metadata::parser::decode_metadata (parser.rs:233)
  13: ParquetMetaDataReader::decode_footer_metadata   (reader.rs:783)
  14: ParquetMetaDataReader::parse_metadata           (reader.rs:585)
  17: ParquetMetaDataReader::parse_and_finish         (reader.rs:230)
```

A 45-byte reproducer with valid `PAR1` magic drives the same panic through `ParquetRecordBatchReaderBuilder::try_new`.

## Expected behavior

Malformed Parquet input should return a `ParquetError` that callers can handle, never panic.

## Found via

`cargo-fuzz` libFuzzer harness wrapping `ParquetMetaDataReader::parse_and_finish` and `ParquetRecordBatchReaderBuilder::try_new` over `bytes::Bytes`. ~95 unique crashing inputs (different VLQ sizes, different element types — `SchemaElement`, `RowGroup`, `ColumnChunk` — different offsets in the metadata graph) all converged on this single root cause within ~3 minutes of single-thread fuzzing. **One bug, many surface symptoms.**

## Scope

This issue tracks two related strands:

1. **Immediate fix** — bound the `Vec::with_capacity` in `read_thrift_vec` by remaining input bytes, since every Thrift element costs at least one wire byte. Also reject negative declared sizes explicitly. Proposed in #9868.
2. **Broader thrift parser hardening** — per @etseidl on #9868: \"I think we could make a few more improvements to the thrift parser here.\" This issue is the umbrella for those follow-ups (e.g. reviewing every `with_capacity`/`reserve` in `parquet_thrift` against attacker-controlled sizes, audit of nested-list / map-key-or-value paths, any other decoders that could allocate before reading).

Happy to do further fuzzing passes against specific suspected hotspots if maintainers point at where to look.

## Environment

- crate: `parquet`
- master at the time of report: HEAD around `fd86c75` (see #9868 merge-base)
- rustc: stable

## Related

- PR #9868 — proposed fix for strand 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parquet: panic-on-decode DoS from unchecked Thrift list size in read_thrift_vec (and broader thrift parser hardening) #9874

Describe the bug

To Reproduce

Expected behavior

Found via

Scope

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

parquet: panic-on-decode DoS from unchecked Thrift list size in read_thrift_vec (and broader thrift parser hardening) #9874

Description

Describe the bug

To Reproduce

Expected behavior

Found via

Scope

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions