Skip to content

Panic when reading corrupt parquet file with truncated data instead of ParquetError #9705

@swanandx

Description

@swanandx

Describe the bug

Reading parquet files with corrupted data leads to panic due to:

assert!(end <= remainder.len());

To Reproduce

couldn't write a minimal PoC, but here is stacktrace

thread 'xx' (53) panicked at /home/ubuntu/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/parquet-57.3.0/src/file/metadata/reader.rs:535:17:
assertion failed: end <= remainder.len()
stack backtrace:
   0: __rustc::rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic
   3: <datafusion_datasource_parquet::reader::CachedParquetFileReader as parquet::arrow::async_reader::AsyncFileReader>::get_metadata::{{closure}}
   4: <datafusion_datasource_parquet::opener::ParquetOpener as datafusion_datasource::file_stream::FileOpener>::open::{{closure}}
   5: <datafusion_datasource::file_stream::FileStream as futures_core::stream::Stream>::poll_next
   6: <datafusion_physical_plan::coop::CooperativeStream<T> as futures_core::stream::Stream>::poll_next
   7: <datafusion_physical_plan::stream::BatchSplitStream as futures_core::stream::Stream>::poll_next

we had a deltalake, I corrupted one of the parquet files with:

python3 -c "
data = open('/tmp/original.parquet','rb').read()
total = len(data)

# Keep first 30% of data + last 1000 bytes (footer)
head = data[:int(total * 0.3)]
foot = data[-1000:]  # footer is small, 846 bytes per metadata output
open('/tmp/corrupt.parquet','wb').write(head + foot)
print(f'Original: {total} -> Corrupt: {len(head) + len(foot)} bytes')
"

that lead to the crash

Expected behavior

It should return ParquetError instead of panic

Additional context
version: parquet-57.3.0 [ latest would fail as well ]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugparquetChanges to the parquet crate

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions