Skip to content

fix: Avoid panic decoding invalid parquet writer version from proto#22467

Open
fallintoplace wants to merge 2 commits into
apache:mainfrom
fallintoplace:fix-parquet-proto-writer-version
Open

fix: Avoid panic decoding invalid parquet writer version from proto#22467
fallintoplace wants to merge 2 commits into
apache:mainfrom
fallintoplace:fix-parquet-proto-writer-version

Conversation

@fallintoplace
Copy link
Copy Markdown

@fallintoplace fallintoplace commented May 22, 2026

Which issue does this PR close?

Rationale for this change

Parquet file format proto decoding is exposed through a try_decode_file_format API, but invalid writer_version values could still panic because the Parquet options conversion used an infallible expect while parsing the writer version.

This makes malformed or manually produced proto bytes abort the decode path instead of returning a DataFusion error.

What changes are included in this PR?

  • Convert Parquet table options decoding in the file format codec to the fallible TryFromProto path.
  • Return the existing writer version validation error for invalid non-empty proto values.
  • Treat an empty proto writer_version as the default writer version for compatibility with proto default values.
  • Add regression coverage for invalid and empty writer version decoding.

Are these changes tested?

  • cargo fmt --all
  • cargo test -p datafusion-proto --lib try_decode_file_format --features parquet
  • cargo test -p datafusion-proto --lib --features parquet
  • cargo clippy -p datafusion-proto --lib --features parquet -- -D warnings

Are there any user-facing changes?

Invalid Parquet writer versions in serialized file format protos now return an error instead of panicking during decode.

@github-actions github-actions Bot added the proto Related to proto crate label May 22, 2026
@fallintoplace fallintoplace marked this pull request as ready for review May 23, 2026 20:37
@Jefffrey Jefffrey changed the title fix: Avoid panic decoding invalid parquet writer version fix: Avoid panic decoding invalid parquet writer version from proto May 25, 2026
}
}

impl FromProto<&ParquetOptionsProto> for ParquetOptions {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno how much we care about maintaining compatibility here; I guess we could keep FromProto impl and just defer to the try version with an unwrap 🤔

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this on TryFromProto only rather than reintroducing an infallible FromProto wrapper. I checked the local call sites and there aren’t any remaining users of the old parquet FromProto impl, and keeping an unwrap-based wrapper would preserve the panic path this PR is trying to remove.

For the empty writer_version case, I kept the defaulting because this is specifically the proto3 omitted-field case: absent strings decode as "", while the schema documents the logical default for writer_version as "1.0". So empty/omitted preserves compatibility, while non-empty invalid values still return an error.

) -> datafusion_common::Result<Self, Self::Error> {
let default_options = ParquetOptions::default();
let writer_version = if proto.writer_version.is_empty() {
default_options.writer_version
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this default behaviour here, given for other options we don't really do this? Happy to hear more thoughts though

@fallintoplace fallintoplace force-pushed the fix-parquet-proto-writer-version branch from 60b6435 to 6ee499e Compare May 25, 2026 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parquet file format proto decode panics on invalid writer_version

2 participants