fix: Avoid panic decoding invalid parquet writer version from proto#22467
fix: Avoid panic decoding invalid parquet writer version from proto#22467fallintoplace wants to merge 2 commits into
Conversation
| } | ||
| } | ||
|
|
||
| impl FromProto<&ParquetOptionsProto> for ParquetOptions { |
There was a problem hiding this comment.
Dunno how much we care about maintaining compatibility here; I guess we could keep FromProto impl and just defer to the try version with an unwrap 🤔
There was a problem hiding this comment.
I kept this on TryFromProto only rather than reintroducing an infallible FromProto wrapper. I checked the local call sites and there aren’t any remaining users of the old parquet FromProto impl, and keeping an unwrap-based wrapper would preserve the panic path this PR is trying to remove.
For the empty writer_version case, I kept the defaulting because this is specifically the proto3 omitted-field case: absent strings decode as "", while the schema documents the logical default for writer_version as "1.0". So empty/omitted preserves compatibility, while non-empty invalid values still return an error.
| ) -> datafusion_common::Result<Self, Self::Error> { | ||
| let default_options = ParquetOptions::default(); | ||
| let writer_version = if proto.writer_version.is_empty() { | ||
| default_options.writer_version |
There was a problem hiding this comment.
Not sure about this default behaviour here, given for other options we don't really do this? Happy to hear more thoughts though
60b6435 to
6ee499e
Compare
Which issue does this PR close?
Rationale for this change
Parquet file format proto decoding is exposed through a
try_decode_file_formatAPI, but invalidwriter_versionvalues could still panic because the Parquet options conversion used an infallibleexpectwhile parsing the writer version.This makes malformed or manually produced proto bytes abort the decode path instead of returning a DataFusion error.
What changes are included in this PR?
TryFromProtopath.writer_versionas the default writer version for compatibility with proto default values.Are these changes tested?
cargo fmt --allcargo test -p datafusion-proto --lib try_decode_file_format --features parquetcargo test -p datafusion-proto --lib --features parquetcargo clippy -p datafusion-proto --lib --features parquet -- -D warningsAre there any user-facing changes?
Invalid Parquet writer versions in serialized file format protos now return an error instead of panicking during decode.