I created a sample Parquet file with pyarrow that contained Uuid columns.
Here is a test script:
use std::fs::File;
use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;
fn main() {
let file = File::open("uuids.parquet").unwrap();
let reader = ParquetRecordBatchReaderBuilder::try_new(file).unwrap();
let schema = reader.schema();
let field = schema.fields.get(0).unwrap();
println!("{:?}", field);
match field.try_canonical_extension_type() {
Ok(extension_type) => println!("I am of extension type: {:?}", extension_type),
_ => println!("I am NOT an extension type")
}
}
and the output of cargo run:
$ cargo run
Field { name: "uuids", data_type: FixedSizeBinary(16), metadata: {"ARROW:extension:metadata": "", "ARROW:extension:name": "arrow.uuid"} }
I am NOT an extension type
This is because in the C++ definition of UuidType, the SerDe methods both return and expect the empty string. This causes issues in the Rust SerDe methods for Uuid, which return and expect Option::None.
I found this comment in the original commit for the extension types. The simplest fix is just to accept the empty string as valid metadata.
I created a sample Parquet file with pyarrow that contained Uuid columns.
Here is a test script:
and the output of
cargo run:This is because in the C++ definition of UuidType, the SerDe methods both return and expect the empty string. This causes issues in the Rust SerDe methods for Uuid, which return and expect Option::None.
I found this comment in the original commit for the extension types. The simplest fix is just to accept the empty string as valid metadata.