-
Notifications
You must be signed in to change notification settings - Fork 370
Labels
bugSomething isn't workingSomething isn't workingp1Important to tackle soon, but preemptable by p0Important to tackle soon, but preemptable by p0
Description
Describe the bug
Attached an example Parquet file which was written through databricks.
It seems other Rust-based engines also fail to read it:
Polars, pqrs
However, I was able to successfully read it via duckdb and pyarrow.
File:
part-00000-68793c08-5b0e-480e-bd6b-0a7568d49906.c000.snappy.parquet.zip
To Reproduce
df = daft.read_parquet("file.parquet")
This fails with File out of specification: Invalid thrift: bad data
import pyarrow.parquet as papq
f = papq.ParquetFile("myfile.parquet")
f.metadata
This prints
<pyarrow._parquet.FileMetaData object at 0x145c60720>
created_by: parquet-mr compatible Photon version 0.2 (build 16.4)
num_columns: 10
num_rows: 76
num_row_groups: 1
format_version: 1.0
serialized_size: 4432
Expected behavior
No response
Component(s)
Parquet
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingp1Important to tackle soon, but preemptable by p0Important to tackle soon, but preemptable by p0