-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The query SELECT * EXCEPT (field) FROM 'test.parquet' WHERE field = 'field'; panicked when being executed against an empty parquet file, below is the error message given by datafusion-cli:
The parquet file is empty in data, not schema, it has a field
field.
push_down_projection
caused by
Internal error: Optimizer rule 'push_down_projection' failed, due to generate a different schema,
original schema: DFSchema { fields: [], metadata: {} },
new schema: DFSchema { fields: [DFField { qualifier: Some(Bare { table: "test.parquet" }), field: Field { name: "field", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} }.
To Reproduce
-
Use the following rust code to create an empty parquet file, data is empty but the fields in schema is NOT empty.
fn main() { let file = OpenOptions::new() .write(true) .create(true) .open("test.parquet") .unwrap(); let writer = ArrowWriter::try_new( file, Arc::new(Schema::new(vec![Field::new("field", DataType::Utf8, true)])), None, ) .unwrap(); writer.close().unwrap(); } -
Run the following query in
datafuion-cliOr a Rust program using the
datafusionlibrary, you will get the same result$ ls -l test.parquet .rw-r--r--@ 263 steve 9 Aug 13:40 test.parquet $ datafusion-cli ❯ SELECT * EXCEPT (field) FROM 'test.parquet' WHERE field = 'field'; push_down_projection caused by Internal error: Optimizer rule 'push_down_projection' failed, due to generate a different schema, original schema: DFSchema { fields: [], metadata: {} }, new schema: DFSchema { fields: [DFField { qualifier: Some(Bare { table: "test.parquet" }), field: Field { name: "field", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} }. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Expected behavior
The query can be successfully executed.
Additional context
- parquet library version: 43
- datafusion-cli version: 28.0.0
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working