Skip to content

Optimizer push_down_projection failed due to different schemas #7241

@SteveLauC

Description

@SteveLauC

Describe the bug

The query SELECT * EXCEPT (field) FROM 'test.parquet' WHERE field = 'field'; panicked when being executed against an empty parquet file, below is the error message given by datafusion-cli:

The parquet file is empty in data, not schema, it has a field field.

push_down_projection
caused by
Internal error: Optimizer rule 'push_down_projection' failed, due to generate a different schema, 

original schema: DFSchema { fields: [], metadata: {} },

new schema: DFSchema { fields: [DFField { qualifier: Some(Bare { table: "test.parquet" }), field: Field { name: "field", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} }. 

To Reproduce

  1. Use the following rust code to create an empty parquet file, data is empty but the fields in schema is NOT empty.

    fn main() {
       let file = OpenOptions::new()
           .write(true)
           .create(true)
           .open("test.parquet")
           .unwrap();
       let writer = ArrowWriter::try_new(
           file,
           Arc::new(Schema::new(vec![Field::new("field", DataType::Utf8, true)])),
           None,
       )
       .unwrap();
    
       writer.close().unwrap();
    }
    
  2. Run the following query in datafuion-cli

    Or a Rust program using the datafusion library, you will get the same result

    $ ls -l test.parquet
    .rw-r--r--@ 263 steve  9 Aug 13:40 test.parquet
    
    $ datafusion-cli
    ❯ SELECT * EXCEPT (field) FROM 'test.parquet' WHERE field = 'field';
    push_down_projection
    caused by
    Internal error: Optimizer rule 'push_down_projection' failed, due to generate a different schema, original schema: DFSchema { fields: [], metadata: {} }, new schema: DFSchema { fields: [DFField { qualifier: Some(Bare { table: "test.parquet" }), field: Field { name: "field", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} }. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Expected behavior

The query can be successfully executed.

Additional context

  • parquet library version: 43
  • datafusion-cli version: 28.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions