Skip to content

[Experimental scans] schema adapter does not apply required schema for structs within lists #1681

@andygrove

Description

@andygrove

Describe the bug

The following test currently fails:

  test("nested data - array of struct") {
    val data = (1 to 10).map(i => Tuple1(Seq(i -> s"val_$i")))
    withParquetTable(data, "t") {
      withSQLConf(CometConf.COMET_NATIVE_SCAN_IMPL.key -> CometConf.SCAN_NATIVE_ICEBERG_COMPAT) {
        checkSparkAnswerAndOperator(sql("SELECT _1[0]._2 FROM t"))
      }
    }
  }

The parquet file contains an array of struct, where the struct has fields _1 and _2.

The required schema in the scan filters out the _1 from the struct:

required_schema = Field { name: "_1", data_type: List(Field { name: "item", data_type: Struct([Field { name: "_2", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }

The projection contains GetStructField with ordinal 0 to represent field _2 in the struct.

Comet does not apply the required schema and therefore selects both _1 and _2 from the struct, and the GetStructField with ordinal=0 returns _1 instead of _2.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions