-
Notifications
You must be signed in to change notification settings - Fork 227
Closed
Description
Describe the bug
The following test currently fails:
test("nested data - array of struct") {
val data = (1 to 10).map(i => Tuple1(Seq(i -> s"val_$i")))
withParquetTable(data, "t") {
withSQLConf(CometConf.COMET_NATIVE_SCAN_IMPL.key -> CometConf.SCAN_NATIVE_ICEBERG_COMPAT) {
checkSparkAnswerAndOperator(sql("SELECT _1[0]._2 FROM t"))
}
}
}
The parquet file contains an array of struct, where the struct has fields _1
and _2
.
The required schema in the scan filters out the _1
from the struct:
required_schema = Field { name: "_1", data_type: List(Field { name: "item", data_type: Struct([Field { name: "_2", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }
The projection contains GetStructField
with ordinal 0
to represent field _2
in the struct.
Comet does not apply the required schema and therefore selects both _1
and _2
from the struct, and the GetStructField
with ordinal=0
returns _1
instead of _2
.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
hsiang-c and comphead
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working