-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Apache Iceberg version
1.10.0 (latest release)
Query engine
Spark
Please describe the bug 🐞
Spark 4.0 introduced "Variant type" (https://www.databricks.com/blog/introducing-apache-spark-40). Iceberg 1.10 also adds "Variant type" support.
When migrating an existing Spark table containing variant type using the CALL catalog_name.system.snapshot procedure, we get an UnsupportedOperationException. I have only checked this for Parquet.
The root cause appears to be the format we get from CatalogTable sourceTable = spark.sessionState().catalog().getTableMetadata(sourceTableIdent); in SparkTableUtil.java --> importUnpartitionedSparkTable. With Variant type, we get org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe and without variant type we have org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe and TableMigrationUtil.listPartition fails to recongnize this format.
If this is fixed, the next failure could be due to lack of Variant type in Conversions.java --> fromPartitionString
Reproduction steps:
Can be verified with a unit test in Spark 4.0 TestSnapshotTableProcedure.java
@TestTemplate
public void testSnapshot() throws IOException {
String location = Files.createTempDirectory(temp, "junit").toFile().toString();
sql(
"CREATE TABLE %s (id bigint NOT NULL, data variant) USING parquet LOCATION '%s'",
SOURCE_NAME, location);
sql(
"INSERT INTO TABLE %s VALUES (1, parse_json('{\"key\": 123, \"data\": [4, 5, \"str\"]}'))",
SOURCE_NAME);
sql("select * from %s ", SOURCE_NAME); // Works
sql("select id, variant_get(data, '$.key', 'int') from %s", SOURCE_NAME); // Works
// Fails with UnsupportedOperationException exception
Object result =
scalarSql(
"CALL %s.system.snapshot('%s', '%s', properties => map('format-version','3'))",
catalogName, SOURCE_NAME, tableName);
}Not sure if this should be a feature request or bug.
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time