Skip to content

Migrating existing Spark tables with Variant type fails in Iceberg #14123

@hariuserx

Description

@hariuserx

Apache Iceberg version

1.10.0 (latest release)

Query engine

Spark

Please describe the bug 🐞

Spark 4.0 introduced "Variant type" (https://www.databricks.com/blog/introducing-apache-spark-40). Iceberg 1.10 also adds "Variant type" support.

When migrating an existing Spark table containing variant type using the CALL catalog_name.system.snapshot procedure, we get an UnsupportedOperationException. I have only checked this for Parquet.

The root cause appears to be the format we get from CatalogTable sourceTable = spark.sessionState().catalog().getTableMetadata(sourceTableIdent); in SparkTableUtil.java --> importUnpartitionedSparkTable. With Variant type, we get org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe and without variant type we have org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe and TableMigrationUtil.listPartition fails to recongnize this format.

If this is fixed, the next failure could be due to lack of Variant type in Conversions.java --> fromPartitionString

Reproduction steps:

Can be verified with a unit test in Spark 4.0 TestSnapshotTableProcedure.java

@TestTemplate
  public void testSnapshot() throws IOException {
    String location = Files.createTempDirectory(temp, "junit").toFile().toString();
    sql(
        "CREATE TABLE %s (id bigint NOT NULL, data variant) USING parquet LOCATION '%s'",
        SOURCE_NAME, location);
    sql(
        "INSERT INTO TABLE %s VALUES (1, parse_json('{\"key\": 123, \"data\": [4, 5, \"str\"]}'))",
        SOURCE_NAME);
    sql("select * from %s ", SOURCE_NAME); // Works
    sql("select id, variant_get(data, '$.key', 'int') from %s", SOURCE_NAME); // Works

   // Fails with UnsupportedOperationException exception
    Object result =
        scalarSql(
            "CALL %s.system.snapshot('%s', '%s', properties => map('format-version','3'))",
            catalogName, SOURCE_NAME, tableName);
}

Not sure if this should be a feature request or bug.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions