Apache Iceberg version
1.11.0 (latest release)
Query engine
Spark
Please describe the bug 🐞
Bug report
Apache Iceberg version
1.11.0 (org.apache.iceberg:iceberg-spark-runtime-4.0_2.13:1.11.0)
Spark version
Spark 4.0.2-amzn-0 (AWS EMR), PySpark, writeTo(...).append()
Catalog
Iceberg REST catalog → Amazon S3 Tables (SparkCatalog, S3FileIO)
What happened
Append to an Iceberg v3 table with VARIANT columns fails during Parquet file close when variant shredding is enabled and default column metrics are collected.
- Read from S3 Parquet: OK
CREATE TABLE: OK
- Append: fails in stage 3 at
ParquetWriter.metrics() / DataWriter.close()
Dataset: ~669M rows (~216 GB), with parse_json(PROPERTIES) and parse_json(V).
Error message
java.util.NoSuchElementException
at org.apache.iceberg.relocated.com.google.common.collect.Iterables.getOnlyElement(Iterables.java:262)
at org.apache.iceberg.parquet.ParquetMetrics$MetricsVisitor$MetricsVariantVisitor.value(ParquetMetrics.java:480)
at org.apache.iceberg.parquet.ParquetMetrics$MetricsVisitor$MetricsVariantVisitor.value(ParquetMetrics.java:410)
at org.apache.iceberg.parquet.ParquetVariantVisitor.visitValue(ParquetVariantVisitor.java:215)
at org.apache.iceberg.parquet.ParquetVariantVisitor.visitObjectFields(ParquetVariantVisitor.java:265)
at org.apache.iceberg.parquet.ParquetVariantVisitor.visitValue(ParquetVariantVisitor.java:226)
at org.apache.iceberg.parquet.ParquetVariantVisitor.visit(ParquetVariantVisitor.java:183)
at org.apache.iceberg.parquet.ParquetMetrics$MetricsVisitor.variant(ParquetMetrics.java:362)
at org.apache.iceberg.parquet.TypeWithSchemaVisitor.visitVariant(TypeWithSchemaVisitor.java:221)
at org.apache.iceberg.parquet.ParquetMetrics.metrics(ParquetMetrics.java:101)
at org.apache.iceberg.parquet.ParquetWriter.metrics(ParquetWriter.java:173)
at org.apache.iceberg.io.DataWriter.close(DataWriter.java:90)
at org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:126)
at org.apache.iceberg.spark.source.SparkWrite$PartitionedDataWriter.commit(SparkWrite.java:838)
Spark config:
spark.sql.iceberg.shred-variants=true
spark.sql.catalog.<catalog>.table-default.write.parquet.shred-variants=true
Expected behavior
Append completes; Iceberg can compute metrics for shredded VARIANT columns without error.
Actual behavior
NoSuchElementException in MetricsVariantVisitor.value() when closing Parquet writers. Looks like metrics code expects exactly one sub-value when visiting variant object fields.
Workaround
Disable metrics on VARIANT columns (shredding still works):
ALTER TABLE catalog.test.events SET TBLPROPERTIES (
'write.metadata.metrics.column.properties' = 'none',
'write.metadata.metrics.column.v' = 'none'
);
After this, append of ~669M rows succeeds with shred enabled.
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [x] I cannot contribute a fix for this bug at this time
Apache Iceberg version
1.11.0 (latest release)
Query engine
Spark
Please describe the bug 🐞
Bug report
Apache Iceberg version
1.11.0 (
org.apache.iceberg:iceberg-spark-runtime-4.0_2.13:1.11.0)Spark version
Spark 4.0.2-amzn-0 (AWS EMR), PySpark,
writeTo(...).append()Catalog
Iceberg REST catalog → Amazon S3 Tables (
SparkCatalog,S3FileIO)What happened
Append to an Iceberg v3 table with VARIANT columns fails during Parquet file close when variant shredding is enabled and default column metrics are collected.
CREATE TABLE: OKParquetWriter.metrics()/DataWriter.close()Dataset: ~669M rows (~216 GB), with
parse_json(PROPERTIES)andparse_json(V).Error message