You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into an issue with an Iceberg table after adding some Parquet files using the Iceberg Java API. While the file contains all fields defined in the table schema, their order differs from the original schema. When attempting to read the table using Spark SQL or Trino, I receive a ClassCastException.
Here's the error message:
Copyorg.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 149.0 failed 4 times, most recent failure: Lost task 0.3 in stage 149.0 (TID 271) (10.104.94.54 executor 1): java.lang.ClassCastException: class java.lang.Long cannot be cast to class org.apache.spark.unsafe.types.UTF8String (java.lang.Long is in module java.base of loader 'bootstrap'; org.apache.spark.unsafe.types.UTF8String is in unnamed module of loader 'app')
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String$(rows.scala:46)
This error suggests that Spark SQL is accessing the Iceberg table using column positions rather than column names. The mismatch in field order is causing Spark to attempt casting a Long to a UTF8String, resulting in the ClassCastException.
Spark's read operation relies on the original schema's column order, which can lead to data misinterpretation when this order is not preserved in new files.
Has anyone else encountered similar issues or can provide insights on best practices for managing schema evolution in Iceberg tables, particularly when adding new files with reordered fields
Best regards,
Andrei
The text was updated successfully, but these errors were encountered:
Query engine
Spark, trino
Question
Hello everyone,
I ran into an issue with an Iceberg table after adding some Parquet files using the Iceberg Java API. While the file contains all fields defined in the table schema, their order differs from the original schema. When attempting to read the table using Spark SQL or Trino, I receive a ClassCastException.
Here's the error message:
This error suggests that Spark SQL is accessing the Iceberg table using column positions rather than column names. The mismatch in field order is causing Spark to attempt casting a Long to a UTF8String, resulting in the ClassCastException.
Spark's read operation relies on the original schema's column order, which can lead to data misinterpretation when this order is not preserved in new files.
Has anyone else encountered similar issues or can provide insights on best practices for managing schema evolution in Iceberg tables, particularly when adding new files with reordered fields
Best regards,
Andrei
The text was updated successfully, but these errors were encountered: