Spark ClassCastException when inserting into bucketed binary column

### Apache Iceberg version

1.1.0, 1.2.1

### Query engine

Spark

### Please describe the bug 🐞

```
spark-sql-3.3> create table jzhuge.b1 (key binary, ts timestamp) partitioned by (bucket(4, key));
spark-sql-3.3> create table jzhuge.tb1 (key binary, ts timestamp);

spark-sql-3.3> insert into jzhuge.tb1 values (X'a1a2', timestamp '2023-05-01'), (X'b1b2', timestamp '2023-05-02');

spark-sql-3.3> table jzhuge.tb1;
��      2023-05-01 00:00:00
��      2023-05-02 00:00:00

spark-sql-3.3> insert into jzhuge.b1 table jzhuge.tb1;
23/05/22 19:04:29 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 5)
java.lang.ClassCastException: [B cannot be cast to java.nio.ByteBuffer
        at org.apache.iceberg.transforms.Bucket$BucketByteBuffer.apply(Bucket.java:230)
        at org.apache.spark.sql.catalyst.expressions.IcebergBucketTransform.$anonfun$bucketFunc$3(TransformExpressions.scala:116)
        at org.apache.spark.sql.catalyst.expressions.IcebergBucketTransform.$anonfun$bucketFunc$3$adapted(TransformExpressions.scala:116)
        at org.apache.spark.sql.catalyst.expressions.IcebergBucketTransform.nullSafeEval(TransformExpressions.scala:120)
        at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:512)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
        at org.apache.spark.sql.execution.SortExec$$anon$1.computePrefix(SortExec.scala:90)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:137)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:226)
        at org.apache.spark.sql.execution.SortExec.$anonfun$doExecute$1(SortExec.scala:119)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:903)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:903)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:378)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:342)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:555)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1531)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:558)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark ClassCastException when inserting into bucketed binary column #7682

Apache Iceberg version

Query engine

Please describe the bug 🐞

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spark ClassCastException when inserting into bucketed binary column #7682

Description

Apache Iceberg version

Query engine

Please describe the bug 🐞

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions