Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failed to read data from iceberg #10831

Closed
wbo4958 opened this issue May 17, 2024 · 1 comment · Fixed by #10836
Closed

[BUG] Failed to read data from iceberg #10831

wbo4958 opened this issue May 17, 2024 · 1 comment · Fixed by #10836
Assignees
Labels
bug Something isn't working

Comments

@wbo4958
Copy link
Collaborator

wbo4958 commented May 17, 2024

Bug Desc

I tried to use spark-rapids to read data from iceberg, but failed with below exception no matter whether spark.rapids.sql.format.iceberg.enabled is true or flase or not.

24/05/17 10:50:40 ERROR GpuOverrideUtil: Encountered an exception applying GPU overrides java.lang.ClassCastException: org.apache.iceberg.BaseFileScanTask cannot be cast to org.apache.iceberg.CombinedScanTask
java.lang.ClassCastException: org.apache.iceberg.BaseFileScanTask cannot be cast to org.apache.iceberg.CombinedScanTask
	at com.nvidia.spark.rapids.iceberg.spark.source.GpuSparkBatchQueryScan.isMetadataScan(GpuSparkBatchQueryScan.java:92)
	at com.nvidia.spark.rapids.iceberg.IcebergProviderImpl$$anon$1.tagSelfForGpu(IcebergProviderImpl.scala:51)
	at com.nvidia.spark.rapids.RapidsMeta.tagForGpu(RapidsMeta.scala:318)
	at com.nvidia.spark.rapids.RapidsMeta.$anonfun$tagForGpu$1(RapidsMeta.scala:292)
	at com.nvidia.spark.rapids.RapidsMeta.$anonfun$tagForGpu$1$adapted(RapidsMeta.scala:292)

How to repro

  • spark local mode: spark 3.5.1
  • spark-rapids: 24.04
  • iceberg: iceberg-spark-runtime-3.5_2.12: 1.5.2

prepare data for iceberg

spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.2\
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
    --conf spark.sql.catalog.spark_catalog.type=hive \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop \
    --conf spark.sql.catalog.local.warehouse=$PWD/warehouse \

and execute the below command

scala> spark.range(100).writeTo("local.db.demo").using("iceberg").create()
                                                                                
scala> spark.table("local.db.demo").show()
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
| 10|
| 11|
| 12|
| 13|
| 14|
| 15|
| 16|
| 17|
| 18|
| 19|
+---+
only showing top 20 rows

run with spark-rapids

$SPARK_HOME/bin/spark-shell \
     --master "local[1]" \
     --driver-memory 2G \
     --conf spark.plugins=com.nvidia.spark.SQLPlugin \
     --jars /home/bobwang/jars/rapids-4-spark_2.12-24.04.0.jar \
     --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer \
     --conf spark.rapids.sql.enabled=true\
     --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.2\
     --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
     --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
     --conf spark.sql.catalog.spark_catalog.type=hive \
     --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.local.type=hadoop \
     --conf spark.sql.catalog.local.warehouse=$PWD/warehouse\
     --conf spark.rapids.sql.format.iceberg.enabled=false \

and execute

cala> spark.table("local.db.demo").show()
24/05/17 10:55:46 ERROR GpuOverrideUtil: Encountered an exception applying GPU overrides java.lang.ClassCastException: org.apache.iceberg.BaseFileScanTask cannot be cast to org.apache.iceberg.CombinedScanTask
java.lang.ClassCastException: org.apache.iceberg.BaseFileScanTask cannot be cast to org.apache.iceberg.CombinedScanTask
	at com.nvidia.spark.rapids.iceberg.spark.source.GpuSparkBatchQueryScan.isMetadataScan(GpuSparkBatchQueryScan.java:92)
	at com.nvidia.spark.rapids.iceberg.IcebergProviderImpl$$anon$1.tagSelfForGpu(IcebergProviderImpl.scala:51)
	at com.nvidia.spark.rapids.RapidsMeta.tagForGpu(RapidsMeta.scala:318)
	at com.nvidia.spark.rapids.RapidsMeta.$anonfun$tagForGpu$1(RapidsMeta.scala:292)
	at com.nvidia.spark.rapids.RapidsMeta.$anonfun$tagForGpu$1$adapted(RapidsMeta.scala:292)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at com.nvidia.spark.rapids.RapidsMeta.tagForGpu(RapidsMeta.scala:292)
@wbo4958 wbo4958 added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 17, 2024
@firestarman
Copy link
Collaborator

So far, we only support Iceberg of v0.13.x, can you try this version ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants