-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.2.0 java.lang.IncompatibleClassChangeError when using IcebergSparkSessionExtensions #3585
Comments
If a fix for this is already on the roadmap, can you let us know which version of iceberg will address this? Thanks. |
There is no version support now, but had supported in master branch |
add .config("spark.sql.sources.partitionOverwriteMode", "dynamic") to try |
just tried it, got the same error |
+1 ,I have the same problem |
scalaVersion := "2.12.10" |
+1 |
There is no release version support spark3.2 |
I try using spark version 3.1.2 and it's worked, thanks a lot |
Using spark 3.2.0 and iceberg-spark-runtime-3.2_2.12 (0.13.0-SNAPSHOT, built locally from master today), I still see this issue (did try adding .config("spark.sql.sources.partitionOverwriteMode", "dynamic") with no luck). Is this issue still present in current master or is there more configuration required to resolve the issue or possibly an issue with my local build? |
@nreich, could you, please, provide the full stack trace on 3.2.0 and master? Like others said, Iceberg 0.12 extensions are not compatible with Spark 3.2 but the master and upcoming 0.13 should be. Are you using PySpark? |
@aokolnychyi I tested again today with spark 3.2.0 and iceberg 0.13.0 release candidate (downloaded jars from repository.apache.org/content/repositories/orgapacheiceberg-1079/org/apache/iceberg/iceberg-spark3-runtime/0.13.0/). Tried to run through the "getting started" guide for spark-sql.
Running any query (but for example,
|
@nreich, can you check the Jar you used? For Spark 3.2, you should be using the |
As of Iceberg 0.13.0, the You can test the 0.13.0-rc1, fetching it from the staging maven repository, with the following command line flags for Spark 3.2: For other Spark versions than 3.2, use the artifactIds below (in place of Iceberg 0.13.0 spark-runtime jar names The complete package name for any depends on your spark version now. |
@rdblue @kbendick I had tried that jar first, actually: had the same exact result, so I looked to the docs in master for getting started and found they still referred to the old jar (I must have missed the correct location to look for the changed getting started instructions?). I cleared dependency caches (just in case), and ran again with:
but still got the exact same stacktrace as before. Just as a sanity check, I switched over to spark 3.1.2 and the org.apache.iceberg:iceberg-spark-runtime-3.1_2.12:0.13.0 jar and that was able to create the table successfully. |
Thank you for testing this @nreich 🙏 . I have tested with Spark 3.1.2 and Spark 3.2.0, both built for Hadoop 3.2., and I don't seem to get this problem. At least on the cd spark-3.2.0-bin-hadoop3.2 && rm-rf /tmp/iceberg && mkdir -p /tmp/iceberg/warehouse && ./bin/spark-shell --packages 'org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0' --repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.local.type=hadoop --conf spark.sql.catalog.local.warehouse=/tmp/iceberg/warehouse
scala> spark.sql("use local")
scala> spark.sql("CREATE TABLE local.db.table (id bigint, data string) USING iceberg")
res2: org.apache.spark.sql.DataFrame = []
scala> spark.sql("show tables in local.db").show
+---------+---------+-----------+
|namespace|tableName|isTemporary|
+---------+---------+-----------+
| db| table| false|
+---------+---------+-----------+ I also tried with Spark 3.1.2, as well as with a partitioned table and I did not encounter any exceptions. |
From Spark 3.2, I also used a partitioned table to test dynamic partition overwrite in Spark 3.2 Bash start up script from spark-3.2.0-bin-hadoop3.2: mkdir -p /tmp/iceberg/warehouse && ./bin/spark-shell \
--packages 'org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0' \
--repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/ \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.local.type=hadoop \
--conf spark.sql.catalog.local.warehouse=/tmp/iceberg/warehouse \
--conf spark.sql.sources.partitionOverwriteMode=dynamic Spark Shell: scala> spark.sql("CREATE TABLE local.db.table_partitioned (id bigint, data string) USING iceberg partitioned by (id)")
res6: org.apache.spark.sql.DataFrame = []
scala> spark.sql("show tables in local.db").show
+---------+-----------------+-----------+
|namespace| tableName|isTemporary|
+---------+-----------------+-----------+
| db|table_partitioned| false|
| db| table| false|
+---------+-----------------+-----------+
scala> spark.sql("INSERT INTO local.db.table_partitioned(id, data) VALUES (1, 'Hank'), (2, 'Kyle'), (3, 'Jethro'), (4, 'Russell'), (5, 'Maggie')")
scala> spark.sql("select * from local.db.table_partitioned").show
+---+-------+
| id| data|
+---+-------+
| 1| Hank|
| 2| Kyle|
| 3| Jethro|
| 4|Russell|
| 5| Maggie|
+---+-------+
scala> spark.createDataFrame(Seq((3, "Burt"))).toDF("id", "data").write.mode("overwrite").insertInto("local.db.table_partitioned")
scala> spark.sql("select * from local.db.table_partitioned order by id").show
+---+-------+
| id| data|
+---+-------+
| 1| Hank|
| 2| Kyle|
| 3| Burt|
| 4|Russell|
| 5| Maggie|
+---+-------+ |
Tried the spark-shell just in case (following exactly what you did) and get the same error. Looks like I have a different hadoop version: |
That's a bit of a red herring. I have the same output. That's what's used for hadoop3.2 surprisingly enough and is expected.
|
Just as a sanity check, does your
|
Other possibly relevant info: |
@nreich, maybe additional Jars are sneaking into your classpath. Can you dump the classpath in the Spark UI and share it? Or check for more than one Iceberg Jar? |
Downloaded a fresh copy of spark 3.2.0 (https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz): everything now works as expected |
That was the issue @rdblue: I had an errant copy of the old |
Glad to hear it's working! Thanks for working with us to debug! |
Thanks for all your help! |
Thank you so much @nreich for working with us to debug this. This is a very important part of the release process and it really helps to have community members testing things out. I'm going to close this issue in a bit if there are no more comments. Anybody please feel free to open a new issue referencing this one if need be! TLDR - Be sure to have the correct Iceberg artifact based on the Spark version, as well as ensuring there aren't extra |
If you're encountering this issue, please be sure that you're using the correct artifact for your Spark version, and that you don't have any additional If you still have an issue, please open another issue (feel free to reference this one). Thank you! |
FYI, since the official release of 0.13, the artifact is available with these coordinates... Spark: 3.2.0 sbt:
maven
|
Thanks, Spark 3.1.3-bin-hadoop3.2 with iceberg-spark-runtime-3.1_2.12:0.13.0 jar also worked for me. |
Iceberg version: 0.12.0
Spark version: 3.2.0
...yields
The text was updated successfully, but these errors were encountered: