Spark 3.2.0 java.lang.IncompatibleClassChangeError when using IcebergSparkSessionExtensions #3585

cccs-br · 2021-11-19T17:23:52Z

Iceberg version: 0.12.0
Spark version: 3.2.0

from pyspark.sql import SparkSession

spark = (SparkSession.builder
            .appName("spark.tests.iceberg")
            .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
            .getOrCreate())


spark.sql("select 'hello'").show()

...yields

: java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DynamicFileFilterWithCardinalityCheck has interface org.apache.spark.sql.catalyst.plans.logical.BinaryNode as super class
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.$anonfun$apply$8(IcebergSparkSessionExtensions.scala:50)
	at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildOptimizerRules$1(SparkSessionExtensions.scala:201)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)

The text was updated successfully, but these errors were encountered:

cccs-br · 2021-11-19T17:25:28Z

If a fix for this is already on the roadmap, can you let us know which version of iceberg will address this?

Thanks.

KnightChess · 2021-11-20T07:39:50Z

There is no version support now, but had supported in master branch
#3335

felixYyu · 2021-11-28T09:16:10Z

add .config("spark.sql.sources.partitionOverwriteMode", "dynamic") to try

racevedoo · 2021-11-29T18:42:52Z

add .config("spark.sql.sources.partitionOverwriteMode", "dynamic") to try

just tried it, got the same error

Narcasserun · 2021-12-27T09:10:49Z

+1 ，I have the same problem

choupijiang · 2021-12-31T16:34:04Z

scalaVersion := "2.12.10"
sparkVersion := "3.2.0"
"org.apache.iceberg" % "iceberg-spark3-runtime_2.12" % "0.12.1"
+1, the same problem

lidroider · 2022-01-11T07:19:37Z

+1
Same problem too
Any idea?
Thanks

KnightChess · 2022-01-11T07:30:25Z

There is no release version support spark3.2
You can use the master branch build to support spark3.2. #3335

lidroider · 2022-01-11T07:31:48Z

I try using spark version 3.1.2 and it's worked, thanks a lot

KnightChess · 2022-01-11T07:32:54Z

https://github.com/apache/iceberg#compatibility

nreich · 2022-01-24T20:26:35Z

Using spark 3.2.0 and iceberg-spark-runtime-3.2_2.12 (0.13.0-SNAPSHOT, built locally from master today), I still see this issue (did try adding .config("spark.sql.sources.partitionOverwriteMode", "dynamic") with no luck). Is this issue still present in current master or is there more configuration required to resolve the issue or possibly an issue with my local build?

aokolnychyi · 2022-01-25T21:23:27Z

@nreich, could you, please, provide the full stack trace on 3.2.0 and master? Like others said, Iceberg 0.12 extensions are not compatible with Spark 3.2 but the master and upcoming 0.13 should be.

Are you using PySpark?

nreich · 2022-01-25T22:09:43Z

@aokolnychyi I tested again today with spark 3.2.0 and iceberg 0.13.0 release candidate (downloaded jars from repository.apache.org/content/repositories/orgapacheiceberg-1079/org/apache/iceberg/iceberg-spark3-runtime/0.13.0/). Tried to run through the "getting started" guide for spark-sql.
My spark-sql startup command (also tried with spark.sql.sources.partitionOverwriteMode=dynamic):

bin/spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.13.0  \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog  \
    --conf spark.sql.catalog.spark_catalog.type=hive  \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop  \
    --conf spark.sql.catalog.local.warehouse=/tmp/iceberg/warehouse

Running any query (but for example, CREATE TABLE local.db.table (id bigint, data string) USING iceberg). Resulted in this stack trace:

ERROR SparkSQLDriver: Failed in [CREATE TABLE local.db.table (id bigint, data string) USING iceberg
]
java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DynamicFileFilterWithCardinalityCheck has interface org.apache.spark.sql.catalyst.plans.logical.BinaryNode as super class
	at java.base/java.lang.ClassLoader.defineClass1(Native Method)
	at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
	at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
	at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
	at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.$anonfun$apply$8(IcebergSparkSessionExtensions.scala:50)
	at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildOptimizerRules$1(SparkSessionExtensions.scala:201)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at org.apache.spark.sql.SparkSessionExtensions.buildOptimizerRules(SparkSessionExtensions.scala:201)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder.customOperatorOptimizationRules(BaseSessionStateBuilder.scala:259)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$2.extendedOperatorOptimizationRules(BaseSessionStateBuilder.scala:248)
	at org.apache.spark.sql.catalyst.optimizer.Optimizer.defaultBatches(Optimizer.scala:130)
	at org.apache.spark.sql.execution.SparkOptimizer.defaultBatches(SparkOptimizer.scala:42)
	at org.apache.spark.sql.catalyst.optimizer.Optimizer.batches(Optimizer.scala:382)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:138)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:134)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:130)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:148)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:166)
	at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:214)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:259)
	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:228)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DynamicFileFilterWithCardinalityCheck has interface org.apache.spark.sql.catalyst.plans.logical.BinaryNode as super class
	at java.base/java.lang.ClassLoader.defineClass1(Native Method)
	at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
	at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
	at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
	at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.$anonfun$apply$8(IcebergSparkSessionExtensions.scala:50)
	at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildOptimizerRules$1(SparkSessionExtensions.scala:201)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at org.apache.spark.sql.SparkSessionExtensions.buildOptimizerRules(SparkSessionExtensions.scala:201)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder.customOperatorOptimizationRules(BaseSessionStateBuilder.scala:259)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$2.extendedOperatorOptimizationRules(BaseSessionStateBuilder.scala:248)
	at org.apache.spark.sql.catalyst.optimizer.Optimizer.defaultBatches(Optimizer.scala:130)
	at org.apache.spark.sql.execution.SparkOptimizer.defaultBatches(SparkOptimizer.scala:42)
	at org.apache.spark.sql.catalyst.optimizer.Optimizer.batches(Optimizer.scala:382)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:138)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:134)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:130)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:148)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:166)
	at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:214)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:259)
	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:228)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

rdblue · 2022-01-25T23:23:01Z

@nreich, can you check the Jar you used? For Spark 3.2, you should be using the iceberg-spark-runtime-3.2_2.12 Jar but the link you posted was to the iceberg-spark3-runtime Jar. That's one works with 3.0 and 3.1, but we're moving to separate Jars for each Iceberg version.

kbendick · 2022-01-25T23:58:13Z

As of Iceberg 0.13.0, the iceberg-spark-runtime jars have changed to reflect the Spark (and Scala) versions:

You can test the 0.13.0-rc1, fetching it from the staging maven repository, with the following command line flags for Spark 3.2: --packages 'org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0' --repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/

For other Spark versions than 3.2, use the artifactIds below (in place of iceberg-spark-runtime-3.2_2.12 above).

Iceberg 0.13.0 spark-runtime jar names
Spark 3.0: iceberg-spark3-runtime:0.13.0
Spark 3.1: iceberg-spark-runtime-3.1_2.12
Spark 3.2: iceberg-spark-runtime-3.2_2.12

The complete package name for any depends on your spark version now. iceberg-spark3-runtime should only be used for Spark 3.0.

nreich · 2022-01-26T00:01:06Z

@rdblue @kbendick I had tried that jar first, actually: had the same exact result, so I looked to the docs in master for getting started and found they still referred to the old jar (I must have missed the correct location to look for the changed getting started instructions?).

I cleared dependency caches (just in case), and ran again with:

bin/spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0 \
    --repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/ \
    ...

but still got the exact same stacktrace as before. Just as a sanity check, I switched over to spark 3.1.2 and the org.apache.iceberg:iceberg-spark-runtime-3.1_2.12:0.13.0 jar and that was able to create the table successfully.

kbendick · 2022-01-26T00:29:16Z

Thank you for testing this @nreich 🙏 .

I have tested with Spark 3.1.2 and Spark 3.2.0, both built for Hadoop 3.2., and I don't seem to get this problem. At least on the CREATE TABLE statement.

cd spark-3.2.0-bin-hadoop3.2 && rm-rf /tmp/iceberg && mkdir -p /tmp/iceberg/warehouse && ./bin/spark-shell --packages 'org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0' --repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog  --conf spark.sql.catalog.local.type=hadoop --conf spark.sql.catalog.local.warehouse=/tmp/iceberg/warehouse

scala> spark.sql("use local")

scala> spark.sql("CREATE TABLE local.db.table (id bigint, data string) USING iceberg")
res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("show tables in local.db").show
+---------+---------+-----------+
|namespace|tableName|isTemporary|
+---------+---------+-----------+
|       db|    table|      false|
+---------+---------+-----------+

I also tried with Spark 3.1.2, as well as with a partitioned table and I did not encounter any exceptions.

kbendick · 2022-01-26T00:40:15Z

From Spark 3.2, I also used a partitioned table to test dynamic partition overwrite in Spark 3.2

Bash start up script from spark-3.2.0-bin-hadoop3.2:

mkdir -p /tmp/iceberg/warehouse && ./bin/spark-shell \
--packages 'org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0' \
--repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/  \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog  \
--conf spark.sql.catalog.local.type=hadoop \
--conf spark.sql.catalog.local.warehouse=/tmp/iceberg/warehouse \
--conf spark.sql.sources.partitionOverwriteMode=dynamic

Spark Shell:

scala> spark.sql("CREATE TABLE local.db.table_partitioned (id bigint, data string) USING iceberg partitioned by (id)")
res6: org.apache.spark.sql.DataFrame = []

scala> spark.sql("show tables in local.db").show
+---------+-----------------+-----------+
|namespace|        tableName|isTemporary|
+---------+-----------------+-----------+
|       db|table_partitioned|      false|
|       db|            table|      false|
+---------+-----------------+-----------+

scala> spark.sql("INSERT INTO local.db.table_partitioned(id, data) VALUES (1, 'Hank'), (2, 'Kyle'), (3, 'Jethro'), (4, 'Russell'), (5, 'Maggie')")

scala> spark.sql("select * from local.db.table_partitioned").show
+---+-------+
| id|   data|
+---+-------+
|  1|   Hank|
|  2|   Kyle|
|  3| Jethro|
|  4|Russell|
|  5| Maggie|
+---+-------+

scala> spark.createDataFrame(Seq((3, "Burt"))).toDF("id", "data").write.mode("overwrite").insertInto("local.db.table_partitioned")

scala> spark.sql("select * from local.db.table_partitioned order by id").show
+---+-------+
| id|   data|
+---+-------+
|  1|   Hank|
|  2|   Kyle|
|  3|   Burt|
|  4|Russell|
|  5| Maggie|
+---+-------+

nreich · 2022-01-26T00:41:54Z

Tried the spark-shell just in case (following exactly what you did) and get the same error. Looks like I have a different hadoop version:
from the spark shell I ran println("Hadoop version: " + org.apache.hadoop.util.VersionInfo.getVersion())
with response of Hadoop version: 3.3.1

kbendick · 2022-01-26T00:43:03Z

That's a bit of a red herring. I have the same output. That's what's used for hadoop3.2 surprisingly enough and is expected.

scala>  println("Hadoop version: " + org.apache.hadoop.util.VersionInfo.getVersion())
Hadoop version: 3.3.1

kbendick · 2022-01-26T00:44:21Z

Just as a sanity check, does your spark-shell ASCII art print version 3.2.0?

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
      /_/

nreich · 2022-01-26T00:48:37Z

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
      /_/

Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.12)

Other possibly relevant info:
running on MacOS Catalina (10.15.7)
spark 3.2.0 was installed through brew

rdblue · 2022-01-26T00:50:46Z

@nreich, maybe additional Jars are sneaking into your classpath. Can you dump the classpath in the Spark UI and share it? Or check for more than one Iceberg Jar?

nreich · 2022-01-26T00:57:40Z

Downloaded a fresh copy of spark 3.2.0 (https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz): everything now works as expected

nreich · 2022-01-26T01:06:43Z

That was the issue @rdblue: I had an errant copy of the old iceberg-spark3-runtime-0.12.1.jar on the classpath. After removing it, my original installation of spark 3.2.0 now functions as expected with iceberg 0.13.0.

rdblue · 2022-01-26T01:07:52Z

Glad to hear it's working! Thanks for working with us to debug!

nreich · 2022-01-26T01:15:35Z

Thanks for all your help!

kbendick · 2022-01-26T01:16:45Z

Thank you so much @nreich for working with us to debug this. This is a very important part of the release process and it really helps to have community members testing things out.

I'm going to close this issue in a bit if there are no more comments. Anybody please feel free to open a new issue referencing this one if need be!

TLDR - Be sure to have the correct Iceberg artifact based on the Spark version, as well as ensuring there aren't extra iceberg jars in your Spark /jars folder (I've definitely done that more than once).

kbendick · 2022-01-26T02:21:11Z

If you're encountering this issue, please be sure that you're using the correct artifact for your Spark version, and that you don't have any additional iceberg jars on the classpath (likely in spark's /jars directory).

If you still have an issue, please open another issue (feel free to reference this one). Thank you!

cccs-br · 2022-02-04T15:14:03Z

FYI, since the official release of 0.13, the artifact is available with these coordinates...

Spark: 3.2.0
Iceberg: 0.13

sbt:

// https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark-runtime
libraryDependencies += "org.apache.iceberg" %% "iceberg-spark-runtime-3.2" % "0.13.0"

maven

<!-- https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark-runtime-3.2 -->
<dependency>
    <groupId>org.apache.iceberg</groupId>
    <artifactId>iceberg-spark-runtime-3.2_2.12</artifactId>
    <version>0.13.0</version>
</dependency>

Ashish1997 · 2022-03-25T06:14:05Z

@rdblue @kbendick I had tried that jar first, actually: had the same exact result, so I looked to the docs in master for getting started and found they still referred to the old jar (I must have missed the correct location to look for the changed getting started instructions?).

I cleared dependency caches (just in case), and ran again with:
bin/spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0 \
    --repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/ \
    ...
but still got the exact same stacktrace as before. Just as a sanity check, I switched over to spark 3.1.2 and the org.apache.iceberg:iceberg-spark-runtime-3.1_2.12:0.13.0 jar and that was able to create the table successfully.

Thanks, Spark 3.1.3-bin-hadoop3.2 with iceberg-spark-runtime-3.1_2.12:0.13.0 jar also worked for me.

kbendick closed this as completed Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.2.0 java.lang.IncompatibleClassChangeError when using IcebergSparkSessionExtensions #3585

Spark 3.2.0 java.lang.IncompatibleClassChangeError when using IcebergSparkSessionExtensions #3585

cccs-br commented Nov 19, 2021 •

edited

cccs-br commented Nov 19, 2021

KnightChess commented Nov 20, 2021 •

edited

felixYyu commented Nov 28, 2021

racevedoo commented Nov 29, 2021

Narcasserun commented Dec 27, 2021

choupijiang commented Dec 31, 2021

lidroider commented Jan 11, 2022

KnightChess commented Jan 11, 2022

lidroider commented Jan 11, 2022

KnightChess commented Jan 11, 2022

nreich commented Jan 24, 2022

aokolnychyi commented Jan 25, 2022 •

edited

nreich commented Jan 25, 2022

rdblue commented Jan 25, 2022

kbendick commented Jan 25, 2022

nreich commented Jan 26, 2022

kbendick commented Jan 26, 2022

kbendick commented Jan 26, 2022 •

edited

nreich commented Jan 26, 2022

kbendick commented Jan 26, 2022 •

edited

kbendick commented Jan 26, 2022

nreich commented Jan 26, 2022

rdblue commented Jan 26, 2022

nreich commented Jan 26, 2022

nreich commented Jan 26, 2022

rdblue commented Jan 26, 2022

nreich commented Jan 26, 2022

kbendick commented Jan 26, 2022 •

edited

kbendick commented Jan 26, 2022

cccs-br commented Feb 4, 2022 •

edited

Ashish1997 commented Mar 25, 2022

Spark 3.2.0 java.lang.IncompatibleClassChangeError when using IcebergSparkSessionExtensions #3585

Spark 3.2.0 java.lang.IncompatibleClassChangeError when using IcebergSparkSessionExtensions #3585

Comments

cccs-br commented Nov 19, 2021 • edited

cccs-br commented Nov 19, 2021

KnightChess commented Nov 20, 2021 • edited

felixYyu commented Nov 28, 2021

racevedoo commented Nov 29, 2021

Narcasserun commented Dec 27, 2021

choupijiang commented Dec 31, 2021

lidroider commented Jan 11, 2022

KnightChess commented Jan 11, 2022

lidroider commented Jan 11, 2022

KnightChess commented Jan 11, 2022

nreich commented Jan 24, 2022

aokolnychyi commented Jan 25, 2022 • edited

nreich commented Jan 25, 2022

rdblue commented Jan 25, 2022

kbendick commented Jan 25, 2022

nreich commented Jan 26, 2022

kbendick commented Jan 26, 2022

kbendick commented Jan 26, 2022 • edited

nreich commented Jan 26, 2022

kbendick commented Jan 26, 2022 • edited

kbendick commented Jan 26, 2022

nreich commented Jan 26, 2022

rdblue commented Jan 26, 2022

nreich commented Jan 26, 2022

nreich commented Jan 26, 2022

rdblue commented Jan 26, 2022

nreich commented Jan 26, 2022

kbendick commented Jan 26, 2022 • edited

kbendick commented Jan 26, 2022

cccs-br commented Feb 4, 2022 • edited

Ashish1997 commented Mar 25, 2022

cccs-br commented Nov 19, 2021 •

edited

KnightChess commented Nov 20, 2021 •

edited

aokolnychyi commented Jan 25, 2022 •

edited

kbendick commented Jan 26, 2022 •

edited

kbendick commented Jan 26, 2022 •

edited

kbendick commented Jan 26, 2022 •

edited

cccs-br commented Feb 4, 2022 •

edited