Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.2.0 java.lang.IncompatibleClassChangeError when using IcebergSparkSessionExtensions #3585

Closed
cccs-br opened this issue Nov 19, 2021 · 31 comments

Comments

@cccs-br
Copy link

cccs-br commented Nov 19, 2021

Iceberg version: 0.12.0
Spark version: 3.2.0

from pyspark.sql import SparkSession

spark = (SparkSession.builder
            .appName("spark.tests.iceberg")
            .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
            .getOrCreate())


spark.sql("select 'hello'").show()

...yields

: java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DynamicFileFilterWithCardinalityCheck has interface org.apache.spark.sql.catalyst.plans.logical.BinaryNode as super class
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.$anonfun$apply$8(IcebergSparkSessionExtensions.scala:50)
	at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildOptimizerRules$1(SparkSessionExtensions.scala:201)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
@cccs-br
Copy link
Author

cccs-br commented Nov 19, 2021

If a fix for this is already on the roadmap, can you let us know which version of iceberg will address this?

Thanks.

@KnightChess
Copy link
Contributor

KnightChess commented Nov 20, 2021

There is no version support now, but had supported in master branch
#3335

@felixYyu
Copy link
Contributor

add .config("spark.sql.sources.partitionOverwriteMode", "dynamic") to try

@racevedoo
Copy link
Contributor

add .config("spark.sql.sources.partitionOverwriteMode", "dynamic") to try

just tried it, got the same error

@Narcasserun
Copy link

+1 ,I have the same problem

@choupijiang
Copy link

scalaVersion := "2.12.10"
sparkVersion := "3.2.0"
"org.apache.iceberg" % "iceberg-spark3-runtime_2.12" % "0.12.1"
+1, the same problem

@lidroider
Copy link

+1
Same problem too
Any idea?
Thanks

@KnightChess
Copy link
Contributor

There is no release version support spark3.2
You can use the master branch build to support spark3.2. #3335

@lidroider
Copy link

I try using spark version 3.1.2 and it's worked, thanks a lot

@KnightChess
Copy link
Contributor

https://github.com/apache/iceberg#compatibility

@nreich
Copy link

nreich commented Jan 24, 2022

Using spark 3.2.0 and iceberg-spark-runtime-3.2_2.12 (0.13.0-SNAPSHOT, built locally from master today), I still see this issue (did try adding .config("spark.sql.sources.partitionOverwriteMode", "dynamic") with no luck). Is this issue still present in current master or is there more configuration required to resolve the issue or possibly an issue with my local build?

@aokolnychyi
Copy link
Contributor

aokolnychyi commented Jan 25, 2022

@nreich, could you, please, provide the full stack trace on 3.2.0 and master? Like others said, Iceberg 0.12 extensions are not compatible with Spark 3.2 but the master and upcoming 0.13 should be.

Are you using PySpark?

@nreich
Copy link

nreich commented Jan 25, 2022

@aokolnychyi I tested again today with spark 3.2.0 and iceberg 0.13.0 release candidate (downloaded jars from repository.apache.org/content/repositories/orgapacheiceberg-1079/org/apache/iceberg/iceberg-spark3-runtime/0.13.0/). Tried to run through the "getting started" guide for spark-sql.
My spark-sql startup command (also tried with spark.sql.sources.partitionOverwriteMode=dynamic):

bin/spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.13.0  \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog  \
    --conf spark.sql.catalog.spark_catalog.type=hive  \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop  \
    --conf spark.sql.catalog.local.warehouse=/tmp/iceberg/warehouse

Running any query (but for example, CREATE TABLE local.db.table (id bigint, data string) USING iceberg). Resulted in this stack trace:

ERROR SparkSQLDriver: Failed in [CREATE TABLE local.db.table (id bigint, data string) USING iceberg
]
java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DynamicFileFilterWithCardinalityCheck has interface org.apache.spark.sql.catalyst.plans.logical.BinaryNode as super class
	at java.base/java.lang.ClassLoader.defineClass1(Native Method)
	at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
	at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
	at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
	at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.$anonfun$apply$8(IcebergSparkSessionExtensions.scala:50)
	at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildOptimizerRules$1(SparkSessionExtensions.scala:201)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at org.apache.spark.sql.SparkSessionExtensions.buildOptimizerRules(SparkSessionExtensions.scala:201)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder.customOperatorOptimizationRules(BaseSessionStateBuilder.scala:259)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$2.extendedOperatorOptimizationRules(BaseSessionStateBuilder.scala:248)
	at org.apache.spark.sql.catalyst.optimizer.Optimizer.defaultBatches(Optimizer.scala:130)
	at org.apache.spark.sql.execution.SparkOptimizer.defaultBatches(SparkOptimizer.scala:42)
	at org.apache.spark.sql.catalyst.optimizer.Optimizer.batches(Optimizer.scala:382)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:138)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:134)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:130)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:148)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:166)
	at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:214)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:259)
	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:228)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DynamicFileFilterWithCardinalityCheck has interface org.apache.spark.sql.catalyst.plans.logical.BinaryNode as super class
	at java.base/java.lang.ClassLoader.defineClass1(Native Method)
	at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
	at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
	at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
	at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.$anonfun$apply$8(IcebergSparkSessionExtensions.scala:50)
	at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildOptimizerRules$1(SparkSessionExtensions.scala:201)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at org.apache.spark.sql.SparkSessionExtensions.buildOptimizerRules(SparkSessionExtensions.scala:201)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder.customOperatorOptimizationRules(BaseSessionStateBuilder.scala:259)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$2.extendedOperatorOptimizationRules(BaseSessionStateBuilder.scala:248)
	at org.apache.spark.sql.catalyst.optimizer.Optimizer.defaultBatches(Optimizer.scala:130)
	at org.apache.spark.sql.execution.SparkOptimizer.defaultBatches(SparkOptimizer.scala:42)
	at org.apache.spark.sql.catalyst.optimizer.Optimizer.batches(Optimizer.scala:382)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:138)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:134)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:130)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:148)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:166)
	at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:214)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:259)
	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:228)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

@rdblue
Copy link
Contributor

rdblue commented Jan 25, 2022

@nreich, can you check the Jar you used? For Spark 3.2, you should be using the iceberg-spark-runtime-3.2_2.12 Jar but the link you posted was to the iceberg-spark3-runtime Jar. That's one works with 3.0 and 3.1, but we're moving to separate Jars for each Iceberg version.

@kbendick
Copy link
Contributor

As of Iceberg 0.13.0, the iceberg-spark-runtime jars have changed to reflect the Spark (and Scala) versions:

You can test the 0.13.0-rc1, fetching it from the staging maven repository, with the following command line flags for Spark 3.2: --packages 'org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0' --repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/

For other Spark versions than 3.2, use the artifactIds below (in place of iceberg-spark-runtime-3.2_2.12 above).

Iceberg 0.13.0 spark-runtime jar names
Spark 3.0: iceberg-spark3-runtime:0.13.0
Spark 3.1: iceberg-spark-runtime-3.1_2.12
Spark 3.2: iceberg-spark-runtime-3.2_2.12

The complete package name for any depends on your spark version now. iceberg-spark3-runtime should only be used for Spark 3.0.

@nreich
Copy link

nreich commented Jan 26, 2022

@rdblue @kbendick I had tried that jar first, actually: had the same exact result, so I looked to the docs in master for getting started and found they still referred to the old jar (I must have missed the correct location to look for the changed getting started instructions?).

I cleared dependency caches (just in case), and ran again with:

bin/spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0 \
    --repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/ \
    ...

but still got the exact same stacktrace as before. Just as a sanity check, I switched over to spark 3.1.2 and the org.apache.iceberg:iceberg-spark-runtime-3.1_2.12:0.13.0 jar and that was able to create the table successfully.

@kbendick
Copy link
Contributor

Thank you for testing this @nreich 🙏 .

I have tested with Spark 3.1.2 and Spark 3.2.0, both built for Hadoop 3.2., and I don't seem to get this problem. At least on the CREATE TABLE statement.

cd spark-3.2.0-bin-hadoop3.2 && rm-rf /tmp/iceberg && mkdir -p /tmp/iceberg/warehouse && ./bin/spark-shell --packages 'org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0' --repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog  --conf spark.sql.catalog.local.type=hadoop --conf spark.sql.catalog.local.warehouse=/tmp/iceberg/warehouse

scala> spark.sql("use local")

scala> spark.sql("CREATE TABLE local.db.table (id bigint, data string) USING iceberg")
res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("show tables in local.db").show
+---------+---------+-----------+
|namespace|tableName|isTemporary|
+---------+---------+-----------+
|       db|    table|      false|
+---------+---------+-----------+

I also tried with Spark 3.1.2, as well as with a partitioned table and I did not encounter any exceptions.

@kbendick
Copy link
Contributor

kbendick commented Jan 26, 2022

From Spark 3.2, I also used a partitioned table to test dynamic partition overwrite in Spark 3.2

Bash start up script from spark-3.2.0-bin-hadoop3.2:

mkdir -p /tmp/iceberg/warehouse && ./bin/spark-shell \
--packages 'org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0' \
--repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/  \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog  \
--conf spark.sql.catalog.local.type=hadoop \
--conf spark.sql.catalog.local.warehouse=/tmp/iceberg/warehouse \
--conf spark.sql.sources.partitionOverwriteMode=dynamic

Spark Shell:

scala> spark.sql("CREATE TABLE local.db.table_partitioned (id bigint, data string) USING iceberg partitioned by (id)")
res6: org.apache.spark.sql.DataFrame = []

scala> spark.sql("show tables in local.db").show
+---------+-----------------+-----------+
|namespace|        tableName|isTemporary|
+---------+-----------------+-----------+
|       db|table_partitioned|      false|
|       db|            table|      false|
+---------+-----------------+-----------+

scala> spark.sql("INSERT INTO local.db.table_partitioned(id, data) VALUES (1, 'Hank'), (2, 'Kyle'), (3, 'Jethro'), (4, 'Russell'), (5, 'Maggie')")

scala> spark.sql("select * from local.db.table_partitioned").show
+---+-------+
| id|   data|
+---+-------+
|  1|   Hank|
|  2|   Kyle|
|  3| Jethro|
|  4|Russell|
|  5| Maggie|
+---+-------+

scala> spark.createDataFrame(Seq((3, "Burt"))).toDF("id", "data").write.mode("overwrite").insertInto("local.db.table_partitioned")

scala> spark.sql("select * from local.db.table_partitioned order by id").show
+---+-------+
| id|   data|
+---+-------+
|  1|   Hank|
|  2|   Kyle|
|  3|   Burt|
|  4|Russell|
|  5| Maggie|
+---+-------+

@nreich
Copy link

nreich commented Jan 26, 2022

Tried the spark-shell just in case (following exactly what you did) and get the same error. Looks like I have a different hadoop version:
from the spark shell I ran println("Hadoop version: " + org.apache.hadoop.util.VersionInfo.getVersion())
with response of Hadoop version: 3.3.1

@kbendick
Copy link
Contributor

kbendick commented Jan 26, 2022

That's a bit of a red herring. I have the same output. That's what's used for hadoop3.2 surprisingly enough and is expected.

scala>  println("Hadoop version: " + org.apache.hadoop.util.VersionInfo.getVersion())
Hadoop version: 3.3.1

@kbendick
Copy link
Contributor

Just as a sanity check, does your spark-shell ASCII art print version 3.2.0?

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
      /_/

@nreich
Copy link

nreich commented Jan 26, 2022

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
      /_/

Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.12)

Other possibly relevant info:
running on MacOS Catalina (10.15.7)
spark 3.2.0 was installed through brew

@rdblue
Copy link
Contributor

rdblue commented Jan 26, 2022

@nreich, maybe additional Jars are sneaking into your classpath. Can you dump the classpath in the Spark UI and share it? Or check for more than one Iceberg Jar?

@nreich
Copy link

nreich commented Jan 26, 2022

Downloaded a fresh copy of spark 3.2.0 (https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz): everything now works as expected

@nreich
Copy link

nreich commented Jan 26, 2022

That was the issue @rdblue: I had an errant copy of the old iceberg-spark3-runtime-0.12.1.jar on the classpath. After removing it, my original installation of spark 3.2.0 now functions as expected with iceberg 0.13.0.

@rdblue
Copy link
Contributor

rdblue commented Jan 26, 2022

Glad to hear it's working! Thanks for working with us to debug!

@nreich
Copy link

nreich commented Jan 26, 2022

Thanks for all your help!

@kbendick
Copy link
Contributor

kbendick commented Jan 26, 2022

Thank you so much @nreich for working with us to debug this. This is a very important part of the release process and it really helps to have community members testing things out.

I'm going to close this issue in a bit if there are no more comments. Anybody please feel free to open a new issue referencing this one if need be!

TLDR - Be sure to have the correct Iceberg artifact based on the Spark version, as well as ensuring there aren't extra iceberg jars in your Spark /jars folder (I've definitely done that more than once).

@kbendick
Copy link
Contributor

If you're encountering this issue, please be sure that you're using the correct artifact for your Spark version, and that you don't have any additional iceberg jars on the classpath (likely in spark's /jars directory).

If you still have an issue, please open another issue (feel free to reference this one). Thank you!

@cccs-br
Copy link
Author

cccs-br commented Feb 4, 2022

FYI, since the official release of 0.13, the artifact is available with these coordinates...

Spark: 3.2.0
Iceberg: 0.13

sbt:

// https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark-runtime
libraryDependencies += "org.apache.iceberg" %% "iceberg-spark-runtime-3.2" % "0.13.0"

maven

<!-- https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark-runtime-3.2 -->
<dependency>
    <groupId>org.apache.iceberg</groupId>
    <artifactId>iceberg-spark-runtime-3.2_2.12</artifactId>
    <version>0.13.0</version>
</dependency>

@Ashish1997
Copy link

@rdblue @kbendick I had tried that jar first, actually: had the same exact result, so I looked to the docs in master for getting started and found they still referred to the old jar (I must have missed the correct location to look for the changed getting started instructions?).

I cleared dependency caches (just in case), and ran again with:

bin/spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0 \
    --repositories https://repository.apache.org/content/repositories/orgapacheiceberg-1079/ \
    ...

but still got the exact same stacktrace as before. Just as a sanity check, I switched over to spark 3.1.2 and the org.apache.iceberg:iceberg-spark-runtime-3.1_2.12:0.13.0 jar and that was able to create the table successfully.

Thanks, Spark 3.1.3-bin-hadoop3.2 with iceberg-spark-runtime-3.1_2.12:0.13.0 jar also worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests