Expected behavior
When running spark jobs with a fat jar and Spark 3.0.0-rc2, Hadoop 3.2.0 and Geospark 1.4.0 the jobs should run normally.
Actual behavior
The job starts and fails with:
Exception in thread "main" java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3217)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3262)
at org.apache.hadoop.fs.FsUrlStreamHandlerFactory.<init>(FsUrlStreamHandlerFactory.java:77)
at org.apache.hadoop.fs.FsUrlStreamHandlerFactory.<init>(FsUrlStreamHandlerFactory.java:70)
at org.apache.spark.sql.internal.SharedState$.liftedTree1$1(SharedState.scala:203)
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$setFsUrlStreamHandlerFactory(SharedState.scala:202)
at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:54)
at org.apache.spark.sql.SparkSession.$anonfun$sharedState$1(SparkSession.scala:131)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:131)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:130)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:309)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1051)
at org.apache.spark.sql.SparkSession.$anonfun$sessionState$2(SparkSession.scala:156)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:154)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:151)
at org.apache.spark.sql.DataFrameReader.<init>(DataFrameReader.scala:804)
at org.apache.spark.sql.SparkSession.read(SparkSession.scala:620)
at ncpipeline.ImporterJob.main(ImporterJob.java:75)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
The reason for the problem is a dependency to hadoop-hdfs that is missing the provided scope.
Steps to reproduce the problem
- Create a Maven Project using Spark 3.0.0-rc2 dependencies and Hadoop 3.2.0 depdencies
- Add Shade Plugin
- Compile the Jar and submit a spark job.
Settings
GeoSpark version = 1.4.0
Apache Spark version = 3.0.0
JRE version = 1.8
API type = Java
Expected behavior
When running spark jobs with a fat jar and Spark 3.0.0-rc2, Hadoop 3.2.0 and Geospark 1.4.0 the jobs should run normally.
Actual behavior
The job starts and fails with:
The reason for the problem is a dependency to hadoop-hdfs that is missing the provided scope.
Steps to reproduce the problem
Settings
GeoSpark version = 1.4.0
Apache Spark version = 3.0.0
JRE version = 1.8
API type = Java