Skip to content

1.4.0 causes exception because of wrong HDFS version. #445

Description

@matb

Expected behavior

When running spark jobs with a fat jar and Spark 3.0.0-rc2, Hadoop 3.2.0 and Geospark 1.4.0 the jobs should run normally.

Actual behavior

The job starts and fails with:

Exception in thread "main" java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
        at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
        at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
        at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3217)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3262)
        at org.apache.hadoop.fs.FsUrlStreamHandlerFactory.<init>(FsUrlStreamHandlerFactory.java:77)
        at org.apache.hadoop.fs.FsUrlStreamHandlerFactory.<init>(FsUrlStreamHandlerFactory.java:70)
        at org.apache.spark.sql.internal.SharedState$.liftedTree1$1(SharedState.scala:203)
        at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$setFsUrlStreamHandlerFactory(SharedState.scala:202)
        at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:54)
        at org.apache.spark.sql.SparkSession.$anonfun$sharedState$1(SparkSession.scala:131)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:131)
        at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:130)
        at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:309)
        at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1051)
        at org.apache.spark.sql.SparkSession.$anonfun$sessionState$2(SparkSession.scala:156)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:154)
        at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:151)
        at org.apache.spark.sql.DataFrameReader.<init>(DataFrameReader.scala:804)
        at org.apache.spark.sql.SparkSession.read(SparkSession.scala:620)
        at ncpipeline.ImporterJob.main(ImporterJob.java:75)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

The reason for the problem is a dependency to hadoop-hdfs that is missing the provided scope.

Steps to reproduce the problem

  • Create a Maven Project using Spark 3.0.0-rc2 dependencies and Hadoop 3.2.0 depdencies
  • Add Shade Plugin
  • Compile the Jar and submit a spark job.

Settings

GeoSpark version = 1.4.0

Apache Spark version = 3.0.0

JRE version = 1.8

API type = Java

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions