Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.apache.spark.sql.delta.sources.DeltaDataSource could not be instantiated #6

Closed
saj1th opened this issue Apr 24, 2019 · 4 comments

Comments

@saj1th
Copy link

commented Apr 24, 2019

Trying to run

./spark-shell --packages io.delta:delta-core_2.11:0.1.0

&

val df = spark.read.format("delta").load(deltaPath)

This error get thrown

java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.delta.sources.DeltaDataSource could not be instantiated
  at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:581)
  at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:803)
  at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:721)
  at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1394)
  at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
  at scala.collection.Iterator.foreach(Iterator.scala:941)
  at scala.collection.Iterator.foreach$(Iterator.scala:941)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:250)
  at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:248)
  at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
  at scala.collection.TraversableLike.filter(TraversableLike.scala:262)
  at scala.collection.TraversableLike.filter$(TraversableLike.scala:262)
  at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
  at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
  ... 49 elided
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
  at org.apache.spark.sql.delta.sources.DeltaDataSource.<init>(DeltaDataSource.scala:42

Environment

Scala 2.11.12
Spark 2.4.2
@zsxwing

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

This error is because your Spark is built with Scala 2.12 but the delta-core jar you are using is built with Scala 2.11. If you use the Scala 2.12 version of delta-core like this, it should work:

./spark-shell --packages io.delta:delta-core_2.12:0.1.0
@saj1th

This comment has been minimized.

Copy link
Author

commented Apr 24, 2019

Cool! Thanks!

@saj1th saj1th closed this Apr 24, 2019
@tdas

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

For more info, you can verify which version of Scala is Spark running with by looking at the startup graphics of the Spark/Pyspark shell.

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.2
      /_/

Using Scala version 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)

Note the Scala version here ^^^ whether it is 2.11 or 2.12. You should use the same version in Delta core, either --packages io.delta:delta-core_2.12:0.1.0 or --packages io.delta:delta-core_2.12:0.1.0.

@xctom

This comment has been minimized.

Copy link

commented May 31, 2019

@zsxwing Hi I also met this error.

I ran pyspark --packages io.delta:delta-core_2.12:0.1.0

But got the same error

Python 2.7.16 (default, Apr 12 2019, 15:32:40)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Ivy Default Cache set to: /Users/xuc/.ivy2/cache
The jars for the packages stored in: /Users/xuc/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/lib/python2.7/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-fedfbd0b-ef8a-4239-b4d1-3a1af140aa07;1.0
	confs: [default]
	found io.delta#delta-core_2.12;0.1.0 in central
:: resolution report :: resolve 145ms :: artifacts dl 4ms
	:: modules in use:
	io.delta#delta-core_2.12;0.1.0 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-fedfbd0b-ef8a-4239-b4d1-3a1af140aa07
	confs: [default]
	0 artifacts copied, 1 already retrieved (0kB/5ms)
19/05/30 22:43:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/

Using Python version 2.7.16 (default, Apr 12 2019 15:32:40)
SparkSession available as 'spark'.
>>> data = spark.range(0, 5)

>>>
>>> data.write.format("delta").save("/tmp/delta-table")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pyspark/sql/readwriter.py", line 734, in save
    self._jwrite.save(path)
  File "/usr/local/lib/python2.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/local/lib/python2.7/site-packages/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/local/lib/python2.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o37.save.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.delta.sources.DeltaDataSource could not be instantiated
	at java.util.ServiceLoader.fail(ServiceLoader.java:232)
	at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
	at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
	at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
	at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
	at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
	at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:245)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
	at org.apache.spark.sql.delta.sources.DeltaDataSource.<init>(DeltaDataSource.scala:42)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at java.lang.Class.newInstance(Class.java:442)
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
	... 24 more

The version for scala and java are here. I already have scala version to be 2.12.8 so not sure what happened here:

java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode)
scala -version
Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.