-
Notifications
You must be signed in to change notification settings - Fork 11
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
Setup:
- ArangoGraph (oneshard model, 3 x 4GB)
- Google Dataproc version 2.1 (Spark version 3.3.2, Scala 2.12)
- Pyspark (Python 3.10)
- ArangoDB Spark Connector version 1.6.0
Description:
I'm reading data from Google BigQuery and want to load it into ArangoDB using Apache Spark. I confirmed that the data can be read properly from BigQuery. I followed the instructions in this document. I'm using but I'm getting the following error while writing:
Traceback (most recent call last):
File "/home/test.py", line 60, in <module>
.save()
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 966, in save
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o95.save.
: java.lang.NoSuchMethodError: 'scala.collection.immutable.ArraySeq scala.runtime.ScalaRunTime$.wrapRefArray(java.lang.Object[])'
at org.apache.spark.sql.arangodb.datasource.ArangoTable.capabilities(ArangoTable.scala:27)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Implicits$TableHelper.supports(DataSourceV2Implicits.scala:95)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:297)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
The dataframe has the following schema and is 61K rows:
root
|-- id: long (nullable = true)
|-- name: string (nullable = true)
The sample code I'm using:
df = spark.read.format("bigquery")...
logging.info(f"Table <table> contains {df.count():,} rows")
(
df.write.mode("overwrite")
.format("com.arangodb.spark")
.option("table", "<table_name>")
.option("endpoints", "<endpoint>")
.option("user", "root")
.option("password", "<password>")
.save() # fails here
)And the way that I'm submitting my job is as follows:
spark-submit --master yarn --packages com.arangodb:arangodb-spark-datasource-3.3_2.12:1.6.0 /home/arangodb_test.pyWondering if anyone has seen this error. I've tried using previous versions, but to no avail.
Happy to share more details if needed
Thanks,
Aldo
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested