Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow' #33758

Closed
cyborne100 opened this issue Jan 18, 2023 · 4 comments

Comments

@cyborne100
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

Using Spark on Databricks runtime 10.4 LTS | Spark 3.2.1 | Scala 2.12. I am attempting to use the "hello world" instructions from the SparkR pages. Both SparkR and arrow are installed at the cluster level. For some reason, Arrow & SparkR are trying to call write_arrow (which was deprecated in Arrow 1.0).

Running:

library(SparkR)
library(arrow)
# Converts Spark DataFrame from an R DataFrame
spark_df <- createDataFrame(mtcars)

# Converts Spark DataFrame to an R DataFrame
collect(spark_df)

# Apply an R native function to each partition.
collect(dapply(spark_df, function(rdf) { data.frame(rdf$gear + 1) }, structType("gear double")))

# Apply an R native function to grouped data.
collect(gapply(spark_df,
               "gear",
               function(key, group) {
                 data.frame(gear = key[[1]], disp = mean(group$disp) > group$disp)
               },
               structType("gear double, disp boolean")))

The notebook error from
collect(dapply(spark_df, function(rdf) { data.frame(rdf$gear + 1) }, structType("gear double")))
is:

Error in readBin(con, raw(), as.integer(dataLen), endian = "big") :
invalid 'n' argument

Digging further into the Spark job stderr, I get:

Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 14) (10.1.8.43 executor 2): org.apache.spark.SparkException: R unexpectedly exited.
R worker produced errors: Error: 'write_arrow' is not an exported object from 'namespace:arrow'
Execution halted

at org.apache.spark.api.r.BaseRRunner$ReaderIterator$$anonfun$1.applyOrElse(BaseRRunner.scala:169)
at org.apache.spark.api.r.BaseRRunner$ReaderIterator$$anonfun$1.applyOrElse(BaseRRunner.scala:162)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:194)
at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:123)
at org.apache.spark.api.r.BaseRRunner$ReaderIterator.hasNext(BaseRRunner.scala:138)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.hasNext(ArrowConverters.scala:206)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
at org.apache.spark.sql.Dataset.$anonfun$collectAsArrowToR$3(Dataset.scala:3841)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:832)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1681)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:835)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:690)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:154)

SessionInfo():

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] arrow_10.0.1 SparkR_3.2.0

loaded via a namespace (and not attached):
[1] digest_0.6.29 assertthat_0.2.1 R6_2.5.1 magrittr_2.0.2
[5] rlang_1.0.1 TeachingDemos_2.10 cli_3.2.0 hwriter_1.3.2
[9] vctrs_0.3.8 hwriterPlus_1.0-3 tools_4.1.2 bit64_4.0.5
[13] glue_1.6.1 purrr_0.3.4 bit_4.0.4 fastmap_1.1.0
[17] compiler_4.1.2 Rserve_1.8-10 htmltools_0.5.2 tidyselect_1.1.2

By Googling, I cannot seem to find a resolution on this anywhere. Please let me know if I need to provide more info.

Component(s)

R

@nealrichardson
Copy link
Member

write_arrow() was deprecated in arrow 1.0.0 (July 2020) and removed in arrow 9.0.0 (https://arrow.apache.org/docs/r/news/index.html#arrow-900). (For that matter, SparkR was archived from CRAN in 2021.)

I recommend either switching from SparkR to sparklyr, which seems to be more actively maintained, or downgrading arrow to 8.0.0.

@cyborne100
Copy link
Author

Arrow 8.0.0 works.

Are you are stating the issue lies within SparkR and I should engage them?

Was SparkR archived or moved to Apache Spark proper here?

@nealrichardson
Copy link
Member

Arrow 8.0.0 works.

Are you are stating the issue lies within SparkR and I should engage them?

Looks like it has already been fixed, so perhaps you can just install the latest SparkR.

Was SparkR archived or moved to Apache Spark proper here?

Archived from CRAN, the R package repository, so it's no longer distributed there, but that doesn't mean anything for the source on GitHub. One major benefit of relying on CRAN packages is that they run tests to ensure that the latest version of all packages work together.

@kou kou changed the title SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow' [R] SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow' Jan 19, 2023
@cyborne100
Copy link
Author

cyborne100 commented Jan 19, 2023

Ahh, I understand now. I researched this for the last three weeks before I asked. Thanks for helping a NOOB. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants