[R] SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow' #33758

cyborne100 · 2023-01-18T16:23:23Z

Describe the bug, including details regarding any error messages, version, and platform.

Using Spark on Databricks runtime 10.4 LTS | Spark 3.2.1 | Scala 2.12. I am attempting to use the "hello world" instructions from the SparkR pages. Both SparkR and arrow are installed at the cluster level. For some reason, Arrow & SparkR are trying to call write_arrow (which was deprecated in Arrow 1.0).

Running:

library(SparkR)
library(arrow)
# Converts Spark DataFrame from an R DataFrame
spark_df <- createDataFrame(mtcars)

# Converts Spark DataFrame to an R DataFrame
collect(spark_df)

# Apply an R native function to each partition.
collect(dapply(spark_df, function(rdf) { data.frame(rdf$gear + 1) }, structType("gear double")))

# Apply an R native function to grouped data.
collect(gapply(spark_df,
               "gear",
               function(key, group) {
                 data.frame(gear = key[[1]], disp = mean(group$disp) > group$disp)
               },
               structType("gear double, disp boolean")))

The notebook error from
collect(dapply(spark_df, function(rdf) { data.frame(rdf$gear + 1) }, structType("gear double")))
is:

Error in readBin(con, raw(), as.integer(dataLen), endian = "big") :
invalid 'n' argument

Digging further into the Spark job stderr, I get:

Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 14) (10.1.8.43 executor 2): org.apache.spark.SparkException: R unexpectedly exited.
R worker produced errors: Error: 'write_arrow' is not an exported object from 'namespace:arrow'
Execution halted

at org.apache.spark.api.r.BaseRRunner$ReaderIterator$$anonfun$1.applyOrElse(BaseRRunner.scala:169)
at org.apache.spark.api.r.BaseRRunner$ReaderIterator$$anonfun$1.applyOrElse(BaseRRunner.scala:162)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:194)
at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:123)
at org.apache.spark.api.r.BaseRRunner$ReaderIterator.hasNext(BaseRRunner.scala:138)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.hasNext(ArrowConverters.scala:206)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
at org.apache.spark.sql.Dataset.$anonfun$collectAsArrowToR$3(Dataset.scala:3841)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:832)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1681)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:835)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:690)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:154)

SessionInfo():

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] arrow_10.0.1 SparkR_3.2.0

loaded via a namespace (and not attached):
[1] digest_0.6.29 assertthat_0.2.1 R6_2.5.1 magrittr_2.0.2
[5] rlang_1.0.1 TeachingDemos_2.10 cli_3.2.0 hwriter_1.3.2
[9] vctrs_0.3.8 hwriterPlus_1.0-3 tools_4.1.2 bit64_4.0.5
[13] glue_1.6.1 purrr_0.3.4 bit_4.0.4 fastmap_1.1.0
[17] compiler_4.1.2 Rserve_1.8-10 htmltools_0.5.2 tidyselect_1.1.2

By Googling, I cannot seem to find a resolution on this anywhere. Please let me know if I need to provide more info.

Component(s)

R

The text was updated successfully, but these errors were encountered:

nealrichardson · 2023-01-18T17:52:42Z

write_arrow() was deprecated in arrow 1.0.0 (July 2020) and removed in arrow 9.0.0 (https://arrow.apache.org/docs/r/news/index.html#arrow-900). (For that matter, SparkR was archived from CRAN in 2021.)

I recommend either switching from SparkR to sparklyr, which seems to be more actively maintained, or downgrading arrow to 8.0.0.

cyborne100 · 2023-01-18T19:03:20Z

Arrow 8.0.0 works.

Are you are stating the issue lies within SparkR and I should engage them?

Was SparkR archived or moved to Apache Spark proper here?

nealrichardson · 2023-01-18T19:26:16Z

Arrow 8.0.0 works.

Are you are stating the issue lies within SparkR and I should engage them?

Looks like it has already been fixed, so perhaps you can just install the latest SparkR.

Was SparkR archived or moved to Apache Spark proper here?

Archived from CRAN, the R package repository, so it's no longer distributed there, but that doesn't mean anything for the source on GitHub. One major benefit of relying on CRAN packages is that they run tests to ensure that the latest version of all packages work together.

cyborne100 · 2023-01-19T14:36:23Z

Ahh, I understand now. I researched this for the last three weeks before I asked. Thanks for helping a NOOB. :)

cyborne100 added the Type: bug label Jan 18, 2023

github-actions bot added the Component: R label Jan 18, 2023

nealrichardson closed this as completed Jan 18, 2023

kou changed the title ~~SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow'~~ [R] SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow' Jan 19, 2023

nealrichardson mentioned this issue Apr 5, 2023

Enabling for Conversion to/from R DataFrame, dapply and gapply failed #34876

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R] SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow' #33758

[R] SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow' #33758

cyborne100 commented Jan 18, 2023

nealrichardson commented Jan 18, 2023

cyborne100 commented Jan 18, 2023

nealrichardson commented Jan 18, 2023

cyborne100 commented Jan 19, 2023 •

edited

[R] SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow' #33758

[R] SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow' #33758

Comments

cyborne100 commented Jan 18, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

nealrichardson commented Jan 18, 2023

cyborne100 commented Jan 18, 2023

nealrichardson commented Jan 18, 2023

cyborne100 commented Jan 19, 2023 • edited

cyborne100 commented Jan 19, 2023 •

edited