In the R example, I am failing to read a CSV file with the above error.
sc = sparkR.init()
sqlContext = sparkRSQL(sc)
sample = read.df(sqlContext, "sample.csv", "com.databricks.spark.csv", header="true")
I get a runtime exception on the lookup of the datasource.
I notice that when starting spark from the command line, I have to specify packages, but it seems there isn't a way to do so when invoking it from sparkR.init, or that it's not documented.
same to me. i try to setup the SPARKR_SUBMIT_ARGS env before sparkR.init
> Sys.setenv('SPARKR_SUBMIT_ARGS'='--packages com.databricks:spark-csv_2.10:1.0.3')
> library(SparkR, lib='~/spark/R/lib/')
> sc <- sparkR.init(master="local", sparkHome='/Users/13k/spark')
Launching java with spark-submit command /Users/13k/spark/bin/spark-submit --packages com.databricks:spark-csv_2.10:1.0.3 /var/folders/sd/2_0mf8p100v7dl3p_2cs5m_80000gn/T//RtmpXveJ0G/backend_portbbee386d2dc9
Error: Cannot load main class from JAR file:/var/folders/sd/2_0mf8p100v7dl3p_2cs5m_80000gn/T/RtmpXveJ0G/backend_portbbee386d2dc9
Run with --help for usage help or --verbose for debug output
Error in sparkR.init(master = "local", sparkHome = "/Users/13k/spark") :
will this be a Permission denied issue? because i need to run load package with root in sparkR shell
and how can i fix it?
This is the right syntax (after hours of trying):
(Note - You've to focus on the first line. Notice to double-quotes)
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"')
# Initialize SparkContext and SQLContext
sc <- sparkR.init(appName="SparkR-Flights-example")
sqlContext <- sparkRSQL.init(sc)
# The SparkSQL context should already be created for you as sqlContext
# Java ref type org.apache.spark.sql.SQLContext id 1
# Load the flights CSV file using `read.df`. Note that we use the CSV reader Spark package here.
flights <- read.df(sqlContext, "nycflights13.csv", "com.databricks.spark.csv", header="true")