Failed to load class for data source: com.databricks.spark.csv #79

Closed
nevi-me opened this Issue Jun 13, 2015 · 2 comments

Projects

None yet

3 participants

@nevi-me
nevi-me commented Jun 13, 2015

In the R example, I am failing to read a CSV file with the above error.

library(SparkR)

sc = sparkR.init()

sqlContext = sparkRSQL(sc)

sample = read.df(sqlContext, "sample.csv", "com.databricks.spark.csv", header="true")

I get a runtime exception on the lookup of the datasource.

I notice that when starting spark from the command line, I have to specify packages, but it seems there isn't a way to do so when invoking it from sparkR.init, or that it's not documented.

@feng5
feng5 commented Jun 16, 2015

same to me. i try to setup the SPARKR_SUBMIT_ARGS env before sparkR.init

> Sys.setenv('SPARKR_SUBMIT_ARGS'='--packages com.databricks:spark-csv_2.10:1.0.3')
> library(SparkR, lib='~/spark/R/lib/')
> sc <- sparkR.init(master="local[4]", sparkHome='/Users/13k/spark')
Launching java with spark-submit command /Users/13k/spark/bin/spark-submit  --packages com.databricks:spark-csv_2.10:1.0.3 /var/folders/sd/2_0mf8p100v7dl3p_2cs5m_80000gn/T//RtmpXveJ0G/backend_portbbee386d2dc9 
Error: Cannot load main class from JAR file:/var/folders/sd/2_0mf8p100v7dl3p_2cs5m_80000gn/T/RtmpXveJ0G/backend_portbbee386d2dc9
Run with --help for usage help or --verbose for debug output
Error in sparkR.init(master = "local[4]", sparkHome = "/Users/13k/spark") : 

will this be a Permission denied issue? because i need to run load package with root in sparkR shell
and how can i fix it?

@Pragith
Pragith commented Jun 26, 2015

This is the right syntax (after hours of trying):
(Note - You've to focus on the first line. Notice to double-quotes)

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"')

library(SparkR)
library(magrittr)

# Initialize SparkContext and SQLContext
sc <- sparkR.init(appName="SparkR-Flights-example")
sqlContext <- sparkRSQL.init(sc)


# The SparkSQL context should already be created for you as sqlContext
sqlContext
# Java ref type org.apache.spark.sql.SQLContext id 1

# Load the flights CSV file using `read.df`. Note that we use the CSV reader Spark package here.
flights <- read.df(sqlContext, "nycflights13.csv", "com.databricks.spark.csv", header="true")
@falaki falaki closed this in #103 Jul 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment