Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.util.NoSuchElementException: key not found: path #230

Closed
onsDridi opened this issue Jul 11, 2016 · 10 comments
Closed

java.util.NoSuchElementException: key not found: path #230

onsDridi opened this issue Jul 11, 2016 · 10 comments

Comments

@onsDridi
Copy link

onsDridi commented Jul 11, 2016

I'm trying to test this code

from pyspark.sql import SQLContext
from pyspark import SparkContext
sc = SparkContext(appName="Connect Spark with Redshift")
sql_context = SQLContext(sc)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "ACCESSID")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "ACEESKEY")
df = sql_context.read \
    .option("url", "jdbc:redshift://example.coyf2i236wts.eu-central1.redshift.amazonaws.com:5439/agcdb?user=user&password=pwd") \
    .option("dbtable", "table_name") \
    .option("tempdir", "s3://bucket/path") \
    .load()

but i'm getting this error

capture d ecran 2016-07-11 a 14 56 47

Any ideas ?

@JoshRosen
Copy link
Contributor

I think that you need to add .format("com.databricks.spark.redshift") to your sql_context.read call; my hunch is that Spark can't infer the format for this data source, so you need to explicitly specify that we should use the spark-redshift connector.

(This is an unhelpful error message in Spark; I'll see if there's a way to provide a more helpful one).

@onsDridi
Copy link
Author

Thank you Josh , I tried to add it but now i'm getting this error : java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.redshift

@JoshRosen
Copy link
Contributor

Is the Redshift connector JAR on your Spark driver's classpath?

@onsDridi
Copy link
Author

Yes, i used this command to run the sript
bin/spark-submit --driver-class-path path/RedshiftJDBC41-1.1.17.1017.jar script.py
I tried also
bin/spark-submit redshiftTestCode/sparkRedshift.py --jars path/RedshiftJDBC41-1.1.17.1017.jar
but i still get the same error

@JoshRosen
Copy link
Contributor

You also need to add the spark-redshift JAR; the Redshift JDBC driver is not sufficient by itself.

@onsDridi
Copy link
Author

I did it too but i still have the same error

@JoshRosen
Copy link
Contributor

Can you post the exact command that you tried most recently and which didn't work?

@onsDridi
Copy link
Author

I run this commands :
bin/spark-submit redshiftTestCode/sparkRedshift.py --jars /Users/od/Documents/Work/spark-redshift_2.10-1.0.0.jar
bin/spark-submit redshiftTestCode/sparkRedshift.py --jars /Users/od/Documents/Work/RedsphiftJDBC41-1.1.17.1017.jar

Both commands give me the same error : java.util.NoSuchElementException: key not found: path

@JoshRosen
Copy link
Contributor

Okay, and you also added .format("com.databricks.spark.redshift") to your code?

@tokland
Copy link

tokland commented Sep 29, 2016

Note you write -- jars after the python script, but this is an option of spark-submit. For the record, this worked for me:

$ spark-submit --jars spark-redshift_2.10-1.1.0.jar,RedshiftJDBC.jar,minimal-json-0.9.4.jar test-redshift.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants