Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pydoop - HDFS IOExeption #218

Closed
lott3 opened this issue Jun 24, 2016 · 4 comments
Closed

Pydoop - HDFS IOExeption #218

lott3 opened this issue Jun 24, 2016 · 4 comments
Labels

Comments

@lott3
Copy link

lott3 commented Jun 24, 2016

I got this Issue using HDP 2.4 and Python 2.7 --> http://stackoverflow.com/questions/37925300/pydoop-hdfs-ioexeption posted it on stackoverflow Someone got an idea how to solve it?

"You should open a new issue rather than reviving old, closed issues.

You might have a different python interpreter being called in PySpark, or perhaps the same interpreter running with different environment variables? Open a new issue and tell us what's the relevant code you're running in PySpark.

Luca" (#158)

  • The interpreter is set to Python 2.7 (Anaconda) as I set the environment varialbes of Spark User to it
  • I am executing in Pydoop in Jupyter: file_X_train = hdfs.open("/path../.csv") import pydoop.hdfs as hdfs
@ilveroluca
Copy link
Member

Is the file you're trying to open on HDFS or on your local file sistem?

@lott3
Copy link
Author

lott3 commented Jul 4, 2016

On HDFS

@simleo simleo added the HDP label Nov 22, 2017
@pboynton
Copy link

pboynton commented Dec 9, 2017

I had this issue too (running HDP 2.6 and Python 2.7 on RHEL 7) and then discovered that when I ran import pydoop followed by pydoop.hadoop_home() I got different answers depending on if I was in a Python REPL or in a Pyspark REPL. The Python REPL returned the real path "/usr/hdp/2.6.1.0-129/hadoop" while the Pyspark REPL returned the symlink "/usr/hdp/current/hadoop-client". This ended up causing a different set of loaded java classes as seen by running for p in pydoop.hadoop_classpath().split(":"): print p.

The solution is just to set your HADOOP_HOME to explicitly the real path or you can set it as $(realpath /usr/hdp/current/hadoop-client) if you have realpath installed. Of course JAVA_HOME and HADOOP_CONF_DIR still need to be set, as well as adding ${JAVA_HOME}/bin to your PATH.

@simleo
Copy link
Member

simleo commented Dec 11, 2017

Thanks for the feedback @pboynton! @lott3 can you please check if this works for you?

@simleo simleo closed this as completed May 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants