Pydoop - HDFS IOExeption #218

lott3 · 2016-06-24T11:12:28Z

I got this Issue using HDP 2.4 and Python 2.7 --> http://stackoverflow.com/questions/37925300/pydoop-hdfs-ioexeption posted it on stackoverflow Someone got an idea how to solve it?

"You should open a new issue rather than reviving old, closed issues.

You might have a different python interpreter being called in PySpark, or perhaps the same interpreter running with different environment variables? Open a new issue and tell us what's the relevant code you're running in PySpark.

Luca" (#158)

The interpreter is set to Python 2.7 (Anaconda) as I set the environment varialbes of Spark User to it
I am executing in Pydoop in Jupyter: file_X_train = hdfs.open("/path../.csv") import pydoop.hdfs as hdfs

The text was updated successfully, but these errors were encountered:

ilveroluca · 2016-07-01T10:23:38Z

Is the file you're trying to open on HDFS or on your local file sistem?

lott3 · 2016-07-04T09:22:28Z

On HDFS

pboynton · 2017-12-09T01:29:41Z

I had this issue too (running HDP 2.6 and Python 2.7 on RHEL 7) and then discovered that when I ran import pydoop followed by pydoop.hadoop_home() I got different answers depending on if I was in a Python REPL or in a Pyspark REPL. The Python REPL returned the real path "/usr/hdp/2.6.1.0-129/hadoop" while the Pyspark REPL returned the symlink "/usr/hdp/current/hadoop-client". This ended up causing a different set of loaded java classes as seen by running for p in pydoop.hadoop_classpath().split(":"): print p.

The solution is just to set your HADOOP_HOME to explicitly the real path or you can set it as $(realpath /usr/hdp/current/hadoop-client) if you have realpath installed. Of course JAVA_HOME and HADOOP_CONF_DIR still need to be set, as well as adding ${JAVA_HOME}/bin to your PATH.

simleo · 2017-12-11T09:20:42Z

Thanks for the feedback @pboynton! @lott3 can you please check if this works for you?

simleo added the HDP label Nov 22, 2017

simleo closed this as completed May 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pydoop - HDFS IOExeption #218

Pydoop - HDFS IOExeption #218

lott3 commented Jun 24, 2016

ilveroluca commented Jul 1, 2016

lott3 commented Jul 4, 2016

pboynton commented Dec 9, 2017

simleo commented Dec 11, 2017

Pydoop - HDFS IOExeption #218

Pydoop - HDFS IOExeption #218

Comments

lott3 commented Jun 24, 2016

ilveroluca commented Jul 1, 2016

lott3 commented Jul 4, 2016

pboynton commented Dec 9, 2017

simleo commented Dec 11, 2017