Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load libhdfs #18621

Open
asfimport opened this issue Apr 15, 2021 · 8 comments
Open

Unable to load libhdfs #18621

asfimport opened this issue Apr 15, 2021 · 8 comments

Comments

@asfimport
Copy link

I am using pyarrow 3.0.0 with python 3.7 and hadoop 2.10.1 on windows 10 64bit. Facing this following error. 

I am using pyspark 3.1.1. I am not able to save dataframe to hdfs. When I used pyspark 3.0.0 I was able to save dataframe hdfs.

please help:

import pyarrow as pa
fs = pa.hdfs.connect(host='localhost', port=9001)
main:1: DeprecationWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 219, in connect
extra_conf=extra_conf
File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 229, in _connect
extra_conf=extra_conf)
File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 45, in init
self._connect(host, port, user, kerb_ticket, extra_conf)
File "pyarrow\io-hdfs.pxi", line 75, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow\error.pxi", line 99, in pyarrow.lib.check_status
OSError: Unable to load libhdfs: The specified module could not be found.

 

 

Reporter: Sukesh Pabolu

Original Issue Attachments:

Note: This issue was originally created as ARROW-12399. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
Could you try with pyarrow.fs.HadoopFileSystem(host='localhost', port=9001) instead? (the hdfs.connect() method is deprecated in favor of pyarrow.fs.HadoopFileSystem, which is also backed by a somewhat different implementation)

@asfimport
Copy link
Author

Sukesh Pabolu:

from pyarrow import fs
fs.HadoopFileSystem(host='localhost', port=9001)
Traceback (most recent call last):
File "", line 1, in
File "pyarrow_hdfs.pyx", line 83, in pyarrow._hdfs.HadoopFileSystem.init
File "pyarrow\error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow\error.pxi", line 99, in pyarrow.lib.check_status
OSError: Unable to load libhdfs: The specified module could not be found.

 

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
Further question, did you check the docs at https://arrow.apache.org/docs/python/filesystems.html#hadoop-file-system-hdfs ? It specified some environment variables that need to be set. Are those set?

@asfimport
Copy link
Author

Sukesh Pabolu:
image-2021-04-15-20-04-50-069.png

@asfimport
Copy link
Author

Sukesh Pabolu:
I have assigned all environment variables

@asfimport
Copy link
Author

Sukesh Pabolu:
I am waiting for further reply

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
The assignment of "ARROW_LIBHDFS_DIR" might not be fully correct (it should be "C:
hadoop
lib
native" with double back-slashes), but it might also not be needed to set this variable if this is not pointing to something else as $HADOOP_HOME/lib/native

(I don't have HDFS myself, so I can't further help than asking those initial questions)

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
[~sukeshpabolu] Have you checked the above suggestion (use double backslashes to circumvent escaping issues)?

@asfimport asfimport added this to the 3.0.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant