Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] pyarrow.hdfs.connect() failing #20975

Closed
asfimport opened this issue Jan 29, 2019 · 11 comments
Closed

[Python] pyarrow.hdfs.connect() failing #20975

asfimport opened this issue Jan 29, 2019 · 11 comments
Assignees
Labels
Milestone

Comments

@asfimport
Copy link

asfimport commented Jan 29, 2019

Trying to connect to hdfs using the below snippet. Using hadoop-libhdfs.
This error appears in v0.12.0. It doesn't appear in v0.11.1. (I used the same environment when testing that it still worked on v0.11.1)

 

In [1]: import pyarrow as pa

In [2]: fs = pa.hdfs.connect()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-e0007ad7fa95> in <module>()
----> 1 fs = pa.hdfs.connect()

/usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in connect(host, port, user, kerb_ticket, driver, extra_conf)
    205     fs = HadoopFileSystem(host=host, port=port, user=user,
    206                           kerb_ticket=kerb_ticket, driver=driver,
--> 207                           extra_conf=extra_conf)
    208     return fs

/usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in __init__(self, host, port, user, kerb_ticket, driver, extra_conf)
     36             _maybe_set_hadoop_classpath()
     37 
---> 38         self._connect(host, port, user, kerb_ticket, driver, extra_conf)
     39 
     40     def __reduce__(self):

/usr/local/lib64/python2.7/site-packages/pyarrow/io-hdfs.pxi in pyarrow.lib.HadoopFileSystem._connect()
     72         if host is not None:
     73             conf.host = tobytes(host)
---> 74         self.host = host
     75 
     76         conf.port = port

TypeError: Expected unicode, got str

Environment: Python 2.7
Hadoop distribution: Amazon 2.7.3
Hive 2.1.1
Spark 2.1.1
Tez 0.8.4
Linux 4.4.35-33.55.amzn1.x86_64
Reporter: Bradley Grantham
Assignee: Antoine Pitrou / @pitrou

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-4413. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
The following patch would probably work, but I don't know how to test it:
https://gist.github.com/pitrou/1ee2e1b04543cddead11a146938d9e80

@asfimport
Copy link
Author

Wes McKinney / @wesm:
[~bradleygrantham] are you able to test this out and submit a PR?

@asfimport
Copy link
Author

Bradley Grantham:
@wesm Yeah, I'd love to!

@asfimport
Copy link
Author

Bradley Grantham:
@wesm Unfortunately I can't build the package. I've spent all day trying to get it working (I also spent some time last weekend). I can build it on my Mac, but can't get hdfs to work with it so that's obviously useless for this problem. And on Amazon EMR, which is where I encountered the problem originally, I can't build the package at all. I keep getting an error when running

make -j4

...

 

/usr/bin/ld: /home/hadoop/miniconda3/envs/pyarrow-dev/lib/libglog.a(libglog_la-signalhandler.o): unrecognized relocation (0x29) in section `.text'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [release/arrow-file-to-stream] Error 1
make[1]: *** [src/arrow/ipc/CMakeFiles/arrow-file-to-stream.dir/all] Error 2
/usr/bin/ld: /home/hadoop/miniconda3/envs/pyarrow-dev/lib/libglog.a(libglog_la-signalhandler.o): unrecognized relocation (0x29) in section `.text'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [release/arrow-stream-to-file] Error 1
make[1]: *** [src/arrow/ipc/CMakeFiles/arrow-stream-to-file.dir/all] Error 2
make: *** [all] Error 2

Sorry!

 

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Sorry, unfortunately the build instructions that use conda / conda-forge are out of date since the compiler migration that occurred on January 15.

There is at least https://issues.apache.org/jira/browse/ARROW-3096 about fixing

@asfimport
Copy link
Author

Wes McKinney / @wesm:
If you install the gcc_linux-64 and gxx_linux-64 packages before activating the conda environment it should work, but we'll get those instructions updated hopefully in the coming weeks

@asfimport
Copy link
Author

Bradley Grantham:
Ok cool. I would really like to help so I will give this another stab tomorrow and let you know how it goes!

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Any luck on this?

@asfimport
Copy link
Author

Wes McKinney / @wesm:
We should apply Antoine's patch in the meantime

@asfimport
Copy link
Author

Bradley Grantham:
Sorry again. Somehow missed the email RE the comment 4 days ago. So I tried making sure that gcc_linux-64 and gxx_linux-64 were both installed but I still can't get it working to the point where I can install a dev version of the package.

 

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 3975
#3975

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants