-
-
Notifications
You must be signed in to change notification settings - Fork 40
Segfault When Initiating HDFS Connection #163
Comments
libhdfs3-2.3.0-2 was just released, can you try with libhdfs3-2.3.0-1 and see if this a new problem? |
@sk1p , ideas? |
@martindurant that worked! Ran this in a Jupyter Notebook and things are working again: |
Thanks for testing, @lucashu1 |
Thanks for tagging me. That's weird: |
@sk1p Nope, not seeing any instances of |
If I remember, effective_user arises in situations something like you have logged in with a kerberos principal other than your local username. Are you using kerberos? |
@martindurant did anything change wrt. kerberos library dependency versions in -2? |
Since the last release was 9 months ago, a whole load of versions have been updated, but the latest build release was with the older code and newer versions, so that is likely not the problem. I think that there must still be bad code from the big merge ContinuumIO/libhdfs3-downstream#3 which is beyond my c++ ability. You'll notice that there were conversations around how to make that right, but it was never used in a conda build until now (the recipe still pointed to the |
Ok, that may be it. Sadly I couldn't reproduce it yet, I don't have any kerberos stuff set up. Maybe @bdrosen96 knows more? |
Presumably, yes, but getting this right for a variety of systems is not easy. |
I got something the same issue with a manual install, on master. Here are the complete steps to build latest libhdfs3 master (commit 41fd50d from May 7, PR#12) on fedora 28 and reproduce this (I don't believe an actual kerberos connection is needed, just the standard hdfs-client with auth set to kerberos): dnf install -y krb5-workstation protobuf-devel gtest-devel gmock-devel libcurl-devel libgsasl-devel libgasl
git clone https://github.com/ContinuumIO/libhdfs3-downstream
cd libhdfs3-downstream/libhdfs3
mkdir build
cd build
../bootstrap --prefix=/usr/local
make -j5
sudo make install
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
pip3 install --user hdfs3 # 0.3.0
gdb python3
import hdfs3 ; f = hdfs3.HDFileSystem('some.net', port=8020) |
Thanks to @martindurant I was able to work around it by building the following libhdfs3 commit: git checkout 7842951 # august
git cherry-pick 069d774 # gcc-7 (then same steps as above, it should at least get you to point where it tries to connect and fails with |
I could not reproduce this when I used a debug build. That may mean there is unitialized variable |
It is probably important to be sure the python hdfs3 code is using: hdfsBuilderConnect.argtypes = [ct.POINTER(hdfsBuilder), ct.c_char_p] I also could not reproduce this with non debug build. I noticed the connect call looks like: fs = _lib.hdfsBuilderConnect(o) instead of: fs = _lib.hdfsBuilderConnect(o, ensure_bytes(self.effective_user)) so there might be incompatability between python lib and C++ lib |
Here is the line from the python side, having the extra argument https://github.com/dask/hdfs3/blob/master/hdfs3/core.py#L147 , but this is not in the released conda/pip package. |
might be why there are issues if people are using older pip package without those changes. |
I was definitely using pypi latest, v0.3.0 as stated above
…On Fri, 15 Jun 2018, 15:33 Brett Rosen, ***@***.***> wrote:
might be why there are issues if people are using older pip package
without those changes.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#163 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACpOGSDQbwkQFQAQCtUa6gbUIz0mdid1ks5t87engaJpZM4UCIqM>
.
|
I don't think so, the libhdfs3 conda package pull from my repo. not from ContinuumIO's for this reason; I could not get a build of libhdfs3 that worked reliably on the various systems, and that's why hdfs3 has remained unreleased since. It would be great, but apparently beyond my ability without considerable investment of time. |
Do you get same error if you use the unreleased version from github instead of pypi? By older I meant the version that was out of sync with C++ changes. |
conda install -c conda-forge libhdfs3=2.3.0=1 hdfs3 --yes what a mess!! I think its very commandable that continuum tries to tackle these integration issues. lack of progress in them is sad |
Sorry, @petacube - the build difficulties here are indeed what led to a stop in the efforts on our part to try to get everything smooth. Arrow's hdfs does not suffer from them, so long as you have the java-native libraries present (which is normally true on any hdfs edge-node). For connection from outside the cluster, I would not recommend my implementation of webhdfs or one of the other python implementations out there. |
@martindurant i see.. |
Right, webhdfs needs to be enabled, and may need its own kerberos principal, etc. That's fairly common, though. You probably do not need direct access to the data nodes, if a proxy has been suitably setup - but there are many complicated options there. |
can it be installed in ubuntu 18.0.4..if so how? ty |
@HarshaRagyari , have you tried pip? Generally I would recommend basing your whole python environment on |
Hi,
I'm getting a segfault when trying to create a connection using the HDFileSystem constructor. The code that I'm running is:
(with
HOSTNAME
andPORT-NUM
filled in, ofc.)When I run the script through GDB (for debugging purposes), I get:
I installed hdfs3 via conda (
conda install -c conda-forge hdfs3
).Here are the package versions installed by conda-forge:
I don't think there's anything wrong with the Hadoop setup, since I'm still able to access it using
hadoop fs
commands.Any thoughts on what could be going wrong?
The text was updated successfully, but these errors were encountered: