Skip to content
This repository has been archived by the owner on Feb 10, 2021. It is now read-only.

Segfault When Initiating HDFS Connection #163

Open
lucashu1 opened this issue May 16, 2018 · 26 comments
Open

Segfault When Initiating HDFS Connection #163

lucashu1 opened this issue May 16, 2018 · 26 comments

Comments

@lucashu1
Copy link

Hi,

I'm getting a segfault when trying to create a connection using the HDFileSystem constructor. The code that I'm running is:

from hdfs3 import HDFileSystem
hdfs = HDFileSystem([HOSTNAME], port=[PORT-NUM])

(with HOSTNAME and PORT-NUM filled in, ofc.)

When I run the script through GDB (for debugging purposes), I get:

Starting program: /home/jovyan/.conda/bin/python hdfs-test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
strlen () at ../sysdeps/x86_64/strlen.S:106
106	../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb) backtrace
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x00007ffff3ff83ee in std::char_traits<char>::length (__s=0x1 <error: Cannot access memory at address 0x1>)
   from /home/jovyan/.conda/lib/python3.6/lib-dynload/../../libhdfs3.so
#2  std::string::assign (__s=0x1 <error: Cannot access memory at address 0x1>, this=0x555555c27fa0)
    at /feedstock_root/build_artefacts/libhdfs3_1526441785297/work/libhdfs3/src/client/FileSystem.cpp:1131
#3  std::string::operator= (__s=0x1 <error: Cannot access memory at address 0x1>, this=0x555555c27fa0)
    at /feedstock_root/build_artefacts/libhdfs3_1526441785297/work/libhdfs3/src/client/FileSystem.cpp:555
#4  Hdfs::FileSystem::FileSystem (this=0x555555c27fa0, conf=..., euser=0x1 <error: Cannot access memory at address 0x1>)
    at /feedstock_root/build_artefacts/libhdfs3_1526441785297/work/libhdfs3/src/client/FileSystem.cpp:148
#5  0x00007ffff400f117 in hdfsBuilderConnect (bld=0x555555bf8c40, 
    effective_user=0x1 <error: Cannot access memory at address 0x1>)
   from /home/jovyan/.conda/lib/python3.6/lib-dynload/../../libhdfs3.so
#6  0x00007ffff6607ec0 in ffi_call_unix64 () from /home/jovyan/.conda/lib/python3.6/lib-dynload/../../libffi.so.6
#7  0x00007ffff660787d in ffi_call () from /home/jovyan/.conda/lib/python3.6/lib-dynload/../../libffi.so.6
#8  0x00007ffff681cdee in _ctypes_callproc ()
   from /home/jovyan/.conda/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
#9  0x00007ffff681d825 in PyCFuncPtr_call ()
   from /home/jovyan/.conda/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
#10 0x00005555556631bb in _PyObject_FastCallDict ()
#11 0x00005555556f0d3e in call_function ()
#12 0x000055555571519a in _PyEval_EvalFrameDefault ()
#13 0x00005555556ea7db in fast_function ()
#14 0x00005555556f0cc5 in call_function ()
#15 0x000055555571519a in _PyEval_EvalFrameDefault ()
#16 0x00005555556e99a6 in _PyEval_EvalCodeWithName ()
#17 0x00005555556eb108 in _PyFunction_FastCallDict ()
#18 0x000055555566339f in _PyObject_FastCallDict ()
#19 0x0000555555667ff3 in _PyObject_Call_Prepend ()
#20 0x0000555555662dde in PyObject_Call ()
#21 0x00005555556bdf6b in slot_tp_init ()
#22 0x00005555556f0f27 in type_call ()
#23 0x00005555556631bb in _PyObject_FastCallDict ()
#24 0x00005555556eacfa in _PyObject_FastCallKeywords ()
#25 0x00005555556f0d3e in call_function ()
#26 0x0000555555715eb1 in _PyEval_EvalFrameDefault ()
#27 0x00005555556eb529 in PyEval_EvalCodeEx ()
#28 0x00005555556ec2cc in PyEval_EvalCode ()
#29 0x0000555555768af4 in run_mod ()
#30 0x0000555555768ef1 in PyRun_FileExFlags ()
#31 0x00005555557690f4 in PyRun_SimpleFileExFlags ()
#32 0x000055555576cc28 in Py_Main ()
#33 0x000055555563471e in main ()

I installed hdfs3 via conda (conda install -c conda-forge hdfs3).

Here are the package versions installed by conda-forge:

    boost-cpp:    1.66.0-1         conda-forge
    bzip2:        1.0.6-1          conda-forge
    curl:         7.59.0-1         conda-forge
    hdfs3:        0.3.0-py36_0     conda-forge
    icu:          58.2-0           conda-forge
    krb5:         1.14.6-0         conda-forge
    libgcrypt:    1.8.2-hfc679d8_1 conda-forge
    libgpg-error: 1.31-hf484d3e_0  conda-forge
    libgsasl:     1.8.0-2          conda-forge
    libhdfs3:     2.3.0-2          conda-forge
    libiconv:     1.15-0           conda-forge
    libntlm:      1.4-1            conda-forge
    libprotobuf:  3.5.2-0          conda-forge
    libssh2:      1.8.0-2          conda-forge
    libuuid:      1.0.3-1          conda-forge
    libxml2:      2.9.8-0          conda-forge

I don't think there's anything wrong with the Hadoop setup, since I'm still able to access it using hadoop fs commands.

Any thoughts on what could be going wrong?

@martindurant
Copy link
Member

libhdfs3-2.3.0-2 was just released, can you try with libhdfs3-2.3.0-1 and see if this a new problem?
conda install -c conda-forge libhdfs3=2.3.0=1

@martindurant
Copy link
Member

@sk1p , ideas?

@lucashu1
Copy link
Author

@martindurant that worked! Ran this in a Jupyter Notebook and things are working again: ! conda install -c conda-forge libhdfs3=2.3.0=1 hdfs3 --yes. Hopefully they're able to get that fixed before the issue pops up for more people.

@martindurant
Copy link
Member

Thanks for testing, @lucashu1

@sk1p
Copy link
Contributor

sk1p commented May 16, 2018

Thanks for tagging me. That's weird: effective_user=0x1 - do you have anything related to effective_user in your configuration file?

@lucashu1
Copy link
Author

lucashu1 commented May 16, 2018

@sk1p Nope, not seeing any instances of effective_user in /etc/hadoop, which to my understanding should contain the Hadoop config files.

@martindurant
Copy link
Member

If I remember, effective_user arises in situations something like you have logged in with a kerberos principal other than your local username. Are you using kerberos?

@sk1p
Copy link
Contributor

sk1p commented May 17, 2018

@martindurant did anything change wrt. kerberos library dependency versions in -2?

@martindurant
Copy link
Member

Since the last release was 9 months ago, a whole load of versions have been updated, but the latest build release was with the older code and newer versions, so that is likely not the problem. I think that there must still be bad code from the big merge ContinuumIO/libhdfs3-downstream#3 which is beyond my c++ ability. You'll notice that there were conversations around how to make that right, but it was never used in a conda build until now (the recipe still pointed to the concat branch)

@sk1p
Copy link
Contributor

sk1p commented May 17, 2018

Ok, that may be it. Sadly I couldn't reproduce it yet, I don't have any kerberos stuff set up.

Maybe @bdrosen96 knows more?

@martindurant
Copy link
Member

Presumably, yes, but getting this right for a variety of systems is not easy.

@eddy-geek
Copy link

eddy-geek commented Jun 14, 2018

I got something the same issue with a manual install, on master.

Here are the complete steps to build latest libhdfs3 master (commit 41fd50d from May 7, PR#12) on fedora 28 and reproduce this (I don't believe an actual kerberos connection is needed, just the standard hdfs-client with auth set to kerberos):

dnf install -y krb5-workstation protobuf-devel gtest-devel gmock-devel libcurl-devel libgsasl-devel libgasl
git clone https://github.com/ContinuumIO/libhdfs3-downstream
cd libhdfs3-downstream/libhdfs3
mkdir build
cd build
../bootstrap --prefix=/usr/local
make -j5
sudo make install
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
pip3 install --user hdfs3  # 0.3.0
gdb python3
import hdfs3 ; f = hdfs3.HDFileSystem('some.net', port=8020)

@eddy-geek
Copy link

Thanks to @martindurant I was able to work around it by building the following libhdfs3 commit:

git checkout 7842951    # august
git cherry-pick 069d774 # gcc-7

(then same steps as above, it should at least get you to point where it tries to connect and fails with ERROR Failed to setup RPC connection to "some.net:8020")

@bdrosen96
Copy link

I could not reproduce this when I used a debug build. That may mean there is unitialized variable

@bdrosen96
Copy link

It is probably important to be sure the python hdfs3 code is using:

hdfsBuilderConnect.argtypes = [ct.POINTER(hdfsBuilder), ct.c_char_p]

I also could not reproduce this with non debug build. I noticed the connect call looks like:

fs = _lib.hdfsBuilderConnect(o)

instead of:

fs = _lib.hdfsBuilderConnect(o, ensure_bytes(self.effective_user))

so there might be incompatability between python lib and C++ lib

@martindurant
Copy link
Member

Here is the line from the python side, having the extra argument https://github.com/dask/hdfs3/blob/master/hdfs3/core.py#L147 , but this is not in the released conda/pip package.

@bdrosen96
Copy link

might be why there are issues if people are using older pip package without those changes.

@eddy-geek
Copy link

eddy-geek commented Jun 15, 2018 via email

@martindurant
Copy link
Member

I don't think so, the libhdfs3 conda package pull from my repo. not from ContinuumIO's for this reason; I could not get a build of libhdfs3 that worked reliably on the various systems, and that's why hdfs3 has remained unreleased since. It would be great, but apparently beyond my ability without considerable investment of time.
Since arrow's libhdfs implementation now does most of the job, there is less incentive to push on here.

@bdrosen96
Copy link

Do you get same error if you use the unreleased version from github instead of pypi? By older I meant the version that was out of sync with C++ changes.

@petacube
Copy link

petacube commented Feb 6, 2019

conda install -c conda-forge libhdfs3=2.3.0=1 hdfs3 --yes
fails to install proper version of libcrypto.so.1.0.0
this shows up in ldd $minicondda_home/lib/libhdfs3.so
i have tried to use libcrypto.1.1 but that leads to segmentation fault

what a mess!!
i had to find library libcrypto.1.0.0 online and install it ( conda install openssl=1.0.2)
hdfs3 and libhdfs3 i had installed normally. ( conda install libhdfs3 hdfs3 -c conda-forge)

I think its very commandable that continuum tries to tackle these integration issues. lack of progress in them is sad

@martindurant
Copy link
Member

Sorry, @petacube - the build difficulties here are indeed what led to a stop in the efforts on our part to try to get everything smooth. Arrow's hdfs does not suffer from them, so long as you have the java-native libraries present (which is normally true on any hdfs edge-node). For connection from outside the cluster, I would not recommend my implementation of webhdfs or one of the other python implementations out there.

@petacube
Copy link

petacube commented Feb 8, 2019

@martindurant i see..
so the webhdfs requires hdfs admin to enable/install webhdfs interface on hdfs machines right? or can it be used against plain hdfs cluster without any changes?

@martindurant
Copy link
Member

Right, webhdfs needs to be enabled, and may need its own kerberos principal, etc. That's fairly common, though. You probably do not need direct access to the data nodes, if a proxy has been suitably setup - but there are many complicated options there.

@HarshaRagyari
Copy link

can it be installed in ubuntu 18.0.4..if so how? ty

@martindurant
Copy link
Member

@HarshaRagyari , have you tried pip? Generally I would recommend basing your whole python environment on conda, but I understand there are a range of user cases. A debian package called python-fsspec exists, that might be available on ubuntu and might be recent enough for you - I don't know the details on that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants