pyarrow: Segmentation fault (core dumped) #1450

vgoklani · 2017-12-26T20:03:17Z

Getting a segmentation fault, when loading this after Keras (see below)

Keras version = 2.1.2
tensorflow version = 1.4.1
pyarrow version = 0.8.0

edit: #1 strangely enough, it seems to run fine if I switch the order of the imports (i.e. importing keras before pyarrow)

edit: #2 it works fine if I drop pyarrow back to '0.7.1'

Please advise

Python 3.6.3 |Anaconda, Inc.| (default, Nov 20 2017, 20:41:42)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pyarrow.parquet as pq

In [2]: import keras
Using TensorFlow backend.
Segmentation fault (core dumped)
root@336ad63c9866:~/src#

wesm · 2017-12-26T22:15:51Z

Could we have more information about your platform, and how you installed keras?

vgoklani · 2017-12-26T22:54:35Z

Sure, it's running inside a Docker container. I've pasted the Dockerfile below:

Nothing special in the requirements.txt file, just the usual stuff, but I don't pin the versions, I just use the latest via pip :)

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04

RUN echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list

# Configure the build for our CUDA configuration.
ENV CI_BUILD_PYTHON python
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
ENV TF_NEED_CUDA 1
ENV TF_CUDA_COMPUTE_CAPABILITIES=3.0,3.5,5.2,6.0,6.1
ENV TF_CUDA_VERSION=9.0
ENV TF_CUDNN_VERSION=7

RUN apt-get update && apt-get install -y --no-install-recommends apt-utils
RUN apt-get install -y wget curl bzip2 unzip libcupti-dev vim gcc make supervisor git-core vim gcc tmux tar wget python-software-properties unzip tzdata

RUN apt-get clean

RUN ln -sf /usr/share/zoneinfo/US/Eastern /etc/localtime

RUN cd tmp/ && \
    wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \
    /opt/conda/bin/conda update -y conda && \
    rm -f /tmp/Miniconda3-latest-Linux-x86_64.sh

ENV PATH /opt/conda/bin:$PATH
ENV KERAS_BACKEND tensorflow

COPY requirements.txt /root/requirements.txt
COPY conda_requirements.txt /root/conda_requirements.txt

RUN pip install -r /root/requirements.txt
RUN conda install --file /root/conda_requirements.txt -y

# fix Intel MKL ERROR: Parameter 4 was incorrect on entry to DLASCL
RUN conda install -y mkl==11.3.3

RUN conda install pytorch torchvision cuda90 -c pytorch

EXPOSE 6006 8888
VOLUME ["/data"]
VOLUME ["/log"]
VOLUME ["/src"]
COPY jupyter_notebook_config.py /root/.jupyter/
WORKDIR /root/src
RUN jt -t oceans16 -f roboto -fs 12 -cellw 100%
RUN conda clean -tp -y

vgoklani · 2017-12-26T22:59:36Z

Here is the requirements.txt file, if it's useful:

tensorflow-gpu
keras
h5py
gpustat
pip
hickle
pydot-ng
tqdm
jupyterthemes
nltk
gensim
jupyternotify
joblib
babel
keras-tqdm
pymongo
gym
networkx
pyarrow

vgoklani · 2017-12-26T23:00:01Z

and the conda_requirements.txt:

jupyter
scikit-learn
seaborn
numpy
scipy
cython
pandas
ipython

xhochy · 2017-12-28T06:29:08Z

I would suspect that this issue comes from the tensorflow pip Wheel. This is compiled against a newer libstdc++ and libc. Importing tensorflow first will load the newer one into memory, importing it last will lead Arrow to loading the older one into memory and thus tensorflow using its symbols.

You should be able to verify this by running the following code. Please do, as if it isn't the above mentioned issue, we might be able to fix it in a simpler fashion.

# ulimit -c unlimited
# python example.py
…
Using TensorFlow backend.
Segmentation fault (core dumped)
# gdb python core
gdb> thread apply all bt full
… (please attach this output to the issue)

(Note that instead of core the coredump might also be called core.XXXX on your system where XXXX is the PID of the faulty process.)

For now, ensuring the import order is your easiest way around this. We probably use a C++ symbol in 0.8.0 that we have not been using with 0.7.1.

There are two optimal solutions to the this which are rather a long term approach:

Introduce manylinux2 and manylinux3 standards for Python wheels on Linux. Tensorflow sadly needs features that are not available with the manylinux1 specification. With these new specs, Tensorflow and Arrow would each then build packages for them. Installing both from the same spec should prevent the problem.
Better tensorflow packages for conda. Currently there are only CPU builds available on conda-forge. If there were more builds available, you could take one of them as conda should be able to resolve the libstdc++ conflicts.

wesm · 2017-12-28T15:25:08Z

Should this issue be raised with TF? They claim to be producing manylinux1 wheels but I guess they are not using the standard manylinux1 image setup?

https://pypi.python.org/pypi/tensorflow-gpu

xhochy · 2017-12-29T16:32:53Z

@wesm Yes but I think that this issue was already raised somewhere.

wesm · 2018-01-02T16:26:36Z

I'm looking at https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/ci_build

It looks like (from what I can see anyway) TensorFlow is not using the same compiler as the manylinux1 spec (devtoolset-2 gcc 4.8.5 I think), which I guess would explain the quagmire we are in. So one possible workaround is that we could try to import tensorflow pre-emptively if it's available when we are importing pyarrow (so that its symbols get loaded first).

xhochy · 2018-01-02T16:48:23Z

👍 for doing this for Tensorflow as we might get a lot of users that run into this issue. Otherwise the longterm fix is to have manylinux2 and manylinux3 standard based on newer devtoolsets.

wesm · 2018-01-02T17:08:22Z

See https://issues.apache.org/jira/browse/ARROW-1960

wesm · 2018-01-17T18:34:58Z

Since TensorFlow is using a non-standard compiler to make manylinux1 wheels, I'm not sure there's a good solution here except to "import tensorflow before pyarrow"

wesm · 2018-01-24T02:40:04Z

I resolved ARROW-1960 as "won't fix" with this resolution for now

wesm closed this as completed Jan 24, 2018

pitrou mentioned this issue Jul 1, 2018

[Help needed] Implement thread pool with pthreads #2198

Closed

seanpmorgan mentioned this issue Apr 2, 2019

Segmentation fault with tfa.seq2seq.gather_tree tensorflow/addons#125

Closed

achimnol mentioned this issue Jun 9, 2020

Q: Build options for OpenBLAS embedded with numpy to debug a segfault issue numpy/numpy#16533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyarrow: Segmentation fault (core dumped) #1450

pyarrow: Segmentation fault (core dumped) #1450

vgoklani commented Dec 26, 2017 •

edited

Loading

wesm commented Dec 26, 2017

vgoklani commented Dec 26, 2017 •

edited

Loading

vgoklani commented Dec 26, 2017

vgoklani commented Dec 26, 2017

xhochy commented Dec 28, 2017

wesm commented Dec 28, 2017

xhochy commented Dec 29, 2017

wesm commented Jan 2, 2018

xhochy commented Jan 2, 2018

wesm commented Jan 2, 2018

wesm commented Jan 17, 2018

wesm commented Jan 24, 2018

pyarrow: Segmentation fault (core dumped) #1450

pyarrow: Segmentation fault (core dumped) #1450

Comments

vgoklani commented Dec 26, 2017 • edited Loading

wesm commented Dec 26, 2017

vgoklani commented Dec 26, 2017 • edited Loading

vgoklani commented Dec 26, 2017

vgoklani commented Dec 26, 2017

xhochy commented Dec 28, 2017

wesm commented Dec 28, 2017

xhochy commented Dec 29, 2017

wesm commented Jan 2, 2018

xhochy commented Jan 2, 2018

wesm commented Jan 2, 2018

wesm commented Jan 17, 2018

wesm commented Jan 24, 2018

vgoklani commented Dec 26, 2017 •

edited

Loading

vgoklani commented Dec 26, 2017 •

edited

Loading