Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyarrow: Segmentation fault (core dumped) #1450

Closed
vgoklani opened this issue Dec 26, 2017 · 12 comments
Closed

pyarrow: Segmentation fault (core dumped) #1450

vgoklani opened this issue Dec 26, 2017 · 12 comments

Comments

@vgoklani
Copy link

vgoklani commented Dec 26, 2017

Getting a segmentation fault, when loading this after Keras (see below)

Keras version = 2.1.2
tensorflow version = 1.4.1
pyarrow version = 0.8.0

edit: #1 strangely enough, it seems to run fine if I switch the order of the imports (i.e. importing keras before pyarrow)

edit: #2 it works fine if I drop pyarrow back to '0.7.1'

Please advise

Python 3.6.3 |Anaconda, Inc.| (default, Nov 20 2017, 20:41:42)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pyarrow.parquet as pq

In [2]: import keras
Using TensorFlow backend.
Segmentation fault (core dumped)
root@336ad63c9866:~/src#
@wesm
Copy link
Member

wesm commented Dec 26, 2017

Could we have more information about your platform, and how you installed keras?

@vgoklani
Copy link
Author

vgoklani commented Dec 26, 2017

Sure, it's running inside a Docker container. I've pasted the Dockerfile below:

Nothing special in the requirements.txt file, just the usual stuff, but I don't pin the versions, I just use the latest via pip :)

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04

RUN echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list

# Configure the build for our CUDA configuration.
ENV CI_BUILD_PYTHON python
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
ENV TF_NEED_CUDA 1
ENV TF_CUDA_COMPUTE_CAPABILITIES=3.0,3.5,5.2,6.0,6.1
ENV TF_CUDA_VERSION=9.0
ENV TF_CUDNN_VERSION=7

RUN apt-get update && apt-get install -y --no-install-recommends apt-utils
RUN apt-get install -y wget curl bzip2 unzip libcupti-dev vim gcc make supervisor git-core vim gcc tmux tar wget python-software-properties unzip tzdata

RUN apt-get clean

RUN ln -sf /usr/share/zoneinfo/US/Eastern /etc/localtime

RUN cd tmp/ && \
    wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \
    /opt/conda/bin/conda update -y conda && \
    rm -f /tmp/Miniconda3-latest-Linux-x86_64.sh

ENV PATH /opt/conda/bin:$PATH
ENV KERAS_BACKEND tensorflow

COPY requirements.txt /root/requirements.txt
COPY conda_requirements.txt /root/conda_requirements.txt

RUN pip install -r /root/requirements.txt
RUN conda install --file /root/conda_requirements.txt -y

# fix Intel MKL ERROR: Parameter 4 was incorrect on entry to DLASCL
RUN conda install -y mkl==11.3.3

RUN conda install pytorch torchvision cuda90 -c pytorch

EXPOSE 6006 8888
VOLUME ["/data"]
VOLUME ["/log"]
VOLUME ["/src"]
COPY jupyter_notebook_config.py /root/.jupyter/
WORKDIR /root/src
RUN jt -t oceans16 -f roboto -fs 12 -cellw 100%
RUN conda clean -tp -y

@vgoklani
Copy link
Author

Here is the requirements.txt file, if it's useful:

tensorflow-gpu
keras
h5py
gpustat
pip
hickle
pydot-ng
tqdm
jupyterthemes
nltk
gensim
jupyternotify
joblib
babel
keras-tqdm
pymongo
gym
networkx
pyarrow

@vgoklani
Copy link
Author

and the conda_requirements.txt:

jupyter
scikit-learn
seaborn
numpy
scipy
cython
pandas
ipython

@xhochy
Copy link
Member

xhochy commented Dec 28, 2017

I would suspect that this issue comes from the tensorflow pip Wheel. This is compiled against a newer libstdc++ and libc. Importing tensorflow first will load the newer one into memory, importing it last will lead Arrow to loading the older one into memory and thus tensorflow using its symbols.

You should be able to verify this by running the following code. Please do, as if it isn't the above mentioned issue, we might be able to fix it in a simpler fashion.

# ulimit -c unlimited
# python example.py
…
Using TensorFlow backend.
Segmentation fault (core dumped)
# gdb python core
gdb> thread apply all bt full
… (please attach this output to the issue)

(Note that instead of core the coredump might also be called core.XXXX on your system where XXXX is the PID of the faulty process.)

For now, ensuring the import order is your easiest way around this. We probably use a C++ symbol in 0.8.0 that we have not been using with 0.7.1.

There are two optimal solutions to the this which are rather a long term approach:

  • Introduce manylinux2 and manylinux3 standards for Python wheels on Linux. Tensorflow sadly needs features that are not available with the manylinux1 specification. With these new specs, Tensorflow and Arrow would each then build packages for them. Installing both from the same spec should prevent the problem.
  • Better tensorflow packages for conda. Currently there are only CPU builds available on conda-forge. If there were more builds available, you could take one of them as conda should be able to resolve the libstdc++ conflicts.

@wesm
Copy link
Member

wesm commented Dec 28, 2017

Should this issue be raised with TF? They claim to be producing manylinux1 wheels but I guess they are not using the standard manylinux1 image setup?

https://pypi.python.org/pypi/tensorflow-gpu

@xhochy
Copy link
Member

xhochy commented Dec 29, 2017

@wesm Yes but I think that this issue was already raised somewhere.

@wesm
Copy link
Member

wesm commented Jan 2, 2018

I'm looking at https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/ci_build

It looks like (from what I can see anyway) TensorFlow is not using the same compiler as the manylinux1 spec (devtoolset-2 gcc 4.8.5 I think), which I guess would explain the quagmire we are in. So one possible workaround is that we could try to import tensorflow pre-emptively if it's available when we are importing pyarrow (so that its symbols get loaded first).

@xhochy
Copy link
Member

xhochy commented Jan 2, 2018

👍 for doing this for Tensorflow as we might get a lot of users that run into this issue. Otherwise the longterm fix is to have manylinux2 and manylinux3 standard based on newer devtoolsets.

@wesm
Copy link
Member

wesm commented Jan 2, 2018

@wesm
Copy link
Member

wesm commented Jan 17, 2018

Since TensorFlow is using a non-standard compiler to make manylinux1 wheels, I'm not sure there's a good solution here except to "import tensorflow before pyarrow"

@wesm
Copy link
Member

wesm commented Jan 24, 2018

I resolved ARROW-1960 as "won't fix" with this resolution for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants