Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading GDAL (import fiona) triggers Arrow to use the wrong zstd decompression and crash #592

Closed
jorisvandenbossche opened this issue Dec 31, 2021 · 6 comments

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 31, 2021

Copying over the notes from geopandas/geopandas#2278 (comment) about how to reproduce this. I can reproduce the issue with the python packages fiona and pyarrow. But opening it here because I assume it's related to their underlying C++ packages instead (not fully sure though, and maybe I should have opened it in the zstd or arrow-cpp feedstock instead?)

Reproducer

Create an environment with mamba create -n test python=3.9 pyarrow fiona. Currently this gives the following conda list:

$ conda list
# packages in environment at /home/joris/miniconda3/envs/test:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
abseil-cpp                20210324.2           h9c3ff4c_0    conda-forge
arrow-cpp                 6.0.1           py39h01fd06f_7_cpu    conda-forge
attrs                     21.4.0             pyhd8ed1ab_0    conda-forge
aws-c-cal                 0.5.11               h95a6274_0    conda-forge
aws-c-common              0.6.2                h7f98852_0    conda-forge
aws-c-event-stream        0.2.7               h3541f99_13    conda-forge
aws-c-io                  0.10.5               hfb6a706_0    conda-forge
aws-checksums             0.1.11               ha31a3da_7    conda-forge
aws-sdk-cpp               1.8.186              hb4091e7_3    conda-forge
blosc                     1.21.0               h9c3ff4c_0    conda-forge
boost-cpp                 1.74.0               h359cf19_5    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2021.10.8            ha878542_0    conda-forge
cairo                     1.16.0            ha00ac49_1009    conda-forge
cfitsio                   4.0.0                h9a35b8e_0    conda-forge
click                     8.0.3            py39hf3d152e_1    conda-forge
click-plugins             1.1.1                      py_0    conda-forge
cligj                     0.7.2              pyhd8ed1ab_1    conda-forge
curl                      7.80.0               h2574ce0_0    conda-forge
expat                     2.4.2                h9c3ff4c_0    conda-forge
fiona                     1.8.20           py39hc5a795b_4    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
freexl                    1.0.6                h7f98852_0    conda-forge
gdal                      3.4.0           py39h0494519_12    conda-forge
geos                      3.10.1               h9c3ff4c_1    conda-forge
geotiff                   1.7.0                h90a4e78_5    conda-forge
gettext                   0.19.8.1          h73d1719_1008    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
glog                      0.5.0                h48cff8f_0    conda-forge
grpc-cpp                  1.42.0               ha1441d3_1    conda-forge
hdf4                      4.2.15               h10796ff_3    conda-forge
hdf5                      1.12.1          nompi_h2750804_103    conda-forge
icu                       69.1                 h9c3ff4c_0    conda-forge
jbig                      2.1               h7f98852_2003    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
json-c                    0.15                 h98cffda_0    conda-forge
kealib                    1.4.14               h87e4c3c_3    conda-forge
krb5                      1.19.2               hcc1bbae_3    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      3.0                  h9c3ff4c_0    conda-forge
libblas                   3.9.0           12_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h7f98852_6    conda-forge
libbrotlidec              1.0.9                h7f98852_6    conda-forge
libbrotlienc              1.0.9                h7f98852_6    conda-forge
libcblas                  3.9.0           12_linux64_openblas    conda-forge
libcurl                   7.80.0               h2574ce0_0    conda-forge
libdap4                   3.20.6               hd7c4107_2    conda-forge
libdeflate                1.8                  h7f98852_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h9b69904_4    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 11.2.0              h1d223b6_11    conda-forge
libgdal                   3.4.0               hbe510e8_12    conda-forge
libgfortran-ng            11.2.0              h69a702a_11    conda-forge
libgfortran5              11.2.0              h5c6108e_11    conda-forge
libglib                   2.70.2               h174f98d_1    conda-forge
libgomp                   11.2.0              h1d223b6_11    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
libkml                    1.3.0             h238a007_1014    conda-forge
liblapack                 3.9.0           12_linux64_openblas    conda-forge
libnetcdf                 4.8.1           nompi_hb3fd0d9_101    conda-forge
libnghttp2                1.43.0               h812cca2_1    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.18          pthreads_h8fe5266_0    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libpq                     14.1                 hd57d9b9_1    conda-forge
libprotobuf               3.19.1               h780b84a_0    conda-forge
librttopo                 1.1.0                h30b5eef_8    conda-forge
libspatialite             5.0.1               hf3ee066_12    conda-forge
libssh2                   1.10.0               ha56f1ee_2    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_11    conda-forge
libthrift                 0.15.0               he6d91bd_1    conda-forge
libtiff                   4.3.0                h6f004c6_2    conda-forge
libutf8proc               2.7.0                h7f98852_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libwebp-base              1.2.1                h7f98852_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxml2                   2.9.12               h885dcf4_1    conda-forge
libzip                    1.8.0                h4de3113_1    conda-forge
libzlib                   1.2.11            h36c2ea0_1013    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
munch                     2.5.0                      py_0    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
nspr                      4.32                 h9c3ff4c_1    conda-forge
nss                       3.73                 hb5efdd6_0    conda-forge
numpy                     1.21.5           py39haac66dc_0    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   1.1.1l               h7f98852_0    conda-forge
orc                       1.7.2                h1be678f_0    conda-forge
parquet-cpp               1.5.1                         1    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pip                       21.3.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
poppler                   21.11.0              ha39eefc_0    conda-forge
poppler-data              0.4.11               hd8ed1ab_0    conda-forge
postgresql                14.1                 h2510834_1    conda-forge
proj                      8.2.0                h277dcde_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pyarrow                   6.0.1           py39hff6fa39_7_cpu    conda-forge
python                    3.9.9           h62f1059_0_cpython    conda-forge
python_abi                3.9                      2_cp39    conda-forge
re2                       2021.11.01           h9c3ff4c_0    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
s2n                       1.0.10               h9b69904_0    conda-forge
setuptools                60.2.0           py39hf3d152e_0    conda-forge
shapely                   1.8.0            py39h98e2149_4    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.8                he1b5a44_3    conda-forge
sqlite                    3.37.0               h9cd32fc_0    conda-forge
tiledb                    2.5.3                h2038895_0    conda-forge
tk                        8.6.11               h27826a3_1    conda-forge
tzcode                    2021e                h7f98852_0    conda-forge
tzdata                    2021e                he74cb21_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xerces-c                  3.2.3                h8ce2273_4    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h36c2ea0_1013    conda-forge
zstd                      1.5.1                ha95c52a_0    conda-forge

And then the following works:

>>> import pyarrow as pa
>>> from pyarrow import feather
>>> import numpy as np
>>> table = pa.table({'a': np.random.randn(100000)})
>>> feather.write_feather(table, "test.feather", compression="zstd")
>>> feather.read_table("test.feather")
pyarrow.Table
a: double
----
a: [[-0.46684274443294194,-0.3868599077722866,1.9091371836014668,...]

but adding a import fiona in front results in a segfault:

>>> import fiona
>>> import pyarrow as pa
>>> from pyarrow import feather
>>> import numpy as np
>>> table = pa.table({'a': np.random.randn(100000)})
>>> feather.write_feather(table, "test.feather", compression="zstd")
>>> feather.read_table("test.feather")
Segmentation fault (core dumped)

Observations

Getting a backtrace with gdb (putting the above code snippet in a file and running it with gdb) gives:

(gdb) bt
#0  0x00007ffff3bc4994 in HUF_decompress4X1_usingDTable_internal_default () from /home/joris/miniconda3/envs/test2/lib/python3.9/site-packages/fiona/../../.././libblosc.so.1
#1  0x00007ffff3bd4bda in ZSTD_decompressBlock_internal () from /home/joris/miniconda3/envs/test2/lib/python3.9/site-packages/fiona/../../.././libblosc.so.1
#2  0x00007ffff3bce4c0 in ZSTD_decompressMultiFrame () from /home/joris/miniconda3/envs/test2/lib/python3.9/site-packages/fiona/../../.././libblosc.so.1
#3  0x00007ffff3bce7a0 in ZSTD_decompress_usingDDict () from /home/joris/miniconda3/envs/test2/lib/python3.9/site-packages/fiona/../../.././libblosc.so.1
#4  0x00007ffff3adc829 in ZSTD_decompress () from /home/joris/miniconda3/envs/test2/lib/python3.9/site-packages/fiona/../../.././libzstd.so.1
#5  0x00007fffee144ea3 in arrow::util::internal::(anonymous namespace)::ZSTDCodec::Decompress(long, unsigned char const*, long, unsigned char*) ()
   from /home/joris/miniconda3/envs/test2/lib/python3.9/site-packages/pyarrow/../../../libarrow.so.600
#6  0x00007fffeebbac0e in arrow::ipc::DecompressBuffer(std::shared_ptr<arrow::Buffer> const&, arrow::ipc::IpcReadOptions const&, arrow::util::Codec*) ()
   from /home/joris/miniconda3/envs/test2/lib/python3.9/site-packages/pyarrow/../../../libarrow.so.600
#7  0x00007fffeebc3737 in arrow::internal::FnOnce<void ()>::FnImpl<std::_Bind<arrow::detail::ContinueFuture (arrow::Future<arrow::internal::Empty>, arrow::ipc::DecompressBuffers(arrow::Compression::type, arrow::ipc::IpcReadOptions const&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >*)::{lambda(int)#1}, int)> >::invoke() ()
   from /home/joris/miniconda3/envs/test2/lib/python3.9/site-packages/pyarrow/../../../libarrow.so.600
#8  0x00007fffee10ba17 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::{lambda()#1}> > >::_M_run() ()
   from /home/joris/miniconda3/envs/test2/lib/python3.9/site-packages/pyarrow/../../../libarrow.so.600
#9  0x00007ffff00be9d4 in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#10 0x00007ffff7f95609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007ffff7d57293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

Indicating that Arrow is somehow using libzstd that is located relative to the location of fiona. Although that should point to the same libzstd than otherwise would be used by Arrow I suppose.

But additional notes:

  • If I run the version without importing fiona under gdb and set a breakpoint in the ZST_decompress code, it seems Arrow is normally using the ZSTD decompression methods all from libzstd, while when fiona was imported first, it actually uses the lower level decompression functions from libblosc (as shown in the output above).
  • This only started failing relatively recently (a build on the PR where we discovered this from a month ago was working). One of the differences I spotted between the working and failing env is that blosc is now included in the failing env. And it seems that blosc was added relatively recently as a dependency to GDAL: Add (lib)blosc as a dependency #568
@gillins
Copy link
Contributor

gillins commented Jan 1, 2022

So both libblosc and libzstd export the same function names? Is this a known problem - I couldn't find anything about it. This is bad as it will be a bit random which ones get picked up (as you've demonstrated). Seems this will be a problem for any software that links to both libraries...

@gillins
Copy link
Contributor

gillins commented Jan 1, 2022

Maybe libblosc should be built with -DPREFER_EXTERNAL_ZSTD=ON (see https://github.com/Blosc/c-blosc#codec-support-with-cmake) so there is only one copy of these function in use? libblosc would need to depend on zstd (and maybe libz?) for this to work.

@gillins
Copy link
Contributor

gillins commented Jan 4, 2022

@jorisvandenbossche are you going to submit a PR to https://github.com/conda-forge/blosc-feedstock for this? I see they already have an open issue about this since 2018 (conda-forge/blosc-feedstock#18)....

@jorisvandenbossche
Copy link
Member Author

@gillins sorry for not following up here.
It seems that conda-forge/blosc-feedstock#18 (conda-forge/blosc-feedstock#53) has been tackled now, and blosc is no longer is including zstd but rather depending on the zstd conda-forge package.
So I assume that this issue should therefore also be solved.

@jorisvandenbossche
Copy link
Member Author

In any case my reproducible example above is now working, so closing this.

@gillins
Copy link
Contributor

gillins commented May 10, 2022

Thanks @jorisvandenbossche !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants