Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-5868: [Python] Correctly remove liblz4 shared libraries from manylinux2010 image so lz4 is statically linked #4828

Closed
wants to merge 1 commit into from

Conversation

@wesm
Copy link
Member

wesm commented Jul 9, 2019

I pushed the image to Docker Hub (it was on quay.io before). I'm not sure what's the best way to test for this -- I checked the produced wheels locally to verify that liblz4.so is no longer required

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 9, 2019

Codecov Report

Merging #4828 into master will decrease coverage by 22.25%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #4828       +/-   ##
===========================================
- Coverage   87.43%   65.18%   -22.26%     
===========================================
  Files         997      487      -510     
  Lines      139804    64296    -75508     
  Branches     1418        0     -1418     
===========================================
- Hits       122241    41910    -80331     
- Misses      17201    22386     +5185     
+ Partials      362        0      -362
Impacted Files Coverage Δ
cpp/src/arrow/util/memory.h 0% <0%> (-100%) ⬇️
cpp/src/gandiva/date_utils.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/util/memory.cc 0% <0%> (-100%) ⬇️
cpp/src/arrow/filesystem/util-internal.cc 0% <0%> (-100%) ⬇️
cpp/src/arrow/util/sse-util.h 0% <0%> (-100%) ⬇️
cpp/src/gandiva/decimal_type_util.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/compute/logical_type.h 0% <0%> (-100%) ⬇️
cpp/src/parquet/hasher.h 0% <0%> (-100%) ⬇️
cpp/src/gandiva/basic_decimal_scalar.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/compute/kernels/boolean.cc 0% <0%> (-100%) ⬇️
... and 748 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90affbd...1fb1acb. Read the comment docs.

@pitrou

This comment has been minimized.

Copy link
Contributor

pitrou commented Jul 9, 2019

Hmm... auditwheel didn't complain? Perhaps worth reporting as a bug?

@kszucs

This comment has been minimized.

Copy link
Member

kszucs commented Jul 9, 2019

We can add more docker images to test the produced wheels. Have you reproduced the issue?

@kszucs

This comment has been minimized.

Copy link
Member

kszucs commented Jul 9, 2019

I cannot really find a linux distribution without lz4 preinstalled.

@kszucs

This comment has been minimized.

Copy link
Member

kszucs commented Jul 9, 2019

So we statically link liblz4 in the manylinux1 wheels

# ldd pyarrow-manylinux1/libarrow.so.14 | grep z
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc28cef4000)

but dynamically in the manylinux2010 wheels

# ldd pyarrow-manylinux2010/libarrow.so.14 | grep z
        liblz4.so.1 => not found  (already deleted to reproduce the issue)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f56f7440000)

this what this PR resolves.

What I'm finding strange, that auditwheel seems to bundle libz for manylinux1:

# ls -lah pyarrow-manylinux1/*z*so.*
-rwxr-xr-x 1 root root 115K Jun 29 00:14 pyarrow-manylinux1/libz-7f57503f.so.1.2.11

while ldd still uses the system libz:

# ldd pyarrow-manylinux1/libarrow.so.14 | grep z
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f91fcf3f000)

For manylinux2010 we also have liblz4:

#  ls -lah pyarrow-manylinux2010/*z*so.*
-rwxr-xr-x 1 root root 191K Jun 28 23:38 pyarrow-manylinux2010/liblz4-8cb8bdde.so.1.8.3
-rwxr-xr-x 1 root root 115K Jun 28 23:38 pyarrow-manylinux2010/libz-c69b9943.so.1.2.11

and ldd similarly tries to load the system libs:

# ldd pyarrow-manylinux2010/libarrow.so.14 | grep z
        liblz4.so.1 => not found
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fd72764e000)

Inspecting manylinux1 with LD_DEBUG=files,libs ldd libarrow.so.14 it seems like to search the right path, but cannot find the hashed version of libz libz-7f57503f.so.1.2.11

       463:     file=libz.so.1 [0];  needed by ./libarrow.so.14 [0]
       463:     find library=libz.so.1 [0]; searching
       463:      search path=/tmp/pyarrow-manylinux1/.          (RPATH from file ./libarrow.so.14)
       463:       trying file=/tmp/pyarrow-manylinux1/./libz.so.1
       463:      search cache=/etc/ld.so.cache
       463:       trying file=/lib/x86_64-linux-gnu/libz.so.1

There is no libz.so.1 just libz-7f57503f.so.1.2.11.

Similarly for manylinux2010 and libz:

       470:     file=libz.so.1 [0];  needed by ./libarrow.so.14 [0]
       470:     find library=libz.so.1 [0]; searching
       470:      search path=/tmp/pyarrow-manylinux2010/.               (RPATH from file ./libarrow.so.14)
       470:       trying file=/tmp/pyarrow-manylinux2010/./libz.so.1
       470:      search cache=/etc/ld.so.cache
       470:       trying file=/lib/x86_64-linux-gnu/libz.so.1

for liblz4 (again, I've deleted the system one):

       470:     file=liblz4.so.1 [0];  needed by ./libarrow.so.14 [0]
       470:     find library=liblz4.so.1 [0]; searching
       470:      search path=/tmp/pyarrow-manylinux2010/.               (RPATH from file ./libarrow.so.14)
       470:       trying file=/tmp/pyarrow-manylinux2010/./liblz4.so.1
       470:      search cache=/etc/ld.so.cache
       470:      search path=/lib/x86_64-linux-gnu/tls/x86_64:/lib/x86_64-linux-gnu/tls:/lib/x86_64-linux-gnu/x86_64:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/tls/x86_64:/usr/lib/x86_64-linux-gnu/tls:/usr/lib/x86_64-linux-gnu/x86_6$
:/usr/lib/x86_64-linux-gnu:/lib/tls/x86_64:/lib/tls:/lib/x86_64:/lib:/usr/lib/tls/x86_64:/usr/lib/tls:/usr/lib/x86_64:/usr/lib          (system search path)

There are no libz.so.1 nor liblz4.so.1, just libz-c69b9943.so.1.2.11 and liblz4-8cb8bdde.so.1.8.3

According to https://www.python.org/dev/peps/pep-0571/ liblz4 nor libz are part of the whitelist, and while these are bundled with the wheel, seemingly cannot be found - perhaps because of the hash in the library name?

I've tried to inspect the wheels with auditwheel show with version 2 and 1.10, both says the following:

# auditwheel show pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl

pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl is consistent with
the following platform tag: "linux_x86_64".

The wheel references external versioned symbols in these system-
provided shared libraries: libgcc_s.so.1 with versions {'GCC_3.3',
'GCC_3.4', 'GCC_3.0'}, libpthread.so.0 with versions {'GLIBC_2.3.3',
'GLIBC_2.12', 'GLIBC_2.2.5', 'GLIBC_2.3.2'}, libc.so.6 with versions
{'GLIBC_2.4', 'GLIBC_2.6', 'GLIBC_2.2.5', 'GLIBC_2.7', 'GLIBC_2.3.4',
'GLIBC_2.3.2', 'GLIBC_2.3'}, libstdc++.so.6 with versions
{'CXXABI_1.3', 'GLIBCXX_3.4.10', 'GLIBCXX_3.4.9', 'GLIBCXX_3.4.11',
'GLIBCXX_3.4.5', 'GLIBCXX_3.4', 'CXXABI_1.3.2', 'CXXABI_1.3.3'},
librt.so.1 with versions {'GLIBC_2.2.5'}, libm.so.6 with versions
{'GLIBC_2.2.5'}, libdl.so.2 with versions {'GLIBC_2.2.5'}, libz.so.1
with versions {'ZLIB_1.2.0'}

This constrains the platform tag to "manylinux2010_x86_64". In order
to achieve a more compatible tag, you would need to recompile a new
wheel from source on a system with earlier versions of these
libraries, such as a recent manylinux image.
# auditwheel show pyarrow-0.14.0-cp37-cp37m-manylinux1_x86_64.whl

pyarrow-0.14.0-cp37-cp37m-manylinux1_x86_64.whl is consistent with the
following platform tag: "linux_x86_64".

The wheel references external versioned symbols in these system-
provided shared libraries: libgcc_s.so.1 with versions {'GCC_3.4',
'GCC_3.0', 'GCC_3.3'}, libc.so.6 with versions {'GLIBC_2.3',
'GLIBC_2.2.5', 'GLIBC_2.3.4', 'GLIBC_2.4', 'GLIBC_2.3.2'},
libstdc++.so.6 with versions {'CXXABI_1.3', 'GLIBCXX_3.4.5',
'GLIBCXX_3.4'}, librt.so.1 with versions {'GLIBC_2.2.5'}, libm.so.6
with versions {'GLIBC_2.2.5'}, libpthread.so.0 with versions
{'GLIBC_2.3.3', 'GLIBC_2.3.2', 'GLIBC_2.2.5'}, libdl.so.2 with
versions {'GLIBC_2.2.5'}, libz.so.1 with versions {'ZLIB_1.2.0'}

The following external shared libraries are required by the wheel:
{
    "libc.so.6": "/lib/x86_64-linux-gnu/libc-2.24.so",
    "libcrypt.so.1": "/lib/x86_64-linux-gnu/libcrypt-2.24.so",
    "libdl.so.2": "/lib/x86_64-linux-gnu/libdl-2.24.so",
    "libgcc_s.so.1": "/lib/x86_64-linux-gnu/libgcc_s.so.1",
    "libm.so.6": "/lib/x86_64-linux-gnu/libm-2.24.so",
    "libnsl.so.1": "/lib/x86_64-linux-gnu/libnsl-2.24.so",
    "libpthread.so.0": "/lib/x86_64-linux-gnu/libpthread-2.24.so",
    "librt.so.1": "/lib/x86_64-linux-gnu/librt-2.24.so",
    "libstdc++.so.6": "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22",
    "libutil.so.1": "/lib/x86_64-linux-gnu/libutil-2.24.so",
    "libz.so.1": "/lib/x86_64-linux-gnu/libz.so.1.2.8"
}

In order to achieve the tag platform tag "manylinux2010_x86_64" the
following shared library dependencies will need to be eliminated:

libz.so.1

In order to achieve the tag platform tag "manylinux1_x86_64" the
following shared library dependencies will need to be eliminated:

libz.so.1

I think there are more todo left with the wheels. IMO the manylinux1 wheels are not compliant because of libz and the manylinux2010 wheels are not compliant because of both libz and liblz4 (but incorrectly reported by auditwheel?).

@kszucs

This comment has been minimized.

Copy link
Member

kszucs commented Jul 9, 2019

Perhaps we should use setup.py::move_shared_libs for libz on linux too? cc @xhochy

@wesm

This comment has been minimized.

Copy link
Member Author

wesm commented Jul 9, 2019

Can you open a new JIRA about the libz issue with the information from this issue and we can investigate separately? I'm going to merge this for now.

+1

@wesm

This comment has been minimized.

Copy link
Member Author

wesm commented Jul 9, 2019

Do we know why auditwheel did not raise either the liblz4 or libz issue before?

@kszucs

This comment has been minimized.

Copy link
Member

kszucs commented Jul 9, 2019

Sure, opening it.

@wesm wesm closed this in 7838886 Jul 9, 2019
@kszucs

This comment has been minimized.

Copy link
Member

kszucs commented Jul 9, 2019

No idea, we only run auditwheel repair though.

@wesm

This comment has been minimized.

Copy link
Member Author

wesm commented Jul 9, 2019

I see. We should do something about that then

@wesm wesm deleted the wesm:ARROW-5868 branch Jul 9, 2019
@kszucs

This comment has been minimized.

Copy link
Member

kszucs commented Jul 9, 2019

In case of manylinux2010 the report seems faulty, but adding it to the issue.

@xhochy

This comment has been minimized.

Copy link
Member

xhochy commented Jul 9, 2019

auditwheel repair should package libz.so into the wheel. Only after the repair it should be consistent.

wesm added a commit that referenced this pull request Jul 13, 2019
…nylinux2010 image so lz4 is statically linked

I pushed the image to Docker Hub (it was on quay.io before). I'm not sure what's the best way to test for this -- I checked the produced wheels locally to verify that liblz4.so is no longer required

Author: Wes McKinney <wesm+git@apache.org>

Closes #4828 from wesm/ARROW-5868 and squashes the following commits:

1fb1acb <Wes McKinney> Remove liblz4 shared libraries from /usr/local so static linking occurs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.