Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Packaging][Wheel] Symlink libraries in wheels #21637

Closed
asfimport opened this issue Apr 10, 2019 · 4 comments
Closed

[Packaging][Wheel] Symlink libraries in wheels #21637

asfimport opened this issue Apr 10, 2019 · 4 comments

Comments

@asfimport
Copy link

asfimport commented Apr 10, 2019

Libraries are copied instead of symlinking in linux and osx wheels, which result quiet big binaries:

 
This is what the wheel contains before running auditwheel:
 

-rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so
-rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so.1.66.0
-rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so
-rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so.1.66.0
-rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so
-rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so.1.66.0
-rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so
-rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so.14
-rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so
-rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so.14
-rw-r--r--  1 root root 6.1M Apr  3 09:02 lib.cpp
-rwxr-xr-x  1 root root 2.4M Apr  3 09:02 [lib.cpython-36m-x86_64-linux-gnu.so|http://lib.cpython-36m-x86_64-linux-gnu.so/]
-rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so
-rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so.14
-rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so
-rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so.14
-rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so
-rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so.14
 

After running auditwheel, the repaired wheel contains:
 

-rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so
-rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so.1.66.0
-rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so
-rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so.1.66.0
-rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so
-rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so.1.66.0
-rwxr-xr-x  1 root root 1.6M Apr  3 09:55 libarrow_python.so
-rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so.14
-rwxr-xr-x  1 root root  12M Apr  3 09:55 libarrow.so
-rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so.14
-rw-r--r--  1 root root 6.1M Apr  3 09:02 lib.cpp
-rwxr-xr-x  1 root root 2.5M Apr  3 09:55 [lib.cpython-36m-x86_64-linux-gnu.so|http://lib.cpython-36m-x86_64-linux-gnu.so/]
-rwxr-xr-x  1 root root  59M Apr  3 09:55 libgandiva.so
-rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so.14
-rwxr-xr-x  1 root root 3.5M Apr  3 09:55 libparquet.so
-rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so.14
-rwxr-xr-x  1 root root 345K Apr  3 09:55 libplasma.so
-rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so.14

 
Here is the output of auditwheel https://travis-ci.org/kszucs/crossbow/builds/514605723#L3340

They should be symlinks, we have special code for this:

arrow/python/setup.py

Lines 489 to 499 in 4495305

# Longest suffix library should be copied, all others symlinked
libs.sort(key=lambda s: -len(s))
print(libs, libs[0])
lib_filename = os.path.basename(libs[0])
shutil.move(pjoin(build_prefix, lib_filename),
pjoin(build_lib, 'pyarrow', lib_filename))
for lib in libs[1:]:
filename = os.path.basename(lib)
link_name = pjoin(build_lib, 'pyarrow', filename)
if not os.path.exists(link_name):
os.symlink(lib_filename, link_name)
This is probably not going into the wheel as wheels are zip-files and they don't support symlinks by default. So we probably need to pass the --symlinks parameter to the wheel code.

Reporter: Krisztian Szucs / @kszucs
Assignee: Wes McKinney / @wesm

Related issues:

Note: This issue was originally created as ARROW-5158. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Michael Marino:
Note that this is critical for applications using pyarrow in environments where space is limited.  This is the case with e.g. ([AWS Lambda|[https://docs.aws.amazon.com/lambda/latest/dg/limits.html]]), where the package size of all modules needs to be below 250 MB.  With this bug, it is impossible to deploy pyarrow together with e.g. pandas/numpy, making it impossible to load parquet files in lambda functions.  

@asfimport
Copy link
Author

Wes McKinney / @wesm:
You are welcome to submit pull requests to help.

My organization (Ursa Labs) for example has no sponsors who use Lambda and we don't use Lambda ourselves so it's hard for us to volunteer our time to do packaging engineering on something we don't directly need. So we may work on this eventually but it's hard to say when

@asfimport
Copy link
Author

Michael Marino:
Hi Wes, thanks for the response.  Indeed, I understand the issue and that this isn't a critical part of the immediate timeline.  We currently work around this, and so it is not yet critical for us, but, especially with AWS pushing serverless for handling data workflows, I do expect this to become an issue for us and for others sometime soon. 

 

I personally have started looking at some possible solutions and will try to submit a PR here, but I would need some guidance as to the external requirements of the package.  Given the conversation about this here, it sounds like the libraries are packaged in such a way so as to be usable by other tools (e.g. pyspark?).  If this is not the case, then I would focus on trying to update how the library is loaded from within pyarrow itself to handle the case when the library is coming from within the wheel.  

 

 

@asfimport
Copy link
Author

Wes McKinney / @wesm:
This was resolved by ARROW-5082

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants