You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Longest suffix library should be copied, all others symlinked
libs.sort(key=lambdas: -len(s))
print(libs, libs[0])
lib_filename=os.path.basename(libs[0])
shutil.move(pjoin(build_prefix, lib_filename),
pjoin(build_lib, 'pyarrow', lib_filename))
forlibinlibs[1:]:
filename=os.path.basename(lib)
link_name=pjoin(build_lib, 'pyarrow', filename)
ifnotos.path.exists(link_name):
os.symlink(lib_filename, link_name)
This is probably not going into the wheel as wheels are zip-files and they don't support symlinks by default. So we probably need to pass the --symlinks parameter to the wheel code.
Michael Marino:
Note that this is critical for applications using pyarrow in environments where space is limited. This is the case with e.g. ([AWS Lambda|[https://docs.aws.amazon.com/lambda/latest/dg/limits.html]]), where the package size of all modules needs to be below 250 MB. With this bug, it is impossible to deploy pyarrow together with e.g. pandas/numpy, making it impossible to load parquet files in lambda functions.
Wes McKinney / @wesm:
You are welcome to submit pull requests to help.
My organization (Ursa Labs) for example has no sponsors who use Lambda and we don't use Lambda ourselves so it's hard for us to volunteer our time to do packaging engineering on something we don't directly need. So we may work on this eventually but it's hard to say when
Michael Marino:
Hi Wes, thanks for the response. Indeed, I understand the issue and that this isn't a critical part of the immediate timeline. We currently work around this, and so it is not yet critical for us, but, especially with AWS pushing serverless for handling data workflows, I do expect this to become an issue for us and for others sometime soon.
I personally have started looking at some possible solutions and will try to submit a PR here, but I would need some guidance as to the external requirements of the package. Given the conversation about this here, it sounds like the libraries are packaged in such a way so as to be usable by other tools (e.g. pyspark?). If this is not the case, then I would focus on trying to update how the library is loaded from within pyarrow itself to handle the case when the library is coming from within the wheel.
Libraries are copied instead of symlinking in linux and osx wheels, which result quiet big binaries:
This is what the wheel contains before running auditwheel:
After running auditwheel, the repaired wheel contains:
Here is the output of auditwheel https://travis-ci.org/kszucs/crossbow/builds/514605723#L3340
They should be symlinks, we have special code for this:
arrow/python/setup.py
Lines 489 to 499 in 4495305
--symlinks
parameter to the wheel code.Reporter: Krisztian Szucs / @kszucs
Assignee: Wes McKinney / @wesm
Related issues:
Note: This issue was originally created as ARROW-5158. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: