New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manylinux wheels #398

Closed
perrygeo opened this Issue Oct 15, 2016 · 11 comments

Comments

Projects
None yet
3 participants
@perrygeo

perrygeo commented Oct 15, 2016

With manylinux1 wheels now becoming commonplace for modules with compiled dependencies, we should be able to use them directly rather than duplicate the packaging effort to get them into lambda-packages. Since manylinux packages are built on an older version of CentOS, the ...cp27-cp27mu-manylinux1_x86_64.whl wheels are fully compatible with Lambda.

What would it entail?

Seems like the same technique of wiping the local version from the tarball and unzipping the wheel into the tarball would work.

The challenging part would be discovery, not sure how to determine if there is a compatible manylinux wheel available. Maybe we could start by having a wheels directory populated with wheels that user must download manually?

Thoughts?

@Miserlou

This comment has been minimized.

Owner

Miserlou commented Oct 15, 2016

This is an interesting idea, I didn't know about this effort.

There are some other caveats:

  • If building from OSX, linux packages would have to be downloaded during the package creation process. Is there an interface for that?
  • Many of the lambda-packages are specifically stripped to keep their file size down, as this is a real limitation on lambda.

Is there a list of manylinux wheels?

@perrygeo

This comment has been minimized.

perrygeo commented Oct 15, 2016

I just posted a gist, venv_bundler.py as a proof of concept. It puts the responsibility for discovery and download of the wheels onto the user, allowing you to hand-curate the manylinux wheels you want to override your local packages. Of course, the discovery and download part is the most challenging...

If building from OSX, linux packages would have to be downloaded during the package creation process. Is there an interface for that?

Not that I know of. pip downloads them but only if you're on the appropriate platform; there's nothing like a --pretend-to-be-64bit-linux option.

lambda-packages are specifically stripped to keep their file size down

Wheels should be only the python module with compiled objects bundled with any non-standard shared libs. Just the bare minimum to run the module without system dependencies. That being said, some of those shared libs can be hefty.

Is there a list of manylinux wheels?

Most of the major scipy stack players (numpy, scipy, scikit-learn, scikit-image, matplotlib, etc) are publishing manylinux wheels but I haven't seen a comprehensive list anywhere.

@perrygeo

This comment has been minimized.

perrygeo commented Oct 16, 2016

I did a bit pf pypi scraping and out of ~90k packages, about 106 publish manylinux wheels and 79 publish manylinux wheels that will work on AWS Lambda (Python 2.7 64 bit).

The full list with links to the appropriate .whl file is here: manylinux_py27mu_64bit_wheels.json

@Miserlou

This comment has been minimized.

Owner

Miserlou commented Oct 16, 2016

Did you actually have to scrape all of PyPi to discover that though?

I do like this idea, but I can't see it being practical if it requires
scraping all of PyPi every time, and building a custom PyPi client just to
download manylinux packages on OSX. Maybe having a custom wheels directory
for package installation could be built as a hook into the package creation
process, then provided as a snippet in the documentation?

On Sun, Oct 16, 2016 at 9:05 AM, Matthew Perry notifications@github.com
wrote:

I did a bit pf pypi scraping and out of ~90k packages, about 106 publish
manylinux wheels and 79 publish manylinux wheels that will work on AWS
Lambda (Python 2.7 64 bit).

The full list with links to the appropriate .whl file is here:
manylinux_py27mu_64bit_wheels.json
https://gist.github.com/perrygeo/9545f94eaddec18a65fd7b56880adbae


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#398 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIi0_MSuG5r4iX3xQTIsQFh_BOcog11ks5q0iE0gaJpZM4KXrZ8
.

@perrygeo

This comment has been minimized.

perrygeo commented Oct 17, 2016

Did you actually have to scrape all of PyPi to discover that though?

Not technically scraping but in order to get a list of manylinux packages, the only way I found was to get a full package list and iterate over each package's json endpoint to see if manylinux wheels were available. So to check an individual package you could do something like:

import requests

def manylinux_wheel(package):
    url = 'https://pypi.python.org/pypi/{}/json'.format(package)
    res = requests.get(url)
    data = res.json()
    version = data['info']['version']
    for f in data['releases'][version]:
        if f['filename'].endswith('cp27mu-manylinux1_x86_64.whl'):
            return f['url']

So option 1 is to check pypi in real time using this json endpoint. For each package? Not sure.

Option 2 might be to just keep a curated list of manylinux wheels for popular packages that the user could add/modify as needed.

I haven't fully though this one through but Option 3 might be to use the manylinux wheels to create lambda-packages - they are both just archives of python packages so it should be possible to convert a manylinux wheel to a lambda packge. This would at least ease the maintenance burden on lambda packages without modifying the way Zappa worked.

@Miserlou

This comment has been minimized.

Owner

Miserlou commented Oct 18, 2016

Checking the API for each package isn't so bad IMO. It'd be nicer if the API would give us a way to do this all as one call, or we could do it in parallel on the client. It could also be a configurable settings in the zappa_settings if it takes too long and it's for packages that don't have wheels.

Keeping a curated list seems like the wrong approach to me. Too much work.

Option 3 is interesting but I'd table that one for right now. I have some other ideas for how lambda-packages should be build reproducible and signed in a way that works beyond Zappa.

I'd say we still prioritize lambda-packages over wheels, but the general process for building the package would be:

Package all the local env
Scan installed packages for manylinux wheels
Install them into package if available
Scan installed packages for lambda-package replacements
Install them into the package

Make sense?

I wish there were more wheels, that list is pretty sparse, all things considered. Still, worth it for the sci packages and pymongo alone.

@Miserlou

This comment has been minimized.

Owner

Miserlou commented Oct 27, 2016

Just noticed that pandas is on the list here, which has a been target for lambda-packages for a while now. Met a guy in Stockholm who said that that was a big blocker for his company using Zappa, so I think that this will be a big get if we can get this built properly. It'll probably be good incentives for wheel advocates as well.

@Miserlou

This comment has been minimized.

Owner

Miserlou commented Nov 15, 2016

Just pushed an initial attempt at adding this feature. I think it works but needs some confirmation.

@Miserlou

This comment has been minimized.

Owner

Miserlou commented Nov 17, 2016

Available in 0.29.0! Blog post soon

@Miserlou Miserlou closed this Nov 17, 2016

@Miserlou

This comment has been minimized.

Owner

Miserlou commented Nov 17, 2016

@amolnaik

This comment has been minimized.

amolnaik commented May 30, 2018

Pandas still throws the error with the version 0.45! Looking for fix because Zappa is simply great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment