Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] C dependencies #42

Closed
cristianocca opened this issue Feb 26, 2016 · 14 comments
Closed

[Question] C dependencies #42

cristianocca opened this issue Feb 26, 2016 · 14 comments

Comments

@cristianocca
Copy link

Hello, first of all sorry if this is in the wrong place.

There is something that got my attetion, about how you need to use python-mysql-connector which is pure python implemented (no C dependencies).
This may not be good if you need high performance on database queries (even more considering a 1 minute timeout as max execution time), and this driver is quite slow compared to the ones with C extensions.

There's mysqlclient connector which does not depend on MySQL Binaries to be installed (but uses C code, and it is one of the fastests connectors) which of course can not be included easily due to the C dependencies. However, I have once deployed python code with C code already compiled on a EC2 machine (there's on AWS Lambda docs the exact ec2 image they use) which worked fine on Lambda, at the cost of having a bigger zip. Have you guys ever considered this as an option if you are looking to include C code?

@collingreen
Copy link
Contributor

That sounds like a good place to start looking into faster connectors - was there anything 'special' you had to do to get it working?

@mathom
Copy link
Collaborator

mathom commented Feb 26, 2016

The main problem is you have to build the c extensions for the version of Linux that is running in lambda, I believe. If you're developing on Windows or Mac (or maybe an incompatible Linux) you're probably going to have a heck of a time cross compiling the modules.

@cristianocca
Copy link
Author

This is what I did, nothing special at all.

Create a new EC2 instance with the exact image AWS Lambda uses (Sorry I can't provide it right now, but you can easily find it looking through google).

Once you are there, all I did was pip install what ever I needed > t output_folder, and that's it. Then on the zip I upload to Lambda I just include it like with any other library and lambda was able to run the C library that was included.
Of course this means you can not use this library by any means on other platforms, and if you want to use it you will probably need to install it on your local machine with a regular pip install and hope that when you import it, it is loaded from there and not from the actual project, I believe that in order to do this, the external libraries that will be included in the final zip to upload needs to be copied right when building the zip, and make sure they are not accessible to the code for local development.

@collingreen
Copy link
Contributor

Sounds easy enough - thanks for the walkthrough! Another point for multiple environments #7

@Miserlou
Copy link
Owner

There are some obvious pros and cons with this approach.

At this point, I'd say, if you need to run C-extensions (for high performance MySQL or to support Postgres) - simply develop on Linux using Vagrant or similar.

Open to other suggestions though, but depending on a running EC2 instance isn't a very elegant solution. :[

@mathom
Copy link
Collaborator

mathom commented Mar 1, 2016

Just another note about this in case someone comes searching:

I've implemented it for some C extensions we're using internally by just building wheels on our ec2 boxes and keeping them around for deployment. When I'm ready to deploy or update, a simple wheel install --force wheels/* is all I need to do to get the right binaries in my venv.

@Miserlou
Copy link
Owner

Miserlou commented Mar 2, 2016

Ah! Okay, this is really interesting. Me and @Doerge talked about this the other day.

This may be the right way to do it. I don't know enough about wheels, but if we could install the linux wheel to a .zappa-venv on a OSX system and then that'd be perfect.

A caveat: Python-MySQL isn't a wheel. :(

@mathom
Copy link
Collaborator

mathom commented Mar 2, 2016

We usually just build custom wheels for our AWS deployments like this in an ec2 box:

venv /tmp/build
/tmp/build/bin/pip wheel -w /tmp/wheels SomeLib==1.0.9

I might make a little snip in the readme so people know that workflow functions. What do you guys think about adding a wiki to github for this project?

@cristianocca
Copy link
Author

Are the wheels dynamically built into their respective binary libraries at the lambda, or before uploading? What's the difference between having a wheel file rather than the actual compiled library (which is to be used only when all the code is uploaded into AWS Lambda) ? While developing you would still need to install the library normally through pip install.

@Miserlou have you tried mysqlclient rather than Python-MySQL ? mysqlclient can be used exactly the same with no changes, and it is slightly easier to install than Python-MySQL (you don't need any mysql dependency) while having the same perfrormance, it is a fork of the Python-MySQL one if I'm not wrong.

@mathom
Copy link
Collaborator

mathom commented Mar 2, 2016

A wheel contains the compiled distribution for whatever you build - in this case it's the EC2-compatible bdist of the pip package I grab in the example.

@mathom
Copy link
Collaborator

mathom commented Mar 2, 2016

There's no mechanism for this built into Zappa. I'm just taking advantage of how it zips up the virtualenv and the fact that I can install other platform's wheels in there.

@cristianocca
Copy link
Author

Interesting, if it works it is really good, because the performance difference of pure python connectors vs C conectors is huge.
It is a shame that AWS Lambda includes boto3 installed already in the enviorment but it doesn't include other "must" libraries like db connectors, requests, numpy or other highly used libraries with C dependencies which are highly used.

@Miserlou
Copy link
Owner

Miserlou commented Mar 3, 2016

So, I was able to do a few kind of interesting things - specifically, get GCC and 'pip' working on AWS Lambda. Unfortunately, that doesn't actually get us all of the way there because most of that packages that you'd actually want to use have further dependencies on other libraries (mysql-dev, etc etc). And, as far as I can tell, there is no standard way for describing these dependencies inside of the pip package, so that's kind of the end of that road.

The other alternative I can think of is that we simply maintain and host Zappa-compatible versions of the top 50 or so dependencies. Or - barring the top 50 - just MySQL and Postgres (see: #3). We then hotswop these in as part of the package compilation.

@jasonrhaas
Copy link

Just went through the whole C dependencies issue and thought I would add some information to the comments above. The problem I was running into was that I was deploying from a MacOSX virtualenv which installed the library fine because of the pre-built wheels. But when deploying to Lambda, it attempts to build the binary on the Amazon Linux server and was failing.

I was able to work through this by creating a similar EC2 instance to what Lambda is running, installing the tools necessary for compiling the code, and then pip installing the packages in my virtualenv.

I used the "Amazon Linux 2" EC2, which is linked to in the docs here:
https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html

Here are the steps I took to resolve this:

  • Create your virtualenv
  • pip install the package that was causing the problem. It should fail. For me it was due to a error: command 'gcc' failed with exit status 1 error.
  • sudo yum groupinstall "Development Tools". This will install common build tools like gcc and make. More information here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/compile-software.html
  • sudo yum install python3-devel. This is also required for some Python C libraries.
  • Try to pip install your package in the virtualenv again.

If this all worked, you can now do zappa deploy from that virtualenv and it should work on the Lamda EC2 with the pre-compiled binaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants