Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use this library in AWS Lambda due to package size exceeded max limit #1200

Open
nemalipuri opened this issue Dec 26, 2019 · 30 comments · May be fixed by #4222
Open

Unable to use this library in AWS Lambda due to package size exceeded max limit #1200

nemalipuri opened this issue Dec 26, 2019 · 30 comments · May be fixed by #4222

Comments

@nemalipuri
Copy link

Please fill out the form below.

System Information

  • AWS Lambda:
  • Python v3.6:
  • Sagemaker Python SDK 1.49.0:

Describe the problem

I'm trying to use Sagemaker Python SDK in Lambda to trigger train and deploy steps. Packaged the dependencies along with function code and when trying to create Lambda function it is throwing error 'Unzipped size must be smaller than 262144000 bytes'

Sorry, though this issue is related to Lambda service limit I want to check is there anyway I can reduce the size of the dependencies?

I have tried removing boto3 and botocare from function zip file since Lambda provides these libraries but it lead to different issue 'expecting python-dateutil<2.8.1,>=2.1'

Minimal repro / logs

AWS Lambda error 'Unzipped size must be smaller than 262144000 bytes'

  • Exact command to reproduce:
    mkdir python
    cd python
    pip install sagemaker --target .
    chmod 777 python
    zip python directory
    Upload Zip file to S3
    Error when creating AWS Layer 'Failed to create layer version: Unzipped size must be smaller than 262144000 bytes'

Similarly, instead of Laye when packaged code with dependencies and uploading the zip file into Lambda function I received error 'Unzipped size must be smaller than 262144000 bytes'

Appreciate your help.

@knakad
Copy link
Contributor

knakad commented Dec 27, 2019

Hi @nemalipuri !

Unfortunately, running sagemaker-python-sdk in AWS Lambda is not currently supported. This is a pain point that we're aware of and for which we are working on prioritizing a solution.

I would normally recommend pinning python-dateutil to 2.8.0 to resolve the conflict, but I actually experimented locally and found that, even without boto3, the zip (55MB) is still over the 50MB zipped limit for Lambda.

An alternative is to remove numpy and scipy dependencies entirely for specific sagemaker installations, as they account for ~73% of the installation size.
In order for me to gauge the solution's viability, can you tell me if you will need numpy/scipy functionality when running sagemaker-python-sdk in AWS Lambda?
Similarly, what are your sagemaker-python-sdk AWS Lambda use-cases?

Thanks!

@nemalipuri
Copy link
Author

@knakad Thanks for looking into this.

Almost a year back I've used Sagemaker Python SDK in Lambda without any issues, the version it was 1.18.0 and size of the package was smaller.

Another use-case came up now and when I trie to pull latest package the size is larger than unzipped limit(260MB). Use-case is build a ML model with custom container and implement Lambda functions for creating training job and endpoint creation. StepFunctions will invoke these Lambda services at scheduled times to automate workflow.

I am not using scipy in my client code.
Even in sagemaker-python-sdk library I see scipy used at one place only(src/sagemaker/amazon/common.py).

I did try without boto3, botocare and scipy, but Lambda failed with error 'No module named 'numpy.core._multiarray_umath'.
Steps I executed:
Cloned sagemaker-python-sdk repo v1.49.0
Removed "scipy>=0.19.0" in setup.py
pip install into a directory (ex. pip install . -t ./python -c ../requirements.txt)
Zipped and uploaded into S3
Created a Layer and attached this layer to Lambda
'import sagemaker' failed with No module named 'numpy.core._multiarray_umath'.

If you could provide some workaround it would be great otherwise plan sdk(via boto3) is the only option I would have to implement sagemaker apis in Lambda.

Thank you.

@knakad
Copy link
Contributor

knakad commented Dec 30, 2019

Until sagemaker-python-sdk is officially supported in AWS Lambda, here's a workaround that removes a bit of bloat from the installation, allowing it to fit in lambda without sacrificing any functionality:

pip install sagemaker --target sagemaker-installation
cd sagemaker-installation
find . -type d -name "tests" -exec rm -rfv {} +
find . -type d -name "__pycache__" -exec rm -rfv {} +
zip -r sagemaker_lambda_light.zip .

I was able to upload the following zip along with a simple handler that called import sagemaker and some very basic validation.

This solution also doesn't require you to fork any of the code, so you can more easily run the latest sagemaker-python-sdk with the latest features/bug fixes.

Please try it out and let me know if you run into any issues =)

@nemalipuri
Copy link
Author

Perfect, it worked after executing the above steps. Thank you so much!

@knakad
Copy link
Contributor

knakad commented Dec 31, 2019

Anytime! Leaving this issue open to track the workaround and the feature request.

@ikopas3
Copy link

ikopas3 commented Mar 30, 2020

@knakad This looks like a great solution and I'd like to implement it. I followed the steps you listed out, created a layer and attached it to my Lambda function, but I still get the error when I try and import sagemaker package in my lambda function:

"errorMessage": "Unable to import module 'lambda_function': No module named 'sagemaker'"

Any idea what could be causing the issue? I don't get any hints in the logs in CloudWatch and it just looks like the function is not able to find the sagemaker package from the attached layer.

Thanks for your help on this, in advance.

@sudeshgit
Copy link

Is there any date decided for the support in AWS Lambda for sagemaker-python-sdk ?

@ikopas3
Copy link

ikopas3 commented Apr 3, 2020

Is there any date decided for the support in AWS Lambda for sagemaker-python-sdk ?

After facing the issue myself, I read through the documentation and found the requirement on the path within the zip file that must be followed. There are two options: python or python/lib/python3.8/site-packages. I installed the sagemaker package into a python folder, delete tests and __pycache__ folders, then zipped it up, loaded it to S3 and created a layer. After that, import sagemaker from the lambda function with the layer attached worked for me.

Documentation for ease of reference: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html

@joelachance
Copy link

+1 to all of this! Looking forward to using SageMaker in Lambda once this is resolved.

@ksachdeva11
Copy link

ksachdeva11 commented Sep 9, 2020

Until sagemaker-python-sdk is officially supported in AWS Lambda, here's a workaround that removes a bit of bloat from the installation, allowing it to fit in lambda without sacrificing any functionality:

pip install sagemaker --target sagemaker-installation
cd sagemaker-installation
find . -type d -name "tests" -exec rm -rfv {} +
find . -type d -name "__pycache__" -exec rm -rfv {} +
zip -r sagemaker_lambda_light.zip .

I was able to upload the following zip along with a simple handler that called import sagemaker and some very basic validation.

This solution also doesn't require you to fork any of the code, so you can more easily run the latest sagemaker-python-sdk with the latest features/bug fixes.

Please try it out and let me know if you run into any issues =)

@knakad do we need to manually zip sagemaker installation along with handler.py and upload it manually to s3? also, how will the lambda function pick up the new zip file? It would be helpful if you could list down the steps to do this. Thanks!

@arne-munch-ellingsen
Copy link

arne-munch-ellingsen commented Oct 23, 2020

In order to create a valid sagemaker SDK layer it is important to create the layer using an AWS compatible numpy version (since some numpy packages are binary). Here is a slightly updated version of the above that has proved to work for me:

mkdir sagemaker-layer
cd sagemaker-layer
mkdir python
# Install the sagemaker modules in the python folder
pip install sagemaker --target ./python
# Remove tests and cache stuff (to reduce size)
find ./python -type d -name "tests" -exec rm -rfv {} +
find ./python -type d -name "__pycache__" -exec rm -rfv {} +

# Remove the python/numpy* folders since it will contain a numpy version for your host machine
rm -rf python/numpy*

# Download an AWS Linux compatible numpy package
# Navigate to https://pypi.org/project/numpy/#files.
# Search for and download newest *manylinux1_x86_64.whl package for your Python version (I have Python 3.7)
curl "https://files.pythonhosted.org/packages/9b/04/c3846024ddc7514cde17087f62f0502abf85c53e8f69f6312c70db6d144e/numpy-1.19.2-cp37-cp37m-manylinux2010_x86_64.whl" -o "numpy-1.19.2-cp36-cp36m-manylinux1_x86_64.whl"
unzip numpy-1.19.2-cp37-cp37m-manylinux2010_x86_64.whl -d python

zip -r sagemaker_lambda.zip .

# When zip file is ready, upload it to S3
aws s3 cp sagemaker_lambda.zip s3://ai4iot-lambda/sagemaker_lambda_light.zip

# When upload is complete, goto Lambda layers to create a layer from the uploaded zip file.

@shlomi-schwartz
Copy link

@arne-munch-ellingsen Thank you for the lead. I tried your code but when testing the lambda function I got the following error:
Response: { "errorMessage": "Unable to import module 'lambda_function': cannot import name '_ccallback_c' from 'scipy._lib' (/opt/python/scipy/_lib/__init__.py)", "errorType": "Runtime.ImportModuleError" }

My local machine (where I ran your code) is Mac OS, any idea what am I missing?

@arne-munch-ellingsen
Copy link

@shlomi-schwartz Are you trying to import scipy in your Lambda function? If that is the case you will have to add scipy to your layer as well using the same "trick" that I used to add the AWS Lambda Python 3.7 specific numpy library. The Sagemaker SDK does not include scipy.

@shlomi-schwartz
Copy link

@arne-munch-ellingsen Thanks for the tip, I was not calling scipy, it was one of the dependencies for sagemaker==1.71.1, but I used your trick and downloaded the .whl file, it works now!

Thanks again 👍

@calvinfeng
Copy link

This worked for me, but I am looking forward to the actual support for SageMaker SDK in Lambda.

mkdir lambda_deployment
cd lambda_deployment
touch lambda_function.py

Write the logic in the lambda_function.py file.

pip install sagemaker --target sagemaker-installation
cd sagemaker-installation
find . -type d -name "tests" -exec rm -rfv {} +
find . -type d -name "__pycache__" -exec rm -rfv {} +
zip -r ../lambda-deployment.zip .
cd ..
zip -g lambda-deployment.zip lambda_function.py

Then upload lambda-deployment.zip to Lambda

@mhobby
Copy link

mhobby commented Feb 27, 2021

further to @arne-munch-ellingsen's post, you can skip the download of the numpy whl and use the AWSLambda-Python37-SciPy1x layer provided by AWS (arn:aws:lambda:eu-west-2:142628438157:layer:AWSLambda-Python37-SciPy1x:35) instead

@AlvaroCavalcante
Copy link

I'm trying to follow this tutorial about scheduling data wrangler processing jobs. I created my lambda function uploading the zip file that was created following these commands:

pip install sagemaker --target sagemaker-installation
cd sagemaker-installation
find . -type d -name "tests" -exec rm -rfv {} +
find . -type d -name "__pycache__" -exec rm -rfv {} +
zip -r ../lambda-deployment.zip .
cd ..
zip -g lambda-deployment.zip lambda_function.py export.flow

The zip file has around 35MB. Then, when I try to add the Scipy layer to the lambda I got the following error:

"Function code combined with layers exceeds the maximum allowed size of 262144000 bytes. The actual size is 263682135 bytes."

Does anyone know how to deal with this? Could I somehow reduce even more the sagemaker size?

@camilaagw
Copy link

camilaagw commented Jan 10, 2023

Tried many approaches and nothing worked for me. It turns out it had to do with my local machine not using a Linux operating system (I have a macOS Catalina). Followed the instructions here for the installation of numpy and it worked like a charm 😄 (credits to Shandy Roque) :

cd <directory-containing-lambda_function.py>

# Install sagemaker
pip install sagemaker --target sagemaker-installation

# Remove numpy since it will contain incompatible binary files (when installed in a OS different from Linux)
rm -rf sagemaker-installation/numpy*

# Download an AWS Linux compatible numpy package
pip install numpy \
    --platform manylinux2014_x86_64 \
    --target=sagemaker-installation \
    --implementation cp \
    --python 3.8 \
    --only-binary=:all: --upgrade 
 
# Remove unnecessary files
cd sagemaker-installation
find . -type d -name "tests" -exec rm -rfv {} +
find . -type d -name "__pycache__" -exec rm -rfv {} +

# Zip everything together 
zip -r ../lambda-deployment.zip .
cd ..
zip -g lambda-deployment.zip lambda_function.py 

@j-adamczyk
Copy link

@knakad any news on this? Lambda functions are great for orchestrating more complicated flows, which end with SageMaker prediction. Not being able to use SDK is irritating. Heavy dependencies can be moved to package extra, like sagemaker[numpy].

@KaramRazooq
Copy link

Any updates on this? new package updates has broken the way to import sagemaker module

@PreethiJC
Copy link

As mentioned by @KaramRazooq, the above instructions needed to be updated. In a nutshell, I had to downgrade jsonschema to 4.17.3 and install linux specific pandas package. I built upon the solution given by @arne-munch-ellingsen. Here is the version that worked for me:

mkdir sagemaker-layer
cd sagemaker-layer
mkdir python

# Install the sagemaker modules in the python folder
pip install sagemaker --target ./python

# Remove the python/numpy* folders since it will contain a numpy version for your host machine
rm -rf python/numpy*

# Remove the python/pandas* folders since it will contain a pandas version for your host machine
rm -rf python/pandas*

# Downgrade jsonschema from 4.19 to 4.17.3 to avoid the rdps.rdps import error.
rm -rf python/jsonschema*
pip install jsonschema==4.17.3 --target ./python

# Download an AWS Linux compatible numpy package.
# Navigate to https://pypi.org/project/numpy/#files.
# Search for and download newest *manylinux_2_x86_64.whl package for your Python version (I have Python 3.9). I had two # .whl packages that matched this requirement. The version mentioned below worked for me.
curl "https://files.pythonhosted.org/packages/69/1f/c95b1108a9972a52d7b1b63ed8ca70466b59b8c1811bd121f1e667cc45d8/numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl" -o "numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl"
unzip numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -d python

# Download an AWS Linux compatible pandas package.
# Navigate to https://pypi.org/project/pandas/#files.
# Search for and download newest *manylinux_2_x86_64.whl package for your Python version
curl "https://files.pythonhosted.org/packages/83/f0/2765daac3c58165460b127df5c0ef7b3a039f3bfe7ea7a51f3d20b01371b/pandas-2.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl" -o "pandas-2.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl"
unzip pandas-2.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -d python

# Remove tests and cache stuff (to reduce size)
find ./python -type d -name "tests" -exec rm -rfv {} +
find ./python -type d -name "__pycache__" -exec rm -rfv {} +

zip -r sagemaker_lambda.zip python

@KaramRazooq
Copy link

@PreethiJC's method works, as long as you include the Compatible runtimes and architectures!

@trungleduc trungleduc linked a pull request Oct 23, 2023 that will close this issue
9 tasks
@trungleduc trungleduc self-assigned this Oct 23, 2023
@guimorg
Copy link

guimorg commented Jan 2, 2024

Hey, y'all! Any news on the MR? I feel that this is an important update for sagemaker-python-sdk, especially for the usage on Lambdas.

@gorj-tessella
Copy link

The sagemaker library really be something made available through https://aws.amazon.com/serverless/serverlessrepo/. All I want to do is kick off a processor.run job on from my lambda. As is I need to set up a pipeline and then call that.

@liujiaorr
Copy link
Contributor

liujiaorr commented Apr 21, 2024

Seems a fix has been merged, does the issue still exist?

@liujiaorr
Copy link
Contributor

Close this issue now, feel free to open if there is still a problem on it.

@liujiaorr liujiaorr reopened this Apr 29, 2024
@guimorg
Copy link

guimorg commented May 2, 2024

Hmm. I am executing the following script:

mkdir python
cd python
pip install sagemaker --target .
cd ..
du -sh
# 207M	.

Maybe the issue is resolved, but I feel that we can still have pandas and numpy as optional when one wants to use SageMaker's API (for my case I use it for assessing SageMaker Pipelines, for example, but I still have to bring lots of dependencies and sometimes this still makes me reach the max size of a layer).

Could we still try to optimize the size of this package, the abstractions that are maintained here are really good for handling SageMaker Pipelines.

@liujiaorr
Copy link
Contributor

liujiaorr commented May 3, 2024

Thanks for your information @guimorg, I updated the priority to the highest level to make it get a quicker attention of resolving.

@trungleduc
Copy link
Collaborator

@guimorg this PR #4222 could help make pandas optional

@rberger
Copy link

rberger commented May 22, 2024

Breaking out sagemaker.workflow (and maybe others) into their own libraries would address many use cases.
Our use case is I just want a Lambda that will orchestrate workflow pipelines But if I include just

from sagemaker.workflow.notebook_job_step import NotebookJobStep
from sagemaker.workflow.pipeline import Pipeline

I still have to install all of sagemaker and all its huge dependencies and end up with a lambda image size that is too big to deploy as a zipfile image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.