Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library is too large to deploy as a Lambda layer #2761

Closed
nabelekt opened this issue Apr 6, 2024 · 8 comments
Closed

Library is too large to deploy as a Lambda layer #2761

nabelekt opened this issue Apr 6, 2024 · 8 comments
Labels
question Further information is requested

Comments

@nabelekt
Copy link

nabelekt commented Apr 6, 2024

Do people not use awswrangler in AWS Lambda functions?
Lambda limits deployment packages, including functions and layers, to 250 MB (https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html). For me, this library when installed uses 315 MB. It is listed as an option provided by AWS, but even then I get an error:

Does anyone here know: Is there any way I can use awswrangler in my Lambda functions?

@nabelekt nabelekt added the question Further information is requested label Apr 6, 2024
@nabelekt
Copy link
Author

nabelekt commented Apr 7, 2024

FWIW, it looks like the biggest culprits are the pyarrow and pandas libraries:

root@ThinkPad-3:~/repos/cloud-data-processing/aws/lambda_dev/temp_layer_creation/python/lib/python3.12/site-packages# du * -sh | sort -hr
120M    pyarrow
95M     pandas
35M     numpy
27M     numpy.libs
25M     botocore
3.9M    awswrangler
2.8M    tzdata
2.8M    pytz
1.6M    boto3
1000K   urllib3
868K    dateutil
684K    s3transfer
416K    packaging
324K    pandas-2.2.1.dist-info
236K    botocore-1.34.79.dist-info
228K    numpy-1.26.4.dist-info
208K    pyarrow-15.0.2.dist-info
188K    jmespath
188K    __pycache__
120K    typing_extensions.py
100K    pytz-2024.1.dist-info
92K     tzdata-2024.1.dist-info
76K     awswrangler-3.7.2.dist-info
48K     boto3-1.34.79.dist-info
40K     s3transfer-0.10.1.dist-info
40K     python_dateutil-2.9.0.post0.dist-info
40K     packaging-24.0.dist-info
36K     urllib3-2.2.1.dist-info
36K     typing_extensions-4.11.0.dist-info
36K     six.py
32K     jmespath-1.0.1.dist-info
28K     six-1.16.0.dist-info
20K     bin

@jaidisido
Copy link
Contributor

jaidisido commented Apr 7, 2024

Are you adding more than one layer as that is likely the issue there?

I assume you are in us-east-1? I have added a layer to a new python 3.12 Arm64 Lambda without any issues:
image

@nabelekt
Copy link
Author

nabelekt commented Apr 8, 2024

Ah. I misunderstood the quota limit. I thought we could have up to 5 layers with each being up to 250 MB. Now I understand that the quota is 250 MB across all layers (and other packages, etc.).

But, now I am wondering why the package is larger for me. If I run

mkdir temp_layer_creation/python -p
cd temp_layer_creation/python

pip3 install \
    --platform manylinux2014_aarch64 \
    --target=./lib/python3.12/site-packages \
    --implementation cp \
    --python-version 3.12 \
    --only-binary=:all: --upgrade \
    awswrangler

rm -rf lib/python3.12/site-packages/*.dist-info

du . -sh

I get:

Collecting awswrangler
  Using cached awswrangler-3.7.2-py3-none-any.whl.metadata (23 kB)
...
Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<2.0.0,>=1.23.32->awswrangler)
  Using cached six-1.16.0-py2.py3-none-any.whl.metadata (1.8 kB)
Using cached awswrangler-3.7.2-py3-none-any.whl (380 kB)
...
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, urllib3, tzdata, typing-extensions, six, packaging, numpy, jmespath, python-dateutil, pyarrow, pandas, botocore, s3transfer, boto3, awswrangler
Successfully installed awswrangler-3.7.2 boto3-1.34.79 botocore-1.34.79 jmespath-1.0.1 numpy-1.26.4 packaging-24.0 pandas-2.2.1 pyarrow-15.0.2 python-dateutil-2.9.0.post0 pytz-2024.1 s3transfer-0.10.1 six-1.16.0 typing-extensions-4.11.0 tzdata-2024.1 urllib3-2.2.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
314M    .

Showing an unzipped package size of 314 MB. Any ideas here?

Also, can you tell me which awswrangler release version corresponds with Version '6' of the AWS-supplied Lambda layer?

@malachi-constant
Copy link
Contributor

That layer is version 3.7.1 of the library per the docs

@nabelekt
Copy link
Author

nabelekt commented Apr 8, 2024

That layer is version 3.7.1 of the library per the docs

Thanks.

If I am reading this right, at least 3.0.0-3.7.1 correspond with layer version 6? I'm not sure what that means.

It looks like 3.7.2 should be version 7?

But I only see version 6 as an option.

But then, after adding the layer, it seems I am able to change it to Edit the version to be 7.

This is my first time using layers, so I appreciate your patience.

@jaidisido
Copy link
Contributor

Every time the library is released, associated Lambda layers are generated (for each Python version, Architecture and AWS region) and their layer version incremented. If we take the layer for Python 3.12/Arm64/us-east-1 for example, it was version 1 for release 3.5.0 and then version 2 for release 3.5.1. There was no support for Python 3.12 before that.

The AWS Lambda console only surfaces the latest layer version, but as you discovered you can manually override that value to any of the existing older layer versions.

@nabelekt
Copy link
Author

nabelekt commented Apr 9, 2024

Thanks, @jaidisido. But am still trying to understand:

  1. Why is it that I measure the library size at 314 MB while the size of the layer seems to be <= 250 MB?
  2. What I was pointing out with the layer version in the console is that it is not giving the latest version as the option. It gives version 6, the latest is version 7.

@malachi-constant
Copy link
Contributor

@nabelekt

  1. We don't include all dependencies called out above in our lambda layer build, here's the script that does that.
  2. It takes some time between github release of the package and an update to the drop-down in the lambda console. This can take days or over a week depending on the downstream team's pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants