Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink lambda layer with strip command #884

Closed
mdavis-xyz opened this issue Sep 1, 2021 · 6 comments
Closed

Shrink lambda layer with strip command #884

mdavis-xyz opened this issue Sep 1, 2021 · 6 comments
Assignees
Labels
enhancement New feature or request minor release Will be addressed in the next minor release ready to release
Milestone

Comments

@mdavis-xyz
Copy link
Contributor

mdavis-xyz commented Sep 1, 2021

The lambda layer can be reduced by another 17MB by removing debugging symbols from compiled binaries.

i.e. just add something like this to the dockerfile:

find ./ -name '*.so*' -type f -exec strip "{}" \;
@mdavis-xyz mdavis-xyz added the enhancement New feature or request label Sep 1, 2021
@mdavis-xyz
Copy link
Contributor Author

You can also save another 15MB by deleting *.pyc files and __pycache__.

@jaidisido jaidisido self-assigned this Sep 1, 2021
@jaidisido
Copy link
Contributor

Thanks for the suggestion @mdavis-xyz, did you test the resulting layer in a Lambda function after running the command?

In my test, running the below:

find python -name '*.so*' -type f -exec strip "{}" \;

throws a Numpy import error when testing the output layer in Lambda:

Unable to import required dependencies:
numpy: 
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.7 from "/var/lang/bin/python3.7"
* The NumPy version is: "1.21.2"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: libgfortran-2e0d59d6.so.5.0.0: ELF load command address/offset not properly aligned

@mdavis-xyz
Copy link
Contributor Author

Hmm. No I didn't test it. I just downloaded the layer zip, unzipped it on my local machine, and ran strip.

I have seen some libraries don't like stripping of .so.*, but just .so files seems fine. I don't know why. (I don't know the difference between .so and .so.5.0.0.)

Maybe we need to play around with strip --strip-debug and strip --strip-unneeded?

How can I test this? What's the best way to build this docker image?
Will these steps test the layer build?

I tried cd building; ./build-lambda-layer.sh. That builds something. But when I add a check like python -c 'import awswrangler' to the bottom of building/lambda/build-lambda-layer.sh I get an import error. (Even before I add strip)

@jaidisido
Copy link
Contributor

Scripts in the building directory are indeed used to build Lambda layers. However, they assume you have a local Docker AWS Lambda Python image. It's also a compute heavy and time consuming process. So those scripts are mostly meant to be used by our dev team when releasing a new Wrangler version and its associated Lambda layers.

In your case, I believe the test could be as simple as:

  1. Downloading the current Wrangler layer zip file and unzipping it locally
  2. Running the strip command
  3. Zipping it again and uploading it to an S3 bucket
  4. Creating a Lambda layer from the S3 path
  5. Attaching the layer to a Lambda function and checking that you can import Wrangler without errors

@mdavis-xyz
Copy link
Contributor Author

I just tested. Yes I reproduced your error. We can fix that by stripping *.so instead of *.so*.

Removing __pycache__ and .pyc doesn't break things either. (Although it probably slows down the first run a little bit. My view is that in this scenario we should prioritise disk space over first run speed. Especially since .pyc files aren't supposed to be moved between machines.)

find ./ -name '*.so*' -type f -exec strip "{}" \;
find ./ -name '__pycache__' -type d -exec rm -rf "{}" \; 2>/dev/null
find ./ -name '*.pyc' -type f -delete

(The 2>/dev/null is because find gets confused when it can't look inside the folder it just deleted.)

Where do we add this?

@jaidisido jaidisido added this to the 2.12.0 milestone Sep 28, 2021
@jaidisido jaidisido added minor release Will be addressed in the next minor release ready to release labels Sep 28, 2021
@jaidisido
Copy link
Contributor

Now merged and will be available in the next release, thank you for the contribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor release Will be addressed in the next minor release ready to release
Projects
None yet
Development

No branches or pull requests

2 participants