Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of installed dependencies by slimming #191

Merged
merged 27 commits into from Jun 15, 2018

Conversation

dee-me-tree-or-love
Copy link
Contributor

@dee-me-tree-or-love dee-me-tree-or-love commented May 14, 2018

Hello everyone ✌️

What I did
Added optional stripping of the non crucial package files to reduce package size.

Why
In relation to issue #66 and personal troubles when deploying a function with weighty dependencies:

Installing (numpy + scikit-image + Pillow + scipy + ...) resulted in a package of 250++ MB.
Before I found out about zip option I searched the web for techniques to make the package slimmer.
So that's what I added to the code and seen my package lose weight to 79 MB.

Changes
If in custom options in the serverless config the slim parameter is set to true:

custom:
  pythonRequirements:
    slim: true

Then a script removes the unnecessary files : removes tests in dependencies, *.dist-info folders, __pycache__ folders, removes the *.py[o|c] files and strips the *.so files off the inessential information.

That happens by appending extra commands during the execution of the pip.js:
https://github.com/dee-me-tree-or-love/serverless-python-requirements/blob/436215355f75a5fdfd463b9c1399b0a55267f21f/lib/pip.js#L167

Tests
Added tests for the slim option in test.bats.


Would appreciate any feedback and and be grateful for suggestions for alternative solutions to the problem I attempt to solve with this change. If the proposal is actually is somehow useful, would be glad to know

@dschep
Copy link
Collaborator

dschep commented May 14, 2018

Ooh. i like it. I'll check it out in more detail tomorrow, but 👍 already for adding it behind a flag

lib/pip.js Outdated
@@ -118,6 +118,12 @@ function installRequirements(
}
cmdOptions.push(dockerImage);
cmdOptions.push(...pipCmd);

// If enabled slimming, strip out the caches, tests and dist-infos
if(options.slim == true){
Copy link
Collaborator

@dwolfand dwolfand May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a triple equal please? Javascript is weird, see here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Equality_comparisons_and_sameness

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah! Thanks, changed it!

README.md Outdated
@@ -95,6 +95,15 @@ try:
except ImportError:
pass
```
### Slimmen Package
To remove the tests, information and caches from the installed packages,
enable the `slim` option. This will: 1. `strip` the `.so` files, remove `__pycache__`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the 1. if you're not actually gonna have a numbered list 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, thanks a lot! will fix it up 😄

lib/pip.js Outdated
const folderPath = dockerPathForWin(options, targetRequirementsFolder);
const stripCmd = [
`&& find ${folderPath} -name "*.so" -exec strip {} ;`,
`&& find ${folderPath} -name "*.py[c|o]" -exec rm -rf {} +`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use -exec rm instead of find's -delete option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I have never used t go for -delete - it felt natural to go with rm {} + since it's always there and as far as I know + makes it not span the redundant processes 🤔

Thanks for pointing it out - I have to consider opting for -delete!

It is available at the alpine docker image sooo, maybe I should run some experiments to check which performs better?
Do you personally prefer -delete 🙂 ?

npm i $(npm pack ../..)
! uname -sm|grep Linux || groups|grep docker || id -u|egrep '^0$' || skip "can't dockerize on linux if not root & not in docker group"
sls --dockerizePip=true --slim=true package
unzip .serverless/sls-py-req-test.zip -d puck
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌 Thanks for including tests too!!

Could you add this to also check that the unzipped services doesn't contain any pyc files?

test $(find . -name *.pyc | wc -l) -eq 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's an awesome one, thanks, didn't know of this, will do now! ✌️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I put test $(find puck -name *.pyc | wc -l) -eq 0 maybe? so it runs inside of the puck folder?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. My mistake.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dee-me-tree-or-love, @dschep Wouldn't you want to dump the .py and not .pyc?

lib/pip.js Outdated
@@ -167,7 +167,7 @@ function dockerPathForWin(options, path) {
function getSlimPackageCommands(options, targetRequirementsFolder) {
const folderPath = dockerPathForWin(options, targetRequirementsFolder);
const stripCmd = [
`&& find ${folderPath} -name "*.so" -exec strip {} ;`,
`&& find ${folderPath} -name "*.so" -exec strip {} \\;`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here noticed a bug - at some point removed the \\, which broke the whole script... Now it works

lib/pip.js Outdated
@@ -120,7 +120,7 @@ function installRequirements(
cmdOptions.push(...pipCmd);

// If enabled slimming, strip out the caches, tests and dist-infos
if (options.slim === true) {
if (options.slim) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running tests noticed that setting the slim parameter via cli as --slim=true results in it treated as string.
From this (options.slim === true) was evaluated as false every time and the commands were not executed 😅

Copy link
Collaborator

@dwolfand dwolfand May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is what you want either then because Boolean("false") will evaluate to true since strings are truthy in javascript. I think it should be options.slim === true || options.slim === "true"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the same thing as you for package individually, but David is right, I'd prefer that you use options.slim === true || options.slim === "true" and I'll update my use of the same as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah! thanks, that's a really good one, commit on the way 👍

test.bats Outdated
@@ -83,6 +83,7 @@ teardown() {
sls --dockerizePip=true --slim=true package
unzip .serverless/sls-py-req-test.zip -d puck
ls puck/flask
test $(find puck -name *.pyc | wc -l) -eq 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this advice 👍 really helped to catch a few things!

@dschep
Copy link
Collaborator

dschep commented May 16, 2018

one more thing: add yourself to the contributors section at the bottom of the read me 😄

@dee-me-tree-or-love
Copy link
Contributor Author

Made the change on options.slim and updated the readme
Wow, that's super exciting! 😄

Copy link
Collaborator

@dschep dschep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more change and we should be good to go, could you add a slim: false default value in this default options object: https://github.com/UnitedIncome/serverless-python-requirements/blob/master/index.js#L28-L52

@berlinguyinca
Copy link

when will this be in? I really like this

@berlinguyinca
Copy link

also does this work in non-docker mode? I'm sadly not able to use docker mode due to Ta-Lib being a huge pain...

@berlinguyinca
Copy link

berlinguyinca commented May 17, 2018

ok couple of hours later, how would you like to have a simple patch submitted to support the slim down support for non docker builds?

patch is more or less:

if(docker....){
....
}
 else {
    cmd = pipCmd[0];
    cmdOptions = pipCmd.slice(1);

    if (options.slim === true || options.slim === 'true') {
      serverless.cli.log("slimming it down - no docker");

      const slimCmd = getSlimPackageCommands(options, targetRequirementsFolder);
      cmdOptions.push(...slimCmd);
    }
    else{
        serverless.cli.log('no slimming specified');
    }
  }

@dee-me-tree-or-love
Copy link
Contributor Author

dee-me-tree-or-love commented May 17, 2018

For the non-docker builds I didn't plan to include the support for this originally 🤔
Indeed that would be a super cool patch I think 👍

Mainly because the commands for reducing the package size are known to work in the POSIX environments so by asking docker to run them we can be sure for it to work.
I was not sure they would work for the cmd | powershell etc users (I suppose most likely not 🙃)...

I could work to include it here, but that would increase the pull request scope -- should I or maybe submitting a separate one would be better?

@berlinguyinca
Copy link

berlinguyinca commented May 17, 2018

It would and is also highly required since otherwise, it's near impossible to deploy any large pandas libs. Can we maybe add a check to support slim mode on Linux or Docker system's only?

also, another request would be, can we specify which patterns to exclude? For example, removing all tests, breaks some python packages, which have a submodule named tests. Hyperopt would come to mind.

@dschep
Copy link
Collaborator

dschep commented May 17, 2018

Doh. I didn't notice that this was docker only. I definitely want this to work in both scenarios.

@dee-me-tree-or-love
Copy link
Contributor Author

dee-me-tree-or-love commented May 17, 2018

True, that's a very good point, thanks - I will work on this in the upcoming days then!

  • Specifying patterns to remove
  • Non Docker build support

@berlinguyinca
Copy link

berlinguyinca commented May 17, 2018 via email

// If enabled slimming, strip out the caches, tests and dist-infos
if (options.slim === true || options.slim === 'true') {
const preparedPath = dockerPathForWin(options, targetRequirementsFolder);
const slimCmd = getSlimPackageCommands(options, preparedPath);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the preparing of the strip command happens inside of this method - in case when the environment does not meet expectations (win32 systems) returns an empty array.

To be honest, I am slightly worried about this - theoretically running these commands inside of CYGWIN or Git Bash (or other bash emulators) should be just fine - but I couldn't find a way yet to check for this... would appreciate any advice! :)

`&& find ${folderPath} -name "*.so" -exec strip {} \\;`,
`&& find ${folderPath} -name "*.py[c|o]" -exec rm -rf {} +`,
`&& find ${folderPath} -type d -name "__pycache__*" -exec rm -rf {} +`,
`&& find ${folderPath} -type d -name "*.dist-info*" -exec rm -rf {} +`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe removing egg-info would also be fine within default slim options? 🤔

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dee-me-tree-or-love test folders?

 11M	./pandas/tests/io/sas/data
 11M	./pandas/tests/io/sas
2.0M	./pandas/tests/io/data/legacy_pickle
4.8M	./pandas/tests/io/data
 17M	./pandas/tests/io
1.2M	./pandas/tests/indexes
 24M	./pandas/tests

@dee-me-tree-or-love
Copy link
Contributor Author

dee-me-tree-or-love commented Jun 1, 2018

Friendly bump @dschep 😄
@berlinguyinca @dwolfand
Hey-hey, guys, I was curious if there is something I could do to improve and make the PR progress?
What do you think of it so far?

@dschep
Copy link
Collaborator

dschep commented Jun 1, 2018

Hey @dee-me-tree-or-love! Sorry, we've been busy. I'll take a look at this soon 😃

@sweepy84
Copy link

sweepy84 commented Jun 4, 2018

Great stuff @dee-me-tree-or-love ! This is an absolute must feature for me!

Looking forward to getting this released soon @dschep :)

@braco
Copy link

braco commented Jun 12, 2018

Thank you @dee-me-tree-or-love!

Copy link
Collaborator

@dschep dschep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i haven't looked at this in so long.. looks awesome!

@dschep dschep merged commit 5614ab0 into serverless:master Jun 15, 2018
@dee-me-tree-or-love
Copy link
Contributor Author

Woooow, amazing, thanks! 😄
If there will be any points to improve or follow up on - please let me know! ✌️

@bweigel
Copy link
Contributor

bweigel commented Jun 16, 2018

Kudos to you! Very nice work. 👍
I was just about to start something similar, but now I don't have to. 😄

@sweepy84
Copy link

sweepy84 commented Jun 22, 2018

Hi, tried using this but doesn't seem to be working for me, still getting tests included. (windows machine)

e.g. for pystache still seeing all the test files.

Am i missing something?

This is my serverles.yml

custom:
    pythonRequirements:
      slim: true
      slimPatterns:
        - "*test*"
        - "*.exe"
      pythonBin: python

@dee-me-tree-or-love
Copy link
Contributor Author

dee-me-tree-or-love commented Jun 22, 2018

@sweepy84 oh, indeed, the thing is - the slimPatterns remove directories that match the pattern

To specify additional directories to remove...
-- from README#custom-removal-patterns --

+ also, it works (supposed to) on windows if run with dockerizePip or if run from wsl environment :)

maybe was a bad idea to be limit to directories, but was reasoning for preventing the deletion of possible required files - with this #191 (comment) - maybe not necessarily so 🤔

let me know if this helps, otherwise also possible make a patch on it ;)

@dee-me-tree-or-love dee-me-tree-or-love changed the title Reduce size of installed dependencies by stripping Reduce size of installed dependencies by slimming Jun 26, 2018
@sweepy84
Copy link

@dee-me-tree-or-love thanks for the reply!

Even using dockerizePip it still has test folders (I thought "slim" would remove them without slimPattern),

Still getting test folders >> https://snag.gy/g5SelK.jpg

Using:

custom:
    pythonRequirements:
      slim: true
      slimPatterns:
        - "*test*"
        - "*.exe"
      pythonBin: python
      dockerizePip: true

@dee-me-tree-or-love
Copy link
Contributor Author

@sweepy84 hey - I have looked into it and replicated the problem - oupsie, it's the 'win32' check that prevents the slim commands even when dockerizePip is enabled 😅

I have tried disabling the windows check and running the commands from git bash, but that results in error: find: missing argument to '-exec' - don't have the time yet to investigate, but if I will have a moment to find a solution, I will get back :)

what I am referring to is this line here:
https://github.com/UnitedIncome/serverless-python-requirements/blob/71f36e7e938966a6758522c8f73e3994b4e5d11e/lib/slim.js#L15-L17

The easiest and safest workaround for now could be using WSL if it's an option.
The screenshot is from mine + some debug console logs

image

and the resulting pystache in the zip:

image
Not sure, but maybe it is a decent reason to raise an issue? @dschep

@dschep
Copy link
Collaborator

dschep commented Jun 27, 2018

Oh. yeah i see no reason to keep it disabled when using docker on win32. Also.. it's a bit bigger of a change... but I'd love it if the find usage was replaced by native JS code using the already included glob-all library. This would allow the slim option's test deletion to work, and only the strip usage would be limited to *nix-like systems.

@sweepy84
Copy link

Raised an issue > #212

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants