New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of installed dependencies by slimming #191

Merged
merged 27 commits into from Jun 15, 2018

Conversation

Projects
None yet
7 participants
@dee-me-tree-or-love
Contributor

dee-me-tree-or-love commented May 14, 2018

Hello everyone ✌️

What I did
Added optional stripping of the non crucial package files to reduce package size.

Why
In relation to issue #66 and personal troubles when deploying a function with weighty dependencies:

Installing (numpy + scikit-image + Pillow + scipy + ...) resulted in a package of 250++ MB.
Before I found out about zip option I searched the web for techniques to make the package slimmer.
So that's what I added to the code and seen my package lose weight to 79 MB.

Changes
If in custom options in the serverless config the slim parameter is set to true:

custom:
  pythonRequirements:
    slim: true

Then a script removes the unnecessary files : removes tests in dependencies, *.dist-info folders, __pycache__ folders, removes the *.py[o|c] files and strips the *.so files off the inessential information.

That happens by appending extra commands during the execution of the pip.js:
https://github.com/dee-me-tree-or-love/serverless-python-requirements/blob/436215355f75a5fdfd463b9c1399b0a55267f21f/lib/pip.js#L167

Tests
Added tests for the slim option in test.bats.


Would appreciate any feedback and and be grateful for suggestions for alternative solutions to the problem I attempt to solve with this change. If the proposal is actually is somehow useful, would be glad to know

@dschep

This comment has been minimized.

Collaborator

dschep commented May 14, 2018

Ooh. i like it. I'll check it out in more detail tomorrow, but 👍 already for adding it behind a flag

lib/pip.js Outdated
@@ -118,6 +118,12 @@ function installRequirements(
}
cmdOptions.push(dockerImage);
cmdOptions.push(...pipCmd);
// If enabled slimming, strip out the caches, tests and dist-infos
if(options.slim == true){

This comment has been minimized.

@dwolfand

dwolfand May 15, 2018

Member

Can you make this a triple equal please? Javascript is weird, see here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Equality_comparisons_and_sameness

This comment has been minimized.

@dee-me-tree-or-love

dee-me-tree-or-love May 15, 2018

Contributor

Oh, yeah! Thanks, changed it!

README.md Outdated
@@ -95,6 +95,15 @@ try:
except ImportError:
pass
```
### Slimmen Package
To remove the tests, information and caches from the installed packages,
enable the `slim` option. This will: 1. `strip` the `.so` files, remove `__pycache__`

This comment has been minimized.

@dschep

dschep May 15, 2018

Collaborator

Remove the 1. if you're not actually gonna have a numbered list 😉

This comment has been minimized.

@dee-me-tree-or-love

dee-me-tree-or-love May 16, 2018

Contributor

Oh, yeah, thanks a lot! will fix it up 😄

lib/pip.js Outdated
const folderPath = dockerPathForWin(options, targetRequirementsFolder);
const stripCmd = [
`&& find ${folderPath} -name "*.so" -exec strip {} ;`,
`&& find ${folderPath} -name "*.py[c|o]" -exec rm -rf {} +`,

This comment has been minimized.

@dschep

dschep May 15, 2018

Collaborator

Any reason to use -exec rm instead of find's -delete option?

This comment has been minimized.

@dee-me-tree-or-love

dee-me-tree-or-love May 16, 2018

Contributor

To be honest I have never used t go for -delete - it felt natural to go with rm {} + since it's always there and as far as I know + makes it not span the redundant processes 🤔

Thanks for pointing it out - I have to consider opting for -delete!

It is available at the alpine docker image sooo, maybe I should run some experiments to check which performs better?
Do you personally prefer -delete 🙂 ?

npm i $(npm pack ../..)
! uname -sm|grep Linux || groups|grep docker || id -u|egrep '^0$' || skip "can't dockerize on linux if not root & not in docker group"
sls --dockerizePip=true --slim=true package
unzip .serverless/sls-py-req-test.zip -d puck

This comment has been minimized.

@dschep

dschep May 15, 2018

Collaborator

🙌 Thanks for including tests too!!

Could you add this to also check that the unzipped services doesn't contain any pyc files?

test $(find . -name *.pyc | wc -l) -eq 0

This comment has been minimized.

@dee-me-tree-or-love

dee-me-tree-or-love May 16, 2018

Contributor

Sure, that's an awesome one, thanks, didn't know of this, will do now! ✌️

This comment has been minimized.

@dee-me-tree-or-love

dee-me-tree-or-love May 16, 2018

Contributor

should I put test $(find puck -name *.pyc | wc -l) -eq 0 maybe? so it runs inside of the puck folder?

This comment has been minimized.

@dschep

dschep May 16, 2018

Collaborator

Yup. My mistake.

This comment has been minimized.

@braco

braco Jul 2, 2018

@dee-me-tree-or-love, @dschep Wouldn't you want to dump the .py and not .pyc?

lib/pip.js Outdated
@@ -167,7 +167,7 @@ function dockerPathForWin(options, path) {
function getSlimPackageCommands(options, targetRequirementsFolder) {
const folderPath = dockerPathForWin(options, targetRequirementsFolder);
const stripCmd = [
`&& find ${folderPath} -name "*.so" -exec strip {} ;`,
`&& find ${folderPath} -name "*.so" -exec strip {} \\;`,

This comment has been minimized.

@dee-me-tree-or-love

dee-me-tree-or-love May 16, 2018

Contributor

Here noticed a bug - at some point removed the \\, which broke the whole script... Now it works

lib/pip.js Outdated
@@ -120,7 +120,7 @@ function installRequirements(
cmdOptions.push(...pipCmd);
// If enabled slimming, strip out the caches, tests and dist-infos
if (options.slim === true) {
if (options.slim) {

This comment has been minimized.

@dee-me-tree-or-love

dee-me-tree-or-love May 16, 2018

Contributor

When running tests noticed that setting the slim parameter via cli as --slim=true results in it treated as string.
From this (options.slim === true) was evaluated as false every time and the commands were not executed 😅

This comment has been minimized.

@dwolfand

dwolfand May 16, 2018

Member

I'm not sure this is what you want either then because Boolean("false") will evaluate to true since strings are truthy in javascript. I think it should be options.slim === true || options.slim === "true"

This comment has been minimized.

@dschep

dschep May 16, 2018

Collaborator

I did the same thing as you for package individually, but David is right, I'd prefer that you use options.slim === true || options.slim === "true" and I'll update my use of the same as well.

This comment has been minimized.

@dee-me-tree-or-love

dee-me-tree-or-love May 16, 2018

Contributor

Oh yeah! thanks, that's a really good one, commit on the way 👍

test.bats Outdated
@@ -83,6 +83,7 @@ teardown() {
sls --dockerizePip=true --slim=true package
unzip .serverless/sls-py-req-test.zip -d puck
ls puck/flask
test $(find puck -name *.pyc | wc -l) -eq 0

This comment has been minimized.

@dee-me-tree-or-love

dee-me-tree-or-love May 16, 2018

Contributor

thanks for this advice 👍 really helped to catch a few things!

@dschep

This comment has been minimized.

Collaborator

dschep commented May 16, 2018

one more thing: add yourself to the contributors section at the bottom of the read me 😄

@dee-me-tree-or-love

This comment has been minimized.

Contributor

dee-me-tree-or-love commented May 16, 2018

Made the change on options.slim and updated the readme
Wow, that's super exciting! 😄

@dschep

One more change and we should be good to go, could you add a slim: false default value in this default options object: https://github.com/UnitedIncome/serverless-python-requirements/blob/master/index.js#L28-L52

@berlinguyinca

This comment has been minimized.

berlinguyinca commented May 17, 2018

when will this be in? I really like this

@berlinguyinca

This comment has been minimized.

berlinguyinca commented May 17, 2018

also does this work in non-docker mode? I'm sadly not able to use docker mode due to Ta-Lib being a huge pain...

@berlinguyinca

This comment has been minimized.

berlinguyinca commented May 17, 2018

ok couple of hours later, how would you like to have a simple patch submitted to support the slim down support for non docker builds?

patch is more or less:

if(docker....){
....
}
 else {
    cmd = pipCmd[0];
    cmdOptions = pipCmd.slice(1);

    if (options.slim === true || options.slim === 'true') {
      serverless.cli.log("slimming it down - no docker");

      const slimCmd = getSlimPackageCommands(options, targetRequirementsFolder);
      cmdOptions.push(...slimCmd);
    }
    else{
        serverless.cli.log('no slimming specified');
    }
  }
@dee-me-tree-or-love

This comment has been minimized.

Contributor

dee-me-tree-or-love commented May 17, 2018

For the non-docker builds I didn't plan to include the support for this originally 🤔
Indeed that would be a super cool patch I think 👍

Mainly because the commands for reducing the package size are known to work in the POSIX environments so by asking docker to run them we can be sure for it to work.
I was not sure they would work for the cmd | powershell etc users (I suppose most likely not 🙃)...

I could work to include it here, but that would increase the pull request scope -- should I or maybe submitting a separate one would be better?

@berlinguyinca

This comment has been minimized.

berlinguyinca commented May 17, 2018

It would and is also highly required since otherwise, it's near impossible to deploy any large pandas libs. Can we maybe add a check to support slim mode on Linux or Docker system's only?

also, another request would be, can we specify which patterns to exclude? For example, removing all tests, breaks some python packages, which have a submodule named tests. Hyperopt would come to mind.

@dschep

This comment has been minimized.

Collaborator

dschep commented May 17, 2018

Doh. I didn't notice that this was docker only. I definitely want this to work in both scenarios.

@dee-me-tree-or-love

This comment has been minimized.

Contributor

dee-me-tree-or-love commented May 17, 2018

True, that's a very good point, thanks - I will work on this in the upcoming days then!

  • Specifying patterns to remove
  • Non Docker build support
@berlinguyinca

This comment has been minimized.

berlinguyinca commented May 17, 2018

// If enabled slimming, strip out the caches, tests and dist-infos
if (options.slim === true || options.slim === 'true') {
const preparedPath = dockerPathForWin(options, targetRequirementsFolder);
const slimCmd = getSlimPackageCommands(options, preparedPath);

This comment has been minimized.

@dee-me-tree-or-love

dee-me-tree-or-love May 22, 2018

Contributor

Now the preparing of the strip command happens inside of this method - in case when the environment does not meet expectations (win32 systems) returns an empty array.

To be honest, I am slightly worried about this - theoretically running these commands inside of CYGWIN or Git Bash (or other bash emulators) should be just fine - but I couldn't find a way yet to check for this... would appreciate any advice! :)

`&& find ${folderPath} -name "*.so" -exec strip {} \\;`,
`&& find ${folderPath} -name "*.py[c|o]" -exec rm -rf {} +`,
`&& find ${folderPath} -type d -name "__pycache__*" -exec rm -rf {} +`,
`&& find ${folderPath} -type d -name "*.dist-info*" -exec rm -rf {} +`

This comment has been minimized.

@dee-me-tree-or-love

dee-me-tree-or-love May 22, 2018

Contributor

Maybe removing egg-info would also be fine within default slim options? 🤔

This comment has been minimized.

@braco

braco Jun 13, 2018

@dee-me-tree-or-love test folders?

 11M	./pandas/tests/io/sas/data
 11M	./pandas/tests/io/sas
2.0M	./pandas/tests/io/data/legacy_pickle
4.8M	./pandas/tests/io/data
 17M	./pandas/tests/io
1.2M	./pandas/tests/indexes
 24M	./pandas/tests
@dee-me-tree-or-love

This comment has been minimized.

Contributor

dee-me-tree-or-love commented Jun 1, 2018

Friendly bump @dschep 😄
@berlinguyinca @dwolfand
Hey-hey, guys, I was curious if there is something I could do to improve and make the PR progress?
What do you think of it so far?

@dschep

This comment has been minimized.

Collaborator

dschep commented Jun 1, 2018

Hey @dee-me-tree-or-love! Sorry, we've been busy. I'll take a look at this soon 😃

@sweepy84

This comment has been minimized.

sweepy84 commented Jun 4, 2018

Great stuff @dee-me-tree-or-love ! This is an absolute must feature for me!

Looking forward to getting this released soon @dschep :)

@braco

This comment has been minimized.

braco commented Jun 12, 2018

@dschep

dschep approved these changes Jun 15, 2018

sorry i haven't looked at this in so long.. looks awesome!

@dschep dschep merged commit 5614ab0 into UnitedIncome:master Jun 15, 2018

1 check passed

ci/circleci Your tests passed on CircleCI!
Details
@dee-me-tree-or-love

This comment has been minimized.

Contributor

dee-me-tree-or-love commented Jun 15, 2018

Woooow, amazing, thanks! 😄
If there will be any points to improve or follow up on - please let me know! ✌️

@bweigel

This comment has been minimized.

Contributor

bweigel commented Jun 16, 2018

Kudos to you! Very nice work. 👍
I was just about to start something similar, but now I don't have to. 😄

@sweepy84

This comment has been minimized.

sweepy84 commented Jun 22, 2018

Hi, tried using this but doesn't seem to be working for me, still getting tests included. (windows machine)

e.g. for pystache still seeing all the test files.

Am i missing something?

This is my serverles.yml

custom:
    pythonRequirements:
      slim: true
      slimPatterns:
        - "*test*"
        - "*.exe"
      pythonBin: python
@dee-me-tree-or-love

This comment has been minimized.

Contributor

dee-me-tree-or-love commented Jun 22, 2018

@sweepy84 oh, indeed, the thing is - the slimPatterns remove directories that match the pattern

To specify additional directories to remove...
-- from README#custom-removal-patterns --

+ also, it works (supposed to) on windows if run with dockerizePip or if run from wsl environment :)

maybe was a bad idea to be limit to directories, but was reasoning for preventing the deletion of possible required files - with this #191 (comment) - maybe not necessarily so 🤔

let me know if this helps, otherwise also possible make a patch on it ;)

@dee-me-tree-or-love dee-me-tree-or-love changed the title from Reduce size of installed dependencies by stripping to Reduce size of installed dependencies by slimming Jun 26, 2018

@sweepy84

This comment has been minimized.

sweepy84 commented Jun 27, 2018

@dee-me-tree-or-love thanks for the reply!

Even using dockerizePip it still has test folders (I thought "slim" would remove them without slimPattern),

Still getting test folders >> https://snag.gy/g5SelK.jpg

Using:

custom:
    pythonRequirements:
      slim: true
      slimPatterns:
        - "*test*"
        - "*.exe"
      pythonBin: python
      dockerizePip: true
@dee-me-tree-or-love

This comment has been minimized.

Contributor

dee-me-tree-or-love commented Jun 27, 2018

@sweepy84 hey - I have looked into it and replicated the problem - oupsie, it's the 'win32' check that prevents the slim commands even when dockerizePip is enabled 😅

I have tried disabling the windows check and running the commands from git bash, but that results in error: find: missing argument to '-exec' - don't have the time yet to investigate, but if I will have a moment to find a solution, I will get back :)

what I am referring to is this line here:

if (process.platform !== 'win32' || isWsl) {
stripCmd = getDefaultSLimOptions(folderPath);

The easiest and safest workaround for now could be using WSL if it's an option.
The screenshot is from mine + some debug console logs

image

and the resulting pystache in the zip:

image
Not sure, but maybe it is a decent reason to raise an issue? @dschep

@dschep

This comment has been minimized.

Collaborator

dschep commented Jun 27, 2018

Oh. yeah i see no reason to keep it disabled when using docker on win32. Also.. it's a bit bigger of a change... but I'd love it if the find usage was replaced by native JS code using the already included glob-all library. This would allow the slim option's test deletion to work, and only the strip usage would be limited to *nix-like systems.

@sweepy84

This comment has been minimized.

sweepy84 commented Jun 28, 2018

Raised an issue > #212

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment