-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Initial proof-of-concept to have a static cache #162
Conversation
I'll take a look at this next week. |
You can probably speed it up even more by having |
I'll take a stab later this week at putting this in the OS temp directory (I had some other feedback recommend me to do this, that way the caching can be shared between stacks that use the same requirements) unless there's some reason why I shouldn't do that and then also try to modify injectAllRequirements to pull from this folder to speed it up even more. |
I'm also very interested in that feature, because reinstalling packages isn't really necessary. Keep in mind that temp directory is cleared after a system restart, at least on Linux. Programs usually use user home directory, e.g. If I could help you somehow, just say, I have experience in both JS and Python. |
@Valian Glad to see some shared interest... that user home directory works on Mac/Linux but not Windows (or does it?). Also, I personally dislike storing things on a user's home folder without their knowledge, I think system temp is more appropriate since it is removed upon reboot. Anyways, I emailed you, lets have a chat and maybe you can help me nail this down. Cheers! |
@dschep Could I get your thoughts on the points I've written up below? I'm going to take up making this merge request actually possible and happen because I'm seeing a lot of interest from the people I'm sharing this with, but there are going to be a few bigger changes, and I just want your thoughts on them so I don't waste my time on something that will never be merged incase something seems fundamentally flawed to you
|
For me, all these points are valid and welcomed, maybe except |
Your writeup sounds good for the most part @AndrewFarley. I agree with @Valian's point re |
Thanks @dschep, keep you guys posted... :) |
FYI: RE-opening a new PR to re-review the proposed code |
Fixes #157 (filed by me) ## What this does * Makes the download caching of pip a "first-class citizen" as an option directly in this plugin's options. This will "fix" a few (attempts) at using the pip cache, specifically in Docker, and will simplify this feature (as the user simply has to enable it, not specify a folder). In a future MR, I'd highly suggest enabling this by default. * Second, it adds a new type of caching called "static caching" which allows you to cache the outputs of this plugin. This greatly speeds up every single build as long as you have the feature enabled and do not change your requirements.txt file. In a future MR, I'd highly suggest enabling this by default also. * The pip download and static cache are shared between any projects of the same user through an [appdir](https://www.npmjs.com/package/appdirectory) cache folder when packaging your service. This _especially_ helps on projects that heavily use Docker (Win/Mac) for deployments or development, or for pip modules that need to compile every time, and _especially_ for projects with long requirements.txt files. This will also greatly help the longer and more complex your requirements.txt is, and/or if you use the same requirements.txt on multiple projects (common in team environments). ## Implementation details * When either cache is enabled, this plugin now caches those requirements (download or static) to an "appdir" cache folder (per the [appdirectory](https://www.npmjs.com/package/appdirectory) node module). * When this feature is NOT enabled, nothing changes * Injection happens directly from the new cached requirements directory via a symlink created in the right place in `.serverless` or `.serverless/functionname` if deploying individually. * As mentioned above, there is a symlink into the .serverless folder when the static cache is enabled pointing to it, so you still "know" where your cache is (for both individually and non-individually packaged functions). * The requirements.txt "generator" was improved to remove comments, empty lines, and sort the list of items before trying to use it (or check its md5 sum). This allows for more actual md5 matches between projects, in-case of comments and such in the requirements file. * A new command was added to the command-line to flush the download/static cache, called cleanCache invokable with: `serverless requirements cleanCache`. This clears all items including the download and static cache. * A handful of new tests were created for various edge conditions I've found while doing this refactoring, some were based on bugs other people found while using this plugin with some combination of options, some are not directly related to this merge's intent, but it's just part of my stream of work/consciousness. Sorry tests take a lot longer to run now since there are lots more now. * A UID bug fix related to docker + pip was implemented (seen on a few other bugs) from @cgrimal * The following new configurable custom options were added to this plugin... Variable Name | Value | Description --- | --- | --- useStaticCache | `false/true` | Default: false. This will enable or disable the static cache. After some testing I would like to make this default: true, as this will greatly help everyone, and there's no reason to not enable this. Possibly making this default: true will help weed out issues faster. I'll gladly step-up to quickly fix any bugs people have with it since I'm now well accustomed with the code. useDownloadCache | `false/true` | Default: false. This will enable or disable the pip download cache. This was previously the "example" code using a pipEnvExtraCmd to specify a local folder to cache downloads to. This does not require a cache location to be set, if not specified it will use an appdirs.usercache() location. cacheLocation | `<path>` | Default: [appdirectory](https://www.npmjs.com/package/appdirectory).userCache(appName: serverless-python-requirements) This will allow the user to specify where the caches (both static and download) should be stored. This will be useful for people who want to do advanced things like storing cache globally shared between users, or for CI/CD build servers on shared-storage to allow multiple build machines to leverage a cache to speed builds up. An example would be to mount a shared NFS store on all your CI/CD runners to `/mnt/shared` and set this value to `/mnt/shared/sls-py-cache`. staticCacheMaxVersions | `<integer>` | Default: 0. This will restrict the a maximum number of caches in the cache folder. Setting to 0 makes no maximum number of versions. This will be useful for build/CI/CD machines that have limited disk space and don't want to (potentially) infinitely cache hundreds/thousands of versions of items in cache. Although, I would be disturbed if a project had hundreds of changes to their requirements.txt file. ## TODO - [X] Feature Implementation - [X] BUG: Deploying single-functions fails (Packaging works, but fails because of #161 ) - [X] Code Styling / Linting - [X] Test to be sure Pipfile / generated requirements.txt still works - [X] Tested a bunch on Mac / Linux with and without Docker - [X] Adding Tests for Download Cache - [X] Make sure zip feature still works - [X] Ensure all existing tests pass - [X] Adding Tests for static cache - [X] Updating README.md to inform users how to use it - [X] Make sure dockerSsh works - [X] Implement error when trying to use --cache-dir with dockerizePip (won't work) - [X] Implement suggestion when trying to use --cache-dir without dockerizePip - [x] Test on Windows - [x] Iterate through any feedback - [x] Rebase with master constantly, awaiting merge... :) Replaces #162
WARNING: Please do not merge as-is
Fixes #157 (filed by me)
This is a proof of concept that will probably need a bit of love. What this does, is it creates a cache of the statically compiled pip outputs, and based on the MD5 hash of the generated requirements.txt file it will re-use the last-statically compiled outputs automatically, and never enter docker. Our build times are down from 2-3 minutes to a few seconds.
Implementation Details
Items from
.serverless/requirements
are placed into the.serverless-python-requirements-static-cache
folder at the project root, and a file in that folder is created to store the md5 sum of therequirements.txt
that was used to create that static cache. Since the.serverless
folder is completely destroyed before passing context to the serverless-python-requirements plugin, I have to store the requirements somewhere else.Problems / Suggestions / Ideas / Input Needed
staticCache: true
.serverless/requirements
folder from in there and re-use it for the next run..serverless/requirements
folder (above) or use a system temp path for the cache. But, as this PR works now, I can't seem to get serverless to ignore the new cache folder when packaging the zip file. If I add.serverless-python-requirements-static-cache
to the exclude path inserverless.yml
it still includes it. This is out of my knowledge of Serverless or this plugin. Hoping someone can help with this, as this is the only major bug I can tell in this plugin. Consider using another location for storing the static cache would fix this problem (like os-dependent temp storage)?Sample Output
In some brief tests, I've noticed the larger the project and/or the more compiling having to be done, the more the savings (makes sense). The time it takes now is mostly just zipping everything up.
Questions / Thoughts / Input / Feedback welcome. Feel free to take this idea and run with it, re-implementing it as you see fit, or give me adequate input and I can try to re-align. I'm (clearly) not really a javascript programmer though. I need this for a few of my development teams to support faster development cycles. :)