-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching not working (as well as it could?) when using docker... #157
Comments
#145 combined with using pipCmdExtraArgs might address the issues. I don't currently use caching myself (deps aren't big enough to worry about it) but after the v4 relase with all these changes i'll probably try it out too. |
@dschep Thanks for the feedback, I don't know if that will fix it exactly... I don't see that problem, since I can see the docker is using the cache... but it appears it's only for caching the downloaded modules, not in the properly initialized form. And I've tried with pipCmdExtraArgs setting a custom cache dir... but maybe there's some other pip command I don't know about to do "more" caching? I'll leave this post up and maybe do some research on pip and such... I was hoping for not dropping into docker at all if not necessary... Maybe I'll come up with something. |
I don't think #145 will fix this. |
Hmm. ok, I'll have to dig a bit deeper. |
I haven't gotten around to trying it yet... but perhaps this is what we're looking for...? |
^ attempted to implement in the above pull request. pip-accel seems like it would do the job, but it hasn't been kept up to date in the last 2 years and has some malfunctions on a few of the python modules I tried, so thus not a great solution. Could use some input/feedback on that PR, and/or someone else take the idea and run with it since I'm not a great JS dev. |
Fixes #157 (filed by me) ## What this does * Makes the download caching of pip a "first-class citizen" as an option directly in this plugin's options. This will "fix" a few (attempts) at using the pip cache, specifically in Docker, and will simplify this feature (as the user simply has to enable it, not specify a folder). In a future MR, I'd highly suggest enabling this by default. * Second, it adds a new type of caching called "static caching" which allows you to cache the outputs of this plugin. This greatly speeds up every single build as long as you have the feature enabled and do not change your requirements.txt file. In a future MR, I'd highly suggest enabling this by default also. * The pip download and static cache are shared between any projects of the same user through an [appdir](https://www.npmjs.com/package/appdirectory) cache folder when packaging your service. This _especially_ helps on projects that heavily use Docker (Win/Mac) for deployments or development, or for pip modules that need to compile every time, and _especially_ for projects with long requirements.txt files. This will also greatly help the longer and more complex your requirements.txt is, and/or if you use the same requirements.txt on multiple projects (common in team environments). ## Implementation details * When either cache is enabled, this plugin now caches those requirements (download or static) to an "appdir" cache folder (per the [appdirectory](https://www.npmjs.com/package/appdirectory) node module). * When this feature is NOT enabled, nothing changes * Injection happens directly from the new cached requirements directory via a symlink created in the right place in `.serverless` or `.serverless/functionname` if deploying individually. * As mentioned above, there is a symlink into the .serverless folder when the static cache is enabled pointing to it, so you still "know" where your cache is (for both individually and non-individually packaged functions). * The requirements.txt "generator" was improved to remove comments, empty lines, and sort the list of items before trying to use it (or check its md5 sum). This allows for more actual md5 matches between projects, in-case of comments and such in the requirements file. * A new command was added to the command-line to flush the download/static cache, called cleanCache invokable with: `serverless requirements cleanCache`. This clears all items including the download and static cache. * A handful of new tests were created for various edge conditions I've found while doing this refactoring, some were based on bugs other people found while using this plugin with some combination of options, some are not directly related to this merge's intent, but it's just part of my stream of work/consciousness. Sorry tests take a lot longer to run now since there are lots more now. * A UID bug fix related to docker + pip was implemented (seen on a few other bugs) from @cgrimal * The following new configurable custom options were added to this plugin... Variable Name | Value | Description --- | --- | --- useStaticCache | `false/true` | Default: false. This will enable or disable the static cache. After some testing I would like to make this default: true, as this will greatly help everyone, and there's no reason to not enable this. Possibly making this default: true will help weed out issues faster. I'll gladly step-up to quickly fix any bugs people have with it since I'm now well accustomed with the code. useDownloadCache | `false/true` | Default: false. This will enable or disable the pip download cache. This was previously the "example" code using a pipEnvExtraCmd to specify a local folder to cache downloads to. This does not require a cache location to be set, if not specified it will use an appdirs.usercache() location. cacheLocation | `<path>` | Default: [appdirectory](https://www.npmjs.com/package/appdirectory).userCache(appName: serverless-python-requirements) This will allow the user to specify where the caches (both static and download) should be stored. This will be useful for people who want to do advanced things like storing cache globally shared between users, or for CI/CD build servers on shared-storage to allow multiple build machines to leverage a cache to speed builds up. An example would be to mount a shared NFS store on all your CI/CD runners to `/mnt/shared` and set this value to `/mnt/shared/sls-py-cache`. staticCacheMaxVersions | `<integer>` | Default: 0. This will restrict the a maximum number of caches in the cache folder. Setting to 0 makes no maximum number of versions. This will be useful for build/CI/CD machines that have limited disk space and don't want to (potentially) infinitely cache hundreds/thousands of versions of items in cache. Although, I would be disturbed if a project had hundreds of changes to their requirements.txt file. ## TODO - [X] Feature Implementation - [X] BUG: Deploying single-functions fails (Packaging works, but fails because of #161 ) - [X] Code Styling / Linting - [X] Test to be sure Pipfile / generated requirements.txt still works - [X] Tested a bunch on Mac / Linux with and without Docker - [X] Adding Tests for Download Cache - [X] Make sure zip feature still works - [X] Ensure all existing tests pass - [X] Adding Tests for static cache - [X] Updating README.md to inform users how to use it - [X] Make sure dockerSsh works - [X] Implement error when trying to use --cache-dir with dockerizePip (won't work) - [X] Implement suggestion when trying to use --cache-dir without dockerizePip - [x] Test on Windows - [x] Iterate through any feedback - [x] Rebase with master constantly, awaiting merge... :) Replaces #162
I don't know if this problem is Docker-specific or not, but the subject says it all... for local development cycles we use Macs, but our CI/CD servers deploy via Linux. In order for things to work properly for our devs to locally deploy I've got numerous deployments of this plugin, but we've noticed the ever-increasing time of the requirements step. The docker log shows that it uses cache, it appears this is only for the caching of the downloaded modules from the web, before they are setup.
What I'd like is if the requirements.txt file has not changed (md5 sum or last modified time) to not go and attempt to download and re-setup the requirements, ideally not even launching docker, since docker will do the same thing over and over. In this case, all the plugin should do is re-symlink/copy the contents in place from the requirements cache folder, then zip up the lambda package, and then clean up after itself (as it does now).
I don't know if this is a bug, or a feature request. But when I hear that this library is "caching" requirements, I assume it's caching more than just the downloading of packages, it should be caching the "output" of the requirements step, not the "prefetch" of the requirements step.
I'd love some feedback from some other contributors if this is a solid approach. I wouldn't mind doing this feature myself, but would like feedback before wasting time on it if it was supposed to work differently already.
The text was updated successfully, but these errors were encountered: