-
Notifications
You must be signed in to change notification settings - Fork 392
Caching layers during build #143
Comments
@j00bar any updated specs since our conversation yesterday? |
Trying to document. #thestruggleisreal What I think we've arrived at, as an optional build approach: We're going to implement an Ansible execution strategy. This build approach will not be orchestrating builds using Docker Compose. For each task, it will calculate a hash based on the previous task's hash, the task, the host, and an enumerator of operation order. It will lookup for an image with a label matching that hash. If found, it will consider that image to be a cache of the result of the task and move on. If not, it will stand up a container for that host using the parent's cached image, execute the one task, stop the container, and commit the container as a new image for the cache. Thus, besides the builder container, at most one other container will be running at a time. We will still be able to copy/fetch files to/from the running container and the builder container. But the containers being built will not be able to talk directly to one another over the network. At the end of the build, the resultant image will have one layer per task executed. I'm also going to suggest that as part of this strategy, we allow playbook writers to define a special variable per task, say, Again, this will be an optional build method for speed - the existing build method that results in a single additional layer per build will remain. |
Added feature/build_cache branch for this. Added as an upstream branch to maybe make collaboration easier. |
Has any thought gone into the benefits of using image layers outside of build speed? One of the great things about Docker is that it is fast and lightweight, not only in building but also in shipping and running on different environments. Image layers are a big part of that. For example:
Packing everything into one layer is wasteful if the common use case is to change the last layer of the image 95% of the time (i.e. code). And I guess my point here is that I vote for integrating support for image layers out of the box rather than an optional build parameter. (thanks for the outsider comments. Will be happy to contribute after discussions) |
@jzaccone Thanks for bringing up those points. Out of the box, Ansible Container adds one layer to the base image during the playbook run. So if you're using CentOS (4 layers in stock image), your built image has 5 layers total. You still get all of the benefits you enumerated. |
Well most of the layers are just metadata:
|
In a typical application, the real "meat" is going to come in that last layer created by the playbook run. There is much more to gain here by breaking that apart into multiple layers. |
@jzaccone "gain" in terms of what? |
Gain example: dependencies One application I deploy ATM (to AWS ECR, from a Dockerfile) has:
On a typical build only the last layer changes, ie I only have to upload ~20 MB to AWS most of the time. If the recipe for the three custom layers was an I.e. a way, within the vocabulary of |
@tomsun @j00bar This is the use case that I am referring to. Thank you tomsun for proving the example. To combine my list from above with Tomsun's example, the ~370MB would be duplicated in unique images where only a unique image layer that contains the ~20 MB code change is necessary. This increases the container footprint on all servers where the image is moved to. |
@tomsun @jzaccone Thanks, y'all - that's tremendously helpful. Your input on my design/UX quandry about how to implement this would be welcome. See here: #217 (comment) |
@tomsun's point is the main item holding me back from using ansible-container at the moment (and I would love to do so). The potential proliferation of layers due to the current ineffective cache use is problematic for me as well. I'm curious when you think it might be ready. Dockerfiles just don't have the same functionality offered by ansible. |
Hey @j00bar, has there been any progress made on this? We're using containers in our Jenkins infrastructure and without layer caching, building / pushing / pulling flat images would slow our builds down quite a bit. |
@shanemcd There's a WIP branch that implements an execution strategy that identifies and fingerprints layers. I still need to write the code that will actually do the reuse and cleanup of stale layers. |
One of the niceties of the
Dockerfile
is that the Docker engine re-uses cached layers to accelerate rebuilds.This is not nearly so straight-forward in Ansible Container.
This issue is to track the ongoing discussions among the core team as to how we might implement a similar facility into our builds.
The text was updated successfully, but these errors were encountered: