New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docker squash command #4232
Conversation
Excellent! Hope this makes it into Docker |
+1 this would save hacking around this in Dockerfile with a custom build script which installs dependencies, builds and uninstalls dependencies again. |
heya @alexlarsson I like it, but IANTM can you please add the command to cli.rst And - the big question: to be consistent with |
@SvenDowideit using squash as an option to another action (e.g commit), would that still leave room to review the results of the squash before actually committing? |
@thaJeztah when its used as an option, you're saying 'i want this all to happen NOW', so no, I don't think so - if you want to review each step, you'd do them separately. (not much different to using |
Whoa, back up. How is +1 for having some way to CLI squash arbitrary layers after the fact though, as this PR adds in a sane way IMO (which lays down solid primitives we can play with and improve on so that we can use them for building those other features later) |
@tianon I think that describes my concerns in my previous comment, adding it as an_option_ make a build and squash in one go, without being able to see the consequences. Wasn't sure if that would have consequences. |
I don't mean similar in function, I mean can have a similar usage - when I know the result I want is a squashed image, then ) can do it all in one - just like --rm. as to what it would do - I'd expect the result to be the same as the fully specified - with the implied perhaps I'm too Perlish ? |
@SvenDowideit @alexlarsson This doesn't seem to work for me:
I'm using the btrfs driver. |
@tianon having In dev I would just use a standard dockerfile, taking advantage of caching nicely. then when ready to deploy a clean image with no deps I would do --squash in some cases. This would still keep the image for I don't see this negating docker, its about improving it, there is only a point in distributing intermediate images if they are to be reused. |
@SamSaffron There's absolutely no need for |
@unclejack New version actually works, i was applying the changes in reverse... |
If we add this we should probably make the Container and ContainerConfig fields of image.Image into an array. Then we could save all the intermediate operations in the squashed image. |
It's working properly now, but doc changes for usage and API are needed as well. ping @vieux |
Now has API and cli docs. I put this in version 1.10, but i'm not sure if that is right? When do we mark the version as stable and move to a new version? |
I guess this should go in the |
Now moved to 1.11 api version |
@alexlarsson this overlaps with the changes in image format we started discussing with @vbatts. I would much prefer that "squashing" be an optimization hidden from the user, either as part of build, or push or some new command. But there's no reason why docker can't figure out on its own what's the best thing to do for a given image at a given time. The pre-requisite for any of this is to separate the image metadata from the layer topology. In other words, the Image struct should contain all the information needed by, say, docker history, independently of layers. My suggestion would be to start with that (more modest) change: to start storing the full history of an image in each layer, and change docker history to ignore layers. |
The history should reflect whether a particular point in history could be tagged, or if it has already been collapsed. Perhaps the squash should only be done during a I still feel like having the ability to squash independently of publishing would be valuable. Also, while this workflow does have value for producing minimal output images, it would increase the number of non-overlapped images on any-given registry (If you had two images built 'FROM fedora', now they can not escrow the common parent) /cc @shykes @alexlarsson |
I'm not really aware of the details of your plans wrt the image format. However, i do think it is important that at some level we allow real sharing of data for base images. I.e. on a very dense deployment (i.e. openshift) we do reuse the same bits for base images at some level of granularity at least. |
@alexlarsson How about also a new buildfile command for squash, something like: FROM ubuntu
SQUASH RUN apt-get update && apt-get install -y build-essential
SQUASH RUN # compile and cleanup some stuff
SQUASH RUN apt-get remove -y build-essential && apt-get clean && apt-get autoremove -y This way all that build-essential stuff doesn't show up in any of the layers. |
@cpuguy83 That is a weird syntax, you can't squash a single layer. In general having the ability to define squashes in the dockerfile seems like a good idea. However, i'm trying to keep this small for now to make it easier to discuss and merge. |
@alexlarsson It would get squashed into the last commit. |
@cpuguy83 Seems cleaner to squash a whole range at the end rather than having to modify each row of the Dockerfile. |
But not to squash it with the image specified in "FROM". I.e. single docker file might be single layer on top of base image. |
@tailhook There would be a new commit for the FROM line |
Rebased to latest master and converted docs to md |
This adds a new cli command like: docker squash baseimage leafimage This command creates a new image that is a child of baseimage and has the same content as leafimage. In other words, it combines all the layers between baseimage and leafimage into a single image. There are several reasons why this is useful, for instance it is common for intermediate layers to add extra files during execution which are removed at the end (for instance build dependencies, or e.g. yum/apt-get metadata). Removing these makes for a smaller final image. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
@blacktop The &&s are kind of unnecessary if you're able to GROUP the calls into one layer. :) |
@jaredm4 true, but I like to still logically group types of RUN actions. The Dockerfile style that seems to have arisen in the absence of this feature is: FROM ubuntu
RUN \
apt-get update && \
apt-get install -y build-essential && \
<compile and cleanup some stuff> && \
apt-get remove -y build-essential && \
apt-get clean && \
apt-get autoremove -y Which while not as visually pleasing and 'yaml-ish' still works in my opinion. |
The downside of the current I actually like the approach with
Basically, each group could either start a layer and have all subsequent commands run in a single layer, or create layers the normal way and squash afterwards. This enables people to create logical groups of commands that need to be squashed together. |
+1 for the plain (non-yaml) GROUP syntax |
@vbatts and the other maintainers are currently working on a new image format that will support these advanced operations without destroying history of how the image was built. We can close this as it will be addressed in the new format. |
The Both of the above pull requests were closed without being merged. The two open issues that have referenced this one don't provide any potential solutions. |
I'd like to point out that an external tool that was capable of this (https://github.com/jwilder/docker-squash) seems to have been broken by the content addressable changes. It seems like doing this indepdendent of the image format (i.e. outside the docker daemon) will always make such tools susceptible to breakage whenever the docker image format changes. Baking this into the Dockerfile syntax and/or exposing this as a docker feature should prevent that. Is there any plan to resuscitate this effort after all the content addressable image layers changes have landed in 1.10? I think I heard quite a few voices that sounded in favor of introducing this. |
This adds a new cli command like:
docker squash baseimage leafimage
This command creates a new image that is a child of baseimage
and has the same content as leafimage. In other words, it combines
all the layers between baseimage and leafimage into a single
image.
There are several reasons why this is useful, for instance it is common
for intermediate layers to add extra files during execution which are
removed at the end (for instance build dependencies, or e.g. yum/apt-get
metadata). Removing these makes for a smaller final image.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson alexl@redhat.com (github: alexlarsson)