Add docker squash command #4232

Closed
wants to merge 1 commit into
from

Projects

None yet
@alexlarsson
Contributor

This adds a new cli command like:
docker squash baseimage leafimage

This command creates a new image that is a child of baseimage
and has the same content as leafimage. In other words, it combines
all the layers between baseimage and leafimage into a single
image.

There are several reasons why this is useful, for instance it is common
for intermediate layers to add extra files during execution which are
removed at the end (for instance build dependencies, or e.g. yum/apt-get
metadata). Removing these makes for a smaller final image.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson alexl@redhat.com (github: alexlarsson)

@thaJeztah
Member

Excellent! Hope this makes it into Docker

@deeky666

+1 this would save hacking around this in Dockerfile with a custom build script which installs dependencies, builds and uninstalls dependencies again.

@SvenDowideit
Collaborator

heya @alexlarsson I like it, but IANTM

can you please add the command to cli.rst

And - the big question: to be consistent with docker rm and docker build --rm and docker run --rm, would it be possible to add docker build --squash and docker commit --squash (perhaps there are others..)

@thaJeztah
Member

@SvenDowideit using squash as an option to another action (e.g commit), would that still leave room to review the results of the squash before actually committing?

@SvenDowideit
Collaborator

@thaJeztah when its used as an option, you're saying 'i want this all to happen NOW', so no, I don't think so - if you want to review each step, you'd do them separately. (not much different to using --rm)

@tianon
Member
tianon commented Feb 23, 2014

Whoa, back up. How is --squash similar to --rm? --rm removes worthless intermediate containers that serve effectively no useful purpose. --squash fundamentally and irreversibly changes images (creates new images, but in the case of build it'd have to be changing the images), and currently does so in a way that makes the cache no longer work if you then delete the layers you "squashed" (and would take quite a bit of finagling to make work otherwise in any useful kind of way ATM). Also, the argument to --squash on a build would be very complex in order to specify which layers should be squashed together, and then people will want to be able to squash ranges. I think squashing builds is something that needs much more thought, and should come later.

+1 for having some way to CLI squash arbitrary layers after the fact though, as this PR adds in a sane way IMO (which lays down solid primitives we can play with and improve on so that we can use them for building those other features later)

@thaJeztah
Member

@tianon I think that describes my concerns in my previous comment, adding it as an_option_ make a build and squash in one go, without being able to see the consequences. Wasn't sure if that would have consequences.

@SvenDowideit
Collaborator

I don't mean similar in function, I mean can have a similar usage - when I know the result I want is a squashed image, then ) can do it all in one - just like --rm.

as to what it would do - I'd expect the result to be the same as the fully specified - with the implied baseimage being the FROM - but you're right, other people might expect scratch to be the unspecified baseimage.

perhaps I'm too Perlish ?

@unclejack
Contributor

@SvenDowideit --squash wouldn't be a good idea to have in docker build because some docker users would just add it to their infrastructure and negate many of Docker's benefits.

@alexlarsson This doesn't seem to work for me:

$ docker history docker:latest
IMAGE               CREATED             CREATED BY                                      SIZE
daeacd16bcc7        57 minutes ago      /bin/sh -c #(nop) ADD dir:d00c2c1b7641d2095c6   71.63 MB
e761930cd436        57 minutes ago      /bin/sh -c #(nop) ENTRYPOINT [hack/dind]        0 B
2296f38f9665        57 minutes ago      /bin/sh -c #(nop) WORKDIR /go/src/github.com/   0 B
4d4dc7ffff06        57 minutes ago      /bin/sh -c #(nop) VOLUME /var/lib/docker        0 B
27e9f46b0f0d        57 minutes ago      /bin/sh -c git config --global user.email 'do   48 B
800bbfd1bede        57 minutes ago      /bin/sh -c /bin/echo -e '[default]\naccess_ke   71 B
d395a93dc73c        57 minutes ago      /bin/sh -c gem install --no-rdoc --no-ri fpm    21.01 MB
af91c7121bab        58 minutes ago      /bin/sh -c go get code.google.com/p/go.tools/   13.01 MB
deb81897f742        58 minutes ago      /bin/sh -c cd /usr/local/go/src && bash -xc '   375.7 MB
8a990a46b0fa        About an hour ago   /bin/sh -c #(nop) ENV GOARM=5                   0 B
a361c598f351        About an hour ago   /bin/sh -c #(nop) ENV DOCKER_CROSSPLATFORMS=l   0 B
c157ed8f8825        About an hour ago   /bin/sh -c cd /usr/local/go/src && ./make.bas   84.28 MB
8b093fa0c8d6        About an hour ago   /bin/sh -c #(nop) ENV GOPATH=/go:/go/src/gith   0 B
cabc2a26937b        About an hour ago   /bin/sh -c #(nop) ENV PATH=/usr/local/go/bin:   0 B
6172353d3bb6        About an hour ago   /bin/sh -c curl -s https://go.googlecode.com/   35.32 MB
7eaf6851d8f3        About an hour ago   /bin/sh -c cd /usr/local/lvm2 && ./configure    5.046 MB
7b9bcfef96ed        About an hour ago   /bin/sh -c git clone --no-checkout https://gi   17.92 MB
8e6aca6bc30f        About an hour ago   /bin/sh -c cd /usr/local/lxc && ./autogen.sh    6.127 MB
a7de8e5868ee        About an hour ago   /bin/sh -c git clone --no-checkout https://gi   10.04 MB
b8b5916ea389        About an hour ago   /bin/sh -c apt-get update && DEBIAN_FRONTEND=   224.8 MB
ab8e29119ea6        About an hour ago   /bin/sh -c #(nop) MAINTAINER Tianon Gravi <ad   0 B
9f676bd305a4        2 weeks ago         /bin/sh -c #(nop) ADD saucy.tar.xz in /         182.1 MB
1c7f181e78b9        2 weeks ago         /bin/sh -c #(nop) MAINTAINER Tianon Gravi <ad   0 B
511136ea3c5a        8 months ago                                                        0 B
$ docker squash docker:latest ubuntu:13.10 docker-shrunk
$ docker history docker-shrunk
IMAGE               CREATED             CREATED BY                                      SIZE
120f010398af        5 seconds ago                                                       8.381 MB
9f676bd305a4        2 weeks ago         /bin/sh -c #(nop) ADD saucy.tar.xz in /         182.1 MB
1c7f181e78b9        2 weeks ago         /bin/sh -c #(nop) MAINTAINER Tianon Gravi <ad   0 B
511136ea3c5a        8 months ago                                                        0 B

I'm using the btrfs driver.

@justincampbell justincampbell referenced this pull request in promptworks/docker-ruby-2.0.0 Feb 26, 2014
Closed

Convert the long RUN into many small, caching RUNs #2

@SamSaffron

@tianon having docker build take --squash or --flat would heavily improve my workflow.

In dev I would just use a standard dockerfile, taking advantage of caching nicely. then when ready to deploy a clean image with no deps I would do --squash in some cases.

This would still keep the image for FROM: and effectively squash all the intermediate images I created. Leading to smaller image sizes.

I don't see this negating docker, its about improving it, there is only a point in distributing intermediate images if they are to be reused.

@unclejack
Contributor

@SamSaffron There's absolutely no need for --squash. It's not a feature meant to be used all the time without no effort. Squashing images down like that has the side effect that you'll have to push again the entire image to your production environment. Instead of pushing just the layers which have changed, you'll be pushing pretty much everything again. This is wasteful for a couple of reasons.

@alexlarsson
Contributor

@unclejack New version actually works, i was applying the changes in reverse...

@alexlarsson
Contributor

If we add this we should probably make the Container and ContainerConfig fields of image.Image into an array. Then we could save all the intermediate operations in the squashed image.

@unclejack
Contributor

It's working properly now, but doc changes for usage and API are needed as well.

ping @vieux

@alexlarsson
Contributor

Now has API and cli docs. I put this in version 1.10, but i'm not sure if that is right? When do we mark the version as stable and move to a new version?

@vieux
Member
vieux commented Mar 31, 2014

I guess this should go in the v1.11 of the API. WE should merge this one and #4821 roughly at the same time.

@alexlarsson
Contributor

Now moved to 1.11 api version

@shykes
Contributor
shykes commented Apr 19, 2014

@alexlarsson this overlaps with the changes in image format we started discussing with @vbatts.

I would much prefer that "squashing" be an optimization hidden from the user, either as part of build, or push or some new command. But there's no reason why docker can't figure out on its own what's the best thing to do for a given image at a given time.

The pre-requisite for any of this is to separate the image metadata from the layer topology. In other words, the Image struct should contain all the information needed by, say, docker history, independently of layers.

My suggestion would be to start with that (more modest) change: to start storing the full history of an image in each layer, and change docker history to ignore layers.

@vbatts
Contributor
vbatts commented Apr 21, 2014

The history should reflect whether a particular point in history could be tagged, or if it has already been collapsed. Perhaps the squash should only be done during a docker publish <NAME> or docker prepare <NAME> (neither of these commands exist). This way it is all the same build and testing iteration, but once an image is built and published, then it is squashed, and the history can be attested to.

I still feel like having the ability to squash independently of publishing would be valuable.

Also, while this workflow does have value for producing minimal output images, it would increase the number of non-overlapped images on any-given registry (If you had two images built 'FROM fedora', now they can not escrow the common parent)

/cc @shykes @alexlarsson

@alexlarsson
Contributor

I'm not really aware of the details of your plans wrt the image format. However, i do think it is important that at some level we allow real sharing of data for base images. I.e. on a very dense deployment (i.e. openshift) we do reuse the same bits for base images at some level of granularity at least.

@cpuguy83
Contributor

@alexlarsson How about also a new buildfile command for squash, something like:

FROM ubuntu
SQUASH RUN apt-get update && apt-get install -y build-essential
SQUASH RUN # compile and cleanup some stuff
SQUASH RUN apt-get remove -y build-essential && apt-get clean && apt-get autoremove -y

This way all that build-essential stuff doesn't show up in any of the layers.

@alexlarsson
Contributor

@cpuguy83 That is a weird syntax, you can't squash a single layer.

In general having the ability to define squashes in the dockerfile seems like a good idea. However, i'm trying to keep this small for now to make it easier to discuss and merge.

@cpuguy83
Contributor
cpuguy83 commented May 5, 2014

@alexlarsson It would get squashed into the last commit.

@alexlarsson
Contributor

@cpuguy83 Seems cleaner to squash a whole range at the end rather than having to modify each row of the Dockerfile.

@tailhook
tailhook commented May 5, 2014

Seems cleaner to squash a whole range at the end rather than having to modify each row of the Dockerfile.

But not to squash it with the image specified in "FROM". I.e. single docker file might be single layer on top of base image.

@cpuguy83
Contributor
cpuguy83 commented May 5, 2014

@tailhook There would be a new commit for the FROM line

@alexlarsson
Contributor

Rebased to latest master and converted docs to md

@blacktop
blacktop commented Jun 8, 2014

@cpuguy83 @shykes what about this syntax for the Dockerfile

FROM ubuntu
GROUP
  - RUN apt-get update && apt-get install -y build-essential
  - RUN # compile and cleanup some stuff
  - RUN apt-get remove -y build-essential && apt-get clean && apt-get autoremove -y
@alexlarsson alexlarsson Add docker squash command
This adds a new cli command like:
docker squash baseimage leafimage

This command creates a new image that is a child of baseimage
and has the same content as leafimage. In other words, it combines
all the layers between baseimage and leafimage into a single
image.

There are several reasons why this is useful, for instance it is common
for intermediate layers to add extra files during execution which are
removed at the end (for instance build dependencies, or e.g. yum/apt-get
metadata). Removing these makes for a smaller final image.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
72472fd
@vbatts vbatts added Runtime Distribution and removed Runtime labels Jul 1, 2014
@jaredm4
jaredm4 commented Aug 28, 2014

@blacktop The &&s are kind of unnecessary if you're able to GROUP the calls into one layer. :)

@blacktop

@jaredm4 true, but I like to still logically group types of RUN actions.

The Dockerfile style that seems to have arisen in the absence of this feature is:

FROM ubuntu
RUN \
  apt-get update && \
  apt-get install -y build-essential && \
  <compile and cleanup some stuff> && \
  apt-get remove -y build-essential && \
  apt-get clean && \
  apt-get autoremove -y

Which while not as visually pleasing and 'yaml-ish' still works in my opinion.

@thaJeztah
Member

The downside of the current && \ approach is that it is quite error prone, e.g. removing the last line without removing the previous lines && \

I actually like the approach with GROUP, but without the yaml syntax, more like;

GROUP BEGIN
RUN ....
COPY ....
RUN
GROUP END

Basically, each group could either start a layer and have all subsequent commands run in a single layer, or create layers the normal way and squash afterwards.

This enables people to create logical groups of commands that need to be squashed together.

@gesellix
Contributor

+1 for the plain (non-yaml) GROUP syntax
A variant is discussed in #332, where the automated commits could be disabled. See #332 (comment)

@crosbymichael
Member

@vbatts and the other maintainers are currently working on a new image format that will support these advanced operations without destroying history of how the image was built. We can close this as it will be addressed in the new format.

@xanderdunn

The RUN limit has been the most inconvenient aspect of our Docker use. Long term Docker image evolution is completely impossible. @crosbymichael mentioned a new image format a year and 4 months ago. Is there any progress on that?

Both of the above pull requests were closed without being merged. The two open issues that have referenced this one don't provide any potential solutions.

@amoghe
amoghe commented Mar 30, 2016

I'd like to point out that an external tool that was capable of this (https://github.com/jwilder/docker-squash) seems to have been broken by the content addressable changes. It seems like doing this indepdendent of the image format (i.e. outside the docker daemon) will always make such tools susceptible to breakage whenever the docker image format changes. Baking this into the Dockerfile syntax and/or exposing this as a docker feature should prevent that.

Is there any plan to resuscitate this effort after all the content addressable image layers changes have landed in 1.10?

I think I heard quite a few voices that sounded in favor of introducing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment