Add docker squash command #4232

alexlarsson · 2014-02-19T11:26:26Z

This adds a new cli command like:
docker squash baseimage leafimage

This command creates a new image that is a child of baseimage
and has the same content as leafimage. In other words, it combines
all the layers between baseimage and leafimage into a single
image.

There are several reasons why this is useful, for instance it is common
for intermediate layers to add extra files during execution which are
removed at the end (for instance build dependencies, or e.g. yum/apt-get
metadata). Removing these makes for a smaller final image.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson alexl@redhat.com (github: alexlarsson)

thaJeztah · 2014-02-19T17:30:15Z

Excellent! Hope this makes it into Docker

deeky666 · 2014-02-21T11:39:43Z

+1 this would save hacking around this in Dockerfile with a custom build script which installs dependencies, builds and uninstalls dependencies again.

SvenDowideit · 2014-02-22T00:18:41Z

heya @alexlarsson I like it, but IANTM

can you please add the command to cli.rst

And - the big question: to be consistent with docker rm and docker build --rm and docker run --rm, would it be possible to add docker build --squash and docker commit --squash (perhaps there are others..)

thaJeztah · 2014-02-22T00:46:52Z

@SvenDowideit using squash as an option to another action (e.g commit), would that still leave room to review the results of the squash before actually committing?

SvenDowideit · 2014-02-22T09:53:14Z

@thaJeztah when its used as an option, you're saying 'i want this all to happen NOW', so no, I don't think so - if you want to review each step, you'd do them separately. (not much different to using --rm)

tianon · 2014-02-23T07:33:59Z

Whoa, back up. How is --squash similar to --rm? --rm removes worthless intermediate containers that serve effectively no useful purpose. --squash fundamentally and irreversibly changes images (creates new images, but in the case of build it'd have to be changing the images), and currently does so in a way that makes the cache no longer work if you then delete the layers you "squashed" (and would take quite a bit of finagling to make work otherwise in any useful kind of way ATM). Also, the argument to --squash on a build would be very complex in order to specify which layers should be squashed together, and then people will want to be able to squash ranges. I think squashing builds is something that needs much more thought, and should come later.

+1 for having some way to CLI squash arbitrary layers after the fact though, as this PR adds in a sane way IMO (which lays down solid primitives we can play with and improve on so that we can use them for building those other features later)

thaJeztah · 2014-02-23T09:44:13Z

@tianon I think that describes my concerns in my previous comment, adding it as an_option_ make a build and squash in one go, without being able to see the consequences. Wasn't sure if that would have consequences.

SvenDowideit · 2014-02-24T01:34:42Z

I don't mean similar in function, I mean can have a similar usage - when I know the result I want is a squashed image, then ) can do it all in one - just like --rm.

as to what it would do - I'd expect the result to be the same as the fully specified - with the implied baseimage being the FROM - but you're right, other people might expect scratch to be the unspecified baseimage.

perhaps I'm too Perlish ?

unclejack · 2014-02-24T10:37:56Z

@SvenDowideit --squash wouldn't be a good idea to have in docker build because some docker users would just add it to their infrastructure and negate many of Docker's benefits.

@alexlarsson This doesn't seem to work for me:

$ docker history docker:latest
IMAGE               CREATED             CREATED BY                                      SIZE
daeacd16bcc7        57 minutes ago      /bin/sh -c #(nop) ADD dir:d00c2c1b7641d2095c6   71.63 MB
e761930cd436        57 minutes ago      /bin/sh -c #(nop) ENTRYPOINT [hack/dind]        0 B
2296f38f9665        57 minutes ago      /bin/sh -c #(nop) WORKDIR /go/src/github.com/   0 B
4d4dc7ffff06        57 minutes ago      /bin/sh -c #(nop) VOLUME /var/lib/docker        0 B
27e9f46b0f0d        57 minutes ago      /bin/sh -c git config --global user.email 'do   48 B
800bbfd1bede        57 minutes ago      /bin/sh -c /bin/echo -e '[default]\naccess_ke   71 B
d395a93dc73c        57 minutes ago      /bin/sh -c gem install --no-rdoc --no-ri fpm    21.01 MB
af91c7121bab        58 minutes ago      /bin/sh -c go get code.google.com/p/go.tools/   13.01 MB
deb81897f742        58 minutes ago      /bin/sh -c cd /usr/local/go/src && bash -xc '   375.7 MB
8a990a46b0fa        About an hour ago   /bin/sh -c #(nop) ENV GOARM=5                   0 B
a361c598f351        About an hour ago   /bin/sh -c #(nop) ENV DOCKER_CROSSPLATFORMS=l   0 B
c157ed8f8825        About an hour ago   /bin/sh -c cd /usr/local/go/src && ./make.bas   84.28 MB
8b093fa0c8d6        About an hour ago   /bin/sh -c #(nop) ENV GOPATH=/go:/go/src/gith   0 B
cabc2a26937b        About an hour ago   /bin/sh -c #(nop) ENV PATH=/usr/local/go/bin:   0 B
6172353d3bb6        About an hour ago   /bin/sh -c curl -s https://go.googlecode.com/   35.32 MB
7eaf6851d8f3        About an hour ago   /bin/sh -c cd /usr/local/lvm2 && ./configure    5.046 MB
7b9bcfef96ed        About an hour ago   /bin/sh -c git clone --no-checkout https://gi   17.92 MB
8e6aca6bc30f        About an hour ago   /bin/sh -c cd /usr/local/lxc && ./autogen.sh    6.127 MB
a7de8e5868ee        About an hour ago   /bin/sh -c git clone --no-checkout https://gi   10.04 MB
b8b5916ea389        About an hour ago   /bin/sh -c apt-get update && DEBIAN_FRONTEND=   224.8 MB
ab8e29119ea6        About an hour ago   /bin/sh -c #(nop) MAINTAINER Tianon Gravi <ad   0 B
9f676bd305a4        2 weeks ago         /bin/sh -c #(nop) ADD saucy.tar.xz in /         182.1 MB
1c7f181e78b9        2 weeks ago         /bin/sh -c #(nop) MAINTAINER Tianon Gravi <ad   0 B
511136ea3c5a        8 months ago                                                        0 B
$ docker squash docker:latest ubuntu:13.10 docker-shrunk
$ docker history docker-shrunk
IMAGE               CREATED             CREATED BY                                      SIZE
120f010398af        5 seconds ago                                                       8.381 MB
9f676bd305a4        2 weeks ago         /bin/sh -c #(nop) ADD saucy.tar.xz in /         182.1 MB
1c7f181e78b9        2 weeks ago         /bin/sh -c #(nop) MAINTAINER Tianon Gravi <ad   0 B
511136ea3c5a        8 months ago                                                        0 B

I'm using the btrfs driver.

SamSaffron · 2014-02-26T22:34:40Z

@tianon having docker build take --squash or --flat would heavily improve my workflow.

In dev I would just use a standard dockerfile, taking advantage of caching nicely. then when ready to deploy a clean image with no deps I would do --squash in some cases.

This would still keep the image for FROM: and effectively squash all the intermediate images I created. Leading to smaller image sizes.

I don't see this negating docker, its about improving it, there is only a point in distributing intermediate images if they are to be reused.

unclejack · 2014-03-03T12:59:39Z

@SamSaffron There's absolutely no need for --squash. It's not a feature meant to be used all the time without no effort. Squashing images down like that has the side effect that you'll have to push again the entire image to your production environment. Instead of pushing just the layers which have changed, you'll be pushing pretty much everything again. This is wasteful for a couple of reasons.

alexlarsson · 2014-03-17T14:39:39Z

@unclejack New version actually works, i was applying the changes in reverse...

alexlarsson · 2014-03-17T14:45:42Z

If we add this we should probably make the Container and ContainerConfig fields of image.Image into an array. Then we could save all the intermediate operations in the squashed image.

unclejack · 2014-03-17T15:19:23Z

It's working properly now, but doc changes for usage and API are needed as well.

ping @vieux

alexlarsson · 2014-03-18T10:22:13Z

Now has API and cli docs. I put this in version 1.10, but i'm not sure if that is right? When do we mark the version as stable and move to a new version?

vieux · 2014-03-31T17:39:41Z

I guess this should go in the v1.11 of the API. WE should merge this one and #4821 roughly at the same time.

alexlarsson · 2014-04-18T04:04:09Z

Now moved to 1.11 api version

shykes · 2014-04-19T02:40:38Z

@alexlarsson this overlaps with the changes in image format we started discussing with @vbatts.

I would much prefer that "squashing" be an optimization hidden from the user, either as part of build, or push or some new command. But there's no reason why docker can't figure out on its own what's the best thing to do for a given image at a given time.

The pre-requisite for any of this is to separate the image metadata from the layer topology. In other words, the Image struct should contain all the information needed by, say, docker history, independently of layers.

My suggestion would be to start with that (more modest) change: to start storing the full history of an image in each layer, and change docker history to ignore layers.

vbatts · 2014-04-21T14:21:07Z

The history should reflect whether a particular point in history could be tagged, or if it has already been collapsed. Perhaps the squash should only be done during a docker publish <NAME> or docker prepare <NAME> (neither of these commands exist). This way it is all the same build and testing iteration, but once an image is built and published, then it is squashed, and the history can be attested to.

I still feel like having the ability to squash independently of publishing would be valuable.

Also, while this workflow does have value for producing minimal output images, it would increase the number of non-overlapped images on any-given registry (If you had two images built 'FROM fedora', now they can not escrow the common parent)

/cc @shykes @alexlarsson

alexlarsson · 2014-04-22T13:30:13Z

I'm not really aware of the details of your plans wrt the image format. However, i do think it is important that at some level we allow real sharing of data for base images. I.e. on a very dense deployment (i.e. openshift) we do reuse the same bits for base images at some level of granularity at least.

cpuguy83 · 2014-04-30T13:32:34Z

@alexlarsson How about also a new buildfile command for squash, something like:

FROM ubuntu
SQUASH RUN apt-get update && apt-get install -y build-essential
SQUASH RUN # compile and cleanup some stuff
SQUASH RUN apt-get remove -y build-essential && apt-get clean && apt-get autoremove -y

This way all that build-essential stuff doesn't show up in any of the layers.

alexlarsson · 2014-05-05T18:54:35Z

@cpuguy83 That is a weird syntax, you can't squash a single layer.

In general having the ability to define squashes in the dockerfile seems like a good idea. However, i'm trying to keep this small for now to make it easier to discuss and merge.

cpuguy83 · 2014-05-05T18:56:35Z

@alexlarsson It would get squashed into the last commit.

alexlarsson · 2014-05-05T18:58:12Z

@cpuguy83 Seems cleaner to squash a whole range at the end rather than having to modify each row of the Dockerfile.

tailhook · 2014-05-05T19:12:20Z

Seems cleaner to squash a whole range at the end rather than having to modify each row of the Dockerfile.

But not to squash it with the image specified in "FROM". I.e. single docker file might be single layer on top of base image.

cpuguy83 · 2014-05-05T19:14:20Z

@tailhook There would be a new commit for the FROM line

alexlarsson · 2014-05-09T13:28:27Z

Rebased to latest master and converted docs to md

blacktop · 2014-06-08T18:42:33Z

@cpuguy83 @shykes what about this syntax for the Dockerfile

FROM ubuntu
GROUP
  - RUN apt-get update && apt-get install -y build-essential
  - RUN # compile and cleanup some stuff
  - RUN apt-get remove -y build-essential && apt-get clean && apt-get autoremove -y

This adds a new cli command like: docker squash baseimage leafimage This command creates a new image that is a child of baseimage and has the same content as leafimage. In other words, it combines all the layers between baseimage and leafimage into a single image. There are several reasons why this is useful, for instance it is common for intermediate layers to add extra files during execution which are removed at the end (for instance build dependencies, or e.g. yum/apt-get metadata). Removing these makes for a smaller final image. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

jaredm4 · 2014-08-28T22:23:33Z

@blacktop The &&s are kind of unnecessary if you're able to GROUP the calls into one layer. :)

blacktop · 2014-08-29T03:38:32Z

@jaredm4 true, but I like to still logically group types of RUN actions.

The Dockerfile style that seems to have arisen in the absence of this feature is:

FROM ubuntu
RUN \
  apt-get update && \
  apt-get install -y build-essential && \
  <compile and cleanup some stuff> && \
  apt-get remove -y build-essential && \
  apt-get clean && \
  apt-get autoremove -y

Which while not as visually pleasing and 'yaml-ish' still works in my opinion.

thaJeztah · 2014-08-29T06:27:46Z

The downside of the current && \ approach is that it is quite error prone, e.g. removing the last line without removing the previous lines && \

I actually like the approach with GROUP, but without the yaml syntax, more like;

GROUP BEGIN
RUN ....
COPY ....
RUN
GROUP END

Basically, each group could either start a layer and have all subsequent commands run in a single layer, or create layers the normal way and squash afterwards.

This enables people to create logical groups of commands that need to be squashed together.

gesellix · 2014-08-29T11:36:32Z

+1 for the plain (non-yaml) GROUP syntax
A variant is discussed in #332, where the automated commits could be disabled. See #332 (comment)

crosbymichael · 2014-10-17T21:50:51Z

@vbatts and the other maintainers are currently working on a new image format that will support these advanced operations without destroying history of how the image was built. We can close this as it will be addressed in the new format.

xanderdunn · 2016-03-02T19:06:45Z

The RUN limit has been the most inconvenient aspect of our Docker use. Long term Docker image evolution is completely impossible. @crosbymichael mentioned a new image format a year and 4 months ago. Is there any progress on that?

Both of the above pull requests were closed without being merged. The two open issues that have referenced this one don't provide any potential solutions.

amoghe · 2016-03-30T06:59:10Z

I'd like to point out that an external tool that was capable of this (https://github.com/jwilder/docker-squash) seems to have been broken by the content addressable changes. It seems like doing this indepdendent of the image format (i.e. outside the docker daemon) will always make such tools susceptible to breakage whenever the docker image format changes. Baking this into the Dockerfile syntax and/or exposing this as a docker feature should prevent that.

Is there any plan to resuscitate this effort after all the content addressable image layers changes have landed in 1.10?

I think I heard quite a few voices that sounded in favor of introducing this.

justincampbell mentioned this pull request Feb 26, 2014

Convert the long RUN into many small, caching RUNs tweag/docker-ruby-2.0.0#2

Closed

alexlarsson added the /project/doc label Mar 18, 2014

unclejack mentioned this pull request Apr 21, 2014

The --base-image for docker import #5174

Closed

cpuguy83 mentioned this pull request Jun 8, 2014

Dockerfile keyword RM similar to build --rm=true #6274

Closed

vbatts added Runtime and removed Runtime labels Jul 1, 2014

cpuguy83 mentioned this pull request Jul 2, 2014

Support for environment variables for building containers #6822

Closed

timthelion mentioned this pull request Jul 19, 2014

Dockerfiles should have a way to perform multiple build actions in one commit #2439

Closed

phemmer mentioned this pull request Jul 23, 2014

Squash build dependencies #6906

Closed

ncdc mentioned this pull request Oct 16, 2014

Attempt to implement #332 - flattening layers #8600

Closed

crosbymichael closed this Oct 17, 2014

This was referenced May 8, 2015

Proposal: remove support for multiple FROMs in Dockerfile #13026

Closed

Add MARK and SQUASH builder instructions #12198

Closed

thaJeztah mentioned this pull request May 26, 2015

Secrets: write-up best practices, do's and don'ts, roadmap #13490

Open

vbatts mentioned this pull request Jun 16, 2015

docker squash: Consolidate image layers #13929

Closed

thaJeztah mentioned this pull request May 10, 2016

Adds ability to flatten image after build #22641

Merged

BBBernsteyn mentioned this pull request Oct 13, 2016

Have an internal representation of the relations between different images SUSE/Portus#795

Open

Add docker squash command #4232

Add docker squash command #4232

Conversation

alexlarsson commented Feb 19, 2014

thaJeztah commented Feb 19, 2014

deeky666 commented Feb 21, 2014

SvenDowideit commented Feb 22, 2014

thaJeztah commented Feb 22, 2014

SvenDowideit commented Feb 22, 2014

tianon commented Feb 23, 2014

thaJeztah commented Feb 23, 2014

SvenDowideit commented Feb 24, 2014

unclejack commented Feb 24, 2014

SamSaffron commented Feb 26, 2014

unclejack commented Mar 3, 2014

alexlarsson commented Mar 17, 2014

alexlarsson commented Mar 17, 2014

unclejack commented Mar 17, 2014

alexlarsson commented Mar 18, 2014

vieux commented Mar 31, 2014

alexlarsson commented Apr 18, 2014

shykes commented Apr 19, 2014

vbatts commented Apr 21, 2014

alexlarsson commented Apr 22, 2014

cpuguy83 commented Apr 30, 2014

alexlarsson commented May 5, 2014

cpuguy83 commented May 5, 2014

alexlarsson commented May 5, 2014

tailhook commented May 5, 2014

cpuguy83 commented May 5, 2014

alexlarsson commented May 9, 2014

blacktop commented Jun 8, 2014

jaredm4 commented Aug 28, 2014

blacktop commented Aug 29, 2014

thaJeztah commented Aug 29, 2014

gesellix commented Aug 29, 2014

crosbymichael commented Oct 17, 2014

xanderdunn commented Mar 2, 2016

amoghe commented Mar 30, 2016