Skip to content

Latest commit

 

History

History
62 lines (41 loc) · 5.96 KB

3.1-Optimising-images.md

File metadata and controls

62 lines (41 loc) · 5.96 KB

Optimising Docker Images

Docker images can get really big. Many are over 1G in size. How do they get so big? Do they really need to be this big? Can we make them smaller without sacrificing functionality? Over the past few weeks we have spend a considerable amount of time a lot of time building different docker images. As we began experimenting with image creation one of the things we discovered was that our custom images were ballooning in size pretty quickly (it wasn't uncommon to end up with images that weighed-in at 200MB or more). Now, it's not too big a deal to have a couple gigs worth of images sitting on your local system, but it becomes a bit of pain as soon as you start pushing/pulling these images across the network on a regular basis. I think it was worthwhile to really dig into the docker image creation process so that we could understand how it works and whether there was anything we could do differently to minimize the size of our images.

Image Layers

Before we can talk about how to trim down the size of your images, we need to discuss layers. The concept of image layers involves all sorts of low-level technical details about things like root filesystems, copy-on-write and union mounts -- luckily those topics have been covered pretty well here so no need rehash those details here. For our purposes, the important thing to understand is that each instruction in your Dockerfile results in a new image layer being created. Let's look at an example Dockerfile to see this in action:

RUN mkdir /tmp/foo
RUN fallocate -l 1G /tmp/foo/bar

This is a pretty useless image, but will help illustrate the point about image layers. We're using the alpine:linux image as our base, creating the /tmp/foo directory and then allocating a 1 GB file named bar in that directory. Let's build this image: $ docker build -t sample, sending build context to Docker daemon 2.56 kB:

Step 0 : FROM alpine:linux ---> e8d37d9e3476

Step 1 : RUN mkdir /tmp/foo ---> Running in 3d5d8b288cc2 ---> 9876aa270471 Removing intermediate container 3d5d8b288cc2

Step 2 : RUN fallocate -l 1G /tmp/foo/bar ---> Running in 6c797329ee43 ---> 3ebe08b36733 Removing intermediate container 6c797329ee43 Successfully built 3ebe08b36733

If you read the output of the docker build command you can see exactly what Docker is doing to construct our sample image:

    Using the value specified in our FROM instruction, Docker will docker run a new container from the alpine:wheezy image (the container's ID is 3d5d8b288cc2).
    Within that running container Docker executes the mkdir /tmp/foo instruction.
    The container is stopped, committed (resulting in a new image with ID 9876aa270471) and then removed.
    Docker spins-up another container, this time from the image that was saved in the previous step (this container's ID is 6c797329ee43).
    Within that running container Docker executes the fallocate -l 1G /tmp/foo/bar instruction.
    The container is stopped, committed (resulting in a new image with ID 3ebe08b36733) and then removed.

We can see the end result by looking at the output of the docker images --tree command (unfortunately, the --tree flag is deprecated and will likely be removed in a future release):

$ docker history image_name

In the output you can see the image that's tagged as alpine:linux followed by the two layers we described above (one for each instruction in our Dockerfile). We often talk about "images" and "layers" as if they are different things -- in reality each layer is itself an image. An image's layers are nothing more than a collection of other images. In the same way that we could say: docker run -it sample:latest /bin/bash, we could just as easily execute one of the untagged layers: docker run -it 9876aa270471 /bin/bash. Both are images that can be turned into running containers -- the only difference is that the former has a tag associated with it ("sample:latest") while the later does not. In fact, the ability to create a container from any image layer can be really helpful when trying to debug problems with your Dockerfile.

Image Sizes

Knowing that an image is really a collection of other images, it should come as no surprise that the size of an image is the sum of the sizes of its constituent images. Let's look at the output of the docker history command:

$ docker history sample IMAGE CREATED CREATED BY SIZE 3ebe08b36733 3 minutes ago /bin/sh -c fallocate -l 1G /tmp/foo/bar 1.074 GB 9876aa270471 3 minutes ago /bin/sh -c mkdir /tmp/foo 0 B e8d37d9e3476 4 days ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B 59e359cb35ef 4 days ago /bin/sh -c #(nop) ADD file:1e2ba3d9379f 85.18 MB 511136ea3c5a 13 months ago 0 B

This shows all of the sample image's layers along with the instruction that was used to generate that layer and the resulting size (note that the ordering of layers in the docker history output is reversed from that of the docker images --tree output). There are only two instructions that contribute anything of substance to our image: the ADD instruction (which comes from the alpine:wheezy image) and our fallocate command. Together, these two layers total 1.16 GB which is roughly the size of our image. Let's actually save our image to a tar file and see what the resulting size is:

$ docker save sample > sample.tar $ ls -lh sample.tar -rw-r--r-- 1 core core 1.1G Jul 26 02:35 sample.tar

When an image is saved to a tar file in this way it also includes a bunch of metadata about each of the layers so the total size will be slightly bigger than the sum of the various layers. Let's add one more instruction to our Dockerfile:

Flatten your Image docker-squash tool

We will use the description here and this blog post

Next steps

Managing the size of Docker images is a challenge. It's easy to make them smaller without sacrificing functionality.