Enable Dockerfiles to build, and tag, multiple images #5726

Closed
tsutsu opened this Issue May 11, 2014 · 2 comments

Projects

None yet

4 participants

@tsutsu
tsutsu commented May 11, 2014

Problem

Dockerfiles are almost ideal for building code from source into binaries in a deterministic containerized environment. (In this sense, they serve as a more flexible version of a buildpack.)

However, currently, the result of this build is always a docker image which descends linearly from the source-image, and which thus contains the toolchain used to compile it. In some few cases, this is acceptable--when there is no difference between the toolchain required to operate on the source, and the runtime required to execute the binaries, the toolchain image can "specialize" into the runtime image without any sort of heavy removals.

But this case is rare. More often, your runtime will be a tiny VM or library, but the toolchain will depend on the ability to compile dependencies written in other languages (e.g. C), which themselves require -devel versions of libraries, and so forth. None of this stuff is necessary at runtime, but for compilation to succeed, it must be part of the toolchain, and thus the toolchain must be a multi-gigabyte image that would be ridiculous to pull down to your production cloud-instances et al.

People currently sidestep this problem in a number of ways:

  1. They may write a Dockerfile that starts with the toolchain image, compiles the code, and then does extensive RUNs of removal commands to reduce the toolchain image into a runtime image. (These are the people demanding a docker squash command--to allow the runtime image to actually end up smaller than the toolchain image, even though it descends from the toolchain image.)
  2. They may write a Makefile which first builds in one container, then copies files out of the resulting image and into a second Dockerfile-containing subdirectory, which then builds a final image. (This throws away the whole determinism aspect of Dockerfile builds, relying on the behavior of other tools on the host.)
  3. They may, within their Dockerfile, generate a build artifact (e.g. a .deb), and push it out to an S3 bucket. The resulting docker image is then immediately discarded, having served its purpose. (This is also nondeterministic, relying on the vagaries of the Internet.)

None of these workarounds obey the spirit of Dockerfile builds: deterministically turning a source image, plus a context, into a destination image.

Proposed solution

I propose an alternative, which would look something like the following:

FROM toolchain
BINDCONTEXT /app
WORKDIR /app
RUN make # populates /app/bin

FROM runtime
ADD bin /usr/local/bin

This presumes one additional Dockerfile stanza, and one change-in-behavior of a current Dockerfile stanza:

  • The additional stanza, BINDCONTEXT, would be as discussed in #3056, but specifically giving us read-write access to the the ephemeral, uploaded context. The point of this is not optimization, but rather to give intermediate layers a "scratch volume" to work with, whose contents won't end up in the container, but which can be acted on by, and referred to from, other commands.
  • The change in behavior of the FROM stanza, permitting it to appear more than once in a Dockerfile, would be as follows: when a FROM statement is encountered, the layer pointer which the next-created layer will parent upon is reset from the last-created layer to the newly-specified image, This is a generalization of the previous behavior of FROM; all current FROM stanzas could be considered to be resetting the layer pointer from a null layer. Importantly, FROM unmounts any previously-specified BINDCONTEXT mount, but the contents of the context persist from their previous state, and will be in that state if they are mounted again.

Together, these two alterations allow you to have a Dockerfile which creates multiple images, keeping state from the creation of one to the next. If you ran this Dockerfile using docker build -t foo ., it would be the final image--the terminal position of the layer pointer--that would end up being tagged as "foo." The other one would be remain a stack of untagged layers, which could be reused in builds, or flushed away at need.

Going further

Besides the potential workflow presented in the Dockerfile above, a few more possibilities open up by allowing for a third stanza:

  • TAG: like in docker tag, this gives the layer resulting from the previous stanza a name that can be used to refer to it. Unlike in docker tag, the name will only persist for the duration of the build. All such "local" tags look like nearestglobalparenttag+localname, or just +localname when the nearest global parent tag is unambiguous.

Then you could do something like this:

FROM toolchain
BINDCONTEXT /app
WORKDIR /app
RUN make

FROM runtime
RUN apt-get update && apt-get install my-deps
TAG +with-deps

FROM runtime+with-deps
BINDCONTEXT /app
RUN dpkg -i /app/build/foo.deb
TAG +foo

FROM runtime+with-deps
BINDCONTEXT /app
RUN dpkg -i /app/build/bar.deb
TAG +bar

FROM runtime+with-deps
BINDCONTEXT /app
RUN dpkg -i /app/build/baz.deb
TAG +baz

This stanza would require a slight change in the behavior of docker build -t, adding a docker build -t global:local switch to create global tags from local tags. For example, given the above Dockerfile:

docker build -t "mycorp/foo:runtime+foo" -t "mycorp/bazbaz:runtime+baz" .

For backwards compatibility, -t global would be short for -t global:@END, where @END would explicitly refer to the last layer created in the Dockerfile.

Postscript

Given multiple FROM, BINDCONTEXT, and TAG stanzas, an alternate syntax for Dockerfiles might be considered:

...
+with-deps = FROM runtime WITH context: /app {
  RUN apt-get update
  RUN apt-get install my-deps
}

+foo = FROM +with-deps WITH context: /app {
  RUN dpkg -i /app/foo.deb
}
...
@shin- shin- added the Distribution label Jul 1, 2014
@seanchann

FROM command description for multiple images in Proposed solution section ,is this way can work now?

@jessfraz jessfraz added dist and removed dist Distribution labels Jul 10, 2015
@jessfraz
Contributor

Hello!
We are no longer accepting patches to the Dockerfile syntax as you can read about here: https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax

Mainly:

Allowing the Builder to be implemented as a separate utility consuming the Engine's API will open the door for many possibilities, such as offering alternate syntaxes or DSL for existing languages without cluttering the Engine's codebase

Then from there, patches/features like this can be re-thought. Hope you can understand.

@jessfraz jessfraz closed this Jul 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment