Proposal: Nested builds #7115

Open
shykes opened this Issue Jul 19, 2014 · 54 comments

Projects

None yet
@shykes
Contributor
shykes commented Jul 19, 2014

Some images require not just one base image, but the contents of multiple base images to be combined as part of the build process. A common example is an image with an elaborate build environment (base image #1), but a minimal runtime environment (base image #2) on top of which is added the binary output of the build (typically a very small set of binaries and libraries, or even a single static binary). See for example "create lightweight containers with buildroot" and "create the smallest possible container"

1. New Dockerfile keywords: IN and PUBLISH

IN defines a scope in which a subset of a Dockerfile can be executed. The scope is like a new build, nested within the primary build. It is anchored in a directory of the primary build. For example:

PUBLISH changes the path of the filesystem tree to use as the root of the image at the end of the build. The default value is / (eg. "publish the entire filesystem tree"). If it is set to eg. /foo/bar, then the contents of /foo/bar is published as the root filesystem of the image. All filesystem contents outside of that directory are discarded at the end of the build.

FROM ubuntu

RUN apt-get install build-essentials
ADD . /src
RUN cd /src && make

IN /var/build {
    FROM busybox
    EXPOSE 80
    ENTRYPOINT /usr/local/bin/app
}

RUN cp /src/build/app /var/build/usr/local/bin/app

PUBLISH /var/build

Behavior of RUN

When executing a RUN command in an inner build, the runtime uses the inner build directory as the sandbox to execute the command. So for example: IN /foo { touch /hello.txt } will create /foo/hello.txt.

Behavior of ADD

When executing ADD in an inner build, the original source context does not change. In other words, ADD . /dest will always result in the same content being copied, regardless of where in the Dockerfile it is invoked. Note: the destination of the ADD will change in a nested build, since the destination path is scoped to the current inner build.

The outer build can access the inner build

Note that filesystem changes caused by the inner build are visible from the outer build. For example, /usr/local/bin was created by FROM busybox and is therefore accessible to the final RUN command in the build.

Behavior of PUBLISH

Also note that PUBLISH /var/build causes the result of the inner build (the busybox image) to be published. Everything else (including the outer Ubuntu-based build environment) is discarded and not included in the image.

@shykes shykes added the Distribution label Jul 19, 2014
@SvenDowideit
Collaborator

I asked if we could invert the syntax and achieve the same function - and after lots of IRC discussion I think the answer is not really.

This Proposal has some interesting possible effects that we should list:

  • you can use IN and PUBLISH entirely independently.
  • there may be a third parameter to PUBLISH to give it a subname (perhaps registry/image/subname:tag when you docker build -t registry/image:tag)
  • you could PUBLISH more than once
  • you could overlay more than one IN / {FROM app} to do image mixins - and PUBLISH any dir you like, including leaving it as default

some of these may be bad, some may just need more info in the proposal :)

@timthelion
Contributor

Hm, @shykes version makes more technical sense where-as @SvenDowideit's version seems more logical. I'm +1 for @SvenDowideit's version.

@erikh erikh added the Proposal label Jul 21, 2014
@srlochen

+1 Having the ability to inject build/test dependencies and discard them at publishing time would simplify a lot for our docker build/release pipelines.

@erikh erikh removed the Proposal label Jul 21, 2014
@vmarmol
Contributor
vmarmol commented Jul 21, 2014

It would also potentially make the final images much smaller :)
On Jul 21, 2014 2:04 PM, "srlochen" notifications@github.com wrote:

+1 Having the ability to inject build/test dependencies and discard them
at publishing time would simplify a lot for our docker build/release
pipelines.


Reply to this email directly or view it on GitHub
#7115 (comment).

@wyaeld
wyaeld commented Jul 21, 2014

Can someone elaborate where/how layer caching would work into either use-case, from the stated goal of trying to minimize overall size, is the inner buildfile cached completely as a separate container, and only the result is added to the parent layer?

The build process is typically the most time consuming, and benefits the most from caching.

@proppy
Contributor
proppy commented Jul 22, 2014

I'm not sure if the context needs to be implicitly added/bound in the inner image fs (this could maybe be introduced later and separately from this proposal).

I deleted my earlier syntax change suggestion and created a separate proposal to discuss a more explicit way to bind the context, as per IRC discussion, see #7149.

@shykes
Contributor
shykes commented Jul 22, 2014

Guys I ask that you focus on criticizing the proposal instead of pushing completely different proposals in the comments. By all means create a separate issue if you have a proposal of your own!

Thanks.

@proppy
Contributor
proppy commented Jul 22, 2014

@shykes, agreed switching to constructive critism mode.

IN defines a scope in which a subset of a Dockerfile can be executed

Please specify which subset (are ADD and COPY available?)
Also specify what is the context of an inner build (inside IN{}).

@shykes
Contributor
shykes commented Jul 22, 2014

@proppy

Please specify which subset (are ADD and COPY available?)

I didn't mean a subset of available instructions (all instructions should be available), but a subset of the Dockerfile content - in other words, whatever is enclosed in the curly braces. Happy to change the wording to something more clear.

Also specify what is the context of an inner build (inside IN{}).

The source context would be the same in all images. In other words, ADD . /dest will always result in the same content being copied, regardless of where in the Dockerfile it is invoked. Note: the destination of the ADD will change in a nested build, since the destination path is scoped to the current inner build.

@proppy
Contributor
proppy commented Jul 22, 2014

@shykes, thanks I suggest adding this to your original proposal description, as those were the first question I had while reading it.

@proppy
Contributor
proppy commented Jul 22, 2014

It is anchored in a directory of the primary build

What happens if a file exists in both the anchored directory and the fs of the base image used in the FROM of the inner build? Does an anchored directory have to be empty or IN will fail? Are multiple IN with the same anchored directory forbidden?

@fiadliel

I have a possible use case for nested builds which doesn't seem to be covered (yet) by this proposal.

In some cases, the information written into a Dockerfile is duplicated information from an existing build system, which could have been auto-generated instead.

It would be nice if (optionally) the nested build would look for a Dockerfile at the root of the filesystem for the nested build, at that point in the build process. This means that previous steps could generate the Dockerfile and build context used to create the image.

More concretely, http://www.scala-sbt.org/sbt-native-packager/DetailedTopics/docker.html#tasks shows an example where a build system can create a Dockerfile and context, ready to use with Docker.

One example implementation here could be to look for a second Dockerfile if IN /var/build included no commands to execute.

@vbatts
Contributor
vbatts commented Jul 22, 2014

@shykes after looking over this proposal, it satisfies the use-case that #4933 was targeting.

Also, to further this functionality, the path argument to IN ought to expand ENV variables declared in the parent Dockerfile. This way something like $DESTDIR would be a natural flow from build image to runtime image.

Another topic, how will this relationship be tracked with the image metadata stored? will the IN image track the outer image or it FROM as the parent? or will there need to be an additional field for such? or perhaps a noop record layer that indicated where the image came from or what image copied bits into it?

@shykes
Contributor
shykes commented Jul 22, 2014

@proppy updated

@SvenDowideit
Collaborator

@shykes on irc, you mentioned the possibility of having more than one PUBLISH instruction in a single Dockerfile. until the subname functionality lands, can you please define what happens when there are multiple PUBLISH instructions, possibly with different paths, and possibly in different places in the Dockerfile.

Similarly, can you define what happens with multiple IN's

OH - and nesting. can I have an IN inside an IN, and how deep, and can I have a PUBLISH inside an IN - what does that do?

I'm curious how IN will work - will the outer build create a context, upload that to the Daemon to build fresh, then download it and insert the result, or will it happen in the same build, thus possibly have access to the original context?

Can we define what happens when the IN /dir is not empty (1 error, 2 discarded before we enter, 3 the new image starts from there and magically mixes its FROM fs in)

I'm thinking I could use this as a build pipeline for boot2docker, with the final PUBLISHed image containing the docker and boot2docker binaries and the installers - each of which is built IN separate inner sections, and all the working is discarded. (or better, each is PUBLISHed separately). Is that a useful use-case?

@ibuildthecloud
Contributor

I very much like (and need) this functionality. My main comment is that when I first read the Dockerfile, I didn't understand what was going on. It took me a bit to get it. So a couple comments

  1. I think IN is a bit too abstract of a keyword. What about BUILDIN to indicate your are doing a build in that directory.

  2. If we go with this feature, I think immediately people will want externalize the Dockerfile of the inner build. So a syntax like BUILDIN /var/build Dockerfile, where Dockerfile is interpreted the same as the SRC in a ADD command

  3. PUBLISH directory seems a bit problematic. It seems you should only be able to publish a directory that was first specified by IN. You wouldn't want to allow to PUBLISH any random folder because then the result image would have to be a full cp/tar of the directory. We would loose the image layering (unless there's a clever approach I don't know). I wonder if we can invent a syntax in which the IN context is named. like IN /var/build BINARIES { ... } and then PUBLISH BINARIES. The name should be optional, because people may not always want to publish the inner context.

A final general comment is how are we going to layer the inner context. It seems that with each ADD or RUN command in the outer Dockerfile context you could be modifying the contents of /var/build. So (assuming were bind mounting /var/build) you would need to create a new layer for the parent context and then all inner contexts for every Dockerfile directive. It seems the implementation of this could be messy.

It would be cleaner to implement if we explicitly knew for each Dockerfile invocation if it was going to modify the one of the contexts. For example, the below syntax would be easier to implement IMO, but it is uglier.

FROM ubuntu

RUN apt-get install build-essentials
ADD . /src
RUN cd /src && make

BUILD BINARIES {
    FROM busybox
    EXPOSE 80
    ENTRYPOINT /usr/local/bin/app
}

WITH ["BINARIES:/var/build"] RUN cp /src/build/app /var/build/usr/local/bin/app

PUBLISH BINARIES
@icecrime
Member

I can see how the proposal elegantly solves the issue of complex build workflows, but don't you fear it'll be misused as a mean to "sum" images? For example in:

FROM busybox
IN /redis/ { FROM redis }
IN /python/ { FROM python }

Perhaps IN and PUBLISH should be merged in a single keyword which does both (run a nested build and publish its result as output of the outer build), which would in effect restrict the feature to a way of defining "build steps" rather than a way of combining images.

@tianon
Member
tianon commented Jul 25, 2014

Honestly, I see that use as a cool bonus feature, especially since the two
images are placed neatly in separate directories, although the image size
will likely balloon in that case, but I don't think that's really avoidable
with this feature unless it's implemented very very cleverly (which is
obviously possible :P).

@ibuildthecloud
Contributor

@tianon i don't think this needs to be implemented by actually coping the contents of the inner build to the outer layer. Instead setup two rootfs directoies for outer and inner context and mount the inner in to the /var/build. This will mean if you don't publish the inner context the resulting image will have none of the contents of the inner context because it was bind mounted.

This approach also mean this feature would not be able to "sum" up a bunch of images (which is not something we want to allow).

@SvenDowideit
Collaborator

just to note - I would like to be able to sum up a bunch of images.

Doing so makes Docker interesting from a 'replacement for packages' perspective.

its basically making a way to turn off (or make a shared space in) the FS namespace.

so @ibuildthecloud @icecrime could you perhaps expand on your opinion - as it doesn't sound like we all have the same fear of doing it :)

@ibuildthecloud
Contributor

@SvenDowideit I can't say I'm totally opposed to it in general, but it is a separate topic. This proposal is to address the very real issue of separating your build and runtime environments in an elegant way. Anytime a new feature is proposed you must consider how it might be used in some unexpected way and what is that impact.

Allowing one to sum up a bunch of images will fundamentally change the nature of images. As you indicated, you move from an image essentially being a "full OS image" to an image being a "package." If we were to go in this direction we will need to invent new concepts and technology to describe, manage, and create images. At this point in time I don't think it would be helpful to bifurcate the nascent image ecosystem. Instead we should focus on the specific issue at hand and not focus on changing the nature of images.

@icecrime
Member

@SvenDowideit Don't give my opinion too much credit, I'm a beginner with Docker ;-) TBH I'm not sure I understand how the 'replacement for packages' perspective relates to images combination.

I just have the impression that "how can I get both X and Y in my Docker image" is a recurring beginner question (that I've been asking myself): there's no easy way to do this today, which is probably a good thing as it encourages the "one process for one container" approach.

To sum up: using IN without PUBLISH as in my previous comment seems to me like providing an accessible way to do a discouraged thing (both technically by resulting in a bloated image, and functionally by facilitating multiple-responsibilities container). Thus my question: should we be able to use them independently?

@proppy
Contributor
proppy commented Jul 28, 2014

What makes me inconfortable with the proposal in its current form is the tight coupling between the inner instructions and the outer ones.

In the example of the description, the outer RUN cp has to know about .../usr/local/bin to match the inner ENTRYPOINT /usr/local/bin/....

And the inner instructions don't need to ADD the binary, unlike a regular Dockerfile used with a binary context.

This create a model where the set of inner Dockerfile instructions and outer ones are unlikely to be composable across images, even more so if this is combined later with something like INCLUDE: /me imagine Dockerfiles only suitable for usage in IN block.

With the existing build model this is nicely abstracted by the context notion, and some docker users already compose builds today, by chaining multiple docker build with external scripts: where the output of the previous build is passed as the context of the next one.

Maybe the description could expand a little more on the methods used today, and which tradeoffs (if any) the proposal has to make to simplify and improve them.

@aigarius

This has a quite heavy syntax in the file. I would prefer the combination of #7277 with #6906 (comment) to solve such issues.

It does miss a few nice things from this issue , namely the ability to build in one environment (such as full Debian) and then run in something completely different (like busybox) or to combine multiple outputs, but the syntax is much simpler. And #7277 could actually be merged with this ticket in a way to allow the use of separate images and separate Dockerfiles to define the sub-images.

So the example here could be reformulated as:
INCLUDE Dockefile.runtime IN /var/build
or even possible as:
INCLUDE Dockerfile.runtime IN /var/build AS runtime
thus removing the need for the PUBLISH directive altogether.

@rhatdan
Contributor
rhatdan commented Sep 10, 2014

Anyone every attempt at implementing one of these?

@chancez
Contributor
chancez commented Sep 10, 2014

This would be an amazing thing to have. +1

@tonistiigi
Contributor

To me, the syntax/behavior proposed by @proppy in #7149 makes much more sense.

I (like many others here) have trouble understanding how the layering/caching would work according to this proposal. I assume inner build gets its own layers because otherwise the downloaded image size would still be huge. Are the contents of the inner layers then also copied to the outer layers? Or is it possible that the same layers are used by multiple images in different mountpoints.

Even if only a subdirectory is published, the outside layers still have to be kept around for the caching to work. I don't see a requirement that published directory has to be used in a IN block before, but then how does to builder know to ignore the contents outside of this directory in the parent layers preceding the PUBLISH step?

I think that the ability to combine different Docker images into one as suggested by @SvenDowideit is not related to the original smaller build problem and would have a better solution with an update to ADD/COPY command.

@ndeloof
Contributor
ndeloof commented Oct 7, 2014

As @proppy my main concern here is the nested Docker build can access outer build without any explicit ADD/COPY. I'm considering this feature as a nice replacement for my current workflow to pipe docker build/run generating a build context in first docker build to be executed by second docker build :

docker build  # compile
&& docker run # output binary + production Dockerfile as tar.gz
| docker build - 
@mindscratch

This could be on par with composability that Rocket is shooting for.

@TomasTomecek
Contributor

What's a status of this?

@s3ththompson

Are you still driving this @shykes?

@ibuildthecloud
Contributor

The next time I have some free hacking time I'm going to implement this. I'm really sick of my really fat docker images and trying to do hacks to make small ones.

@jessfraz
Contributor

Talk to @tibor because he's hacking on this too :)

On Sunday, March 22, 2015, Darren Shepherd notifications@github.com wrote:

The next time I have some free hacking time I'm going to implement this.
I'm really sick of my really fat docker images and trying to do hacks to
make small ones.


Reply to this email directly or view it on GitHub
#7115 (comment).

@ibuildthecloud
Contributor

@tibor any branch I can look at?

@ibuildthecloud
Contributor

@jfrazelle wait, is that @tibor or @tiborvass ?

@jessfraz
Contributor

@tiborvass my bad

On Sunday, March 22, 2015, Darren Shepherd notifications@github.com wrote:

@jfrazelle https://github.com/jfrazelle wait, is that @tibor
https://github.com/tibor or @tiborvass https://github.com/tiborvass ?


Reply to this email directly or view it on GitHub
#7115 (comment).

@tiborvass
Contributor

@ibuildthecloud there's this branch (not mine) which is pretty cool, it should solve your current immediate problems: #8021

What I'm currently doing is thinking about all the design implications of various proposals and trying to sketch something out that has an answer to many of the different use-cases. I will share a proposal in the coming weeks after 1.6 :)

Feel free to talk to me about it on IRC (preferably next week).

@sirupsen
Contributor

@tiborvass really interested in this work as well. Looking forward to the proposal.

@jessfraz jessfraz added dist and removed Distribution labels Jul 10, 2015
@muayyad-alsadi
Contributor

I find IN ... PUBLISH ... to be redundant compared to my proposal #15271
which look like this SWITCH_ROOT <other_image> <new_root> ...
where other_image can be scratch or busybox
and new_root is the directory to be copied from the previous process before the reset (which typically would be the target/destination of the previous build processes)

since #13171 is merged (we can cp between containers), the process would be cp <new_root> from the build container to some local/host tmp then dump and start over with the other_image (scratch or busybox in the example) as if it's a new Dockerfile with new FROM the only thing that is supposed to be inherited is the maintainer.

in python zen they "say flat is better than nested". there is no need for two instruction that marks two positions.

@alunduil

+1 for @muayyad-alsadi 's idea. It's far better to have a single transition point than a checkpoint/release unless there are other use cases that I missed while skimming this issue.

@muayyad-alsadi
Contributor

@alunduil @jlhawn the only use case (I can think of) that my proposal won't cover is having a common code base the takes forever to build then we want to extract more than one image from the same codebase. think of it as libreoffice package, you build one package that takes hours to build then you get multiple sub-packages like write and impress ..etc.

a real world example would be having both server and client coming from same source package and you want to build an image for the server and another one for the client.

I do have a idea. My vision to Dockerfile is like Spec-file in RPM world, just like dockerfile but instead of building container images, it builds a binary package. RPM has the feature of subpackages it's syntax is like this

Name: foobar
%files server
# foobar-server goes here
%files client
# foobar-client goes here
%files -n python-foobar
# python-foobar goes here (not foobar-python-foobar)

So we need to build sub-images that is auto-prefixed with the tag like those
we have only one build root and multiple published images

# the same tag passed to docker build
SWITCH_ROOT <other_image> <new_root> ...
# add -<tag-suffix> to the tag up to :
SWITCH_ROOT_N_PUBLISH <tag-suffix> <other_image> <new_root> ...

for example docker build -t foobar:2.5 would result in foobar:2.5, foobar-monitor:2.5 and foobar-client:2.5. I have concerns about having a syntax for a full tag because one building image for eggs would end with image for spam

@alunduil

I'm very biased (my opinion is that spec is a horrendous format and shouldn't be emulated). I personally don't need or want the publish multiple images from one build but I do see the utility of it. I really like the idea of two commands (one for just doing a shear and one for shear with multiple images). This lets me keep my Dockerfile nice and simple while accomplishing the goal (out of published image builds) while having the flexibility to do more if need be.

@jessfraz jessfraz removed the kind/proposal label Sep 8, 2015
@jimmycuadra
Contributor

Any updates on this? Not being able to easily separate the build environment from the final image is one of the biggest pain points in Docker for me. (And yes, I know about https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax.)

@jakirkham

Also, would be really interested in seeing something like this or some variant. From my understanding, it would be very helpful for testing a layer without including artifacts from testing in the final tagged commit.

@netroby
netroby commented Nov 6, 2015

+1 , would like this feature. really useful.

@bfirsh bfirsh added the roadmap label Dec 11, 2015
@sleaze
sleaze commented Mar 7, 2016

+1 for docker multiple inheritance functionality

@kolis
kolis commented Apr 1, 2016

👍

@ionelmc
ionelmc commented Apr 14, 2016

Does this https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax mean this proposal won't be implemented any time soon? (if ever)

@srikanthNutigattu

The proposal need to split so that each can be discussed and closed independently.

@mercuriete

👍
Another use case...
Maven image to build java artifacts (jar)
then put this artifacts inside a smaller runtime image (java jre)

@hiroshi
hiroshi commented Oct 16, 2016

Hi, I'm working on a small tool. It can build small docker image with multiple steps. Some may feel it useful.

@graingert
Contributor
@fletcher91

Since we're posting utilities, I've built one to build minimal Golang images in two steps based on the scratch image

@xenoterracide

I think multiple inheritance is a bad idea, see diamond problem, but composable traits a good one, I wrote on the multiple inheritance ticket how I think it could be accomplished safely syntactically.

that said glancing at this or the issue that I'm interested in is a syntactic sugar around temporary build layers for multiple && commands

for example this nasty piece of code

# oracle hackery that lies to it for it's bad installer
RUN mv /usr/bin/free /usr/bin/free.bak \
    && printf "#!/bin/sh\necho Swap - - 2048" > /usr/bin/free \
    && chmod +x /usr/bin/free \
    && mv /sbin/sysctl /sbin/sysctl.bak \
    && printf "#!/bin/sh" > /sbin/sysctl \
    && chmod +x /sbin/sysctl \
    && rpm --install /tmp/oracle-xe-$VERSION-1.0.x86_64.rpm \
    && rm /tmp/oracle-xe-$VERSION-1.0.x86_64.rpm* \
    && mv /usr/bin/free.bak /usr/bin/free \
    && mv /sbin/sysctl.bak /sbin/sysctl

the rpm command is actually expensive and takes a while when doing build, so if if something fails after it (while developing the image) I have to do the whole thing again. What'd be nice is a way to denote layers that are to be flattened, in the final build.

RUN mv ... 
FLT curl
FLT tar 
FLT rm tar

or something like that, where if say the tar failed (because I typoed the path) I wouldn't necessarily have to run the curl again, while developing the file. In the final image these would just look like one layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment