Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-stage build support #3383

Closed
emilevauge opened this issue Aug 29, 2017 · 14 comments
Closed

Add multi-stage build support #3383

emilevauge opened this issue Aug 29, 2017 · 14 comments

Comments

@emilevauge
Copy link
Contributor

Hi folks,
I'm opening this issue to track multi-stage builds support in official-images.
Any ETA on this ?

@StefanScherer
Copy link

StefanScherer commented Sep 8, 2017

Same demand to build optimal Node.js image in nodejs/docker-node#362 as Windows isn't as good in cleaning up temp files/packages/cached MSI.

  • Windows builds still use Docker 1.12.2-cs2-ws-beta (17.06.1-ee-2 is available)
  • ARM builds already use Docker 17.06.1-ce
  • AMD builds use Docker 17.03.2-ce
    Is there anything the community can help with?

@yosifkit
Copy link
Member

Here is a list we came up with so far, feel free to comment with more.

Positive points:

  • can generate nanoserver images where the installer exe would not normally run
  • upcoming nanoserver (version 1709) based images are mostly useless without multi-stage since they have no powershell
  • can download or generate a full rootfs (no more huge tarballs in git with force pushed branches)
  • large build container with small end result (hello-world, traefik)

Counter points:

  • can mostly be accomplished via long RUN lines
  • increases space required on build servers
  • can greatly increase build time (busybox building binaries: 37 min for just amd64, vs a Dockerfile with ADD busybox.tar.xz /)
  • updates to either base image will require a rebuild and push
  • is a single binary image useful?
    • hard to debug, no shell or other utilities
    • saves about 4 MB to be from scratch vs alpine

This is more complex than just the Docker version of the systems building the images.

  • how do we clean dangling images without invalidating cache of every mutli-stage image?
  • hardcoded assumptions of a single FROM in many places
    • bashbrew itself
      • one example of many: in order to provide more robust build caching, bashbrew does a docker tag on an image with a hash of unique bits that includes the parent image ID
    • many jenkins jobs, shell scripts, tooling, etc used in building, tagging, pushing
      • finding instances of this assumption is difficult, but is still only half the battle; for example, we use the Architectures of the parent image of a given Dockerfile to determine which Architectures this image supports -- in a multi-stage world, that needs to instead calculate the set intersection of all the parent images' Architectures
  • opinion: multi-stage build promotes messy Dockerfile since the first image isn't pushed
    • can be slightly mitigated by requiring same good practices throughout Dockerfile

@StefanScherer
Copy link

Thanks @yosifkit for the comprehensive list of pros/cons to get a better understanding of what has to be considered.
For Windows images we still need an updated Docker engine as I doubt that the current installed docker version is able to run the upcoming 1709 images properly. And then we could use multi-stage builds for reasonable nanoserver based images.
I'm used to disposable build agents like Travis/Circle/AppVeyor, so speaking about caching first stages or cleaning up dangling images is done during one build agent's life cycle, so normally it gets cleaned up after a build.
I've seen some approaches with Jenkins, but it seems that this is not a common practice.

@friism
Copy link

friism commented Sep 20, 2017

can greatly increase build time (busybox building binaries: 37 min for just amd64, vs a Dockerfile with ADD busybox.tar.xz /)

If you have an expedient way to build busybox, can't you just still do that? You don't have to use multi-stage builds where it doesn't make sense.

updates to either base image will require a rebuild and push

Only to the second base image, no? If the first one is the same, you could presumably re-use the first artifact. If your build-boxes are ephemeral you can cache intermediary build artifacts in a registry: moby/moby#26839

is a single binary image useful?

I think deciding whether to further slim down the slim images is a separate discussion. Even if we don't use this as an opportunity to make more images single-binary, multi-stage builds are useful.

@mickare
Copy link

mickare commented Feb 27, 2018

tl;dr: multi-stage build helps to clear removed artifacts and reduces the image size

@yosifkit
increases space required on build servers

I think it is better to use more space on build servers than the traffic that is wasted by removed artifacts left in layers.

A prime example is the Ubuntu image, where apt artifacts are deleted but still remain in the first ADD layer. And I fully agree that "repacking the tarballs is out of the question". Unfortunately the --squash option is still experimental. Multi-stage builds do offer a nice solution and more.

So a simple multi-stage build in the Ubuntu example can decrease the image size by 26MB. The compressed image on Docker Hub would be decreased by 5 MB. That is 5MB lower traffic for each pull. You can imagine what that means for a base image that has already 10M+ pulls. 😄

@mickare
Copy link

mickare commented Mar 20, 2018

A new point was made by @jouve on my multistage copy proposal in the ubuntu image repo, so I just drop it here as an update.

😞 It has serious issues... and therefore is not usable.

@jouve tianon/docker-brew-ubuntu-core#119 (comment)
Hi,

this will not work for binaries with setuid bit because COPY does not keep it:

docker build --no-cache . -t jouve/test
Sending build context to Docker daemon  39.31MB
Step 1/6 : FROM scratch as base
 ---> 
Step 2/6 : ADD ubuntu-artful-core-cloudimg-amd64-root.tar.gz /
 ---> 3cbb692107ee
Step 3/6 : RUN ls -l /usr/bin/passwd
 ---> Running in c309e026a953
-rwsr-xr-x 1 root root 54224 Aug 20  2017 /usr/bin/passwd
Removing intermediate container c309e026a953
 ---> 6975740163a1
Step 4/6 : FROM scratch
 ---> 
Step 5/6 : COPY --from=base / /
 ---> fb9f22eb3121
Step 6/6 : RUN ls -l /usr/bin/passwd
 ---> Running in 9ce611e2a914
-rwxr-xr-x 1 root root 54224 Aug 20  2017 /usr/bin/passwd
Removing intermediate container 9ce611e2a914
 ---> 60931d907a8e
Successfully built 60931d907a8e
Successfully tagged jouve/test:latest

@LaurentGoderre
Copy link
Member

Here is my opinion on multi-stage build based on my experience with the node image. At the moment, node needs to be compiled for Alpine linux and this process is LOOOONNG, around 40 minutes per version. In order to reduce this build time we are implementing ccache. Basically our Travis build remembers parts of the build and can reuse those. It bring down the build time from 40-50 minutes to less than 5-10.

The catch however is two fold, first we need to copy the cache files and the then build adds the new cache files. In order for caching to work we need to extract the new cache from the image, which is not possible at build time. And this is where multi-stage comes to the rescue.

The first stage builds node, the second one copies the build result to the final image. We then can create a container with the first stage image (which is entirely independent of the final image), extract the cache files (and any other build related files for debugging) and then delete that image.

@LaurentGoderre
Copy link
Member

@tianon @yosifkit is there anything I can do to move this along?

@arthurdm
Copy link
Contributor

arthurdm commented May 7, 2018

Hi all. The use of multi-stage builds allows for a clean WebSphere Liberty image that has a different linux OS while still using the official Ubuntu-based WebSphere Liberty images to build the IBM JRE and WebSphere Liberty - this way I am not duplicating code that builds the IBM JRE / Liberty content.

The dockerfile is here: https://github.com/WASdev/ci.docker/blob/master/ga/developer/centos/Dockerfile

The PR to get it integrated is: PR #4326

Are there issues with this approach?

@tianon
Copy link
Member

tianon commented May 7, 2018

Are there issues with this approach?

Yes, similar to what @yosifkit has outlined above regarding multi-stage builds in general -- this creates an implicit dependency between two otherwise disparate images that's difficult for our tooling to extract.

@arthurdm
Copy link
Contributor

arthurdm commented May 8, 2018

hi @tianon - In this case the --from is referencing an official image, instead of a named build-stage, so shouldn't be easier to determine the dependency?

Would this be a better fit for the Docker Store instead of Docker Hub?

@tianon tianon mentioned this issue Jul 26, 2018
9 tasks
@tianon tianon mentioned this issue Jan 4, 2019
@LaurentGoderre
Copy link
Member

Here is a scenario where multi-stage build support would be very helpful. I'm building a docker image for the Pachyderm CLI client which hopefully can become and official image. The client is written in GO. Using multi-stage, I was able to create really small images that don't need the entire golang image. However, if I couldn't use multi-stage I would have to either create an image that copies the golang image and delete the content at the end or use the golang image itself and have a huge image.

@tianon
Copy link
Member

tianon commented May 17, 2019

First step towards being able to actually support this is up at #5929.

@tianon
Copy link
Member

tianon commented Aug 14, 2019

#5929 and https://github.com/docker-library/faq#multi-stage-builds implement the basics of this -- there are still a few dangling scripts that have edge cases (as noted above) but we're finding/fixing them as we go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants