Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Laundry list of smells from this resource #190

Open
vito opened this issue Apr 2, 2018 · 18 comments

Comments

@vito
Copy link
Member

@vito vito commented Apr 2, 2018

This resource hasn't aged well. We've long suspected a new approach may be necessary, potentially including a complete rewrite and/or breaking it up into separate resource types, each focused on a specific workflow.

I'm going to use this issue as a dumping ground of sorts for now, in hopes that it eventually leads to a more focused direction that can skewer a few of these piranhas at once.

  1. There are too many knobs, and way too many ways to put.
  2. The testing situation is improving, but pretty bad.
  3. Important parts are written in bash and are subject to the docker CLI output format changing. Some things are already broken because of this.
  4. There are too many ways to configure credentials.
  5. There's no caching by default, and it's unclear if everyone would want that. cache: true even pulls down the image willy-nilly, preventing Concourse caching of the image (ironically). (see #121, #148, #188)
    • Caching all the time is great and all until it's not. Lines like apt-get update && apt-get -y install foo will need the cache busted in order to get a new foo.
  6. There isn't a great way to integrate Dockerfiles with Concourse-fetched dependencies while also having the image easy to build locally during development (you'd be missing all the dependencies).
  7. There's no way to build without pushing to a registry. To be honest, as a resource that makes sense (there must be an external source of truth), but a reusable task would be great for this.
  8. It embeds a Docker daemon. Would it be easier to point at an external one? What impact might this have on caching? Would this be an acceptable assumption for most people? How would the resource be pointed at the daemon?
@jchesterpivotal

This comment has been minimized.

Copy link

@jchesterpivotal jchesterpivotal commented Apr 18, 2018

There may be value in adopting one of the alternatives that are emerging -- buildah, Img, kaniko etc. There is also a nascent "Container Builder Interface" (CBI) being spruiked which aims to unify these under a common API.

We can definitely seek advice and input from folks in the Google container tooling teams (tagging @mattmoor, @dlorenc and @ImJasonH), who are attacking this stuff from a lot of angles. @sclevine is also exploring this problem in the context of buildpacks, including looking closely at the various options I linked.

Edit: I should also tag @bparees from Red Hat. They've had to wrestle with Dockerfiles-vs-sanity for OpenShift a bit.

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Apr 19, 2018

@jchesterpivotal Yeah, I think that's the future. I'm playing with kaniko locally at the moment and it looks very promising for Concourse's use case. There are obviously limitations right now that would prevent us from switching to it (multi-stage Dockerfiles for example don't work atm), but I'm sure it'll get there.

So the goal with Kaniko would be to run in a regular unprivileged container and not require spinning up a daemon. The downside I suppose would be that there's no caching, which a lot of people seem to care about. There's a somewhat nebulous open issue for it: GoogleContainerTools/kaniko#102

I think we'd at least need the ability to support the load_bases param since that's a very Concoursey way of controlling the version of the base image it's built from. I don't see a way to do that at the moment.

@jchesterpivotal

This comment has been minimized.

Copy link

@jchesterpivotal jchesterpivotal commented Apr 19, 2018

In terms of load_bases, could I interest you in image-rebase?

@jchesterpivotal

This comment has been minimized.

Copy link

@jchesterpivotal jchesterpivotal commented Apr 19, 2018

Also, there's a lot to be said for breaking docker-image-resource apart into a reusable task and a image-registry-resource.

First, we don't really have reusable tasks as a native concept yet, and it would be a useful one to ship. Second, they can use different bits of software for different tasks. A registry resource could use go-containerregistry to handle the bit shipping.

@sfrique

This comment has been minimized.

Copy link

@sfrique sfrique commented Jul 2, 2018

Hello, great to see that this resource may get an big update/refactor

my two cents
Cache is important when there are complex Dockerfiles and it results on big images.
I think external docker daemon is a good option if there is an way to start/stop it or somehow control it. (If i remember correctly, circleCI uses an external docker daemon)
Also with external docker, cache wound't just works if this daemon is long running?

Having different resources make sense in my view. So, build, run, push/pull could be different resources.
So a CI would, build and pull any images needed, run (for tests and integration tests), and if stuff works, push the new version to registry.

We are start to using concourse for some pipelines so I am not that experienced with it, but our pipelines seems hacky regarding running tests and etc.
We have some pretty big images, but the build process didn't went to concourse yet, but we will test it and see if cache works as expected for those. The small ones don't really need cache.

@chrishiestand

This comment has been minimized.

Copy link
Contributor

@chrishiestand chrishiestand commented Jul 9, 2018

Thanks a lot for organizing this issue @vito. As @vito rightly points out, one of the big issues with this resource is the way that cache is abstracted. A standalone docker daemon has pretty good cache defaults and people are used to them. I think we should emulate "workstation docker" behavior as best as we can, whether from a central cache or per worker cache. Per worker cache seems like the most natural fit for concourse, and I think would still yield fairly good performance (e.g. frequently built images could likely be in cache).

I do not know if concourse supports my ideal cache for docker, or for any other immutable data (because the docs are sparse https://concourse-ci.org/caching-and-retention.html; https://concourse-ci.org/volume-internals.html ). But I would love to see persistent cache that is retained across builds and possibly pipelines for immutable data and only for immutable data. For immutable data such as docker layers or precisely versioned libraries, there are only benefits to caching data between pipelines from an end-user perspective. I think CI/CD is an area that warrants extra complexity in order to achieve higher performance because humans wait on CI/CD. Reducing feedback latency is a big win for development and operations velocity.

Here is my wishlist for this resource, but similar things can be said for any immutable resources:

  1. Any layers downloaded from a docker pull should be automatically accessible and used by any docker pull done on the same worker - regardless if this was part of a cache warming step, docker pull, or docker build. Definitely for builds in the same pipeline, ideally between pipelines. This alone will significantly simplify/supersede several cache options in this resource: cache, cache_tag, load, load_base, and load_bases.

  2. Build steps should be automatically re-used. I think this is possible though I'm not completely sure.

  3. The build pipeline should not need to know what docker images to cache as in load_bases or cache_from, but it should optimally use cache anyway. Hard coding these images into the pipeline creates a tight coupling between your higher level pipeline and your lower level application build file (Dockerfile); requiring coordination between the two. Ideally these images might only be referenced once in the app repo's Dockerfile. You should not have to update your build pipeline when you update your Dockerfile, unless there is a reason to not use the defaults. 1 and 2 above should make this step possible

To re-use build steps within a dedicated resource, such as in docker-imag-resource's cache_from: would it make sense to add a special tag to the build e.g. __resource-cache, to later be consumed by a subsequent build e.g. --cache-from=this-image:__resource-cache? The special tag could, by default, be created on all builds. That should yield fairly decent cachability for builds. We could similarly, by default, automatically add tags to intermediate stages in a multi-stage build as in this-image:__resource-cache-stage-2-4 (read: stage 2/4). Just spitballing here.

@vito vito referenced this issue Jul 18, 2018
@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Jul 19, 2018

@chrishiestand Yeah, I agree with just about all of that. I'd be really interested in seeing a fork of this resource that is pointed at a centrally deployed Docker daemon, instead of spinning up its own. That would at least get you a lot of the desired caching behavior for free.

The downside of this approach is it suddenly adds a hard dependency on a Docker daemon, which isn't commonly deployed standalone, at least in my experience. A lot of people are going to want to forward their existing docker.sock file, which won't currently be possible as resource containers are entirely isolated from the host (by design). That approach also won't be relevant at all to folks deploying without Docker.

So you could deploy one on its own and configure it to listen on TCP (maybe with TLS?), and then point your pipelines at it. You'd get much better caching behavior, at the expense of your builds now having to upload their context to a remote Docker daemon as part of the build. Probably a net benfit, but unfortunately it takes a bit more work to set up.

I think it's a worthwhile experiment, and shouldn't be too hard to implement as a fork of this resource. It should just be a matter of ripping out all the Docker daemon setup and the params related to it, and then targeting the configured remote daemon with the CLI.

@jchesterpivotal

This comment has been minimized.

Copy link

@jchesterpivotal jchesterpivotal commented Jul 19, 2018

In terms of alternative implementations, I would think about splitting it into two parts.

The first would be a image-build-task, which under the hood could use Kaniko to avoid a dependency on a Docker daemon. The second would be a image-repository resource which operates on images, not Dockerfiles. The latter could use go-containerregistry to avoid another Docker dependency.

Having the split means that I can also come up with further permutations without needing a single resource to become increasingly busy:

  • I can substitute an image-build-task for a jib-build-image-task, a ko-build-image-task or even a docker-daemon-build-image-task if I am prepared to isolate the daemon onto an isolated machine. One can also imagine build tasks based on Buildah, Orca, CBI, Img etc.
  • I can substitute gcs-bucket-resource for image-registry, because I know GCR simply acts as a clever facade over GCS. Or I can use artifactory-resource. Or I can do something else entirely.

Finally the split opens up seams to allow the introduction of layer-by-layer logic. Right now images are treated as an all-or-nothing affair, but in the near future it will become more common for efficiency and security purposes to think about layers instead of images. Breaking apart the build task from the registry resource will help.

I'm making the assumption that redistributable tasks exist and are based on more than shared conventions. I think splitting up docker logic would be a decent way to explore what would be necessary for a higher-level Task concept.

Edit: I also just noticed I already said all of this previously, but more concisely.

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Jul 19, 2018

@jchesterpivotal

Kaniko won't help with caching yet, but I still think it's worth trying, at least to remove the Docker-in-Docker aspect while preserving portability. For what it's worth, there has since been more discussion in GoogleContainerTools/kaniko#102 which shows that their goals there are pretty well aligned with Concourse's (caching FROM but not individual RUN commands, for example, which makes a lot of sense for CI).

I'm making the assumption that redistributable tasks exist and are based on more than shared conventions. I think splitting up docker logic would be a decent way to explore what would be necessary for a higher-level Task concept.

Fully agreed. See recent discussion in #185 - there's currently way too much friction when it comes to redistributable tasks, which forces people to use resources for things they shouldn't be used for.

So I like the idea of splitting this up, because there's already clearly a need for build-without-push. (I'm also happy to see any proposal which cuts this resource up into smaller pieces, so that alone is great.)

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Jul 19, 2018

@jchesterpivotal Here's a proof-of-concept using Kaniko to build, as a Concourse task:

---
platform: linux

image_resource:
  type: docker-image
  source: {repository: gcr.io/kaniko-project/executor}

inputs:
- name: source

outputs:
- name: image

run:
  path: /kaniko/executor
  args:
  - --dockerfile=/tmp/build/e55deab7/source/Dockerfile
  - "--destination=doesnt/matter:test"
  - --context=/tmp/build/e55deab7/source
  - --tarPath=/tmp/build/e55deab7/image/image.tar

Full usage/output here: https://gist.github.com/vito/ab90980adbeaf000739fff6a64e0407d

One gotcha is I had to use absolute paths for the various flags, because the first thing Kaniko does is cd /. But that's a minor nit. It so happens that fly executed builds have a predictable build path, so I worked around it by hardcoding it. That's all I can really do since the image is build FROM scratch and literally only contains the Kaniko executor. We'd have to find a better solution if we want to do this for real.

I also ran into this issue when trying to do build a multi-stage image: GoogleContainerTools/kaniko#245

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Aug 7, 2018

I've started to implement a new resource for fetching images from a registry: https://github.com/concourse/registry-image-resource

See the README for comparison/goals/etc.

So far so good. Testflight and TopGun are both passing with it as a drop-in replacement for docker-image when used for task and resource type image fetching.

@neumayer

This comment has been minimized.

Copy link

@neumayer neumayer commented Sep 4, 2018

It's super nice this is being discussed. If it helps the workaround I have for now is:

  • have a task to build and test docker images (docker build && inspec && docker save)
  • use the current docker resource to push to a registry only (to have a way of handling images between jobs)

This is for our external images, the testing step is basically why I am using a task instead of the old docker image resource.

The main pain points for me are caching when it comes to batch building images, I have one folder for all images I use inside concourse and the whole folder is built on changes (internal stuff, image testing not such a priority). I just now started looking at improving this a bit and the caching comes up now :-)

So far I think all proposals here could improve stuff for me, I will try to use the new resource here to do some experiments, possibly even look at kaniko. I will continue looking at multi-arch builds so this might be a good chance for me to spend some more time on this. I hope this can be useful input in the design process here, altough admittedly I'm not sure that aarch64/amd64 dual builds are much of a priority for others.

@neumayer

This comment has been minimized.

Copy link

@neumayer neumayer commented Sep 6, 2018

I redid @vito's kaniko experiments, for me there's a few disadvantages:

  • kaniko seems still a bit early (like you mentioned with the caching and multi stage builds). I agree that some of it will be possible to fix but I'm unsure about the caching. I'm not sure what's best I'm just saying in some ways it feels tempting to reuse the dockerd logic for this (i.e. still use dind and "somehow" get dockerd to handle the caching)...
  • it's less multi-arch capable (currently it depends on an amd64 build of some credential helper for example), I mean it could be solved, just comparing with the existing dind solution
  • the image I tried to build required a privileged container for kaniko anyway (something about permissions of the "chage" binary), that might be more of an upstream bug of RHEL, but there might be more like it (I admit I haven't investigated this properly)

In many ways I think that a resource for building based on dind and the go bindings could be a good compromise. I don't feel that depending on dind is a huge issue in itself when it comes to building.

The current resource is multi-arch capable by using the alpine repo/architecture ecosystem.

The caching in my mind is somewhat independent and it doesn't seem like there's easy solutions for it. I discovered that in my scenario the biggest issue is that I have a slow connection to dockerhub. I'll experiment a bit but maybe this can be better solved by using a local registry (to which there is a good connection) always and have that proxy dockerhub so the caching functionality is dealt with there instead (at least in my mind that keeps the architecture clean) ... in that case the split up in registry and building resources makes a lot of sense, we keep the components the way the were envisioned, we avoid implementing caching logic ourselves, we can get rid of the command line output parsing (if there still is any).

I'll try to take a look at @vito's registry resource and try to experiment with the registry proxying/mirroring (it turns out it's a thing: https://docs.docker.com/registry/recipes/mirror/)

@ImJasonH

This comment has been minimized.

Copy link

@ImJasonH ImJasonH commented Sep 6, 2018

Re: caching, GoogleContainerTools/kaniko#300 is the current best idea for enabling fast incremental builds in kaniko.

To me, relying on dind means you simply cannot build on a Kubernetes cluster unless you trust every user on the cluster. By removing the need to trust each user, you can get closer to providing a build service on a single large multitenant cluster instead of having to isolate each user on their own cluster, which unnecessarily wastes resources.

/cc @priyawadhwa @dlorenc @mattmoor

@dlorenc

This comment has been minimized.

Copy link

@dlorenc dlorenc commented Sep 6, 2018

@neumayer - what kaniko image did you use? Nothing we ship as officially supported from the kaniko repo should require the privileged flag.

@neumayer

This comment has been minimized.

Copy link

@neumayer neumayer commented Sep 7, 2018

@dlorenc I think you are right, I must have used some old image at some point in my experiments, I don't seem to be able to reproduce the privilege issue.

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Oct 1, 2018

I whipped up a task that builds a Docker image using img:

What's neat is caching was pretty easy to support via Concourse task caches, and they're easy to invalidate if needed with fly clear-task-cache.

We're using this in our new pipeline for the unit-image and dev-image jobs:

Note that this doesn't yet support running without privileged: true.

@priyawadhwa

This comment has been minimized.

Copy link

@priyawadhwa priyawadhwa commented Oct 1, 2018

@vito just FYI, kaniko currently supports multistage builds and opt-in remote layer caching for RUN commands (we're working on adding more optimizations to caching as well!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.