Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache backend API example #406

Merged
merged 1 commit into from Jul 29, 2021
Merged

Conversation

crazy-max
Copy link
Member

@crazy-max crazy-max commented Jul 13, 2021

@crazy-max
Copy link
Member Author

@crazy-max crazy-max commented Jul 13, 2021

Only merge when buildx 0.6 and BuildKit 0.9 are released. In the meantime you can test this feature with the following workflow:

name: ci

on:
  push:
    branches:
      - 'master'

jobs:
  docker:
    runs-on: ubuntu-latest
    steps:
      -
        name: Checkout
        uses: actions/checkout@v2
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: v0.6.0
          buildkitd-flags: --debug
      -
        name: Login to DockerHub
        uses: docker/login-action@v1 
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      -
        name: Build and push
        uses: docker/build-push-action@v2
        with:
          context: .
          push: true
          tags: user/app:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max

@crazy-max crazy-max mentioned this pull request Jul 13, 2021
jojomatik added a commit to jojomatik/blockcluster that referenced this issue Jul 17, 2021
Use the new github actions cache backend to cache docker layers, configured as specified in this pull request comment: docker/build-push-action#406 (comment). The backend is still experimental but should already work. This solutions is more elegant and could be faster than the previous approach.
jojomatik added a commit to jojomatik/blockcluster that referenced this issue Jul 17, 2021
Use the new github actions cache backend to cache docker layers, configured as specified in this pull request comment: docker/build-push-action#406 (comment). The backend is still experimental but should already work. This solutions is more elegant and could be faster than the previous approach.
@mmeendez8
Copy link

@mmeendez8 mmeendez8 commented Jul 19, 2021

I think this can be merged now 🎉 Super cool work I have been looking forward this!

@crazy-max
Copy link
Member Author

@crazy-max crazy-max commented Jul 19, 2021

@mmeendez8 GitHub virtual environments don't have buildx 0.6.0 (0.5.1 atm) so will wait until then.

@mmeendez8
Copy link

@mmeendez8 mmeendez8 commented Jul 19, 2021

Oh I see, any idea about when this might happen? We may create a new topic in a discussions section

@crazy-max
Copy link
Member Author

@crazy-max crazy-max commented Jul 19, 2021

Oh I see, any idea about when this might happen? We may create a new topic in a discussions section

I think it's Microsoft internal. cc @cpuguy83

@cpuguy83
Copy link

@cpuguy83 cpuguy83 commented Jul 19, 2021

Having some pipeline issues at the moment, but hoping to get this out today or tomorrow.... then it'll need to be picked up in the virtual-env image that GHA uses.

@crazy-max
Copy link
Member Author

@crazy-max crazy-max commented Jul 19, 2021

@cpuguy83 Thanks a bunch!

@cpuguy83
Copy link

@cpuguy83 cpuguy83 commented Jul 20, 2021

0.6.0 is available now, but I'm not sure when the GHA virtual-env images will be updated. You should be able to manually update (apt-get update && apt-get install -y moby-buildx).

@ianschmitz
Copy link

@ianschmitz ianschmitz commented Jul 20, 2021

This is awesome guys - great work!

We've been trying this out over the last couple days and have noticed that the first build for a PR typically doesn't seem to have any cached layers. However subsequent commits to the PR seem to have cached layers and are much faster. Is this expected?

For example my hope would have been that a PR merging into main would use the cache from main if one doesn't already exist for the PR.

@catthehacker
Copy link

@catthehacker catthehacker commented Jul 20, 2021

0.6.0 is available now, but I'm not sure when the GHA virtual-env images will be updated.

Images are generated on a weekend and deployed throughout the week, so it might be next week when 0.6.0 will be widely available or in 2 weeks.

@robpc
Copy link

@robpc robpc commented Jul 21, 2021

We've been trying this out over the last couple days and have noticed that the first build for a PR typically doesn't seem to have any cached layers. However subsequent commits to the PR seem to have cached layers and are much faster. Is this expected?

For example my hope would have been that a PR merging into main would use the cache from main if one doesn't already exist for the PR.

@ianschmitz Are you using multiple restore keys? We use a system like this to fallback to a "any" image if the branch one isn't available.

      - name: Cache Docker layers
        uses: actions/cache@v2
        with:
          path: /tmp/.buildx-cache
          key: ${{ env.app_env }}-${{ github.job }}-${{ env.branch }}-${{ github.sha }}
          restore-keys: |
            ${{ env.app_env }}-${{ github.job }}-${{ env.branch }}-
            ${{ env.app_env }}-${{ github.job }}-

So in this example you could make the second one match your main branch key.

@ianschmitz
Copy link

@ianschmitz ianschmitz commented Jul 21, 2021

@robpc negative. We're following the example laid out in this PR (i think it's correct?). It looks like this:

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: v0.6.0-rc1
          driver-opts: image=moby/buildkit:v0.9.0-rc2
          buildkitd-flags: --debug

      - name: Login to ECR
        uses: docker/login-action@v1
        with:
          registry: ${{ env.DOCKER_REG }}
          username: ${{ secrets.AWS_ACCESS_KEY_ID }}
          password: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v2
        with:
          cache-from: type=gha
          cache-to: type=gha
          file: docker/Dockerfile
          platforms: linux/amd64, linux/arm64
          push: true
          tags: ${{ env.DOCKER_REG }}/foo:${{ github.sha }}

Where i was hoping that cache-from: type=gha would do roughly what i was thinking above.

@robpc
Copy link

@robpc robpc commented Jul 21, 2021

@robpc negative. We're following the example laid out in this PR (i think it's correct?). It looks like this:

Where i was hoping that cache-from: type=gha would do roughly what i was thinking above.

@ianschmitz Sorry, I got my wires crossed a bit. The example I showed was from the workaround to using local in buildx and manually setting the gh cache. I am wondering if there is an equivalent way of specifying multiple keys in this buildx option. I realize now that is probably what your question was, so apologies for that.

It sounds like scope is the way to specify that from the other comments, but I didn't see in the documentation if one could specify multiple and if that behaves like the actions/cache@v2.

@ianschmitz
Copy link

@ianschmitz ianschmitz commented Jul 21, 2021

Sorry, I got my wires crossed a bit. The example I showed was from the workaround to using local in buildx and manually setting the gh cache. I am wondering if there is an equivalent way of specifying multiple keys in this buildx option. I realize now that is probably what your question was, so apologies for that.

Np! I guess that's what my question is? I was more hoping that the default behavior when using type=gha would be fairly intuitive where it pulls from target branch or default branch cache if it exists. I think that's what most folks would want/expect out of the box? From what i've experienced so far, the first job run in a new PR doesn't benefit from this caching.

Effectively the pseudo logic i was thinking would be something like:

  1. Does a cache exist for this PR? If so, use it
  2. Does a cache exist for the target branch? If so, use it
  3. Does a cache exist for the default branch? If so, use it

Something along those lines 😃

@jojomatik
Copy link

@jojomatik jojomatik commented Jul 22, 2021

Effectively the pseudo logic i was thinking would be something like:

  1. Does a cache exist for this PR? If so, use it
  2. Does a cache exist for the target branch? If so, use it
  3. Does a cache exist for the default branch? If so, use it

Something along those lines 😃

@ianschmitz
This is the logic that the "usual" (i.e. actions/cache) github actions cache follows (link to documentation):

A workflow can access and restore a cache created in the current branch, the base branch (including base branches of forked repositories), or the default branch (usually main). For example, a cache created on the default branch would be accessible from any pull request. Also, if the branch feature-b has the base branch feature-a, a workflow triggered on feature-b would have access to caches created in the default branch (main), feature-a, and feature-b.

As far as I can tell from my workflows (e.g. jojomatik/blockcluster#147 (workflow)) the same logic is true for this new cache backend. Not sure why it's not working for you.

@robpc
Copy link

@robpc robpc commented Jul 22, 2021

As far as I can tell from my workflows (e.g. jojomatik/blockcluster#147 (workflow)) the same logic is true for this new cache backend. Not sure why it's not working for you.

As far as I know the github cache action doesn't have a default. From the documentation for the cache action:

The cache action will attempt to restore a cache based on the key you provide. When the action finds a cache, the action restores the cached files to the path you configure.

If there is no exact match, the action creates a new cache entry if the job completes successfully. The new cache will use the key you provided and contains the files in the path directory.

You can optionally provide a list of restore-keys to use when the key doesn't match an existing cache.

The implication there is that the user is supplying both the key and the restore-keys which makes sense because depending on what you are doing they could be wildly different. Given that the action doesn't impose a default, I am wondering if this change is implementing one. I could see a reasonable default being job-branch-sha and restore keys being job-branch-sha, job-branch, job, but that's probably only because that's how I use it and everyone's build setup is a little different.

I am curious what the fallback logic is as implemented by this feature in buildx and if we have the power to change it. If this new feature in buildx is more or less mimicking this functionality, what is the key and restore-keys being used? From the documentation on buildx, and a comment from @crazy-max (I think) in an issue I can't find (EDIT: found it), it sounds like scope is maybe related to something buildx is doing with the keys.

@crazy-max
Copy link
Member Author

@crazy-max crazy-max commented Jul 25, 2021

It sounds like scope is the way to specify that from the other comments, but I didn't see in the documentation if one could specify multiple and if that behaves like

scope is documented in --export-cache and --import-cache sections. I think we will improve this doc and display all attributes for each cache type for better visibility.

@catthehacker
Copy link

@catthehacker catthehacker commented Jul 26, 2021

0.6.0 is available now, but I'm not sure when the GHA virtual-env images will be updated.

Images are generated on a weekend and deployed throughout the week, so it might be next week when 0.6.0 will be widely available or in 2 weeks.

Runners image update 20210726 includes Buildx 0.6.0. PRs (section Tools > Docker-Buildx):

PRs should be merged once all GitHub runners are deployed with this version (when above badges will be at 100% with version 20210726).
Mind you that during deployment phase you can land on a runner with update 20210718 or 20210726 and it's not possible to pin version.

jrheard added a commit to jrheard/rask that referenced this issue Jul 27, 2021
jrheard added a commit to jrheard/rask that referenced this issue Jul 27, 2021
chrisekelley added a commit to Tangerine-Community/Tangerine that referenced this issue Jul 28, 2021
@bekriebel
Copy link

@bekriebel bekriebel commented Jul 28, 2021

I'm not sure if this is related to the scope discussion or not, but I'm finding that cache is not used if actions/checkout@v2 is used with fetch-depth: 0. Is there a way to have the cache work properly when the full git history is pulled?

@cpuguy83
Copy link

@cpuguy83 cpuguy83 commented Jul 28, 2021

Have you tried adding .git to your .dockerignore?

@bekriebel
Copy link

@bekriebel bekriebel commented Jul 28, 2021

Have you tried adding .git to your .dockerignore?

I actually need .git in my docker workspace, but in a later step where I COPY it in. It isn't copied in any of the earlier steps, so I would expect the cache to be valid up until that COPY step.

@bekriebel
Copy link

@bekriebel bekriebel commented Jul 28, 2021

🤦 nevermind, I think I see what is happening. I'm using a multi-stage docker build. Any change that is causing the first stage to be rebuilt is invalidating the cache. It has nothing to do with the fetch-depth, scope, or the .git folder specifically.

I had assumed this was caching the whole docker environment including steps for non-final stages - it looks like the only thing that is cached is the final stage image, so anything that requires any step of an earlier stage to be rebuilt appears to cause the cache to not add any savings.

edit: setting the cache mode to max does what I need

          cache-from: type=gha
          cache-to: type=gha,mode=max

@crazy-max crazy-max force-pushed the cache-exporter-doc branch 2 times, most recently from 6ffd7c3 to 7ee180e Compare Jul 29, 2021
@crazy-max crazy-max marked this pull request as ready for review Jul 29, 2021
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
@crazy-max crazy-max merged commit 1a60e0d into docker:master Jul 29, 2021
23 checks passed
@crazy-max crazy-max deleted the cache-exporter-doc branch Jul 29, 2021
shgtkshruch added a commit to shgtkshruch/chronos that referenced this issue Jul 31, 2021
shgtkshruch added a commit to shgtkshruch/chronos that referenced this issue Jul 31, 2021
shgtkshruch added a commit to shgtkshruch/chronos that referenced this issue Jul 31, 2021
lopopolo added a commit to artichoke/docker-artichoke-nightly that referenced this issue Aug 7, 2021
@@ -600,7 +579,7 @@ jobs:
id: cache
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-ghcache-${{ github.sha }}
key: ${{ runner.os }}-buildx-local-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-ghcache-
Copy link

@pjonsson pjonsson May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the restore-keys have been updated from "ghcache" to "local" as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants