Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use matrix for multi-platform builds? #846

Closed
felipecrs opened this issue Mar 23, 2023 · 30 comments
Closed

How to use matrix for multi-platform builds? #846

felipecrs opened this issue Mar 23, 2023 · 30 comments

Comments

@felipecrs
Copy link

felipecrs commented Mar 23, 2023

I know I can use the same runner to build all the platforms at the same time, but this causes my builds to take 2 hours instead of 20 minutes if I split to different runners.

I was able to achieve something similar with:

name: ci

on:
  push:
    branches:
      - "main"
  pull_request:
    branches:
      - "main"

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        platform:
          - linux/amd64
          - linux/386
          - linux/arm/v6
          - linux/arm/v7
          - linux/arm64
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Set cache name
        id: cache-name
        run: |
          echo 'cache-name=asterisk-cache-${{ matrix.platform }}' | sed 's:/:-:g' >> $GITHUB_OUTPUT

      - name: Build and push
        uses: docker/build-push-action@v4
        with:
          context: asterisk
          platforms: ${{ matrix.platform }}
          tags: asterisk
          cache-from: type=gha
          cache-to: type=local,dest=/tmp/asterisk-cache,mode=max

      - name: Upload cache
        uses: actions/upload-artifact@v3
        with:
          name: asterisk-cache-${{ steps.cache-name.outputs.cache-name }}
          path: /tmp/asterisk-cache
          if-no-files-found: error
          retention-days: 1

  push:
    runs-on: ubuntu-latest
    needs: build
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Download cache
        uses: actions/download-artifact@v3
        with:
          path: /tmp/asterisk-cache

      - name: Get lowercase GitHub username
        id: repository_owner
        uses: ASzc/change-string-case-action@v5
        with:
          string: ${{ github.repository_owner }}

      - name: Docker meta
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: |
            ghcr.io/${{ steps.repository_owner.outputs.lowercase }}/asterisk-hass-addon
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}

      - name: Login to DockerHub
        if: github.event_name == 'push' || github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && github.actor != 'dependabot[bot]'
        uses: docker/login-action@v2
        with:
          registry: ghcr.io
          username: ${{ github.repository_owner }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push
        uses: docker/build-push-action@v4
        with:
          context: asterisk
          platforms: |
            linux/amd64
            linux/386
            linux/arm/v6
            linux/arm/v7
            linux/arm64
          push: ${{ github.event_name == 'push' || github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && github.actor != 'dependabot[bot]' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=local,src=/tmp/.buildx-cache
          cache-to: type=gha,mode=max

The problem is that it takes 10 minutes to upload the cache and then more 5 minutes to download the cache again.

Is there any suggestion to circumvent this?

@K-shir0
Copy link

K-shir0 commented Mar 28, 2023

@felipecrs

I was able to handle individual jobs by using the method posted here.

Reference: #671 (comment)

@felipecrs
Copy link
Author

That's very interesting!

I wonder if it would be possible to have buildx pushing different platforms without the need of being a single call.

Currently if I push a single --platform, it seems to override the previously pushed one.

@crazy-max
Copy link
Member

crazy-max commented May 9, 2023

@felipecrs
Copy link
Author

@crazy-max that's really cool but... last time I tried to use upload-artifact for the job, it took 20-30 minutes only to upload and download it.

It was definitely a showstopper for me.

Do you believe it has improved? Maybe I should try it again.

@crazy-max
Copy link
Member

crazy-max commented May 9, 2023

last time I tried to use upload-artifact for the job, it took 20-30 minutes only to upload and download it.

You were uploading all cache export which can be quite expensive to compress > upload > download > decompress. In https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners we are just uploading the resulting image tarball.

If you want to use cache in the first job you should consider the gha one with a scope affected for each platform. Let me know if you need some help for this.

@felipecrs
Copy link
Author

I will do some testing. Thanks a lot!

@felipecrs
Copy link
Author

This is how I'm adding GHA caching on top of the provided example:

- name: Prepare
  run: |
    mkdir -p /tmp/images
    platform=${{ matrix.platform }}
    platform=${platform//\//-}
    echo "TARFILE=${platform}.tar" >> $GITHUB_ENV
    echo "TAG=${{ env.TMP_LOCAL_IMAGE }}:${platform}" >> $GITHUB_ENV
    echo "SCOPE=${{ env.GITHUB_REF_NAME }}-${platform}" >> $GITHUB_ENV
- name: Build
  uses: docker/build-push-action@v4
  with:
    context: .
    platforms: ${{ matrix.platform }}
    tags: ${{ env.TAG }}
    outputs: type=docker,dest=/tmp/images/${{ env.TARFILE }}
    cache-from: type=gha,scope=${{ env.SCOPE }}
    cache-to: type=gha,scope=${{ env.SCOPE }},mode=max

The test is running now.

However, I wonder how can I integrate the push phase of the example with docker/metadata-action. I suppose I can map the tags to -t flags in docker buildx imagetools create with a shell script, but I wonder what should I do about the labels.

@sando38
Copy link

sando38 commented May 11, 2023

Hi, great to see that approach. I have been using this as well for a couple of months now. And can even adopt some of the commands to mine.

I did not find a solution, however, to annotate labels like "annotations": { "org.opencontainers.image.description": "DESCRIPTION" } to the resulting multi-arch image, yet. Is there a way how this could be achieved?

@crazy-max
Copy link
Member

crazy-max commented May 11, 2023

However, I wonder how can I integrate the push phase of the example with docker/metadata-action. I suppose I can map the tags to -t flags in docker buildx imagetools create with a shell script, but I wonder what should I do about the labels.

Following https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners, I wonder if we could instead reuse the build push action with a temp dockerfile.

Not tested but here is the idea:

  push:
    runs-on: ubuntu-latest
    needs:
      - build
    services:
      registry:
        image: registry:2
        ports:
          - 5000:5000
    steps:
      -
        name: Download images
        uses: actions/download-artifact@v3
        with:
          name: images
          path: /tmp/images
      -
        name: Load images
        run: |
          for image in /tmp/images/*.tar; do
            docker load -i $image
          done
      -
        name: Push images to local registry
        run: |
          docker push -a ${{ env.TMP_LOCAL_IMAGE }}
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          driver-opts: network=host
      -
        name: Login to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      -
        name: Temp Dockerfile
        run: |
          mkdir -p /tmp/dkfilectx
          echo "FROM ${{ env.TMP_LOCAL_IMAGE }}" > /tmp/dkfilectx/Dockerfile
      -
        name: Docker meta
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY_IMAGE }}
      -
        name: Push
        uses: docker/build-push-action@v4
        with:
          context: /tmp/dkfilectx
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
      -
        name: Inspect image
        run: |
          docker buildx imagetools inspect ${{ env.REGISTRY_IMAGE }}:${{ steps.meta.outputs.version }}

@felipecrs
Copy link
Author

It's failing with:

https://github.com/TECH7Fox/asterisk-hass-addons/actions/runs/4949753398/jobs/8852483607?pr=251#step:11:157

Error: buildx failed with: ERROR: failed to solve: localhost:5000/asterisk-hass-addon: localhost:5000/asterisk-hass-addon:latest: not found

I think it's because the "Push" stage is missing the platforms. I'm trying with it now.

@felipecrs
Copy link
Author

Oh no. That's not it. It's because we build the images with the platforms as tags, then later we try to access with :latest.

I think I know how to fix it. Trying now.

@felipecrs
Copy link
Author

After several attempts this is where I stopped:

Dockerfile:2
--------------------
   1 |     ARG TARGETPLATFORM
   2 | >>> FROM localhost:5000/asterisk-hass-addon/${TARGETPLATFORM}
   3 |     
--------------------
ERROR: failed to solve: failed to parse stage name "localhost:5000/asterisk-hass-addon/": invalid reference format
Error: buildx failed with: ERROR: failed to solve: failed to parse stage name "localhost:5000/asterisk-hass-addon/": invalid reference format

It does not make sense, it looks like TARGETPLATFORM is not being injected as the buildx docs says so.

My PR is TECH7Fox/asterisk-hass-addons#251 in case you want to have a look.

@felipecrs
Copy link
Author

Anyway, this is a LOT of complication for such a simple task.

I wonder if it would be possible to have buildx pushing different platforms without the need of being a single call.

Currently if I push a single --platform, it seems to override the previously pushed one.

@crazy-max do you think it would be possible for buildx to support such a thing? I can open an issue there if you say so.

@crazy-max
Copy link
Member

crazy-max commented May 11, 2023

It does not make sense, it looks like TARGETPLATFORM is not being injected as the buildx docs says so.

Can you try with?:

  push:
    runs-on: ubuntu-latest
    needs:
      - build
    services:
      registry:
        image: registry:2
        ports:
          - 5000:5000
    steps:
      -
        name: Download images
        uses: actions/download-artifact@v3
        with:
          name: images
          path: /tmp/images
      -
        name: Load images
        run: |
          for image in /tmp/images/*.tar; do
            docker load -i $image
          done
      -
        name: Push images to local registry
        run: |
          docker push -a ${{ env.TMP_LOCAL_IMAGE }}
            -
        name: Create manifest list and push to local registry
        run: |
          docker buildx imagetools create -t ${{ env.TMP_LOCAL_IMAGE }}:latest \
            $(docker image ls --format '{{.Repository}}:{{.Tag}}' '${{ env.TMP_LOCAL_IMAGE }}' | tr '\n' ' ')
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          driver-opts: network=host
      -
        name: Login to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      -
        name: Temp Dockerfile
        run: |
          mkdir -p /tmp/dkfilectx
          echo "FROM ${{ env.TMP_LOCAL_IMAGE }}:latest" > /tmp/dkfilectx/Dockerfile
      -
        name: Docker meta
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY_IMAGE }}
      -
        name: Push
        uses: docker/build-push-action@v4
        with:
          context: /tmp/dkfilectx
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          platforms: linux/amd64,linux/arm/v6,linux/arm/v7,linux/arm64
          labels: ${{ steps.meta.outputs.labels }}
      -
        name: Inspect image
        run: |
          docker buildx imagetools inspect ${{ env.REGISTRY_IMAGE }}:${{ steps.meta.outputs.version }}

Anyway, this is a LOT of complication for such a simple task.

Yes as said in docker/docs#17180 (comment) we could provide a composite action to ease the integration in your workflow.

@felipecrs
Copy link
Author

@crazy-max you have some references to LOCAL_IMAGE and TMP_LOCAL_IMAGE. Were they all supposed to be TMP_LOCAL_IMAGE like in the document (https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners)?

@felipecrs
Copy link
Author

Never mind, I think the answer is no. I'm testing here.

@crazy-max
Copy link
Member

crazy-max you have some references to LOCAL_IMAGE and TMP_LOCAL_IMAGE. Were they all supposed to be TMP_LOCAL_IMAGE like in the document (https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners)?

That's a typo my bad, should be TMP_LOCAL_IMAGE.

@felipecrs
Copy link
Author

@crazy-max it worked! Thanks a lot!

Just for information:

  1. Uploading the images takes 3+ minutes
  2. Downloading them again takes one more minute
  3. Loading them takes <2 minutes
  4. Pushing to local registry ~3 minutes

1 and 2 can be shaved to less than 1 minute if we switch from upload-artifact to actions cache, here is one example:

TECH7Fox/asterisk-hass-addons#236

3 and 4 maybe can save 1 minute in total by leveraging some parallelism with GNU parallel.


However, another approach that could potentially save time is to instead of using a local registry for the job, we could push the temporary images to GHCR itself with an unique tag and have a cleanup job that deletes the temporary images after.


But still, nothing would beat both the speed and simplicity of:

name: ci

on:
  push:
    branches:
      - "main"

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        platform:
          - linux/amd64
          - linux/386
          - linux/arm/v6
          - linux/arm/v7
          - linux/arm64
    steps:
      -
        name: Checkout
        uses: actions/checkout@v3
      -
        name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      -
        name: Login to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      -
        name: Build and push
        uses: docker/build-push-action@v4
        with:
          context: .
          platforms: ${{ matrix.platform }}
          push: true
          tags: user/app:latest

If buildx supported it.

@felipecrs
Copy link
Author

That's a typo my bad, should be TMP_LOCAL_IMAGE.

Yeah, I realized. No worries!

@crazy-max
Copy link
Member

However, another approach that could potentially save time is to instead of using a local registry for the job, we could push the temporary images to GHCR itself with an unique tag and have a cleanup job that deletes the temporary images after.

Made some changes to our example if you want to try: docker/docs#17305
See https://deploy-preview-17305--docsdocker.netlify.app/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners

@felipecrs
Copy link
Author

I'll try for sure! I'll let you know soon.

@felipecrs
Copy link
Author

I think it can be sharpened a little bit by moving the metadata-action to its own job, like in here:

Another thing, since I push images even for pull requests with the pr-<number> format, I wonder if it would not be less resource consuming if I use inline cache instead of GHA.

I just don't know how it would work with this push by digest stuff. For example, if I enable inline cache and push it by digest, how to I consume it back? Will they be retained when running the docker buildx imagetools create, meaning I could set cache-from as something like ${{ env.REGISTRY_IMAGE }}:pr-<number>?

@felipecrs
Copy link
Author

@crazy-max the result is amazing. A full build (with cache) now takes less than a minute:

https://github.com/TECH7Fox/asterisk-hass-addons/actions/runs/4960678447/jobs/8876490603

Thanks a lot!

@sando38
Copy link

sando38 commented May 12, 2023

@crazy-max the result is amazing. A full build (with cache) now takes less than a minute:

https://github.com/TECH7Fox/asterisk-hass-addons/actions/runs/4960678447/jobs/8876490603

Thanks a lot!

I agree, this approach is great. When I implemented it, it ensured, that the test suites running during the build phase are successfully. When building all images with one runner, they tend to fail due to timeouts.

@crazy-max I still have a problem with getting labels into the final manifest:
https://github.com/sando38/eturnal/pkgs/container/eturnal/92866123?tag=edge

I configured the workflow pretty much like you have posted in the last link. My workflow file is here, the relevant part is from line 446 to the end.
https://github.com/sando38/eturnal/blob/18a056930c7a44ec008186f5576a897e8bc63e9f/.github/workflows/container-build-publish.yml#L446

Not sure if I miss something. Thanks in advance already!

@felipecrs
Copy link
Author

felipecrs commented May 12, 2023

Not sure if I miss something.

You are missing metadata-action in your build job. Double-check the example, metadata-action is ran twice, both in build and then in push. In build, you also need to supply the labels input.

@sando38
Copy link

sando38 commented May 12, 2023

Not sure if I miss something.

You are missing metadata-action in your build job. Double-check the example, metadata-action is ran twice, both in build and then in push. In build, you also need to supply the labels input.

Thanks for the quick reply. It is there:

@felipecrs
Copy link
Author

image

It's missing the labels input.

@sando38
Copy link

sando38 commented May 12, 2023

Oh, I thought they are detected automatically.. I will double check. Thanks for the hint.

@sando38
Copy link

sando38 commented May 12, 2023

Thanks again, the single digests now have labels, however the "merged" manifest still not:

No description provided

https://github.com/sando38/eturnal/pkgs/container/eturnal/92872612?tag=edge

I included the labels. In the push job, I can also see that labels are included in the DOCKER_METADATA_OUTPUT_JSON
https://github.com/sando38/eturnal/actions/runs/4961796747/jobs/8879204058#step:10:23 .. any further ideas :)

@felipecrs
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants