Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: buildx call failed with: failed to solve: rpc error: code = Unknown desc = unexpected status: 403 Forbidden #238

Closed
mskyttner opened this issue Nov 23, 2020 · 10 comments

Comments

@mskyttner
Copy link

Behaviour

A gha using buildx with push:true reports unexpected status: 403 Forbidden, after login to ghcr.io (which was ok).

Final lines in log before error were:

------
failed to solve: rpc error: code = Unknown desc = unexpected status: 403 Forbidden
Error: buildx call failed with: failed to solve: rpc error: code = Unknown desc = unexpected status: 403 Forbidden

Steps to reproduce this issue

  1. Follow this tutorial to set up the gha (using build-push-action and buildx). Use the "push: true" setting.
  2. Use a repository with a Dockerfile which generates a fairly large image (perhaps image size comes into play?)
  3. Run a gha like this one: https://github.com/KTH-Library/kontarion/blob/master/.github/workflows/push-kontarion.yml#L63-L86 (but use "push:true" instead of that run command which uses "docker push" after exporting to a tarball and loading into the local registry).

Expected behaviour

No 403 Forbidden error? I'm not sure what causes that, I don't think it is the CR_PAT token since the login to ghcr.io works.

Actual behaviour

I'm getting that 403. Perhaps some time out kicks in? Or maybe the docker image is too large for the runner? Not sure where to begin to investigate.

Configuration

name: kontarion push

on:
  workflow_dispatch:
  push:
    paths:
    - '1.7.0/**'
    - '.github/workflows/push-kontarion.yml'

jobs:
  kontarion-push:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout
      uses: actions/checkout@v2

    -
      name: Set Environment Variables
      run: |
        IMG=kontarion
        OWNER="$(echo "${{ github.repository_owner }}" | tr '[:upper:]' '[:lower:]')"
        echo "BUILD_VER=1.0.$GITHUB_RUN_NUMBER" >> $GITHUB_ENV
        echo "IMG=${IMG}" >> $GITHUB_ENV
        echo "IMAGE=ghcr.io/${OWNER}/${IMG}" >> $GITHUB_ENV
        echo "BUILD_DATE=$(date +'%Y-%m-%d %H:%M:%S')" >> $GITHUB_ENV
        echo "GIT_SHA=$(echo ${{ github.sha }} | cut -c1-7)" >> $GITHUB_ENV
        echo "GIT_REF=$(git symbolic-ref -q --short HEAD || git describe --tags --exact-match)" >> $GITHUB_ENV
 
    -
      name: Set up QEMU
      id: qemu
      uses: docker/setup-qemu-action@v1
      with:
        image: tonistiigi/binfmt:latest
        platforms: all

    -
      name: Set up Docker Buildx
      id: buildx
      uses: docker/setup-buildx-action@master
      with:
        version: latest
        install: true

    -
      name: Cache Docker layers
      uses: actions/cache@v2
      with:
        path: /tmp/.buildx-cache
        key: ${{ runner.os }}-buildx-${{ github.sha }}
        restore-keys: |
          ${{ runner.os }}-buildx-

    -
      name: Login to Container Registry
      uses: docker/login-action@v1
      with:
        registry: ghcr.io
        username: ${{ github.repository_owner }}
        password: ${{ secrets.CR_PAT }}

    -
      name: Docker build and push
      uses: docker/build-push-action@v2
      with:
        context: ./1.7.0/
        file: ./1.7.0/Dockerfile
        builder: ${{ steps.buildx.outputs.name }}
        labels: |
          org.opencontainers.image.authors=${{ github.repository_owner }}
          org.opencontainers.image.created=${{ env.BUILD_DATE }}
          org.opencontainers.image.description=Created from commit ${{ env.GIT_SHA }} and ref ${{ env.GIT_REF }}
          org.opencontainers.image.ref.name=${{ env.GIT_REF }}
          org.opencontainers.image.revision=${{ github.sha }}
          org.opencontainers.image.source=https://github.com/${{ github.repository }}
          org.opencontainers.image.version=${{ env.BUILD_VER }}
        tags: |
          ${{ env.IMAGE }}:latest
          ${{ env.IMAGE }}:${{ env.GIT_REF }}
          ${{ env.IMAGE }}:${{ env.GIT_SHA }}
          ${{ env.IMAGE }}:${{ env.BUILD_VER }}
        push: true
        platforms: linux/amd64
        cache-from: type=local,src=/tmp/.buildx-cache
        cache-to: type=local,dest=/tmp/.buildx-cache

Workaround

I'm attempting to use a workflow file which instead sets "load: true" and then runs a docker push command, here is a section from the end of that workflow file:

    -
      name: Docker build and push
      uses: docker/build-push-action@v2
      with:
        context: ./1.7.0/
        file: ./1.7.0/Dockerfile
        builder: ${{ steps.buildx.outputs.name }}
        labels: |
          org.opencontainers.image.authors=${{ github.repository_owner }}
          org.opencontainers.image.created=${{ env.BUILD_DATE }}
          org.opencontainers.image.description=Created from commit ${{ env.GIT_SHA }} and ref ${{ env.GIT_REF }}
          org.opencontainers.image.ref.name=${{ env.GIT_REF }}
          org.opencontainers.image.revision=${{ github.sha }}
          org.opencontainers.image.source=https://github.com/${{ github.repository }}
          org.opencontainers.image.version=${{ env.BUILD_VER }}
        tags: |
          ${{ env.IMAGE }}:latest
          ${{ env.IMAGE }}:${{ env.GIT_REF }}
          ${{ env.IMAGE }}:${{ env.GIT_SHA }}
          ${{ env.IMAGE }}:${{ env.BUILD_VER }}
        load: true
        platforms: linux/amd64
        cache-from: type=local,src=/tmp/.buildx-cache
        cache-to: type=local,dest=/tmp/.buildx-cache
    - run: docker push ${{ env.IMAGE }}:latest

    -
      name: Image digest
      run: echo ${{ steps.docker_build.outputs.digest }}

This progresses a bit further with these messages in the log:

#22 exporting layers 539.5s done
#22 exporting manifest sha256:42674974463151fb0e3148dc84ed9af0bd2e792dd2aec3353da6767f70f242aa done
#22 exporting config sha256:9c3a0bc0b6bc49f64ccd6f8f38d3b91a5e004d66977d62dfcf1193c8fbbdb2f7 done
#22 sending tarball
#22 ...

#23 importing to docker
#23 DONE 0.0s

#22 exporting to oci image format
#22 sending tarball 136.4s done
#22 DONE 675.9s

#24 exporting cache
#24 preparing build cache for export 0.0s done
#24 writing layer sha256:08462c4da0eadbace59b3f09dc207e2ecf4e5f70c2c9b820082201ef98710eec
#24 writing layer sha256:08462c4da0eadbace59b3f09dc207e2ecf4e5f70c2c9b820082201ef98710eec 0.2s done
#24 writing layer sha256:0ff0f048790af67aa460f80f3232314f83f9c86ed19c8599cd53d9bbb1eb1103
#24 writing layer sha256:0ff0f048790af67aa460f80f3232314f83f9c86ed19c8599cd53d9bbb1eb1103 5.7s done
#24 writing layer sha256:127c9761dcbaa288abc58fc56437c2f2ffbe611b9f7f30e0b5b43cd348bb2094 done
#24 writing layer sha256:14409b438e8b0f4cd5b1110245de7d35a8c061b8c8f223884d99c6bd65f26e1f 0.0s done
#24 writing layer sha256:1f2fe70d116b95b311b68e88fd11aeccc245a4c69fb36479b52b1c031f0db62d
#24 writing layer sha256:1f2fe70d116b95b311b68e88fd11aeccc245a4c69fb36479b52b1c031f0db62d 11.8s done
#24 writing layer sha256:207bc9dc5200f94a1d8ef2a5b5a725f928be54d44f1ecdf51dbc82113ccaa598
#24 writing layer sha256:207bc9dc5200f94a1d8ef2a5b5a725f928be54d44f1ecdf51dbc82113ccaa598 0.0s done
#24 writing layer sha256:283f88d94097c44b33b48a39c27803d9f952f9087460fc0270b0991d5d8ad867
#24 writing layer sha256:283f88d94097c44b33b48a39c27803d9f952f9087460fc0270b0991d5d8ad867 0.6s done
#24 writing layer sha256:2f54329a6711f1e325c7d17e843ca546b01c1f504c546f23678c0a71e5147f91
#24 writing layer sha256:2f54329a6711f1e325c7d17e843ca546b01c1f504c546f23678c0a71e5147f91 17.3s done
#24 writing layer sha256:39ad6d9967d356670f59b4c1397613b5b1840bd915be68dab372844e5d00cb94
#24 writing layer sha256:39ad6d9967d356670f59b4c1397613b5b1840bd915be68dab372844e5d00cb94 36.4s done
#24 writing layer sha256:4039240d2e0b4bcb42ccbce75bc54570e471ad81457478de35fbeef63536e9c0
#24 writing layer sha256:4039240d2e0b4bcb42ccbce75bc54570e471ad81457478de35fbeef63536e9c0 done
#24 writing layer sha256:4a41b77c8e3fee3711347ffe7a4872ed41b6d91a2655b048a3c5ddb81597ddf9
#24 writing layer sha256:4a41b77c8e3fee3711347ffe7a4872ed41b6d91a2655b048a3c5ddb81597ddf9 2.2s done
#24 writing layer sha256:5025ef4ebbe82547d13bae1ce5db63aea6b496b1a44dfb443012fe49b52ae003
#24 writing layer sha256:5025ef4ebbe82547d13bae1ce5db63aea6b496b1a44dfb443012fe49b52ae003 done
#24 writing layer sha256:57346c02ff626ad57ac7c89f0087ae813c35777012a069e63e9598b6bfa25802 done
#24 writing layer sha256:722348be19cc015bddc73213b78bffb242533b76a41f8c1fa3908db2c90daf8e

But either it is stuck there or takes a long time to complete.

Logs

https://github.com/KTH-Library/kontarion/runs/1441299541?check_suite_focus=true

@crazy-max
Copy link
Member

@mskyttner

But either it is stuck there or takes a long time to complete.

As you can see in your logs: System.IO.IOException: No space left on device.

I'm getting that 403. Perhaps some time out kicks in? Or maybe the docker image is too large for the runner? Not sure where to begin to investigate.

Looks similar to #178 #200. Maybe your personal access token (PAT) does not have the required scopes?

@mskyttner
Copy link
Author

Aha! Feels like a resource constraint, then. It appears as though the 403 Forbidden might actually be caused by "out of disk space" which eventually became more clear when not using the "push:true" setting for the buildx action and instead doing a "load:true" followed by a separate plain "docker push" step.

Since writing the layers works for a while when using "push:true", I think it looks more like the runner actually have permissions to write layers to disk, but it runs out of disk space at some point, would that be possible? Also since the plain docker push equivalence doing the same thing reports that?

I did set up the scopes for repo write and read/write for packages and added the CR_PAT to the repository, but will double check again. The CR_PAT was generated with a service account that is used for automated builds.

A variant of this job runs on Travis CI without disk space issues. I wonder what the limits are for image size before running into this with GitHub Action runners.

@crazy-max
Copy link
Member

crazy-max commented Nov 25, 2020

@mskyttner

I wonder what the limits are for image size before running into this with GitHub Action runners.

See https://docs.github.com/en/free-pro-team@latest/actions/reference/specifications-for-github-hosted-runners#supported-runners-and-hardware-resources

Each virtual machine has the same hardware resources available.

  • 2-core CPU
  • 7 GB of RAM memory
  • 14 GB of SSD disk space

As I can see in the action window no logs are displayed so maybe you reached the std output limit. Can you build locally and redirect stdout/stderr to a file and let me know the size?

A variant of this job runs on Travis CI without disk space issues.

Do you have a link to the Travis build?

@mskyttner
Copy link
Author

@crazy-max I believe out of disk space is causing this and opened an issue with GitHub to clarify and I just got confirmation that there's ~14 GB of disk space available and 7GB of RAM on the hosted runners at GitHub. I think Travis CI.org offers about 40-50 GB on their runners (but lesser of build time, simultaneous builds quotas).

With regards to the awesome actions for cache, qemu, buildx etc, would it be possible to consider an enhancement with regards to the error message, which is a bit vague.

Perhaps in case of OOM or OOD, the message "rpc error: code = Unknown desc = unexpected status: 403 Forbidden" could be "Out of disk space or RAM", if it is possible to capture this error condition?

I'm still unclear on what I can do about what I can do about this "out of disk" issue, for example a) how to reset the runner when it gets filled up (I'm using the cache action) b) whether the caching "doubles the required space" ie the tarball being written back to disk while the base image layers are also stored on disk c) and if there are workarounds.

  • resetting when a runner's cache has filled up the disk - not sure how to do that
  • unsure whether usage of the cache action + buildx/qemu "doubles" the required disk space or whether the stdout tarball output "swaps" or if the RAM (7GB) runs out. Not sure either if the qemu/buildx uses a "shared volume" with the host runner or if it "uses disk in isolation from the host", which would increase disk space requirements for a build.
  • there are some workarounds for getting more disk space by removing some unneeded stuff from the runner. As for reporting available disk space on the runner, there is a workflow step here but I'm not sure this would work when using qemu + buildx.

A local build generates an image with this footprint:

$ docker images | grep kontarion
kthb/kontarion                      1.7.0                 b00b1011b1db        5 weeks ago         14.3GB

This build is based on some layers from https://rocker-project.org, which seems to run into a similar issue looking at this action window

It is a kind of "big" build in the sense that it bundles a lot of things and is the opposite of a minimalistic Alpine linux image build. I would expect more disk space requirements will be needed for these kinds of builds, especially in the future for supporting having workflows that for example generate prepare and generate "data images" for bundling datasets into OCI compliant "storage" or "data only containers", which seems to be on the roadmap: github/roadmap#119. I guess I'll continue to watch the ghcr roadmap and can always build locally or elsewhere where the is more disk space.

@fugkco
Copy link

fugkco commented Dec 9, 2020

I'm getting the same issue. I'm using multistage build, and I checked locally, it uses less than 1GB in total:

$ docker images
REPOSITORY                            TAG           IMAGE ID      CREATED         SIZE
myimg                                 latest        9318b36030be  7 seconds ago   114 MB
<none>                                <none>        3b0cd040ac18  51 seconds ago  365 MB

Login stage shows:

Run docker/login-action@v1
🔑 Logging into ghcr.io...
🎉 Login Succeeded!

Lastly, the push:

#8 exporting config sha256:d0aafd50f126e33f0323e3f55e0f61f326a20eeb5e0dd4a60c9f245e0049f174 done
#8 pushing layers
#8 pushing layers 0.2s done
#8 ERROR: unexpected status: 403 Forbidden
------
 > exporting to image:
------
failed to solve: rpc error: code = Unknown desc = unexpected status: 403 Forbidden
Error: buildx call failed with: failed to solve: rpc error: code = Unknown desc = unexpected status: 403 Forbidden

Full build - note this was the third attempt.

@fugkco
Copy link

fugkco commented Dec 10, 2020

Seems to work fine for Docker Hub. My case is probably an issue with ghcr itself.

@atorosyan
Copy link

@mskyttner have you "Enabling improved container support" for your account and organisation?
https://docs.github.com/en/free-pro-team@latest/packages/guides/enabling-improved-container-support
I had the same issue but it started to work after I followed the steps from the link.

@crazy-max
Copy link
Member

crazy-max commented Dec 10, 2020

@mskyttner

I'm still unclear on what I can do about what I can do about this "out of disk" issue, for example a) how to reset the runner when it gets filled up (I'm using the cache action) b) whether the caching "doubles the required space" ie the tarball being written back to disk while the base image layers are also stored on disk c) and if there are workarounds.

You could use registry cache instead for that.

With regards to the awesome actions for cache, qemu, buildx etc, would it be possible to consider an enhancement with regards to the error message, which is a bit vague.

Will be available with buildx 0.5.0.

@mskyttner @fugkco

have you "Enabling improved container support" for your account and organisation?
https://docs.github.com/en/free-pro-team@latest/packages/guides/enabling-improved-container-support
I had the same issue but it started to work after I followed the steps from the link.

As @atorosyan said your issue is linked to this. See also #205 (comment)

@mskyttner
Copy link
Author

Thanks for info on the registry cache and for helping out with all this.

I think I already had that setting done for the organization I was using for this ... but I will double check my settings again:

image

@mskyttner
Copy link
Author

@crazy-max thanks so much for the help on this, my "big build" now passes after I made two changes a) made it smaller by making a part into an optional install (conda) which effectively reduced the image size to 10GB uncompressed and b) did the settings you recommended with regards to registry cache. No other changes and I'm not sure which one kicked in, but now it pulled through. Travis build failed though, but that is another story. Merry xmas!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants