load sometimes doesn't load #321

champo · 2021-03-26T18:53:54Z

Behaviour

Trying to run a command with a just built image sometimes fails to find the image:

 $ docker run --rm -t -v "${GITHUB_WORKSPACE}:/src/android/apolloui/build/outputs/" muun_android:latest
Unable to find image 'muun_android:latest' locally
docker: Error response from daemon: pull access denied for muun_android, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.

The build step runs ok and has no notable differences in output between correct and failed runs.

Expected behaviour

The muun_android image to be found and run. In https://github.com/muun/apollo/runs/2203961523?check_suite_focus=true it succeded (see the Inspect step cause the build failed due to something unrelated)

Configuration

Repository URL (if public): https://github.com/muun/apollo
Build URL (if public): https://github.com/muun/apollo/runs/2204358021?check_suite_focus=true

name: pr
on: pull_request
jobs:
  pr:
    runs-on: ubuntu-20.04
    steps:
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@154c24e1f33dbb5865a021c99f1318cfebf27b32
        with:
          buildkitd-flags: --debug

      - name: Checkout
        uses: actions/checkout@5a4ac9002d0be2fb38bd78e4b4dbde5606d7042f

      - name: Build
        uses: docker/build-push-action@9379083e426e2e84abb80c8c091f5cdeb7d3fd7a
        with:
          load: true
          tags: muun_android:latest
          file: android/Dockerfile
          context: .

      - name: Inspect
        run: |
            docker images 
      - name: Build apollo
        run: |
          docker run --rm -t -v "${GITHUB_WORKSPACE}:/src/android/apolloui/build/outputs/" muun_android:latest
      - name: Upload APK
        uses: actions/upload-artifact@e448a9b857ee2131e752b06002bf0e093c65e571
        with:
          name: apk
          path: apk/prod/release/apolloui-prod-release-unsigned.apk

Logs

logs_8.zip

The text was updated successfully, but these errors were encountered:

bensalilijames · 2021-04-12T16:20:45Z

This is happening to us too. It's super weird because we have three identical workflows set up (with different image names) - two of them succeed but one of them is constantly failing with the above error.

The workflow file:

name: Docker

on:
  push:
    # Publish `staging` as Docker `latest` image.
    branches:
      - staging

    # Publish `v1.2.3` tags as releases.
    tags:
      - v*

env:
  IMAGE_NAME: ml-intents

jobs:
  # Push image to GitHub Packages.
  # See also https://docs.docker.com/docker-hub/builds/
  push:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2

      # This is the a separate action that sets up buildx runner
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

      # So now we can use GitHub actions' own caching for Docker layers!
      - name: Cache Docker layers
        uses: actions/cache@v2
        with:
          path: /tmp/.buildx-cache
          key: ${{ runner.os }}-buildx-${{ env.IMAGE_NAME }}-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-buildx-${{ env.IMAGE_NAME }}-

      - name: Build image
        uses: docker/build-push-action@v2
        with:
          builder: ${{ steps.buildx.outputs.name }}
          context: .
          file: intents/Dockerfile
          load: true
          tags: ${{ env.IMAGE_NAME }}:latest
          cache-from: type=local,src=/tmp/.buildx-cache
          cache-to: type=local,dest=/tmp/.buildx-cache-new

      - name: Login to GitHub Container Registry
        uses: docker/login-action@v1
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Push image to GitHub Container Registry
        run: |
          IMAGE_ID=ghcr.io/${{ github.repository_owner }}/$IMAGE_NAME

          # Change all uppercase to lowercase
          IMAGE_ID=$(echo $IMAGE_ID | tr '[A-Z]' '[a-z]')

          # Strip git ref prefix from version
          VERSION=$(echo "${{ github.ref }}" | sed -e 's,.*/\(.*\),\1,')

          # Strip "v" prefix from tag name
          [[ "${{ github.ref }}" == "refs/tags/"* ]] && VERSION=$(echo $VERSION | sed -e 's/^v//')

          # Use Docker `latest` tag convention
          [ "$VERSION" == "staging" ] && VERSION=latest

          echo IMAGE_ID=$IMAGE_ID
          echo VERSION=$VERSION

          echo Listing docker images...
          docker image ls

          echo Tagging image...
          docker tag $IMAGE_NAME:latest $IMAGE_ID:$VERSION
          echo Tagged image successfully!

          echo Pushing image...
          docker push $IMAGE_ID:$VERSION
          echo Pushed image successfully!

      - # Temp fix
        # https://github.com/docker/build-push-action/issues/252
        # https://github.com/moby/buildkit/issues/1896
        name: Move cache
        run: |
          rm -rf /tmp/.buildx-cache
          mv /tmp/.buildx-cache-new /tmp/.buildx-cache

The runner gets to the docker tag $IMAGE_NAME:latest $IMAGE_ID:$VERSION line and errors out with Error response from daemon: No such image: ml-intents:latest as above. docker image ls does not list the built image either.

The two successful workflows have much smaller images (500Mb and 2Gb) whereas the failing image is a lot bigger (5Gb). Could that be an influencing factor here?

crazy-max · 2021-04-13T19:19:56Z

@champo @benhjames Cannot repro locally or with GHA. Maybe it fails silently because of insufficient disk space:

Each virtual machine has the same hardware resources available.

2-core CPU

7 GB of RAM memory

14 GB of SSD disk space

You have at your disposal 14GB (actually I would say 9GB by removing the pre-installed middleware) on the runner:

/dev/sdb1        14G  4.1G  9.0G  32% /mnt

Can you add this step at the end of your workflow (before Move cache for you @benhjames) and give me the output:

- name: Disk
  if: always()
  run: |
    df -h
    docker buildx du

bensalilijames · 2021-04-13T21:34:34Z

Thanks for investigating @crazy-max! I first added that step and a separate step to list the Docker images, but it still didn't appear to be exported into Docker. The disk space on that run seemed to match yours:

/dev/sdb1        14G  4.1G  9.0G  32% /mnt

I then modified the workflow file to exactly match yours, and the same issue occured.

Then I re-ran the same job, but this time it exported correctly. This was the first run where Docker had cache available (because previous builds before the last one never got a chance to save as it errored upon push to GCR).

I then went back to look at your first run (i.e. without build cache) and noticed that in that particular run it doesn't list the Docker images. So I have a feeling that if there is no build cache, then the export to Docker fails, but if there is build cache, like in your subsequent builds and my last build linked above, then it succeeds. Really weird. Hope that helps...?

crazy-max · 2021-04-14T16:27:20Z

@benhjames Thanks for your feedback. Yes actually /var/lib/docker uses /dev/root fs which is 99% full on your runner so I presume that's the issue here:

/dev/root        84G   82G  1.2G  99% /

Can you add docker buildx du in the Disk step and give me the output please?

bensalilijames · 2021-04-14T21:54:26Z

Thanks @crazy-max, I added that command to both Disk steps (and removed the cache action) and the results can be viewed here. Looks indeed like it runs out of disk space and then silently fails loading into Docker.

Is there anything that you think could be done about this to shrink the disk usage after the build step? I notice that docker buildx du without cache lists Reclaimable: 17.71GB which seems like a lot? How come building with the cache takes up much less space?

Sorry for the questions - would be great to find a solution to this somehow (without reverting back to the plain docker build without cache like I was previously doing before this!)

crazy-max · 2021-04-14T22:45:25Z

@benhjames

I notice that docker buildx du without cache lists Reclaimable: 17.71GB which seems like a lot? How come building with the cache takes up much less space?

These are the subsequent instructions cached by buildx for the current builder. You can get more info by using docker buildx du --verbose. If you use an external cache, only the last stage will be cached, so it takes less space and the image can be loaded.

Is there anything that you think could be done about this to shrink the disk usage after the build step?

You could use a self-hosted runner but in the near future you will be able to configure CPU cores, RAM, disk space for the runner (see github/roadmap#161).

Or more drastic, remove some components pre-installed on the runner in your workflow like dotnet (~23GB):

  - name: Remove dotnet
    run: sudo rm -rf /usr/share/dotnet

bensalilijames · 2021-04-15T08:59:20Z

Thanks a lot @crazy-max for the help, that's really useful, much appreciated. 🙌

cep21 · 2021-04-22T18:18:08Z

Hi,

Thank you for this thread! I was running into the same issue. I would expect an error log of some kind when disk issues happen and the images cannot correctly --load. I couldn't find a buildx issue for this. Is the issue to track this somewhere else, or is the error log there and I'm not finding it.

Thanks!

bensalilijames · 2021-04-22T18:27:58Z

Hey @cep21, the issue to track in buildx is docker/buildx#593!

champo · 2021-04-22T20:33:00Z

❤️ Thanks for the deep look into this! I ended up changing the build approach for other reasons which I guess accidentaly reduced the image size, making the issue disappear.

Ref: docker/build-push-action#321 Ref: https://github.com/orgs/community/discussions/26723

Nickersoft · 2023-03-01T18:52:34Z

Hey folks! I believe I'm also hitting this issue – is there currently any workaround other than trying to shrink your image size? I tried sudo rm -rf /usr/share/dotnet, but to no avail. I only have 86% of memory used, but load still isn't loading my image into Docker.

crazy-max · 2023-03-20T10:59:34Z

@master-bob As discussed in #841, I made some tests using the docker driver and the docker-container driver:

FROM alpine
RUN dd if=/dev/zero of=/tmp/output.dat bs=2048M count=1
RUN dd if=/dev/zero of=/tmp/output2.dat bs=2048M count=1
RUN dd if=/dev/zero of=/tmp/output3.dat bs=2048M count=1
RUN uname -a

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        driver:
          - docker
          - docker-container
    steps:
      -
        name: Checkout
        uses: actions/checkout@v3
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          driver: ${{ matrix.driver }}
          buildkitd-flags: --debug
      -
        name: Disk
        run: |
          df -h
      -
        name: Build and push
        uses: docker/build-push-action@master
        with:
          context: .
          file: ./fat.Dockerfile
          load: true
          tags: |
            foo
      -
        name: List images
        run: |
          docker image ls
      -
        name: Disk
        if: always()
        run: |
          df -h
          docker buildx du

`docker` driver

fs before build:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   55G   29G  66% /

docker image ls:

Run docker image ls
REPOSITORY       TAG         IMAGE ID       CREATED          SIZE
foo              latest      1636a6843a99   20 seconds ago   6.45GB
node             18          37b4077cbd8a   11 days ago      997MB
...

fs after build:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   61G   23G  73% /

`docker-container` driver

fs before build:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   55G   29G  66% /

docker image ls:

Run docker image ls
REPOSITORY       TAG               IMAGE ID       CREATED              SIZE
foo              latest            50f49c8d6cd9   About a minute ago   6.45GB
node             18                37b4077cbd8a   11 days ago          997MB
...

fs after build:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   67G   17G  80% /

As you can see when building with a container builder, Buildx will first create an intermediate tarball and load the image to Docker so that would explain the issue as it would require twice the space (~30GB) in your case.

I suggest to use the docker driver in your workflow:

      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          driver: docker

~~@tonistiigi @jedevc, I wonder if we could remove the intermediate tarball when the image is loaded to Docker. WDYT?~~

master-bob · 2023-03-20T14:56:46Z

As you can see when building with a container builder, Buildx will first create an intermediate tarball and load the image to Docker so that would explain the issue as it would require twice the space (~30GB) in your case.

I suggest to use the docker driver in your workflow:

Thank you for the in-depth analysis.

I do have a question. Without using that driver, my understanding is that when using subsequent build-push-actions it will use the cached version if it is available. By changing the driver would this functionality remain the same? Edit: yes, it appears functionality remains the same.

Edit: I think the dotnet location changed on ubuntu-22 as I didn't see any significant change in space usage when attempting to remove. So I opted to remove /usr/local/lib/android/sdk, ~14g, and /opt/hostedtoolcache, ~9g.

Abreviated listing of /opt/hostedtoolcache on ubuntu:latest (22):

489M	/opt/hostedtoolcache/PyPy
1.6G	/opt/hostedtoolcache/go
5.4G	/opt/hostedtoolcache/CodeQL
16K	/opt/hostedtoolcache/Java_Temurin-Hotspot_jdk
378M	/opt/hostedtoolcache/node
62M	/opt/hostedtoolcache/Ruby
1.2G	/opt/hostedtoolcache/Python
9.1G	/opt/hostedtoolcache

Before removing android and the hostedtoolcache:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   54G   30G  65% /

and after

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   31G   53G  37% /

Change to build the image using docker action, this should then allow the docker action's cache to be used. Subsequently reducing build time ~50%, as the image will only need to be built once. Currently image is built twice. The default driver uses double the disk space, see docker/build-push-action/issues/321 (in brief the image is build in the build-push-action local cache and then transfered to the local docker). This is a problem as this image is so large. Using the `docker` driver will workaround this.

Change to build the image using docker action. Subsequently reducing build time ~50%, as the image will only need to be built once. Currently image is built twice. The default driver uses double the disk space, see docker/build-push-action#321 (in brief, the image is built in the build-push-action local cache, tared, and then transfered to the local docker). This is a problem as this image is so large. Using the docker driver will workaround this.

saumets · 2023-05-24T21:31:09Z

Just wanted to drop a note that I began experiencing this exact same issue today.

In my workflow I build 3 separate docker image(s) with all using the load: true parameter. Also, I was using caching for all the build images like so:

with:
  context: ./nginx
  load: true
  tags: ibp_nginx:latest
  cache-from: type=gha
  cache-to: type=gha, mode=max

Today randomly one of the images was successfully being built but adding a step to inspect docker images -a showed that the image was never being added to docker images. I stumbled upon this thread today while looking for solution. We're also using a custom GHA runner and we had plenty of disk space available, but I tried some of the disk space proposals in this thread to no avail. I also tried deleting my entire GHA repository cache and starting the cache from scratch. No dice.

In the end I noticed this from @crazy-max up above:

As you can see when building with a container builder, Buildx will first create an intermediate tarball and load the image to Docker so that would explain the issue as it would require twice the space (~30GB) in your case.

I suggest to use the docker driver in your workflow:

name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
  driver: docker

Using setup-buildx-action@v2 with driver: docker resolved my issue and finally all the images are being built and available again via load: true. The downside to this of course is that this driver does not support caching from what I can tell.

Fixes docker#892 Related to docker#321 Signed-off-by: Nicolas Vuillamy <nicolas.vuillamy@gmail.com>

Additionally updates Pangeo-notebook, adds mamba, and removes tensorflow as it was likely responsible for exploding the build as the Docker image would not actually load Closes oceanhackweek#71 oceanhackweek#72 Xref docker/build-push-action#321

So there is enough space for the container. See docker/build-push-action#321

docker/build-push-action#321

bensalilijames mentioned this issue Apr 12, 2021

--load doesn't always export the Docker image docker/buildx#593

Closed

crazy-max added the status/needs-investigation label Apr 13, 2021

crazy-max added kind/upstream Changes need to be made on upstream project and removed status/needs-investigation labels Apr 22, 2021

crazy-max mentioned this issue Oct 25, 2021

Cannot access the image in subsequent workflow steps #490

Closed

crazy-max mentioned this issue Feb 1, 2022

Scan docker images for vulnerabilities horovod/horovod#3392

Closed

cgr71ii mentioned this issue Aug 3, 2022

Can't create a container since the provided tag name is different from the one I provided #660

Closed

Ocramius added a commit to Ocramius/event-sourcing-workshop that referenced this issue Aug 16, 2022

Make sure the docker image is loaded in the docker daemon after build

ad70dc2

Ref: docker/build-push-action#321 Ref: https://github.com/orgs/community/discussions/26723

crazy-max mentioned this issue Mar 16, 2023

Built Image not Loaded to local Docker, Error: No such object #841

Closed

master-bob mentioned this issue Apr 1, 2023

CI: Streamline docker build mingchen/docker-android-build-box#123

Merged

mark-idleman mentioned this issue Apr 20, 2023

Move dev server image build into main CI workflow replicahq/graphhopper#141

Merged

crazy-max mentioned this issue Jul 2, 2023

Built image is loaded but not visible to docker , so not available in the next step #892

Closed

nvuillam added a commit to nvuillam/build-push-action that referenced this issue Jul 3, 2023

Update TROUBLESHOOTING.md to add not loaded image workaround

1fd7f72

Fixes docker#892 Related to docker#321 Signed-off-by: Nicolas Vuillamy <nicolas.vuillamy@gmail.com>

nvuillam mentioned this issue Jul 3, 2023

Update TROUBLESHOOTING.md to add not loaded image workaround #893

Merged

crazy-max mentioned this issue Jul 3, 2023

build: read body response to check for erroneous image export to docker docker/buildx#1927

Merged

This was referenced Jul 31, 2023

Explicit packages for Marty and Myranda's tutorials oceanhackweek/jupyter-image#74

Merged

Bump pangeo-notebook for R oceanhackweek/jupyter-image#73

Open

tonistiigi closed this as completed in docker/buildx#1927 Aug 10, 2023

gaborcsardi added a commit to r-hub/containers that referenced this issue Sep 10, 2023

GHA: try to clean up a bit

7a55136

So there is enough space for the container. See docker/build-push-action#321

evereq added a commit to ever-co/ever-gauzy that referenced this issue Feb 26, 2024

chore: switch docker builds to docker driver for APIs

81da6eb

docker/build-push-action#321

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load sometimes doesn't load #321

load sometimes doesn't load #321

champo commented Mar 26, 2021

bensalilijames commented Apr 12, 2021

crazy-max commented Apr 13, 2021

bensalilijames commented Apr 13, 2021

crazy-max commented Apr 14, 2021 •

edited

Loading

bensalilijames commented Apr 14, 2021

crazy-max commented Apr 14, 2021

bensalilijames commented Apr 15, 2021

cep21 commented Apr 22, 2021

bensalilijames commented Apr 22, 2021

champo commented Apr 22, 2021

Nickersoft commented Mar 1, 2023

crazy-max commented Mar 20, 2023 •

edited

Loading

master-bob commented Mar 20, 2023 •

edited

Loading

saumets commented May 24, 2023 •

edited

Loading

load sometimes doesn't load #321

load sometimes doesn't load #321

Comments

champo commented Mar 26, 2021

Behaviour

Expected behaviour

Configuration

Logs

bensalilijames commented Apr 12, 2021

crazy-max commented Apr 13, 2021

bensalilijames commented Apr 13, 2021

crazy-max commented Apr 14, 2021 • edited Loading

bensalilijames commented Apr 14, 2021

crazy-max commented Apr 14, 2021

bensalilijames commented Apr 15, 2021

cep21 commented Apr 22, 2021

bensalilijames commented Apr 22, 2021

champo commented Apr 22, 2021

Nickersoft commented Mar 1, 2023

crazy-max commented Mar 20, 2023 • edited Loading

docker driver

docker-container driver

master-bob commented Mar 20, 2023 • edited Loading

saumets commented May 24, 2023 • edited Loading

crazy-max commented Apr 14, 2021 •

edited

Loading

crazy-max commented Mar 20, 2023 •

edited

Loading

`docker` driver

`docker-container` driver

master-bob commented Mar 20, 2023 •

edited

Loading

saumets commented May 24, 2023 •

edited

Loading