Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unused unpacked snapshot left in content store after nerdctl system prune #2372

Closed
ginglis13 opened this issue Jul 13, 2023 · 2 comments · Fixed by #2374
Closed

Unused unpacked snapshot left in content store after nerdctl system prune #2372

ginglis13 opened this issue Jul 13, 2023 · 2 comments · Fixed by #2374
Labels
kind/unconfirmed-bug-claim Unconfirmed bug claim

Comments

@ginglis13
Copy link
Contributor

ginglis13 commented Jul 13, 2023

Description

With changes introduced in finch#461, we will be defaulting to building images using the type=image format rather than the type=docker format. This change has exposed common test failures in finch image save and finch image load in both finch and finch-core projects. This is effectively an issue that has presented itself with nerdctl image save and nerdctl image load. The issue occurs when an image (A) is built with no tags and with type=image, the image and builder cache are pruned, and we attempt to pull and save an image (B) which was (A)'s base layer. While the content for (A) has been effectively removed, the unpacked snapshot of (A)'s base layer remains. When we pull (B), only its manifest, config, and index are pulled. The actual content is not, resulting in

$ finch save ...
FATA[0000] failed to get reader: content digest sha256:XXX: not found
FATA[0000] exit status 1

Steps to reproduce the issue

note: these images are arm64. sha’s will vary on different platforms but the below process should work the same to repro.

  1. Build an image without tags

nerdctl build --no-cache -f Dockerfile.with-build-arg --progress=plain --build-arg VERSION=3.13 .

# Dockerfile.with-build-arg
ARG VERSION=latest
FROM public.ecr.aws/docker/library/alpine:${VERSION}

In the output, see the following sha that represents the base layer of the image:

#5 sha256:25f523f0e93b2b5fa676c15d91b90f08ee4de7a160874e6c52ea452929d5a7cc 2.72MB / 2.72MB 0.3s done

We also see the output:

#4 exporting to image
#4 exporting layers done
#4 exporting manifest sha256:0f5d034dfccaf2b8cf5d1901a356f836945526bec6b5a33c055567896dc23ee1 done
#4 exporting config sha256:e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801 done
#4 naming to <none>@sha256:0f5d034dfccaf2b8cf5d1901a356f836945526bec6b5a33c055567896dc23ee1 done
#4 unpacking to <none>@sha256:0f5d034dfccaf2b8cf5d1901a356f836945526bec6b5a33c055567896dc23

We can see our image looks... weird, since we didn't tag it:

$ finch images
REPOSITORY    TAG       IMAGE ID        CREATED          PLATFORM       SIZE       BLOB SIZE
<none>        <none>    0f5d034dfcca    3 minutes ago    linux/arm64    5.7 MiB    2.6 MiB
  1. use ctr to inspect content
$ sudo ctr content ls | grep sha256:0f5d034d # <- manifest of image we just built
sha256:0f5d034dfccaf2b8cf5d1901a356f836945526bec6b5a33c055567896dc23ee1 502B    3 hours         containerd.io/gc.ref.content.0=sha256:e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801
$ sudo ctr content ls | grep sha256:e2730a75
sha256:0f5d034dfccaf2b8cf5d1901a356f836945526bec6b5a33c055567896dc23ee1 502B    3 hours         containerd.io/gc.ref.content.0=sha256:e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801
sha256:e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801 907B    3 hours         containerd.io/gc.ref.snapshot.overlayfs=sha256:de51348d431b23f0be552f83fe8efd4504db8a384d5d6efc9e01550958e09fd5
$ sudo ctr content ls | grep sha256:469b
sha256:469b6e04ee185740477efa44ed5bdd64a07bbdd6c7e5f5d169e540889597b911 1.638kB 7 minutes       containerd.io/distribution.source.public.ecr.aws=docker/library/alpine
$ sudo ctr content ls | grep sha256:25f523f0e # <- THE LAYER CONTENT
sha256:25f523f0e93b2b5fa676c15d91b90f08ee4de7a160874e6c52ea452929d5a7cc 2.722MB 7 minutes       buildkit.io/blob/annotation.containerd.io/uncompressed=sha256:de51348d431b23f0be552f83fe8efd4504db8a384d5d6efc9e01550958e09fd5,buildkit.io/blob/mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip,containerd.io/gc.ref.content.blob-sha256:25f523f0e93b2b5fa676c15d91b90f08ee4de7a160874e6c52ea452929d5a7cc=sha256:25f523f0e93b2b5fa676c15d91b90f08ee4de7a160874e6c52ea452929d5a7cc

Remove the image you just built and inspect content again

nerdctl rmi 0f5d034dfcca # <- image id from step 1

Now, let’s do the same as 2. inspect content: All content is still there. This is because buildkit cached the content and the unmounted snapshot remains.

  1. Prune “everything”

$ nerdctl system prune --all -f

This results in

Deleted build cache objects:
x764neru5picibs7cb5yc4f3x
shizatx98cbbpy4ar4gojbpfc
kz6wnfl0pssu7dbwghx11owe6

Untagged: public.ecr.aws/docker/library/alpine:3.13
deleted: sha256:25f523f0e93b2b5fa676c15d91b90f08ee4de7a160874e6c52ea452929d5a7cc

and again follow inspect content:

$ sudo ctr content ls | grep sha256:25f523f0e # <- the actual base layer: gone
$  sudo ctr content ls | grep sha256:469b # <- the ref sha: gone
$ sudo ctr content ls | grep sha256:e2730a75 # <- the built config sha...
sha256:0f5d034dfccaf2b8cf5d1901a356f836945526bec6b5a33c055567896dc23ee1 502B    3 hours         containerd.io/gc.ref.content.0=sha256:e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801
sha256:e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801 907B    3 hours         containerd.io/gc.ref.snapshot.overlayfs=sha256:de51348d431b23f0be552f83fe8efd4504db8a384d5d6efc9e01550958e09fd5
$ sudo ctr content ls | grep sha256:0f5d034d # <- the built manifest sha...
sha256:0f5d034dfccaf2b8cf5d1901a356f836945526bec6b5a33c055567896dc23ee1 502B    3 hours         containerd.io/gc.ref.content.0=sha256:e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801

Check out the remaining content:

$ cat /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/0f5d034dfccaf2b8cf5d1901a356f836945526bec6b5a33c055567896dc23ee1
{
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "schemaVersion": 2,
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "digest": "sha256:e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801",
    "size": 907
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "digest": "sha256:25f523f0e93b2b5fa676c15d91b90f08ee4de7a160874e6c52ea452929d5a7cc",
      "size": 2722126
    }
  ]
}
$ cat /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801
{"architecture":"arm64","config":{"Env":["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"],"Cmd":["/bin/sh"],"OnBuild":null},"created":"2022-11-10T20:39:56.601468255Z","history":[{"created":"2022-11-10T20:39:56.523308612Z","created_by":"/bin/sh -c #(nop) ADD file:f23c059b4312458fbf0fc018d4695f36157a3eb6e5a83167912a39f9a738f4eb in / "},{"created":"2022-11-10T20:39:56.601468255Z","created_by":"/bin/sh -c #(nop)  CMD [\"/bin/sh\"]","empty_layer":true}],"moby.buildkit.buildinfo.v1":"eyJmcm9udGVuZCI6ImRvY2tlcmZpbGUudjAiLCJzb3VyY2VzIjpbeyJ0eXBlIjoiZG9ja2VyLWltYWdlIiwicmVmIjoicHVibGljLmVjci5hd3MvZG9ja2VyL2xpYnJhcnkvYWxwaW5lOjMuMTMiLCJwaW4iOiJzaGEyNTY6NDY5YjZlMDRlZTE4NTc0MDQ3N2VmYTQ0ZWQ1YmRkNjRhMDdiYmRkNmM3ZTVmNWQxNjllNTQwODg5NTk3YjkxMSJ9XX0=","os":"linux","rootfs":{"type":"layers","diff_ids":["sha256:de51348d431b23f0be552f83fe8efd4504db8a384d5d6efc9e01550958e09fd5"]},"variant":"v8"}

note sha256:de51348d43 - this is the sha of the unpacked layer for the alpine:3.13 image:

...
"rootfs":{"type":"layers","diff_ids":["sha256:de51348d431b23f0be552f83fe8efd4504db8a384d5d6efc9e01550958e09fd5"]},"variant":"v8"}
...

we can still find that in sudo ctr snapshot ls:

...
de51348d431b23f0be552f83fe8efd4504db8a384d5d6efc9e01550958e09fd5
...
  1. Pull public.ecr.aws/docker/library/alpine:3.13, try to save it
$ nerdctl pull public.ecr.aws/docker/library/alpine:3.13
public.ecr.aws/docker/library/alpine:3.13:                                        resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:469b6e04ee185740477efa44ed5bdd64a07bbdd6c7e5f5d169e540889597b911:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:448028a5480dcea5eae6ed1442fb85a44921c41972a405987883aee3abdaf410: done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:1384a14f8577009b729eb1ef6aabe2729ae5a35e07d4d6b84a6dd9b841a818e3:   done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 5.2 s 
$ nerdctl images
REPOSITORY                              TAG     IMAGE ID        CREATED           PLATFORM          SIZE       BLOB SIZE
public.ecr.aws/docker/library/alpine    3.13    469b6e04ee18    10 minutes ago    linux/arm64/v8    5.7 MiB    2.6 Mi                                                                  total:  2.1 Ki (416.0 B/s)
$ nerdctl save -o myfake.tar public.ecr.aws/docker/library/alpine:3.13
FATA[0000] failed to get reader: content digest sha256:25f523f0e93b2b5fa676c15d91b90f08ee4de7a160874e6c52ea452929d5a7cc: not found
FATA[0000] exit status 1

You can see that even though the actual base layer doesn’t exist in the content store (it is unpacked as a snapshot), on pull of the image, we don’t pull that layer back into the containerd content store. We only pull the index, manifest, and config. Why? because the snapshot for that layer, sha256:de51348d43, has already been unpacked and committed. For some reason, nerdctl/containerd thinks the layer still exists.

Describe the results you received and expected

I expected either

  1. nerdctl system prune --all -f to remove the snapshot sha256:de51348d43 that was unpacked during build

Or

  1. pulling an image with a missing layer would still either pull that layer or create it in the content store, even if the contents of the layer have already be unpacked as a committed containerd snapshot

What version of nerdctl are you using?

v1.4.0

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

Finch/Lima, buildkit

Host information

Finch VM https://github.com/runfinch/finch

@ginglis13 ginglis13 added the kind/unconfirmed-bug-claim Unconfirmed bug claim label Jul 13, 2023
@ginglis13
Copy link
Contributor Author

ginglis13 commented Jul 13, 2023

It looks to me like the issue when we nerdctl system prune --all -f and then check the content of the image we built in the manner (untagged with type=image), for some reason the index and manifest content for that image are not removed. From above

and again follow inspect content:

$ sudo ctr content ls | grep sha256:25f523f0e # <- the actual base layer: gone
$  sudo ctr content ls | grep sha256:469b # <- the ref sha: gone
$ sudo ctr content ls | grep sha256:e2730a75 # <- the built config sha...
sha256:0f5d034dfccaf2b8cf5d1901a356f836945526bec6b5a33c055567896dc23ee1 502B    3 hours         containerd.io/gc.ref.content.0=sha256:e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801
sha256:e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801 907B    3 hours         containerd.io/gc.ref.snapshot.overlayfs=sha256:de51348d431b23f0be552f83fe8efd4504db8a384d5d6efc9e01550958e09fd5
$ sudo ctr content ls | grep sha256:0f5d034d # <- the built manifest sha...
sha256:0f5d034dfccaf2b8cf5d1901a356f836945526bec6b5a33c055567896dc23ee1 502B    3 hours         containerd.io/gc.ref.content.0=sha256:e2730a754813a28b0f90c47d888aafc6c53ec1bb87da60881ee7fc4e4a99e801

The image config has the garbage collection label to remove the snapshot: containerd.io/gc.ref.snapshot.overlayfs=sha256:de5134 but this image config content does not get removed on nerdctl system prune --all -f

Additionally, the manifest for the image we just created is missing the containerd.io/gc.ref.content.config label

@ginglis13
Copy link
Contributor Author

ginglis13 commented Jul 13, 2023

This is a bug with buildkitv0.11.x . The bug has been patched in moby/buildkit#3972 which was included in buildkitv0.12.0 which was released just yesterday: https://github.com/moby/buildkit/releases/tag/v0.12.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/unconfirmed-bug-claim Unconfirmed bug claim
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant