Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devmapper snapshotter image pulls result in error unpacking image: failed to extract layer #8674

Open
skaegi opened this issue Jun 10, 2023 · 10 comments
Labels
Milestone

Comments

@skaegi
Copy link
Contributor

skaegi commented Jun 10, 2023

Description

When running a pod with a runtimeclass that uses a devmapper snapshotter we get a CreateContainerError when creating a pod. This previously worked with containerd 1.7.0 but fails with 1.7.1. We are using a managed K8s platform (IBM Cloud IKS) so it's possible something else changed too.

Using kubectl describe on the pod we see events similar to...

Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  50s   default-scheduler  Successfully assigned default/untrusted-alpine to 10.242.0.7
  Normal   Pulling    45s   kubelet            Pulling image "alpine"
  Normal   Pulled     43s   kubelet            Successfully pulled image "alpine" in 1.716503221s (1.716518841s including waiting)
  Warning  Failed     13s   kubelet            Error: failed to create containerd container: error unpacking image: failed to extract layer sha256:bb01bd7e32b58b6694c8c3622c230171f1cec24001a82068a8d30d338f420d6c: failed to get reader from content store: content digest sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6: not found

No matter what image we use we always see the same error with the same sha256 values, even on different nodes. If we nsenter on to the node and do a ctr -n k8s.io images check we see that the image is incomplete...

docker.io/library/alpine:latest                                                                  application/vnd.docker.distribution.manifest.list.v2+json sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11 incomplete (1/2) 1.4 KiB/3.2 MiB     true
docker.io/library/alpine@sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11 application/vnd.docker.distribution.manifest.list.v2+json sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11 incomplete (1/2) 1.4 KiB/3.2 MiB     true
sha256:5e2b554c1c45d22c9d1aa836828828e320a26011b76c08631ac896cbc3625e3e                          application/vnd.docker.distribution.manifest.list.v2+json sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11 incomplete (1/2) 1.4 KiB/3.2 MiB     true

if we then manually do a ctr -n k8s.io images pull --snapshotter devmapper docker.io/library/alpine:latest the image seems to download successfully and becomes complete.

After this manual first time pull everything suddenly seems to work again. e.g. we can run pods normally without problem etc.

Steps to reproduce the issue

  1. Configure a devmapper snapshotter as per https://github.com/containerd/containerd/blob/main/docs/snapshotters/devmapper.md
  2. create a runtimeclass that uses the devmapper snapshotter
  3. create a simple kubernetes pod

Describe the results you received and expected

We now see... CreateContainerError
previously this was... Running.

Suspect this is a change in 1.7.1 but not certain as we are still narrowing down the problem

What version of containerd are you using?

containerd github.com/containerd/containerd v1.7.1 1677a17

Any other relevant information

crictl 1.26.0

Show configuration if it is related to CRI plugin.

[plugins."io.containerd.snapshotter.v1.devmapper"]
pool_name = "devpool"
root_path = "/var/data/containerd/devmapper"
base_image_size = "100GB"
fs_type = "ext4"
discard_blocks = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
privileged_without_host_devices = true
pod_annotations = ["io.katacontainers.*"]
snapshotter = "devmapper"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration.toml"

@skaegi
Copy link
Contributor Author

skaegi commented Jun 12, 2023

hmm.. I've noticed if I am on the node and do a crictl pull alpine:latest I get the same incomplete behavior when I then do a ctr -n k8s.io images check. Is there some sort of problem when multiple snapshotters are available?

# crictl pull alpine     
Image is up to date for sha256:5e2b554c1c45d22c9d1aa836828828e320a26011b76c08631ac896cbc3625e3e
# ctr -n k8s.io images check | grep alpine:latest
docker.io/library/alpine:latest                                                                  application/vnd.docker.distribution.manifest.list.v2+json sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11 incomplete (1/2)   1.4 KiB/3.2 MiB     true
# ctr -n k8s.io images export /tmp/test docker.io/library/alpine:latest 
ctr: failed to get reader: content digest sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6: not found
# ctr -n k8s.io content ls | grep 8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6
sha256:c0669ef34cdc14332c0f1ab0c2c01acb91d96014b172f1a76f3a39e63d1f0bda	528B	24 seconds	containerd.io/distribution.source.docker.io=library/alpine,containerd.io/gc.ref.content.config=sha256:5e2b554c1c45d22c9d1aa836828828e320a26011b76c08631ac896cbc3625e3e,containerd.io/gc.ref.content.l.0=sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6

@skaegi
Copy link
Contributor Author

skaegi commented Jun 12, 2023

I tried with the 1.7.0 and 1.7.2 binaries and got exactly the same problem so there must be something else going on.

@dmcgowan
Copy link
Member

@skaegi how reproducible is this on the other nodes? Have you seen with a different snapshotter?

@skaegi
Copy link
Contributor Author

skaegi commented Jun 13, 2023

Thanks @dmcgowan -- I'm still working the problem and trying to create as small a reproducer as I can. Even without the devmapper snapshotter enabled if I crictl pull alpine:latest I end up with an incomplete when I do an ctr image check. I'm beginning to suspect the underlying problem is the node initial setup and pre-download of an image that conflicts.

@skaegi
Copy link
Contributor Author

skaegi commented Jun 16, 2023

Ok. I think I understand what's happening. We're running into a case where the same diff id maps to two different digests based on where it's pulled from. This can happen based on differences in gzip versions and also on how gzip is used to compress content (e.g. fast vs. best) -- see google/go-containerregistry#895 (comment) and then scroll to "The Issue" for details.

[sorry for the long comment here]

For example...

$ skopeo --override-arch amd64 --override-os linux inspect docker://uk.icr.io/armada-master/ingress-alpine:3.18_249 | jq .LayersData
[
  {
    "MIMEType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
    "Digest": "sha256:bc01fbd705408cb3b447eb0ad4059bf08f686442be80e65dd951ce884b64a595",
    "Size": 3510776,
    "Annotations": null
  },
  {
    "MIMEType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
    "Digest": "sha256:2eff7c3de033bc293b7ed57da746981c13722015e41adbcd90745c9cb054ad14",
    "Size": 2290360,
    "Annotations": null
  }
]
$ docker pull --platform amd64 uk.icr.io/armada-master/ingress-alpine:3.18_249
3.18_249: Pulling from armada-master/ingress-alpine
bc01fbd70540: Pull complete 
2eff7c3de033: Pull complete 
Digest: sha256:2f1ad34e0a7030a8f9a030a1ec99d8ebaa0a9ee4e9cb59ee04e244337d94726c
Status: Downloaded newer image for uk.icr.io/armada-master/ingress-alpine:3.18_249
uk.icr.io/armada-master/ingress-alpine:3.18_249
$ docker inspect uk.icr.io/armada-master/ingress-alpine:3.18_249 | jq .[].RootFS
{
  "Type": "layers",
  "Layers": [
    "sha256:bb01bd7e32b58b6694c8c3622c230171f1cec24001a82068a8d30d338f420d6c",
    "sha256:56827d5f32b1823fa029c5ff87745a88adfb013763fd1c61396bdb303a1597cd"
  ]
}


$ skopeo --override-arch amd64 --override-os linux inspect docker://alpine:3.18.0 | jq .LayersData
[
  {
    "MIMEType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
    "Digest": "sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6",
    "Size": 3397490,
    "Annotations": null
  }
]
$ docker pull --platform amd64 alpine:3.18.0
3.18.0: Pulling from library/alpine
8a49fdb3b6a5: Already exists 
Digest: sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11
Status: Downloaded newer image for alpine:3.18.0
docker.io/library/alpine:3.18.0
$ docker inspect alpine:3.18.0 | jq .[].RootFS
{
  "Type": "layers",
  "Layers": [
    "sha256:bb01bd7e32b58b6694c8c3622c230171f1cec24001a82068a8d30d338f420d6c"
  ]
}

In this case both the following digests...

sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6
sha256:bc01fbd705408cb3b447eb0ad4059bf08f686442be80e65dd951ce884b64a595

map to the same diff id

sha256:bb01bd7e32b58b6694c8c3622c230171f1cec24001a82068a8d30d338f420d6c

Also notice that in the second docker pull we get ... 8a49fdb3b6a5: Already exists to avoid the pull of the same diff content... great!

If I do the equivalent with crictl and then inspect with ctr -n k8s.io image check I get the "incomplete" alpine images...

crictl pull uk.icr.io/armada-master/ingress-alpine:3.18_249
crictl pull alpine:3.18.0
ctr -n k8s.io i check | grep incomplete

docker.io/library/alpine:3.18.0                                                                                application/vnd.docker.distribution.manifest.list.v2+json sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11 incomplete (1/2) 1.4 KiB/3.2 MiB true
docker.io/library/alpine@sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11               application/vnd.docker.distribution.manifest.list.v2+json sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11 incomplete (1/2) 1.4 KiB/3.2 MiB true
sha256:5e2b554c1c45d22c9d1aa836828828e320a26011b76c08631ac896cbc3625e3e                                        application/vnd.docker.distribution.manifest.list.v2+json sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11 incomplete (1/2) 1.4 KiB/3.2 MiB true

@skaegi
Copy link
Contributor Author

skaegi commented Jun 16, 2023

The alpine image is incomplete because we don't download the sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6 layer and put it in the "content" store. This is because there is an optimization in the image puller and the alpine images's config referenced the diff id which is already unpacked as a snapshot in the default snapshotter so we just skip downloading the layer (possibly because we didn't realize the digests were or could be different). e.g. we don't create...

sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6	3.397MB	20 seconds	containerd.io/distribution.source.docker.io=library/alpine,containerd.io/uncompressed=sha256:bb01bd7e32b58b6694c8c3622c230171f1cec24001a82068a8d30d338f420d6c

Even though the alpine image is incomplete this will work fine when we later create a container using the default snapshotter because the diff id is already there and we don't have to unpack the content. It however will fail if we try to ctr image export the image or for any other snapshotter as the diff id is not already present in this different snapshotter and also not available via the layer in the "content" store.

# ctr -n k8s.io image export /tmp/img docker.io/library/alpine:3.18.0
ctr: failed to get reader: content digest sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6: not found

@skaegi
Copy link
Contributor Author

skaegi commented Jun 16, 2023

@dmcgowan this might seem naive but what if the image puller "always" also checked that the digest id was present in the content store and if not downloaded the layer even though the resulting diff id is present in the default snapshotter?
This would cost some disk space to store these layers with different digests but identical diff ids but would prevent incomplete images and the inability to export the image as well as this issue's case where we want to make non-default snapshotter snapshots.

@skaegi
Copy link
Contributor Author

skaegi commented Jun 16, 2023

Hmm... after the fact I noticed that this is what ctr -n k8s.io image pull docker.io/library/alpine:3.18.0 will do -- e.g. it will always pull the layer and put it in the content store however I noticed it did not add the containerd.io/uncompressed label.

# ctr -n k8s.io content ls | grep 8a49
sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6	3.397MB	41 seconds	containerd.io/distribution.source.docker.io=library/alpine
sha256:c0669ef34cdc14332c0f1ab0c2c01acb91d96014b172f1a76f3a39e63d1f0bda	528B	41 seconds	containerd.io/distribution.source.docker.io=library/alpine,containerd.io/gc.ref.content.config=sha256:5e2b554c1c45d22c9d1aa836828828e320a26011b76c08631ac896cbc3625e3e,containerd.io/gc.ref.content.l.0=sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6

Anyway... it would be good if crictl pull had the same behaviour of downloading and storing the content even if the layer didn't need to be unpacked.

@skaegi
Copy link
Contributor Author

skaegi commented Jun 20, 2023

Related or in some sense the real issue -- #8580

@dmcgowan @mikebrow I need a fix here as we really need devmapper to work in IKS and can do the work but am trying to figure out how to do this without creating a PR that would never be accepted.

I can see tweaking the logic around https://github.com/containerd/containerd/blob/release/1.7/pkg/unpack/unpacker.go#L321 or perhaps adding a config parameter to force content store fetching. Because the devmapper snapshotter is native perhaps it could be made smarter to understand the layer mapping in the overlayfs snapshotter and get the content that way?

Do you have any suggestions on an approach I might try here?

@skaegi skaegi changed the title First time devmapper snapshotter k8s image pulls result in error unpacking image: failed to extract layer devmapper snapshotter image pulls result in error unpacking image: failed to extract layer Jul 26, 2023
@mikebrow mikebrow added this to the 2.0 milestone Aug 24, 2023
@mikebrow
Copy link
Member

tagged to 2.0 milestone as Runtime Specific Snapshotters is currently listed to come out of experimental in 2.0 https://github.com/containerd/containerd/blob/main/RELEASES.md#experimental-features

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants