Skip to content
This repository has been archived by the owner on May 24, 2024. It is now read-only.

Tracker for support for nested containers #282

Open
cgwalters opened this issue Feb 9, 2024 · 11 comments
Open

Tracker for support for nested containers #282

cgwalters opened this issue Feb 9, 2024 · 11 comments

Comments

@cgwalters
Copy link
Member

This relates to containers/bootc#128 - but isn't quite the same thing. Let's use this as a tracker for supporting "nesting" container images.

We should ideally support something like this:

FROM quay.io/centos-bootc/centos-bootc:stream9
RUN podman --storage-driver=vfs --root=/usr/share/containers/storage pull <someimage>
COPY somecontainer.container /usr/share/containers/systemd

Where somecontainer.container is a podman systemd unit that also uses:

PodmanArgs=--root=/usr/share/containers/storage

The reason I mentioned --storage-driver=vfs is to avoid overlayfs and nested whiteouts...I think as of recent overlayfs this is supported at runtime, but...I can't make a whiteout in a default podman run invocation; I think the device cgroup may be coming into play?

$ cat Containerfile
FROM quay.io/centos/centos:stream9
RUN mknod somewh c 0 0
$ podman build -t localhost/test .
STEP 1/2: FROM quay.io/centos/centos:stream9
STEP 2/2: RUN mknod somewh c 0 0
mknod: somewh: Operation not permitted
Error: building at STEP "RUN mknod somewh c 0 0": while running runtime: exit status 1
$

Even if we could make the whiteout, I think we'd run into problems because there's no standard for nesting them at the OCI level. Also xref https://www.spinics.net/lists/linux-unionfs/msg11253.html

@DanielFroehlich
Copy link

DanielFroehlich commented Feb 12, 2024

what is also important is that the reference to <someimage> works unmodified with the runtime, e.g. if used in systemd file, scripts using podman, microshift etc. No matter what the reference is (labels, SHA digest, etc.).
Esp. digests used to be a problem in the past because they could change when moving/embedding the oci container image. MicroShift/OpenShift release images rely on digest references.

@rhatdan
Copy link
Collaborator

rhatdan commented Feb 13, 2024

Why not use additionalstores for this. Latest containers-common setup Podman and buildah to automatically look for an additional store in /usr/lib/containers/storage. If images are pulled into this store, then Podman will use this as a read/only store and /var/lib/containers/storage as a read/write store.

@alexlarsson
Copy link

alexlarsson commented Feb 20, 2024

I think using vfs backend is a bad idea btw, at least if you run non-readonly containers, because the vfs driver cannot use overlayfs for the container upper layer. The ideal approach would be to use the overlayfs backend with composefs enabled, because then there will be no whiteout files in the container storage (they are all inside the composefs blob in the storage).

@rhatdan
Copy link
Collaborator

rhatdan commented Mar 25, 2024

Adding in CAP_SYS_ADMIN seems to allow this to work?

$ podman build --cap-add SYS_ADMIN /tmp
STEP 1/2: FROM quay.io/centos-bootc/centos-bootc:stream9
STEP 2/2: RUN podman --root=/usr/share/containers/storage pull alpine
Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob sha256:4abcf20661432fb2d719aaf90656f55c287f8ca915dc1c92ec14ff61e67fbaf8
Copying config sha256:05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e0d007bd
Writing manifest to image destination
05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e0d007bd
COMMIT
--> c8edcbce04cd
c8edcbce04cda8c52eb2043f9bcd23c74cb6a1e90948bb08dde27f2bfd31b7bd

@rhatdan
Copy link
Collaborator

rhatdan commented Mar 25, 2024

Here is a little test I did to make this work.

$ cat /tmp/Containerfile
FROM quay.io/centos-bootc/centos-bootc:stream9
RUN sed -e '/additionalimage.*/a "/usr/lib/containers/storage",' -i /etc/containers/storage.conf
RUN podman --root=/usr/lib/containers/storage pull alpine

$ podman build -t bootc --cap-add SYS_ADMIN /tmp
STEP 1/3: FROM quay.io/centos-bootc/centos-bootc:stream9
STEP 2/3: RUN podman --root=/usr/lib/containers/storage pull alpine
Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob sha256:4abcf20661432fb2d719aaf90656f55c287f8ca915dc1c92ec14ff61e67fbaf8
Copying config sha256:05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e0d007bd
Writing manifest to image destination
05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e0d007bd
--> b4e3d3d3506b
STEP 3/3: RUN sed -e '/additionalimage.*/a "/usr/lib/containers/storage",' -i /etc/containers/storage.conf
COMMIT bootc
--> a94b77143258
Successfully tagged localhost/bootc:latest

$ podman run -ti --cap-add SYS_ADMIN bootc podman images
REPOSITORY TAG IMAGE ID CREATED SIZE R/O
docker.io/library/alpine latest 05455a08881e 8 weeks ago 7.67 MB true

In order to use Overlay within a container you need to run the container with CAP_SYS_ADMIN or play with rootless containers.

@cgwalters
Copy link
Member Author

In order to use Overlay within a container you need to run the container with CAP_SYS_ADMIN or play with rootless containers.

We're having a realtime conversation about this and I think there's general agreement that if the problem is that podman pull is trying to do an overlayfs mount, then the bugfix would be to podman to have it stop doing that.

I still have an open uncertainty about whiteouts which I agree with Alex would be much better fixed by composefs - avoiding the need for metadata in general written directly into the container image filesystem.

@sallyom
Copy link
Collaborator

sallyom commented Mar 25, 2024

cross-building from arm M2 for x86_64 (after adding --cap-add SYS_ADMIN) there's an issue:

$ cat Containerfile
FROM quay.io/centos-bootc/centos-bootc:stream9
RUN podman pull alpine && podman pull busybox

This builds fine from arm M2 machine:

podman build --arch aarch64 -t myimage:arm --cap-add SYS_ADMIN .

This fails from my arm M2 machine:

podman build --arch x86_64 -t myimage:amd64 --cap-add SYS_ADMIN .

and here's the weird error:

STEP 2/2: RUN podman pull alpine && podman pull busybox
Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob sha256:4abcf20661432fb2d719aaf90656f55c287f8ca915dc1c92ec14ff61e67fbaf8
Error: copying system image from manifest list: writing blob: adding layer with blob "sha256:4abcf20661432fb2d719aaf90656f55c287f8ca915dc1c92ec14ff61e67fbaf8": processing tar file(Error: unrecognized command `podman /`

Did you mean this?
	cp
	ps
	rm

Try 'podman --help' for more information
): exit status 125
Error: building at STEP "RUN podman pull alpine && podman pull busybox": while running runtime: exit status 125

@DanielFroehlich
Copy link

Thx for progressing on this!

I would feel better with some automated CI test cases that mimic the actual use case as a smoke test: a container image with whiteouts (!!!) referenced using sha digest in the containerfile. Then bootc the resulting image and ensure that the image referenced with the same digest as in the containerfile comes up and works correctly.
Because: we had the same situation with Blueprints and image builder - it initially looked like it would be working, but actually was not. And this is a must have feature for microshift / edge deployments in airgapped / disconnected used cases.

And to add an additional requirement: building of these images has to work on OpenShift in a CI/CD pipeline without cluster-admin privilege's .

@rhatdan
Copy link
Collaborator

rhatdan commented Mar 25, 2024

The issue seems to be that podman without CAP_SYS_ADMIN fails over to setting up a User Namespace with a single mapping. I am talking to @giuseppe about whether or not this is required or how we could work around this. For now this will work fine with CAP_SYS_ADMIN added to the build. I don't see any issues with the Whiteouts being stored in the images, as they normally do on a host. The running of containers on containers is blocking overlay on overlay, but I don't think this is an issue we would see here.

@giuseppe
Copy link

When we configure the user namespace we don't know what command is going to be executed by Podman so we don't check for that combination (and possibly we need also CAP_SETFCAP), but we check only for CAP_SYS_ADMIN.

I think it is correct this way because even if you pull the images in that environment, you won't be able to use them until you gain CAP_SYS_ADMIN, and setting the user namespace will probably use different mappings.

@cgwalters
Copy link
Member Author

Also relevant is ostreedev/ostree#2722

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants