-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Issue Description
Unable to start a 3-layer deep nested container due to /sys/fs/cgroup access error. This is a Podman in Podman in Podman use case.
Steps to reproduce the issue
On OL8 host, opc user starts an application "ciagent" as a systemd service. /etc/subuid and /etc/subgid contain:
opc:100000:900000
In response to a client request, the ciagent, written in Go and using github.com/containers/podman/v5 v5.1.2, will create a container - "base" container - using podman golang binding API with following ContainerSecurityConfig along with other configs
securityConfig := specgen.ContainerSecurityConfig{
Privileged: &boolTrue,
User: rootUser,
ReadOnlyFilesystem: &boolFalse,
}
Then the base container will create a user 'user1' with uid 1101 and group 'user1' for the client user using groupadd and useradd, then setup user1's home directory for podman config:
NEW_USER=user1
USERDIR=/home/user1
mkdir -p ${USERDIR}/.config/containers
chown ${NEW_USER}:oci ${USERDIR}/.config
cp /etc/podman-containers.conf ${USERDIR}/.config/containers/containers.conf
chown ${NEW_USER}:oci ${USERDIR}/.config/containers
chown ${NEW_USER}:oci ${USERDIR}/.config/containers/containers.conf
mkdir -p /run/user/1101 && chown ${NEW_USER}:oci /run/user/1101
export XDG_RUNTIME_DIR=/run/user/1101
/etc/subuid and /etc/sugid inside base container have
user1:165536:65536
Then the ciagent uses podman binding API to exec and attach user1 into the container so that user1 can run shell commands inside the container.
The user1 inside the base container setup its storage.conf and starts a container "build-agent":
/home/user1/.config/containers/storage.conf contains:
[storage]
driver = "overlay"
runroot = "/run/user/1101/containers"
graphroot = BUILD_AGENT_IMAGE_STORAGE_DIR
podman run --privileged --network=host --cgroupns=private --cgroup-manager=cgroupfs -v step-image-volume:/var/lib/containers/storage:Z -name build-agent BUILD_AGENT_IMAGE BUILD_STEP_IMAGE
Inside the build-agent container, the build-agent container root starts the "build-step" container:
podman run --detach --privileged --network=host --cgroup-manager=cgroupfs --cgroupns=private --name build-step BUILD_STEP_IMAGE
Describe the results you received
The above build-step container fails to start with error:
time="2025-05-24T23:27:16Z" level=warning msg="Failed to add conmon to cgroupfs sandbox cgroup: creating cgroup for cpu: mkdir /sys/fs/cgroup/cpu: read-only file system"
Error: OCI runtime error: runc: runc create failed: no cgroup mount found in mountinfo
Describe the results you expected
Expect the 3rd container - build-step container - starts up successfully
podman info output
In base container:
$ podman info
host:
arch: arm64
buildahVersion: 1.33.12
cgroupControllers:
- cpuset
- cpu
- io
- memory
- pids
- rdma
cgroupManager: cgroupfs
cgroupVersion: v2
conmon:
package: conmon-2.1.10-1.module+el8.10.0+90541+332b2aa7.aarch64
path: /usr/bin/conmon
version: 'conmon version 2.1.10, commit: 6176697378ce26e2229ce82838e17694b054d86f'
cpuUtilization:
idlePercent: 74.08
systemPercent: 6.68
userPercent: 19.24
cpus: 2
databaseBackend: sqlite
distribution:
distribution: ol
variant: server
version: "8.10"
eventLogger: file
freeLocks: 2048
hostname: e17529799dd0
idMappings:
gidmap:
- container_id: 0
host_id: 1101
size: 1
- container_id: 1
host_id: 165536
size: 65536
uidmap:
- container_id: 0
host_id: 1101
size: 1
- container_id: 1
host_id: 165536
size: 65536
kernel: 5.4.17-2136.343.5.1.el8uek.aarch64
linkmode: dynamic
logDriver: k8s-file
memFree: 2354905088
memTotal: 12461932544
networkBackend: cni
networkBackendInfo:
backend: cni
dns:
package: podman-plugins-4.9.4-20.0.1.module+el8.10.0+90541+332b2aa7.aarch64
path: /usr/libexec/cni/dnsname
version: |-
CNI dnsname plugin
version: 1.4.0-dev
commit: unknown
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0
package: containernetworking-plugins-1.4.0-5.module+el8.10.0+90541+332b2aa7.aarch64
path: /usr/libexec/cni
ociRuntime:
name: runc
package: runc-1.1.12-6.module+el8.10.0+90541+332b2aa7.aarch64
path: /usr/bin/runc
version: |-
runc version 1.1.12
spec: 1.2.0+dev
go: go1.22.9 (Red Hat 1.22.9-1.module+el8.10.0+90476+bb48cc15)
libseccomp: 2.5.2
os: linux
pasta:
executable: ""
package: ""
version: ""
remoteSocket:
exists: false
path: /run/user/1101/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.3-1.module+el8.10.0+90541+332b2aa7.aarch64
version: |-
slirp4netns version 1.2.3
commit: c22fde291bb35b354e6ca44d13be181c76a0a432
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.2
swapFree: 2654863360
swapTotal: 2654863360
uptime: 0h 8m 18.00s
variant: v8
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- container-registry.oracle.com
- docker.io
store:
configFile: /home/akang/.config/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /tmp/step2/storage
graphRootAllocated: 5196382208
graphRootUsed: 176128
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Supports shifting: "true"
Supports volatile: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 0
runRoot: /run/user/1101/containers
transientStore: false
volumePath: /tmp/step2/storage/volumes
version:
APIVersion: 4.9.4-rhel
Built: 1742989845
BuiltTime: Wed Mar 26 11:50:45 2025
GitCommit: ""
GoVersion: go1.22.9 (Red Hat 1.22.9-1.module+el8.10.0+90476+bb48cc15)
Os: linux
OsArch: linux/arm64
Version: 4.9.4-rhel
Podman in a container
Yes
Privileged Or Rootless
Rootless with --privileged
Upstream Latest Release
OL8 podman release
Additional environment details
SELinux: The OL8 host has SELinux enforced enabled. The base build-agent and build-step containers have SELinux enforced disabled.
VM Host:
Linux fe7874bcc246 5.4.17-2136.335.4.el8uek.x86_64 #3 SMP Thu Aug 22 12:18:30 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
CGroup V2 enabled by running the following commands then rebooted
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1 user_namespace.enable=1
echo "user.max_user_namespaces=28633" | sudo tee -a /etc/sysctl.conf
Dockerfiles of build-agent and build-step containers:
Both build-agent and build-step containers use base image container-registry.oracle.com/os/oraclelinux:8.OS archicture is X86
Additional Information
The following are some debug output in the build-agent container when its entrypoint script runs right before starting the build-step container:
+ uname -a
Linux fe7874bcc246 5.4.17-2136.335.4.el8uek.x86_64 #3 SMP Thu Aug 22 12:18:30 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
The step image is localhost/step:latest
The Agent entrypoint is running by:
+ echo 'The step image is localhost/step:latest'
+ STEP_IMAGE=localhost/step:latest
+ echo 'The Agent entrypoint is running by:'
+ id
uid=0(root) gid=0(root) groups=0(root)
Environment variables available in Agent container:
+ echo 'Environment variables available in Agent container: '
+ printenv
HOSTNAME=fe7874bcc246
container=podman
PWD=/
HOME=/root
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/bin/printenv
Create /etc/containers/containers.conf
+ echo 'Create /etc/containers/containers.conf'
+ cat
Setup /etc/subuid /etc/subgid for Agent container
+ echo 'Setup /etc/subuid /etc/subgid for Agent container'
+ echo root:300000:65536
+ echo root:300000:65536
Take care /usr/bin/newuidmap /usr/bin/newgidmap
+ echo 'Take care /usr/bin/newuidmap /usr/bin/newgidmap'
+ chown root:root /usr/bin/newuidmap /usr/bin/newgidmap
+ chmod u+s /usr/bin/newuidmap /usr/bin/newgidmap
+ ls -l /usr/bin/newuidmap
-rwsr-xr-x. 1 root root 48904 Apr 9 2024 /usr/bin/newuidmap
+ ls -l /usr/bin/newgidmap
-rwsr-xr-x. 1 root root 48944 Apr 9 2024 /usr/bin/newgidmap
+ setcap cap_setuid+ep /usr/bin/newuidmap
+ setcap cap_setgid+ep /usr/bin/newgidmap
+ getcap /usr/bin/newuidmap /usr/bin/newgidmap
/usr/bin/newuidmap cap_setuid=ep
/usr/bin/newgidmap cap_setgid=ep
+ echo 'Check /etc/subuid /etc/subgid'
Check /etc/subuid /etc/subgid
+ cat /etc/subuid
root:300000:65536
+ cat /etc/subgid
root:300000:65536
+ echo 'Check cgroup'
Check cgroup
+ ls -ld /sys/fs/cgroup
drwxrwxrwt. 2 root root 40 May 26 00:48 /sys/fs/cgroup
+ ls -ld '/sys/fs/cgroup/*'
+ head -5
ls: cannot access '/sys/fs/cgroup/*': No such file or directory
+ true
+ mount
+ grep cgroup
cgroup on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)
tmpfs on /sys/fs/cgroup type tmpfs (ro,relatime,seclabel,uid=1101,gid=1101)
+ lsns -t cgroup
NS TYPE NPROCS PID USER COMMAND
4026532705 cgroup 4 1399 root catatonit -P
4026532712 cgroup 2 2572 root /bin/bash /usr/local/bin/dlcbld_agent_entrypoint.sh localhost/step:latest
+ grep cgroup2 /proc/filesystems
nodev cgroup2
+ cat /sys/fs/cgroup/cgroup.subtree_control
+ head -5
cat: /sys/fs/cgroup/cgroup.subtree_control: No such file or directory
+ true
+ cat /proc/self/cgroup
0::/
+ grep CapEff
+ cat /proc/self/status
CapEff: 0000003fffffffff
+ stat -fc %T /sys/fs/cgroup/
tmpfs
+ cat /proc/self/uid_map
0 1101 1
1 165536 65536
+ cat /proc/filesystems
+ grep cgroup2
nodev cgroup2
+ echo 'Check current init system'
Check current init system
+ ps -p 1 -o comm=
bash
Check SELinux enabled
+ echo 'Check SELinux enabled'
+ getenforce
Disabled