Skip to content

/sys/fs/cgroup access failure on starting a 3-layer deep nested podman container #26182

@amykang2020

Description

@amykang2020

Issue Description

Unable to start a 3-layer deep nested container due to /sys/fs/cgroup access error. This is a Podman in Podman in Podman use case.

Steps to reproduce the issue

On OL8 host, opc user starts an application "ciagent" as a systemd service. /etc/subuid and /etc/subgid contain:
opc:100000:900000

In response to a client request, the ciagent, written in Go and using github.com/containers/podman/v5 v5.1.2, will create a container - "base" container - using podman golang binding API with following ContainerSecurityConfig along with other configs

securityConfig := specgen.ContainerSecurityConfig{
Privileged: &boolTrue,
User: rootUser,
ReadOnlyFilesystem: &boolFalse,
}

Then the base container will create a user 'user1' with uid 1101 and group 'user1' for the client user using groupadd and useradd, then setup user1's home directory for podman config:
NEW_USER=user1
USERDIR=/home/user1
mkdir -p ${USERDIR}/.config/containers
chown ${NEW_USER}:oci ${USERDIR}/.config
cp /etc/podman-containers.conf ${USERDIR}/.config/containers/containers.conf
chown ${NEW_USER}:oci ${USERDIR}/.config/containers
chown ${NEW_USER}:oci ${USERDIR}/.config/containers/containers.conf
mkdir -p /run/user/1101 && chown ${NEW_USER}:oci /run/user/1101
export XDG_RUNTIME_DIR=/run/user/1101

/etc/subuid and /etc/sugid inside base container have
user1:165536:65536

Then the ciagent uses podman binding API to exec and attach user1 into the container so that user1 can run shell commands inside the container.

The user1 inside the base container setup its storage.conf and starts a container "build-agent":
/home/user1/.config/containers/storage.conf contains:
[storage]
driver = "overlay"
runroot = "/run/user/1101/containers"
graphroot = BUILD_AGENT_IMAGE_STORAGE_DIR

podman run --privileged --network=host --cgroupns=private --cgroup-manager=cgroupfs -v step-image-volume:/var/lib/containers/storage:Z -name build-agent BUILD_AGENT_IMAGE BUILD_STEP_IMAGE

Inside the build-agent container, the build-agent container root starts the "build-step" container:
podman run --detach --privileged --network=host --cgroup-manager=cgroupfs --cgroupns=private --name build-step BUILD_STEP_IMAGE

Describe the results you received

The above build-step container fails to start with error:
time="2025-05-24T23:27:16Z" level=warning msg="Failed to add conmon to cgroupfs sandbox cgroup: creating cgroup for cpu: mkdir /sys/fs/cgroup/cpu: read-only file system"
Error: OCI runtime error: runc: runc create failed: no cgroup mount found in mountinfo

Describe the results you expected

Expect the 3rd container - build-step container - starts up successfully

podman info output

In base container:

$ podman info
host:
  arch: arm64
  buildahVersion: 1.33.12
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - pids
  - rdma
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.module+el8.10.0+90541+332b2aa7.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: 6176697378ce26e2229ce82838e17694b054d86f'
  cpuUtilization:
    idlePercent: 74.08
    systemPercent: 6.68
    userPercent: 19.24
  cpus: 2
  databaseBackend: sqlite
  distribution:
    distribution: ol
    variant: server
    version: "8.10"
  eventLogger: file
  freeLocks: 2048
  hostname: e17529799dd0
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1101
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1101
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.4.17-2136.343.5.1.el8uek.aarch64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 2354905088
  memTotal: 12461932544
  networkBackend: cni
  networkBackendInfo:
    backend: cni
    dns:
      package: podman-plugins-4.9.4-20.0.1.module+el8.10.0+90541+332b2aa7.aarch64
      path: /usr/libexec/cni/dnsname
      version: |-
        CNI dnsname plugin
        version: 1.4.0-dev
        commit: unknown
        CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0
    package: containernetworking-plugins-1.4.0-5.module+el8.10.0+90541+332b2aa7.aarch64
    path: /usr/libexec/cni
  ociRuntime:
    name: runc
    package: runc-1.1.12-6.module+el8.10.0+90541+332b2aa7.aarch64
    path: /usr/bin/runc
    version: |-
      runc version 1.1.12
      spec: 1.2.0+dev
      go: go1.22.9 (Red Hat 1.22.9-1.module+el8.10.0+90476+bb48cc15)
      libseccomp: 2.5.2
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: false
    path: /run/user/1101/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.3-1.module+el8.10.0+90541+332b2aa7.aarch64
    version: |-
      slirp4netns version 1.2.3
      commit: c22fde291bb35b354e6ca44d13be181c76a0a432
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 2654863360
  swapTotal: 2654863360
  uptime: 0h 8m 18.00s
  variant: v8
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - container-registry.oracle.com
  - docker.io
store:
  configFile: /home/akang/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /tmp/step2/storage
  graphRootAllocated: 5196382208
  graphRootUsed: 176128
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/user/1101/containers
  transientStore: false
  volumePath: /tmp/step2/storage/volumes
version:
  APIVersion: 4.9.4-rhel
  Built: 1742989845
  BuiltTime: Wed Mar 26 11:50:45 2025
  GitCommit: ""
  GoVersion: go1.22.9 (Red Hat 1.22.9-1.module+el8.10.0+90476+bb48cc15)
  Os: linux
  OsArch: linux/arm64
  Version: 4.9.4-rhel

Podman in a container

Yes

Privileged Or Rootless

Rootless with --privileged

Upstream Latest Release

OL8 podman release

Additional environment details

SELinux: The OL8 host has SELinux enforced enabled. The base build-agent and build-step containers have SELinux enforced disabled.

VM Host:
Linux fe7874bcc246 5.4.17-2136.335.4.el8uek.x86_64 #3 SMP Thu Aug 22 12:18:30 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
CGroup V2 enabled by running the following commands then rebooted
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1 user_namespace.enable=1
echo "user.max_user_namespaces=28633" | sudo tee -a /etc/sysctl.conf

Dockerfiles of build-agent and build-step containers:
Both build-agent and build-step containers use base image container-registry.oracle.com/os/oraclelinux:8.OS archicture is X86

Additional Information

The following are some debug output in the build-agent container when its entrypoint script runs right before starting the build-step container:

+ uname -a
Linux fe7874bcc246 5.4.17-2136.335.4.el8uek.x86_64 #3 SMP Thu Aug 22 12:18:30 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
The step image is localhost/step:latest
The Agent entrypoint is running by:
+ echo 'The step image is localhost/step:latest'
+ STEP_IMAGE=localhost/step:latest
+ echo 'The Agent entrypoint is running by:'
+ id
uid=0(root) gid=0(root) groups=0(root)
Environment variables available in Agent container: 
+ echo 'Environment variables available in Agent container: '
+ printenv
HOSTNAME=fe7874bcc246
container=podman
PWD=/
HOME=/root
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/bin/printenv
Create /etc/containers/containers.conf
+ echo 'Create /etc/containers/containers.conf'
+ cat
Setup /etc/subuid /etc/subgid for Agent container
+ echo 'Setup /etc/subuid /etc/subgid for Agent container'
+ echo root:300000:65536
+ echo root:300000:65536
Take care /usr/bin/newuidmap /usr/bin/newgidmap
+ echo 'Take care /usr/bin/newuidmap /usr/bin/newgidmap'
+ chown root:root /usr/bin/newuidmap /usr/bin/newgidmap
+ chmod u+s /usr/bin/newuidmap /usr/bin/newgidmap
+ ls -l /usr/bin/newuidmap
-rwsr-xr-x. 1 root root 48904 Apr  9  2024 /usr/bin/newuidmap
+ ls -l /usr/bin/newgidmap
-rwsr-xr-x. 1 root root 48944 Apr  9  2024 /usr/bin/newgidmap
+ setcap cap_setuid+ep /usr/bin/newuidmap
+ setcap cap_setgid+ep /usr/bin/newgidmap
+ getcap /usr/bin/newuidmap /usr/bin/newgidmap
/usr/bin/newuidmap cap_setuid=ep
/usr/bin/newgidmap cap_setgid=ep
+ echo 'Check /etc/subuid /etc/subgid'
Check /etc/subuid /etc/subgid
+ cat /etc/subuid
root:300000:65536
+ cat /etc/subgid
root:300000:65536
+ echo 'Check cgroup'
Check cgroup
+ ls -ld /sys/fs/cgroup
drwxrwxrwt. 2 root root 40 May 26 00:48 /sys/fs/cgroup
+ ls -ld '/sys/fs/cgroup/*'
+ head -5
ls: cannot access '/sys/fs/cgroup/*': No such file or directory
+ true
+ mount
+ grep cgroup
cgroup on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)
tmpfs on /sys/fs/cgroup type tmpfs (ro,relatime,seclabel,uid=1101,gid=1101)
+ lsns -t cgroup
        NS TYPE   NPROCS   PID USER COMMAND
4026532705 cgroup      4  1399 root catatonit -P
4026532712 cgroup      2  2572 root /bin/bash /usr/local/bin/dlcbld_agent_entrypoint.sh localhost/step:latest
+ grep cgroup2 /proc/filesystems
nodev   cgroup2
+ cat /sys/fs/cgroup/cgroup.subtree_control
+ head -5
cat: /sys/fs/cgroup/cgroup.subtree_control: No such file or directory
+ true
+ cat /proc/self/cgroup
0::/
+ grep CapEff
+ cat /proc/self/status
CapEff: 0000003fffffffff
+ stat -fc %T /sys/fs/cgroup/
tmpfs
+ cat /proc/self/uid_map
         0       1101          1
         1     165536      65536
+ cat /proc/filesystems
+ grep cgroup2
nodev   cgroup2
+ echo 'Check current init system'
Check current init system
+ ps -p 1 -o comm=
bash
Check SELinux enabled
+ echo 'Check SELinux enabled'
+ getenforce
Disabled

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.locked - please file new issue/PRAssist humans wanting to comment on an old issue or PR with locked comments.triagedIssue has been triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions