Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setting cgroup config for procHooks process caused: open pids.max: no such file or directory #11632

Closed
juspence opened this issue Sep 17, 2021 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@juspence
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

After adding systemd.unified_cgroup_hierarchy=1 to the kernel command line, containers have trouble starting due to a missing "pids.max" file.

This is a different error than #26 in troubleshooting.md, which is about a failure to open a cpu.max file that exists with the wrong permissions.

This issue does look similar to #10824 which was fixed in 3.3.0, but that issue is for podman machine / VM startup failures and not containers.

This issue actually looks most similar to #10800. I am running RHEL 8.4 not Alpine, but I did enable cgroups v2 / unified cgroup hierarchy, and the error message that user saw when removing a pod was also related to a missing pids.max file.

I had this issue with an internal application, which was built from a Dockerfile and started using podman-compose. I was able to start the pod / containers without rebooting after I removed and rebuilt the image a few times.

I aso ran some other podman commands, but I don't remember exactly what they were. It boiled down to "remove all volumes, networks, pods, and images", then rebuild from scratch.

Once I built the application and started it successfully, I could stop and restart it normally with no issues. But the error always reappeared after a reboot.

I was not able to work around the issue for my minimal testcase below (using the redis image). The error happens even for a clean image pull / container start.

Steps to reproduce the issue:

  1. podman pull 'docker.io/library/redis:latest'

  2. podman create redis

  3. podman start

Describe the results you received:
Container failed to start with an error message about a missing "pids.max" file.

Describe the results you expected:
Container starts with no error message.

Additional information you deem important (e.g. issue happens only occasionally):
Issue happens consistently.

Output of podman version:

Version:      3.2.3
API Version:  3.2.3
Go Version:   go1.15.7
Built:        Tue Jul 27 03:29:39 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.21.3
  cgroupControllers: []
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.29-1.module+el8.4.0+11822+6cc1e7d7.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.29, commit: ae467a0c8001179d4d0adf4ada381108a893d7ec'
  cpus: 6
  distribution:
    distribution: '"rhel"'
    version: "8.4"
  eventLogger: file
  hostname: localhost
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1002
      size: 1
    - container_id: 1
      host_id: 231072
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1002
      size: 1
    - container_id: 1
      host_id: 231072
      size: 65536
  kernel: 4.18.0-305.17.1.el8_4.x86_64
  linkmode: dynamic
  memFree: 26769522688
  memTotal: 33257762816
  ociRuntime:
    name: runc
    package: runc-1.0.0-74.rc95.module+el8.4.0+11822+6cc1e7d7.x86_64
    path: /usr/bin/runc
    version: |-
      runc version spec: 1.0.2-dev
      go: go1.15.13
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    path: /run/user/1002/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-1.module+el8.4.0+11822+6cc1e7d7.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 20m 31.05s
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 0
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.6-1.module+el8.4.0+11822+6cc1e7d7.x86_64
      Version: |-
        fusermount3 version: 3.2.1
        fuse-overlayfs: version 1.6
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  graphRoot: /home/user/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 1
  runRoot: /run/user/1002/containers
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 3.2.3
  Built: 1627370979
  BuiltTime: Tue Jul 27 03:29:39 2021
  GitCommit: ""
  GoVersion: go1.15.7
  OsArch: linux/amd64
  Version: 3.2.3

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.2.3-0.10.module+el8.4.0+11989+6676f7ad.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

Tested with 3.2.3, the latest version available for my RHEL8 system. If newer CentOS RPMs are available and installing these on RHEL will not cause issues, I can test with these.

Additional environment details (AWS, VirtualBox, physical, etc.):

cat "/sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers"
Shows an empty file. I have not created any delegate.conf or customized any Podman / systemd container service unit files.

(venv) [user@localhost flawdb]$ podman pull 'docker.io/library/redis:latest'
Trying to pull docker.io/library/redis:latest...
Getting image source signatures
Copying blob 5da5e1b21a2f done
Copying blob 6af3a5ca4596 done
Copying blob a330b6cecb98 done
Copying blob 4f9efe5b47a5 done
Copying blob 8b3e2d14a955 done
Copying blob 14bfbab96d75 done
Copying config 02c7f20544 done
Writing manifest to image destination
Storing signatures
02c7f2054405dadaf295fac7281034e998646996e9768e65a78f90af62218be3

(venv) [user@localhost flawdb]$ podman create redis
9fc09e52c64dba93ea80ad27a441aa5ecb87492d7ada9bbd6df62fb58ba3452e

(venv) [user@localhost flawdb]$ podman start 9fc09e52c64d
Error: unable to start container "9fc09e52c64dba93ea80ad27a441aa5ecb87492d7ada9bbd6df62fb58ba3452e": container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: open /sys/fs/cgroup/user.slice/user-1002.slice/user@1002.service/user.slice/libpod-9fc09e52c64dba93ea80ad27a441aa5ecb87492d7ada9bbd6df62fb58ba3452e.scope/pids.max: no such file or directory: OCI runtime attempted to invoke a command that was not found

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 17, 2021
@mheon
Copy link
Member

mheon commented Sep 17, 2021

@giuseppe PTAL - missing cgroup controller, probably?

@giuseppe
Copy link
Member

I am closing this issue since the problem seems to be in the systemd version shipped in RHEL 8 and the bug is tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=1897579

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

3 participants