Containers created with cgroup_manager=cgroupfs fail obscurely with new default settings #7830

owtaylor · 2020-09-29T17:44:55Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

At some point in the past, apparently libpod wrote a libpod.conf that included cgroup_manager=cgroupfs (small chance that I added it manually as a workaround for some issue in the past.)

When I upgraded to podman-2.1.1, I got the new warning:

 Ignoring libpod.conf EventsLogger setting %q. Use containers.conf if you want to change this setting and remove libpod.conf files.

I inspected libpod.conf, determined that there was nothing in there that looked like local configuration, and deleted it. At this point, my old containers no longer started, with the error message:

$ podman start fedora-toolbox-33
Error: unable to start container "72f11d1d13db67827c20861e1c3017a9e4fce4a75ffdb0c067dad6b34bdc1f19": sd-bus call: Invalid argument: OCI runtime error

After an hour or so of investigation, noticing that new containers did start, and diffing debug output between new and old containers, I determined that it was possible to start my old containers with a manual --cgroup-manager=cgroupfs.

it was pointed out to me that there's a note in the podman(1) man page: Note: Setting this flag can cause certain commands to break when called on containers previously created by the other CGroup manager type." - but even if I had found this note, it would not necessarily been clear to me that "certain commands" includes starting the container - the wording sounds like it's referring to edge cases.

Steps to reproduce the issue:

$ podman --cgroup-manager=cgroupfs run --name=test1 fedora:32 echo hi
hi
$ podman start test1
Error: unable to start container "ac41f25244ff62da8d3702d389ca11e41fc74fc464d7cfa31914580817225978": sd-bus call: Invalid argument: OCI runtime error

Describe the results you expected:

One of the following:

The container runs properly with the systemd cgroup manager - from a user perspective, it's hard to see why this setting can't be used with a previously created container.
libpod detects that the container was created with the cgroupfs cgroup manager and automatically uses that when starting it
libpod gives a clear error message saying what is going wrong, and how to fix it.

From a toolbox perspective, 1 or 2 would be better, since the user doesn't invoke podman directly, and just expects their previously created toolbox to keep working. Having toolbox detect the situation and add --cgroup-manager seems less than ideal.

Output of podman version:

Version:      2.1.1
API Version:  2.0.0
Go Version:   go1.15.1
Built:        Sun Sep 27 09:43:05 2020
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.16.1
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.21-0.3.dev.git5a6b2ac.fc33.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.21-dev, commit: 5c03a23398b94cf869f071128631ef7fd9153b3b'
  cpus: 8
  distribution:
    distribution: fedora
    version: "33"
  eventLogger: journald
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.8.11-300.fc33.x86_64
  linkmode: dynamic
  memFree: 22865244160
  memTotal: 33439457280
  ociRuntime:
    name: crun
    package: crun-0.15-3.fc33.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.15
      commit: 56ca95e61639510c7dbd39ff512f80f626404969
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.4-4.dev.giteecccdb.fc33.x86_64
    version: |-
      slirp4netns version 1.1.4+dev
      commit: eecccdb96f587b11d7764556ffacfeaffe4b6e11
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.0
  swapFree: 21082660864
  swapTotal: 21082660864
  uptime: 1h 42m 20.93s (Approximately 0.04 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /var/home/otaylor/.config/containers/storage.conf
  containerStore:
    number: 20
    paused: 0
    running: 1
    stopped: 19
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.1.2-1.fc33.x86_64
      Version: |-
        fusermount3 version: 3.9.3
        fuse-overlayfs: version 1.1.0
        FUSE library version 3.9.3
        using FUSE kernel interface version 7.31
  graphRoot: /var/home/otaylor/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 216
  runRoot: /run/user/1000/containers
  volumePath: /var/home/otaylor/.local/share/containers/storage/volumes
version:
  APIVersion: 2.0.0
  Built: 1601214185
  BuiltTime: Sun Sep 27 09:43:05 2020
  GitCommit: ""
  GoVersion: go1.15.1
  OsArch: linux/amd64
  Version: 2.1.1

Package info (e.g. output of rpm -q podman or apt list podman):

$ rpm -q podman
warning: Found bdb Packages database while attempting sqlite backend: using bdb backend.
podman-2.1.1-2.fc33.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

The text was updated successfully, but these errors were encountered:

vrothberg · 2020-09-30T12:25:13Z

Thanks for opening the issue, @owtaylor!

Podman does not store the cgroup-manager a container has been started with. It's a global setting. If a container is created with --cgroup-manager=X it must be started with it as well.

It recently bit me as well and I'd love Podman to store the cgroup manager in the database and reuse it for further commands.

@baude @giuseppe @mheon @rhatdan WDYT?

owtaylor · 2020-09-30T13:09:32Z

I guess parsing the cgroup-manager out of the ExitCommand would be bad form?

Presumably, there is something about the container configuration that is causing this particular bug (wild guess - CgroupParent=""?) - if there's not a general solution to this that covers all cases, shouldn't it be possible to catch this particular case and do something reasonable / present a good error message?

mheon · 2020-10-08T19:24:47Z

I have a PR that will fix this for new containers, but I don't think we can fix existing containers easily.

When we create a container, we assign a cgroup parent based on the current cgroup manager in use. This parent is only usable with the cgroup manager the container is created with, so if the default cgroup manager is later changed or overridden, the container will not be able to start. To solve this, store the cgroup manager that created the container in container configuration, so we can guarantee a container with a systemd cgroup parent will always be started with systemd cgroups. Unfortunately, this is very difficult to test in CI, due to the fact that we hard-code cgroup manager on all invocations of Podman in CI. Fixes containers#7830 Signed-off-by: Matthew Heon <matthew.heon@pm.me>

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 29, 2020

rhatdan assigned mheon Sep 30, 2020

mheon mentioned this issue Oct 8, 2020

Store cgroup manager on a per-container basis #7970

Merged

openshift-merge-robot closed this as completed in #7970 Oct 8, 2020

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Containers created with cgroup_manager=cgroupfs fail obscurely with new default settings #7830

Containers created with cgroup_manager=cgroupfs fail obscurely with new default settings #7830

owtaylor commented Sep 29, 2020

vrothberg commented Sep 30, 2020

owtaylor commented Sep 30, 2020

mheon commented Oct 8, 2020

Containers created with cgroup_manager=cgroupfs fail obscurely with new default settings #7830

Containers created with cgroup_manager=cgroupfs fail obscurely with new default settings #7830

Comments

owtaylor commented Sep 29, 2020

vrothberg commented Sep 30, 2020

owtaylor commented Sep 30, 2020

mheon commented Oct 8, 2020