Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot start podman systemd service #13731

Closed
bkaczynski opened this issue Mar 31, 2022 · 14 comments · Fixed by #13765
Closed

Cannot start podman systemd service #13731

bkaczynski opened this issue Mar 31, 2022 · 14 comments · Fixed by #13765
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@bkaczynski
Copy link

bkaczynski commented Mar 31, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I've homelab NUC server running openSUSE MicroOS and during creating pod I encounter errors during enabling containers after reboot. I've tested more complex scenario that I pasted below. I've tried loginctl enable-linger <user>, I've added to my ~/.bashrc file these lines

export XDG_RUNTIME_DIR=/run/user/$(id -u)
export DBUS_SESSION_BUS_ADDRESS=unix:path=${XDG_RUNTIME_DIR}/bus

but it won't change behavior

Steps to reproduce the issue:

  1. podman run -d --name nextcloud -p 8080:80 docker.io/library/nextcloud

  2. cd ~/.config/systemd/user/

  3. podman generate systemd --files --name nextcloud

  4. systemctl enable --user container-nextcloud.service

  5. podman stop nextcloud

  6. systemctl --user start container-nextcloud.service

Describe the results you received:

From last command I receive output

Job for container-nextcloud.service failed because the control process exited with error code.
See "systemctl --user status container-nextcloud.service" and "journalctl --user -xeu container-nextcloud.service" for details.

When I check journalctl --user -xeu container-nextcloud.service I found this entry

Mar 31 08:19:23 srv podman[7950]: Error: OCI runtime error: unable to start container "e151773c7ecc997a82fcb5b722fc5604abb76bc2e3e3bb1589c42fae1038e519": runc create failed: unable to start container process: can't get final child's PID from pipe: EOF
Mar 31 08:19:23 srv systemd[5223]: container-nextcloud.service: Control process exited, code=exited, status=125/n/a

Describe the results you expected:

Nextcloud systemd service up and running

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:      3.4.4
API Version:  3.4.4
Go Version:   go1.13.15
Built:        Thu Dec  9 01:00:00 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.30-1.3.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: unknown'
  cpus: 8
  distribution:
    distribution: '"opensuse-microos"'
    version: "20220329"
  eventLogger: journald
  hostname: srv
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.16.15-1-default
  linkmode: dynamic
  logDriver: journald
  memFree: 30281973760
  memTotal: 33198034944
  ociRuntime:
    name: runc
    package: runc-1.1.0-1.2.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.1.0
      commit: v1.1.0-0-g605c1cb1cc0c
      spec: 1.0.2-dev
      go: go1.17.7
      libseccomp: 2.5.3
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_MKNOD,CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.11-1.4.x86_64
    version: |-
      slirp4netns version 1.1.11
      commit: unknown
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 11m 49.84s
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.opensuse.org
  - docker.io
store:
  configFile: /home/bkaczynski/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: btrfs
  graphOptions: {}
  graphRoot: /home/bkaczynski/.local/share/containers/storage
  graphStatus:
    Build Version: 'Btrfs v5.16.1 '
    Library Version: "102"
  imageStore:
    number: 3
  runRoot: /run/user/1000/containers
  volumePath: /home/bkaczynski/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.4
  Built: 1639008000
  BuiltTime: Thu Dec  9 01:00:00 2021
  GitCommit: ""
  GoVersion: go1.13.15
  OsArch: linux/amd64
  Version: 3.4.4

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.4.4-2.2.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes, tested on

Client:       Podman Engine
Version:      4.0.2
API Version:  4.0.2
Go Version:   go1.16.15

Built:      Wed Mar 16 01:00:00 2022
OS/Arch:    linux/amd64

and switched overlay backend storage with same results.

Additional environment details (AWS, VirtualBox, physical, etc.):

Tried on various physical machines with same podman version

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 31, 2022
@Vogtinator
Copy link

Vogtinator commented Mar 31, 2022

Probably caused by the systemd 250 update and its oom_score_adj change: systemd/systemd#22437

You can try starting the container with --oom-score-adj=200 or undoing the systemd change to user@.service.

@AsterOps
Copy link

Ok, issue is related with systemd 250.4 update. As a workaround I reverted content of /usr/lib/systemd/system/user@.service before the update. Then systemd podman service works well.

@Luap99
Copy link
Member

Luap99 commented Apr 1, 2022

@giuseppe @vrothberg PTAL
Looks like we have issues with new systemd versions.

@heipa0
Copy link

heipa0 commented Apr 3, 2022

I can confirm this bug. Until yesterday everything worked fine on my openSUSE MicroOS.
Since the last update non of my podman containers start anymore via systemd.

Before the update:
systemd 249 (249.7+suse.57.g523f32df57)

After the update:
systemd 250 (250.4+suse.35.g8ef8dfd540)

@mig4
Copy link

mig4 commented Apr 3, 2022

I can confirm starting the containers with the --oom-score-adj=200 option and re-generating the unit files makes those services start up OK.

I'm guessing the connection to systemd/systemd#22437 is such that systemd now sets oom_score_adj to 100 for user services and so podman now has a similar issue as the one in dbus referenced in that systemd issue and so podman will need to be changed to avoid setting oom_score_adj to 0 for user services, right?

@Vogtinator
Copy link

I'm guessing the connection to systemd/systemd#22437 is such that systemd now sets oom_score_adj to 100 for user services and so podman now has a similar issue as the one in dbus referenced in that systemd issue and so podman will need to be changed to avoid setting oom_score_adj to 0 for user services, right?

Yup. IMO nsexec could (optionally) just warn if it fails to change the OOM score and continue.

@vrothberg
Copy link
Member

Why did systemd perform the change? I'd expect a number of units to start failing after the update.

@giuseppe WDYT?

giuseppe added a commit to giuseppe/libpod that referenced this issue Apr 4, 2022
do not force a value of OOMScoreAdj=0 if it is wasn't specified by the
user.

It is a breaking change in the API, but it makes clearer that
OOMScoreAdj is an optional attribute and it is treated as such in the
backend.

Closes: containers#13731

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/libpod that referenced this issue Apr 4, 2022
do not force a value of OOMScoreAdj=0 if it is wasn't specified by the
user.

It is a breaking change in the API, but it makes clearer that
OOMScoreAdj is an optional attribute and it is treated as such in the
backend.

Closes: containers#13731

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member

giuseppe commented Apr 4, 2022

opened a PR: #13765

giuseppe added a commit to giuseppe/libpod that referenced this issue Apr 4, 2022
do not force a value of OOMScoreAdj=0 if it is wasn't specified by the
user.

It is a breaking change in the API, but it makes clearer that
OOMScoreAdj is an optional attribute and it is treated as such in the
backend.

Closes: containers#13731

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/libpod that referenced this issue Apr 4, 2022
do not force a value of OOMScoreAdj=0 if it is wasn't specified by the
user.

Closes: containers#13731

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/libpod that referenced this issue Apr 4, 2022
do not force a value of OOMScoreAdj=0 if it is wasn't specified by the
user.

Closes: containers#13731

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/libpod that referenced this issue Apr 4, 2022
do not force a value of OOMScoreAdj=0 if it is wasn't specified by the
user.

Closes: containers#13731

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/libpod that referenced this issue Apr 4, 2022
do not force a value of OOMScoreAdj=0 if it is wasn't specified by the
user.

Closes: containers#13731

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/libpod that referenced this issue Apr 4, 2022
do not force a value of OOMScoreAdj=0 if it is wasn't specified by the
user.

Closes: containers#13731

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@Vogtinator
Copy link

opened a PR: #13765

Confirmed to work, thanks!

@mig4
Copy link

mig4 commented Apr 10, 2022

Why did systemd perform the change? I'd expect a number of units to start failing after the update.

The change in systemd was introduced so that user services would be more likely to be killed by the OOM killer than system services or the managers themselves. It did cause at least one other issue in dbus-daemon but the argument was that processes without privileges shouldn't attempt to lower their score.

@mwilck
Copy link

mwilck commented Apr 12, 2022

I have a similar issue which is not solved by the current fix, when I start a stopped container which had the OOMScoreAdj property set (it was created before the systemd change was applied).

...
          "HostConfig": {
...
              "OomScoreAdj": 0,

Any chance to fix this issue in this case, too (besides removing and re-creating it)?

@Luap99
Copy link
Member

Luap99 commented Apr 12, 2022

You have to recreate the container. There is no other way.

@Vogtinator
Copy link

IMO the failure to adjust the OOM score could be downgraded to a warning instead of an error.

@rhatdan
Copy link
Member

rhatdan commented Apr 12, 2022

You would need to get the OCI Runtimes to do agree to this. Podman asked for the OOM to be set and the OCI Runtimes can not set it.

gbraad pushed a commit to gbraad-redhat/podman that referenced this issue Jul 13, 2022
do not force a value of OOMScoreAdj=0 if it is wasn't specified by the
user.

Closes: containers#13731

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants