Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman-restart.service causes shutdown to hang #14434

Closed
andrin55 opened this issue May 31, 2022 · 6 comments · Fixed by #14446
Closed

podman-restart.service causes shutdown to hang #14434

andrin55 opened this issue May 31, 2022 · 6 comments · Fixed by #14446
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@andrin55
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Enabling podman-restart.service causes shutdown to hang until the containers are killed after the timeout.

Steps to reproduce the issue:

  1. Create container with restart-policy=always

  2. Enable podman-restart.service

  3. Restart and observe log

Describe the results you received:
Systemd waits 1m 30s for "libcrun container" until it kills it.

Describe the results you expected:
Graceful shutdown of containers on shutdown

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

4.0.2

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.24.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: 3a898eb433ae426e729088ccdc2bdae44a3164da'
  cpus: 2
  distribution:
    distribution: '"rhel"'
    version: "9.0"
  eventLogger: journald
  hostname: localhost
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.0-70.13.1.el9_0.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 3225972736
  memTotal: 3865698304
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.4-2.el9_0.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.4
      commit: 6521fcc5806f20f6187eb933f9f45130c86da230
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-4.el9.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 4202688512
  swapTotal: 4202688512
  uptime: 3m 52.77s
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - quay.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 4
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 7
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.0.2
  Built: 1652984291
  BuiltTime: Thu May 19 20:18:11 2022
  GitCommit: ""
  GoVersion: go1.17.7
  OsArch: linux/amd64
  Version: 4.0.2

Package info (e.g. output of rpm -q podman or apt list podman):

podman-4.0.2-7.el9_0.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes (RHEL Podman)

Additional environment details (AWS, VirtualBox, physical, etc.):

Running latest RHEL9 Podman.

Due to podman-restart.service not having a ExecStop procedure, it fails when stopping:

systemd[1]: Stopping Podman Start All Containers With Restart Policy Set To Always...
systemd[1]: podman-restart.service: State 'stop-sigterm' timed out. Killing.
systemd[1]: podman-restart.service: Killing process 970 (conmon) with signal SIGKILL.
systemd[1]: podman-restart.service: Killing process 972 (gmain) with signal SIGKILL.
systemd[1]: podman-restart.service: Failed with result 'timeout'.
systemd[1]: Stopped Podman Start All Containers With Restart Policy Set To Always.

However this does not seem to stop the container, which then gets killed by systemd after the timeout.
Adding ExecStop to the podman-restart.service solves the issue (due to podman stop not supporting the "--filter" flag, I had to use this workarround):
ExecStop=/bin/sh -c '/usr/bin/podman $LOGGING stop $(/usr/bin/podman container ls --filter restart-policy=always -q)'

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label May 31, 2022
@mheon
Copy link
Member

mheon commented May 31, 2022

Restart-policy doesn't actually use or require podman-restart.service - are you sure the two are related?

@andrin55
Copy link
Contributor Author

Since I use the podman-restart.service to start all containers with restart-policy=always I added the ExecStop with the same filter just to stop the same containers. The problem does not come from the restart-policy, it comes from the fact, that without ExecStop added to the podman-restart.service, systemd needs to kill the running containers forcefully in order to shut down (since nothing else seem to stop them gracefully).

@rhatdan
Copy link
Member

rhatdan commented May 31, 2022

Makes sense to me. @vrothberg WDYT?

@vrothberg
Copy link
Member

Makes sense to me. @vrothberg WDYT?

Sounds good to me. Adding a new --filter flag to podman stop would be nice to make the ExecStop more elegant.

@vrothberg
Copy link
Member

@andrin55, interested in opening a PR to fix the issue? We can add the --filter flag at some later point.

@andrin55
Copy link
Contributor Author

andrin55 commented Jun 1, 2022

@vrothberg I made a new pull request. I messed something up with the signoff with the previous one.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
4 participants