Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rootless podman container is not stopped properly on host shutdown (hangs) #15284

Closed
topas-rec opened this issue Aug 11, 2022 · 14 comments · Fixed by #15766
Closed

rootless podman container is not stopped properly on host shutdown (hangs) #15284

topas-rec opened this issue Aug 11, 2022 · 14 comments · Fixed by #15766
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@topas-rec
Copy link
Contributor

topas-rec commented Aug 11, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

At shutdown the system hangs for some time and does not shutdown as fast as it normally does.
I have added the container to systemd by using the generate systemd command to create a unit file.

When I stop the podman container that is running before shutdown, the shutdown does not hang.
Stopping the container manually via systemctl --user [...] stops it without problems.

Steps to reproduce the issue:

  1. Create a rootless container (nextcloud in my case)

  2. Create a systemd unit file by using the generate systemd command

  3. Shutdown the system and observe a delay at shutdown

Describe the results you received:
The system hangs for some time and does not shutdown as fast as it normally does because stopping the container doesn't work.

Describe the results you expected:
Properly shutdown container (stopped as fast as when the stop command in systemd is invoked manually) and a fast shutdown wihtout systemd to forcefully stop the container.

Additional information you deem important (e.g. issue happens only occasionally):
The issue happens on every shutdown. (It might not have happen after initial container creation - but I sadly do not remember)

Output of podman version:

Client:       Podman Engine
Version:      4.1.1
API Version:  4.1.1
Go Version:   go1.19
Git Commit:   f73d8f8875c2be7cd2049094c29aff90b1150241-dirty
Built:        Wed Aug  3 22:52:48 2022
OS/Arch:      linux/amd64

Output of podman info:

host:
  arch: amd64
  buildahVersion: 1.26.1
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.1.3-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.3, commit: ab52a597278b20173440140cd810dc9fa8785c93'
  cpuUtilization:
    idlePercent: 92.78
    systemPercent: 1.78
    userPercent: 5.44
  cpus: 4
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  hostname: tobias-pc
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.18.16-arch1-1
  linkmode: dynamic
  logDriver: journald
  memFree: 3396468736
  memTotal: 8216518656
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 1.5-1
    path: /usr/bin/crun
    version: |-
      crun version 1.5
      commit: 54ebb8ca8bf7e6ddae2eb919f5b82d1d96863dea
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.2.0-1
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 4107792384
  swapTotal: 4107792384
  uptime: 34m 35.29s
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/tobiasb/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/tobiasb/.local/share/containers/storage
  graphRootAllocated: 49381437440
  graphRootUsed: 27635216384
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/user/1000/containers
  volumePath: /home/tobiasb/.local/share/containers/storage/volumes
version:
  APIVersion: 4.1.1
  Built: 1659559968
  BuiltTime: Wed Aug  3 22:52:48 2022
  GitCommit: f73d8f8875c2be7cd2049094c29aff90b1150241-dirty
  GoVersion: go1.19
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.1

Package info (e.g. output of rpm -q podman or apt list podman):

pacman --query --info podman
Name            : podman
Version         : 4.1.1-4
Description     : Tool and library for running OCI-based containers in pods
Architecture    : x86_64
URL             : https://github.com/containers/podman
Licenses        : Apache
Groups          : None
Provides        : None
Depends On      : catatonit  conmon  containers-common  crun  iptables  libdevmapper.so=1.02-64  libgpgme.so=11-64
                  libseccomp.so=2-64  slirp4netns
Optional Deps   : apparmor: for AppArmor support
                  btrfs-progs: support btrfs backend devices [installed]
                  netavark: for a new container-network-stack implementation [installed]
                  podman-compose: for docker-compose compatibility
                  podman-docker: for Docker-compatible CLI
Required By     : None
Optional For    : None
Conflicts With  : None
Replaces        : None
Installed Size  : 64.11 MiB
Packager        : David Runge <dvzrv@archlinux.org>
Build Date      : Wed 03 Aug 2022 22:52:48 CEST
Install Date    : Fri 05 Aug 2022 08:39:46 CEST
Install Reason  : Explicitly installed
Install Script  : No
Validated By    : Signature

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes (troubleshooting guide, version is mostly up to date)

Additional environment details (AWS, VirtualBox, physical, etc.):

This is the systemd unit file:

# container-myNextcloudContainer.service
# autogenerated by Podman 4.1.1
# Mon Aug  8 09:41:15 CEST 2022

[Unit]
Description=Podman container-myNextcloudContainer.service
Documentation=man:podman-generate-systemd(1)
Wants=network-online.target
After=network-online.target
RequiresMountsFor=/run/user/1000/containers

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
TimeoutStopSec=70
ExecStart=/usr/bin/podman start myNextcloudContainer
ExecStop=/usr/bin/podman stop -t 10 myNextcloudContainer
ExecStopPost=/usr/bin/podman stop -t 10 myNextcloudContainer
PIDFile=/run/user/1000/containers/overlay-containers/30f94855b53ac624877026608b13344235ad4321720aeaec000df352f3f325b0/userdata/conmon.pid
Type=forking

[Install]
WantedBy=default.target

I have a filtered and a full journalctl log from an affected boot:

filtered: (created by journalctl --boot -1 --grep=container\|libpod\|podman\|myNextcloudContainer)

Aug 11 12:23:06 tobias-pc systemd[333]: Starting Podman container-myNextcloudContainer.service...
Aug 11 12:23:07 tobias-pc systemd[333]: libpod-30f94855b53ac624877026608b13344235ad4321720aeaec000df352f3f325b0.scope: unit configures an IP firewall, but not running as root.
Aug 11 12:23:07 tobias-pc systemd[333]: Started libcrun container.
Aug 11 12:23:07 tobias-pc podman[441]: 2022-08-11 12:23:07.593536749 +0200 CEST m=+0.335178142 container init 30f94855b53ac624877026608b13344235ad4321720aeaec000df352f3f325b0 (image=docker.io/library/nextcloud:latest, name=myNextcloudContainer)
Aug 11 12:23:07 tobias-pc podman[441]: 2022-08-11 12:23:07.600739971 +0200 CEST m=+0.342381357 container start 30f94855b53ac624877026608b13344235ad4321720aeaec000df352f3f325b0 (image=docker.io/library/nextcloud:latest, name=myNextcloudContainer)
Aug 11 12:23:07 tobias-pc podman[441]: myNextcloudContainer
Aug 11 12:23:07 tobias-pc systemd[333]: Started podman-pause-b9433de2.scope.
Aug 11 12:23:07 tobias-pc systemd[333]: Started Podman container-myNextcloudContainer.service.
Aug 11 12:23:47 tobias-pc systemd[333]: Stopping libcrun container...
Aug 11 12:23:47 tobias-pc systemd[333]: Stopping podman-pause-b9433de2.scope...
Aug 11 12:23:47 tobias-pc systemd[333]: Stopping Podman container-myNextcloudContainer.service...
Aug 11 12:23:47 tobias-pc systemd[333]: Stopped podman-pause-b9433de2.scope.
Aug 11 12:23:47 tobias-pc systemd[333]: Stopped libcrun container.
Aug 11 12:23:48 tobias-pc podman[1836]: 2022-08-11 12:23:48.058918141 +0200 CEST m=+0.399080940 container died 30f94855b53ac624877026608b13344235ad4321720aeaec000df352f3f325b0 (image=docker.io/library/nextcloud:latest, name=myNextcloudContainer)
Aug 11 12:23:48 tobias-pc podman[1836]: 2022-08-11 12:23:48.062772871 +0200 CEST m=+0.402935700 container restart 30f94855b53ac624877026608b13344235ad4321720aeaec000df352f3f325b0 (image=docker.io/library/nextcloud:latest, name=myNextcloudContainer)
Aug 11 12:24:57 tobias-pc systemd[333]: container-myNextcloudContainer.service: Stopping timed out. Terminating.
Aug 11 12:24:57 tobias-pc systemd[333]: container-myNextcloudContainer.service: Main process exited, code=killed, status=10/USR1
Aug 11 12:24:57 tobias-pc podman[1883]: time="2022-08-11T12:24:57+02:00" level=error msg="Joining network namespace for container 30f94855b53ac624877026608b13344235ad4321720aeaec000df352f3f325b0: error retrieving network namespace at /run/user/1000/netns/netns-f3a5790f-2f43-a4e5-5389-bd1fb024df41: unknown FS magic on \"/run/user/1000/netns/netns-f3a5790f-2f43-a4e5-5389-bd1fb024df41\": 1021994"
Aug 11 12:24:57 tobias-pc podman[1883]: time="2022-08-11T12:24:57+02:00" level=error msg="Joining network namespace for container 30f94855b53ac624877026608b13344235ad4321720aeaec000df352f3f325b0: error retrieving network namespace at /run/user/1000/netns/netns-f3a5790f-2f43-a4e5-5389-bd1fb024df41: unknown FS magic on \"/run/user/1000/netns/netns-f3a5790f-2f43-a4e5-5389-bd1fb024df41\": 1021994"
Aug 11 12:24:57 tobias-pc podman[1883]: 2022-08-11 12:24:57.890991966 +0200 CEST m=+0.037434651 container restart 30f94855b53ac624877026608b13344235ad4321720aeaec000df352f3f325b0 (image=docker.io/library/nextcloud:latest, name=myNextcloudContainer)

Full journalctl of the same boot (created with journalctl --boot -1):
journalctl_podman_shutdown.log

Possible duplicate: #14531

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 11, 2022
@vrothberg
Copy link
Member

Thanks for reaching out, @topas-rec.

It looks similar to #14531. Did you configure a restart policy for the container?

Cc: @mheon

@topas-rec
Copy link
Contributor Author

Thanks for your quick help!

Yes, the container has --restart=always
Full run command is:

podman run --name=myNextcloudContainer --uidmap=0:1:33 --uidmap=33:0:1 --uidmap=34:34:65502 -dt --volume=nextcloudHtml:/var/www/html/ --volume=/home/tobiasb/DATA/:/media/DATA/ --restart=always -p 8084:80/tcp -e NEXTCLOUD_ADMIN_USER="[removed]" -e NEXTCLOUD_ADMIN_password="[removed]" nextcloud

@vrothberg
Copy link
Member

@topas-rec, thanks! Is the container getting restarted after a manual systemctl --user stop $.service as well?

@topas-rec
Copy link
Contributor Author

topas-rec commented Aug 11, 2022

No, it isn't. Stops fast through a systemd manual stop command and stays stopped.

After stopping it manually the shutdown is just fast, too. So when the container is stopped the symptom of a hangin shutdown is gone.

@vrothberg
Copy link
Member

Judging from the logs above it looks like the container gets restarted all the time until the 70 seconds timeout of the units kicks in.

@topas-rec
Copy link
Contributor Author

I find it hard to evaluate, but yeah that might be.
So should I use another restart policy?

@vrothberg
Copy link
Member

@topas-rec it would be worth trying if that is the cause - just to eliminate potential sources. Since the container is already running in a systemd unit, I suggest creating the container without a restart policy and use podman generate systemd --restart-policy instead. systemd can take care of that.

@topas-rec
Copy link
Contributor Author

Sure.
I'll love to see the machine shutdown in seconds again.

If that is the cause this mechanism ahould be noted in the docs for the generate systemd command.

@vrothberg
Copy link
Member

Sure. I'll love to see the machine shutdown in seconds again.

Absolutely :)

If that is the cause this mechanism ahould be noted in the docs for the generate systemd command.

Yes, definitely. Thanks for helping resolving the issue.

@topas-rec
Copy link
Contributor Author

Starting the container without --restart=always solves the problem.
I left the inital systemd unit file unchanged because I think the default restart policy on-failure is ok.

So there should be a hint in the generate systemd docs that says:
When using systemd to start and stop the container, a container restart policy of always might lead to a race condition at shutdown.

Thanks for helping me!

@vrothberg vrothberg reopened this Aug 11, 2022
@vrothberg
Copy link
Member

Thanks a lot, @topas-rec!

I reopen to track that we either document that or elaborate on code improvements.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Sep 12, 2022

@vrothberg Any progress on this?

vrothberg added a commit to vrothberg/libpod that referenced this issue Sep 13, 2022
Emit a warning to the user when generating a unit with --new on a
container that was created with a custom --restart policy. As shown
in containers#15284, a custom --restart policy in that case can lead to issues
on system shutdown where systemd attempts to nuke the unit but Podman
keeps on restarting the container.

Fixes: containers#15284
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
@vrothberg
Copy link
Member

@vrothberg Any progress on this?

Thanks for the ping. I opened #15766.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 16, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants