Active podman process blocks system reboot/shutdown #14531

1player · 2022-06-08T12:50:44Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

An active podman process is unable to be cleanly stopped by systemd reboot/shutdown, and thus has to be killed after the 2min grace period expires.

Steps to reproduce the issue:

podman run -it docker.io/library/busybox
Inside the container: sleep infinity
Reboot the system

Describe the results you received:

Shutdown procedure hangs for ~2 minutes because podman can't be stopped. Then podman is killed and shutdown is complete.

Describe the results you expected:

The podman container to be cleanly terminated as the system shuts down.

Package info (e.g. output of rpm -q podman or apt list podman):

podman-4.1.0-1.fc36.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Experienced this issue on Fedora Workstation 36 and Fedora Silverblue 36.

Downstream bug reports:

podman: https://bugzilla.redhat.com/show_bug.cgi?id=2084498
toolbox: https://bugzilla.redhat.com/show_bug.cgi?id=2081664

The text was updated successfully, but these errors were encountered:

vrothberg · 2022-06-08T12:55:38Z

Thanks for reaching out, @1player.

I don't think there is much Podman can do. sleep in busybox does not seem to respond to SIGSTOP, so systemd has to wait for the grace period to end until it can kill the process.

Luap99 · 2022-06-08T12:59:52Z

I agree, it is best to call podman stop before shutdown. This uses a 10 seconds timeout before sending SIGKILL, can be changed with -t

rhatdan · 2022-06-08T13:54:09Z

I wrote this in bugzilla too:
https://bugzilla.redhat.com/show_bug.cgi?id=2084498

I believe that podman run/start should catch SIGTERM and then execute podman stop on its pods/containers. This would cause the containers to exit properly or exit after 10 seconds.
This might be a slight deviation from Docker in some corner cases, but I believe this is the right behaviour. Especially if a container is running with a STOP_SIGNAL that is different
then SIG_TERM.  In the common case where a container is sending SIG_TERM, there is no change except the container gets killed after 10 seconds.  In the case where STOP_SIGNAL is set
then the container has a chance to close cleanly (systemd based containers for example). 
The only case that really changes is a corner case where user expects SIGTERM of Podman to send SIGTERM to container, when the container is not useing STOP_SIGNAL==SIGTERM.  In this case
users could just call `podman kill --signal SIGTERM $CTR`

From a user point of view, I think this is the most user friendly way to handle this.

vrothberg · 2022-06-08T14:25:24Z

@rhatdan, I don't think that would help in this scenario.

If there's a container running that does not adhere to sigterm/stop etc. Then systemd is blocked on the process.

We could think of a podman-shutdown.service that is being called on shutdown though.

rhatdan · 2022-06-08T14:29:42Z

Whell it would exit with a SIGKILL after 10 seconds. Having a podman-shutdown.service might make some sense and do a
podman pod stop --all
podman stop --all

mheon · 2022-06-08T14:40:23Z

Stop timeout is also user-configurable, so someone could theoretically have a container with a stop timeout of 90 seconds to ensure their container always has time to perform its safe shutdown routine, but that would still stall the system for 90 seconds on shutdown, potentially.

1player · 2022-06-08T14:49:49Z

I don't think there is much Podman can do. sleep in busybox does not seem to respond to SIGSTOP, so systemd has to wait for the grace period to end until it can kill the process.

I think the sleep in my example is a red herring. I notice this problem every time I use toolbox on a Fedora machine. Whenever I reboot, systemd complains it's not able to stop podman.

I have switched to distrobox since, and I have the same problem. Should these two utilities pass some special option to podman to avoid this?

1player · 2022-06-08T14:53:45Z

Sorry for double posting, but please also note this comment of mine from https://bugzilla.redhat.com/show_bug.cgi?id=2081664#c2

As additional details, journalctl suggests that the hung shutdown is caused by /usr/bin/conmon not responding to signals. It only seems to be stuck when running an interactive process.
Example:
If I run toolbox run sleep infinity, the toolbox container can be stopped immediately with podman stop or sending SIGINT to the conmon process.
If I run toolbox run /bin/sh, the toolbox container CANNOT be stopped by podman stop (fails with: "container has active exec sessions, refusing to clean up: container state improper"), and conmon doesn't respond to SIGINT.

Is this happening because of podman "refusing to clean up"?

vrothberg · 2022-06-08T14:58:37Z

I have switched to distrobox since, and I have the same problem. Should these two utilities pass some special option to podman to avoid this?

I would expect the tools to manage the containers and call podman stop.

Is this happening because of podman "refusing to clean up"?

That may explain why the containers are still running: podman stop failed.

Meister1593 · 2022-06-28T13:32:57Z

Is issue still tracked? This is quite annoying bug, is there a workaround?
Forcefully killing containers through scripts at least.

vrothberg · 2022-06-28T13:44:33Z

Is issue still tracked? This is quite annoying bug, is there a workaround? Forcefully killing containers through scripts at least.

The issue in 89luca89/distrobox#340 looks different than the one discussed here:

container has active exec sessions, refusing to clean up: container state improper

I do not know what distrobox does but it needs to exit from all exec sessions before. At the moment, I don't see how this relates to the initial bug here when the container ignores a signal and gets killed after a grace period.

1player · 2022-06-28T15:08:57Z

This is not limited to distrobox. podman exhibits the exact same behaviour. It seems that running some applications inside the container puts it in a state that podman/conmon refuses to stop it gracefully upon system shutdown.

I run emacs and pretty much all my dev tools inside a distrobox container, and most times it hangs on shutdown, but sometimes it doesn't.

I do not understand, as explained in #14531 (comment), why running toolbox run /bin/sh is reason enough for podman stop to quit working. I imagine the sh process would answer to a SIGTERM, and thus terminating a container should be possible.

Maybe it is caused by subshells spawned inside the container, which causes podman to refuse terminating it, hence the delay until SIGKILL is called.

Luap99 · 2022-06-28T15:17:29Z

It is your container process that is not responding to the signal, AFAIK shells do not shutdown on SIGTERM.

1player · 2022-06-28T15:23:15Z

It is your container process that is not responding to the signal, AFAIK shells do not shutdown on SIGTERM.

Are you saying that this is a toolbox and distrobox bug, and not podman?

Luap99 · 2022-06-28T15:37:46Z

yes, what is podman/systemd supposed to do when you container process does not shutdown on a normal stop signal, i.e. SIGTERM. So the only thing to do is to wait and send SIGKILL after timeout. You can change the stop signal and timeout with --stop-signal and --stop-timeout but I guess this only works when it is stopped via podman and not if systemd tries to kill it.

1player · 2022-06-28T16:24:54Z

yes, what is podman/systemd supposed to do when you container process does not shutdown on a normal stop signal, i.e. SIGTERM. So the only thing to do is to wait and send SIGKILL after timeout. You can change the stop signal and timeout with --stop-signal and --stop-timeout but I guess this only works when it is stopped via podman and not if systemd tries to kill it.

Sorry for being obtuse, but then why podman just throws its hands in the air and says container has active exec sessions, refusing to clean up: container state improper when running podman stop? It looks like it's refusing to do anything, not that it has sent a signal and nothing has responded.

Luap99 · 2022-06-28T16:33:28Z

I think you have to stop all exec session before, not sure if podman stop should to do that. @mheon might know that better?

mheon · 2022-06-28T17:18:19Z

Podman stop should do it. This is probably a distinct issue. Open a new bug with the full template filled out, please.

1player · 2022-06-28T20:13:39Z

Is it really a distinct issue? As I described above, this seems to be the cause of this problem. Podman refusing to stop a container because "it has active exec session", thus causing issues with toolbox, thus causing shutdown issues.

There are no particular logs to see, except that upon shutdown, journalctl points out that conmon had to be SIGKILL'd, as I mentioned above. I've provided a simple reproduction example, not sure what more can I do.

Here's the gist of it: a podman container should always be able to be stopped, except in case of an unresponsive process, which I would expect podman stop to send a SIGKILL in that case. But otherwise, podman stop should stop a container, and not complain about "active exec sessions" which I'm not sure I understand what it means concretely.

mheon · 2022-06-29T18:12:56Z

Are you certain Podman is refusing to stop the container? That error message doesn't read as a stop error to me, but a cleanup error. The container should have exited at this point, Podman is just having trouble cleaning up after it.

mheon · 2022-06-29T18:14:22Z

Given this, it definitely smells like a different issue. Podman is seemingly having trouble handling cleanup on containers as the system shuts down, which is distinct from this issue where Podman takes a long time to kill containers that refuse to gracefully exit, causing shutdown to hang.

github-actions · 2022-07-30T00:06:38Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2022-07-30T15:11:11Z

@vrothberg @giuseppe @mheon Do any of the fixups made recently to deadlocks address this issue?

mheon · 2022-07-30T18:15:43Z

I suspect not.

…

On Sat, Jul 30, 2022 at 10:11 Daniel J Walsh ***@***.***> wrote: @vrothberg <https://github.com/vrothberg> @giuseppe <https://github.com/giuseppe> @mheon <https://github.com/mheon> Do any of the fixups made recently to deadlocks address this issue? — Reply to this email directly, view it on GitHub <#14531 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3AOCG2LNWDWU6WJOWAIM3VWVAZVANCNFSM5YGNWQ2A> . You are receiving this because you were mentioned.Message ID: ***@***.***>

1player · 2022-08-01T09:23:30Z

BTW, Fedora is supposed to shorten the timeout before unresponsive processes are SIGKILLed from 2 minutes down to 15 seconds, so if this is still open when that change ships, users won't notice anything during shutdown but containers will still be killed forcefully.

As a big toolbox/distrobox user, I get this issue 4 out of every 5 times I reboot my workstation, and I don't keep any long running services inside the container.

github-actions · 2022-09-01T00:08:15Z

A friendly reminder that this issue had no activity for 30 days.

1player · 2022-12-05T15:23:05Z

This is still an issue and making life on Fedora Silverblue more painful than it needs to be.

vrothberg · 2022-12-06T12:47:00Z

@1player can you share the exact systemd unit that you run Podman in?

queeup · 2022-12-06T13:51:22Z

@vrothberg, you can test it with this container service on Silverblue. It takes 2 min to reboot/shutdown.

syncthing-test.service
- systemctl --user start syncthing-test.service
- Then reboot.the system.

PS: This is syncthing official container, I didn't add any volume or any published port.

Dockerfile: https://github.com/syncthing/syncthing/blob/main/Dockerfile

Only way to reboot this systemd container service without waiting is use --no-healthcheck on podman args.

# autogenerated by Podman 4.3.1
# Tue Dec  6 16:27:12 +03 2022

[Unit]
Description=Podman syncthing-test.service
Documentation=man:podman-generate-systemd(1)
Wants=network-online.target
After=network-online.target
RequiresMountsFor=%t/containers

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=no
TimeoutStopSec=70
ExecStartPre=/bin/rm \
    -f %t/%n.ctr-id
ExecStart=/usr/bin/podman run \
    --cidfile=%t/%n.ctr-id \
    --cgroups=no-conmon \
    --rm \
    --sdnotify=conmon \
    --replace \
    --detach \
    --name syncthing-test docker.io/syncthing/syncthing
ExecStop=/usr/bin/podman stop \
    --ignore -t 10 \
    --cidfile=%t/%n.ctr-id
ExecStopPost=/usr/bin/podman rm \
    -f \
    --ignore -t 10 \
    --cidfile=%t/%n.ctr-id
Type=notify
NotifyAccess=all

[Install]
WantedBy=default.target

vrothberg · 2022-12-06T13:54:40Z

Thanks for sharing, @queeup! I will take a look tomorrow. It's surprising to me as the stop-timeout is set to 10. So the container should - in theory - be killed after 10 seconds.

vrothberg · 2022-12-07T09:35:00Z

I can reproduce

vrothberg · 2022-12-07T09:57:43Z

The image ships a health check (see below) so Podman will run it on container start. But even a simple alpine top container with --health-cmd /bin/ls causes the boot to timeout/hang.

"Healthcheck": {                      
  "Test": [                           
    "CMD-SHELL",                      
    "nc -z 127.0.0.1 8384 || exit 1"  
  ],                                  
  "Interval": 60000000000,            
  "Timeout": 10000000000              
},

vrothberg · 2022-12-08T12:27:29Z

I wished having found more time to work on this bug. One thing I noticed while debugging is that we're stuck on stopping the transient health-check timer.

I hope to find some time tomorrow.

When stopping the transient systemd timer/unit which powers running health checks, make sure to ignore its dependencies. It turns out that we're otherwise running into a timeout when running a container in a systemd unit and reboot. An alternative may be to further tweak some attributes/options when creating the timer/unit via systemd-run but it seems safe to just ignore the dependencies and stop. [NO NEW TESTS NEEDED] - we don't yet have means to test reboots. Fixes: containers#14531 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>

vrothberg · 2022-12-09T07:38:17Z

#16785 fixes the issue and will make it into Podman 4.4.

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 8, 2022

1player mentioned this issue Jun 27, 2022

[BUG] Launched toolbox causes boot delays fedora-silverblue/issue-tracker#302

Closed

89luca89 mentioned this issue Jun 27, 2022

[Error] Containers do not gracefully stop on pc shutdown 89luca89/distrobox#340

Closed

github-actions bot added the stale-issue label Jul 30, 2022

rhatdan removed the stale-issue label Jul 30, 2022

topas-rec mentioned this issue Aug 11, 2022

rootless podman container is not stopped properly on host shutdown (hangs) #15284

Closed

github-actions bot added the stale-issue label Sep 1, 2022

rhatdan removed the stale-issue label Sep 4, 2022

vrothberg removed the kind/bug Categorizes issue or PR as related to a bug. label Dec 5, 2022

vrothberg mentioned this issue Dec 8, 2022

health check: ignore dependencies of transient systemd units/timers #16785

Merged

openshift-merge-robot closed this as completed in #16785 Dec 9, 2022

jackwilsdon mentioned this issue Dec 10, 2022

Generated systemd user units do not stop on shutdown before timeout expires #16683

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 8, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Active podman process blocks system reboot/shutdown #14531

Active podman process blocks system reboot/shutdown #14531

1player commented Jun 8, 2022

vrothberg commented Jun 8, 2022

Luap99 commented Jun 8, 2022

rhatdan commented Jun 8, 2022

vrothberg commented Jun 8, 2022 •

edited

Loading

rhatdan commented Jun 8, 2022

mheon commented Jun 8, 2022

1player commented Jun 8, 2022 •

edited

Loading

1player commented Jun 8, 2022 •

edited

Loading

vrothberg commented Jun 8, 2022

Meister1593 commented Jun 28, 2022

vrothberg commented Jun 28, 2022

1player commented Jun 28, 2022 •

edited

Loading

Luap99 commented Jun 28, 2022

1player commented Jun 28, 2022

Luap99 commented Jun 28, 2022

1player commented Jun 28, 2022 •

edited

Loading

Luap99 commented Jun 28, 2022

mheon commented Jun 28, 2022

1player commented Jun 28, 2022 •

edited

Loading

mheon commented Jun 29, 2022

mheon commented Jun 29, 2022

github-actions bot commented Jul 30, 2022

rhatdan commented Jul 30, 2022

mheon commented Jul 30, 2022 via email

1player commented Aug 1, 2022 •

edited

Loading

github-actions bot commented Sep 1, 2022

1player commented Dec 5, 2022

vrothberg commented Dec 6, 2022

queeup commented Dec 6, 2022 •

edited

Loading

vrothberg commented Dec 6, 2022

vrothberg commented Dec 7, 2022

vrothberg commented Dec 7, 2022

vrothberg commented Dec 8, 2022

vrothberg commented Dec 9, 2022

Active podman process blocks system reboot/shutdown #14531

Active podman process blocks system reboot/shutdown #14531

Comments

1player commented Jun 8, 2022

vrothberg commented Jun 8, 2022

Luap99 commented Jun 8, 2022

rhatdan commented Jun 8, 2022

vrothberg commented Jun 8, 2022 • edited Loading

rhatdan commented Jun 8, 2022

mheon commented Jun 8, 2022

1player commented Jun 8, 2022 • edited Loading

1player commented Jun 8, 2022 • edited Loading

vrothberg commented Jun 8, 2022

Meister1593 commented Jun 28, 2022

vrothberg commented Jun 28, 2022

1player commented Jun 28, 2022 • edited Loading

Luap99 commented Jun 28, 2022

1player commented Jun 28, 2022

Luap99 commented Jun 28, 2022

1player commented Jun 28, 2022 • edited Loading

Luap99 commented Jun 28, 2022

mheon commented Jun 28, 2022

1player commented Jun 28, 2022 • edited Loading

mheon commented Jun 29, 2022

mheon commented Jun 29, 2022

github-actions bot commented Jul 30, 2022

rhatdan commented Jul 30, 2022

mheon commented Jul 30, 2022 via email

1player commented Aug 1, 2022 • edited Loading

github-actions bot commented Sep 1, 2022

1player commented Dec 5, 2022

vrothberg commented Dec 6, 2022

queeup commented Dec 6, 2022 • edited Loading

vrothberg commented Dec 6, 2022

vrothberg commented Dec 7, 2022

vrothberg commented Dec 7, 2022

vrothberg commented Dec 8, 2022

vrothberg commented Dec 9, 2022

vrothberg commented Jun 8, 2022 •

edited

Loading

1player commented Jun 8, 2022 •

edited

Loading

1player commented Jun 8, 2022 •

edited

Loading

1player commented Jun 28, 2022 •

edited

Loading

1player commented Jun 28, 2022 •

edited

Loading

1player commented Jun 28, 2022 •

edited

Loading

1player commented Aug 1, 2022 •

edited

Loading

queeup commented Dec 6, 2022 •

edited

Loading