libpod: simplify WaitForExit() #23601

Luap99 · 2024-08-13T13:41:50Z

The current code did several complicated state checks that simply do not work properly on a fast restarting container. It uses a special case for --restart=always but forgot to take care of --restart=on-failure which always hang for 20s until it run into the timeout.

The old logic also used to call CheckConmonRunning() but synced the state before which means it may check a new conmon every time and thus misses exits.

To fix the new the code is much simpler. Check the conmon pid, if it is no longer running then get then check exit file and get exit code.

This is related to #23473 but I am not sure if this fixes it because we cannot reproduce.

Does this PR introduce a user-facing change?

Fixes a bug where `podman wait` would exit only after 20s on a fast restarting container with a restart policy of on-failure.

openshift-ci · 2024-08-13T13:41:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

packit-as-a-service · 2024-08-13T13:42:01Z

We were not able to find or create Copr project packit/containers-podman-23601 specified in the config with the following error:

Packit received HTTP 500 Internal Server Error from Copr Service. Check the Copr status page: https://copr.fedorainfracloud.org/status/stats/, or ask for help in Fedora Build System matrix channel https://matrix.to/#/#buildsys:fedoraproject.org.

Unless the HTTP status code above is >= 500, please check your configuration for:

typos in owner and project name (groups need to be prefixed with @)
whether the project name doesn't contain not allowed characters (only letters, digits, underscores, dashes and dots must be used)
whether the project itself exists (Packit creates projects only in its own namespace)
whether Packit is allowed to build in your Copr project
whether your Copr project/group is not private

mheon · 2024-08-13T13:52:13Z

libpod/container_api.go

+
+func waitForConmonExit(conmonPID, conmonPidFd int, pollInterval time.Duration) error {
+	if conmonPidFd > -1 {
+		fds := []unix.PollFd{{Fd: int32(conmonPidFd), Events: unix.POLLIN}}


Does this work on FreeBSD? Otherwise will need to move over to one of the _linux files

yes because getConmonPidFd() is already platform specific and the bsd stub returns -1 just like in the case where it is not supported on linux.

mheon · 2024-08-13T13:55:04Z

libpod/container_api.go

+		return -1, err
+	}
+
+	if err := c.checkExitFile(); err != nil {


This can return nil if the exit file doesn't exist - as it might if Conmon was killed by a SIGKILL. I think we get into a bad state in that case - the container will still be marked as running.

we should not get into a bad state as this is just the wait call, cleanup/stop and other podman process would still be able to cleanup correctly I think.
However there is the issue that we call GetContainerExitCode() which would not have the current exit code and then we might return an old exit code from there instead of an error. So I guess I need to handle that somehow

Luap99 · 2024-08-13T15:09:05Z

Well the comment about a deadlock in kube play is true apparently... although I really do not follow the logic there.

lsm5 · 2024-08-13T15:53:19Z

/packit copr-build

mheon · 2024-08-14T14:49:45Z

libpod/runtime_volume_common.go

-	if err := v.update(); err != nil {
-		return err
-	}
+	// DANGEROUS: Do not lock here yet because we might needed to remove containers first.


I'm thinking we should have a "Removing" bool in volumes that prevents further use once it's set. Would solve the container-addition-during-removal problem. But that seems like a separate PR.

yes and the longer I think about the more I think we might similar issues around networks, secretes, etc..

Luap99 · 2024-08-14T14:58:55Z

[+1593s] # [11:42:37.160486147] # /var/tmp/go/src/github.com/containers/podman/bin/podman run -d --rm quay.io/libpod/testimage:20240123 true
[+1593s] # [11:42:37.387005051] de71300697120ae4842fe0729ede959abecee121d9babb5b936788d1b5798cdf
[+1593s] #
[+1593s] # [11:42:37.398484042] # /var/tmp/go/src/github.com/containers/podman/bin/podman wait de71300697120ae4842fe0729ede959abecee121d9babb5b936788d1b5798cdf
[+1593s] # [11:42:37.473532413] 0
[+1593s] #
[+1593s] # [11:42:37.488737037] # /var/tmp/go/src/github.com/containers/podman/bin/podman rm de71300697120ae4842fe0729ede959abecee121d9babb5b936788d1b5798cdf
[+1593s] # [11:42:37.546053991] de71300697120ae4842fe0729ede959abecee121d9babb5b936788d1b5798cdf
[+1593s] # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
[+1593s] # #| FAIL: exit code is 0; expected 1
[+1593s] # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[+1593s] # # [teardown]

The code is flaky somehow, the question is should podman wait even guarantee that the ctr is removed in this case? We just wait for exit not removed...

mheon · 2024-08-14T15:05:17Z

So we have a race of cleanup process against podman wait by the look of things.

I think the expectation would be that podman wait on a --rm container would wait until it's gone, but... I don't think the docs say it will do more than wait for exit?

Luap99 · 2024-08-14T15:08:23Z

So we have a race of cleanup process against podman wait by the look of things.

I think the expectation would be that podman wait on a --rm container would wait until it's gone, but... I don't think the docs say it will do more than wait for exit?

I don't understand how this even works here, we wait on the conmon pid and conmon waits for podman container cleanup to finish and cleanup should have removed the ctr. I wonder if it is related the the super short lived true process because podman wait might never see the running conmon pid.

mheon · 2024-08-14T15:11:38Z

Do we actually wait for the cleanup process as rootless? We double fork to join the right namespace and that ought to detach us from Conmon?

…

On Wed, Aug 14, 2024 at 11:08 Paul Holzinger ***@***.***> wrote: So we have a race of cleanup process against podman wait by the look of things. I think the expectation would be that podman wait on a --rm container would wait until it's gone, but... I don't think the docs say it will do more than wait for exit? I don't understand how this even works here, we wait on the conmon pid and conmon waits for podman container cleanup to finish and cleanup should have removed the ctr. I wonder if it is related the the super short lived true process because podman wait might never see the running conmon pid. — Reply to this email directly, view it on GitHub <#23601 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3AOCH7EZZIBGA7XAW7QH3ZRNXHZAVCNFSM6AAAAABMOJ5EMCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBZGA3TEMJZGY> . You are receiving this because you commented.Message ID: ***@***.***>

Luap99 · 2024-08-14T15:15:46Z

Do we actually wait for the cleanup process as rootless?

Which we?

conmon waits for the result of the podman container cleanup unless it was killed, so if we wait for the conmon pid to exit (which my rewrite does) that cannot fail unless the pid was dead already, then this wait code exits early as we are still locked at that point so cleanup had no chance to run yet

mheon · 2024-08-14T15:23:40Z

We meaning Conmon. I thought that, as part of rootless init/joining the rootless userns, Podman would do some forks, which could cause Conmon to think Podman had exited. On looking more, I do not believe that's the case after some checking, though.

Luap99 · 2024-08-14T15:39:04Z

We meaning Conmon. I thought that, as part of rootless init/joining the rootless userns, Podman would do some forks, which could cause Conmon to think Podman had exited. On looking more, I do not believe that's the case after some checking, though.

Sure we do that, but podman run doesn't matter here conmon exec's podman container cleanup once the ctr process exits and then it waits on the cleanup process to finish to parent process doesn't no interact with that process at all and is not relevant.

Luap99 · 2024-08-14T17:06:05Z

Ok so the problem was basically that syncContainer() already reads the exit file so there was that chance we called that before the cleanup process got the lock. As such it didn't wait for the ctr to be removed.
I think the current version should fix this.

mheon · 2024-08-14T17:13:08Z

LGTM

baude · 2024-08-14T17:25:13Z

/lgtm

The current code did several complicated state checks that simply do not work properly on a fast restarting container. It uses a special case for --restart=always but forgot to take care of --restart=on-failure which always hang for 20s until it run into the timeout. The old logic also used to call CheckConmonRunning() but synced the state before which means it may check a new conmon every time and thus misses exits. To fix the new the code is much simpler. Check the conmon pid, if it is no longer running then get then check exit file and get exit code. This is related to containers#23473 but I am not sure if this fixes it because we cannot reproduce. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Init containers are meant to exit early before other containers are started. Thus stopping the infra container in such case is wrong. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Now that on-failure exits right away the test is racy as the RestartCount is not at the value we expect as the container is still restarting in the background. As such add a timer based approach. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

We cannot get first the volume lock and the container locks. Other code paths always have to first lock the container and the lock the volumes, i.e. to mount/umount them. As such locking the volume fust can always result in ABBA deadlocks. To fix this move the lock down after the container removal. The removal code is racy regardless of the lock as the volume lcok on create is no longer taken since commit 3cc9db8 due another deadlock there. Fixes containers#23613 Signed-off-by: Paul Holzinger <pholzing@redhat.com>

removeVolume() already does the same check so we do not need it twice. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Waiting now actually makes sure to exit on first container exit. Also notice that it does not wait for --rm to have the container removed at this point. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Luap99 · 2024-08-15T11:34:17Z

Ok this should be good now, so much for a "simple" change.

mheon · 2024-08-15T12:05:59Z

/lgtm

openshift-ci bot added the release-note label Aug 13, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 13, 2024

lsm5 mentioned this pull request Aug 13, 2024

frontend: align delete_after_days validators fedora-copr/copr#3329

Merged

mheon reviewed Aug 13, 2024

View reviewed changes

FrostyX mentioned this pull request Aug 14, 2024

500 errors in Packit fedora-copr/copr#3372

Closed

Luap99 force-pushed the wait branch from 19d88dd to b591618 Compare August 14, 2024 11:00

mheon reviewed Aug 14, 2024

View reviewed changes

Luap99 force-pushed the wait branch from b591618 to ade2c1e Compare August 14, 2024 17:04

Luap99 force-pushed the wait branch from ade2c1e to 78aa4ce Compare August 14, 2024 17:23

openshift-ci bot assigned baude Aug 14, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2024

Luap99 force-pushed the wait branch from 78aa4ce to 0698dcc Compare August 14, 2024 17:28

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2024

Luap99 force-pushed the wait branch 2 times, most recently from a60f611 to 9c6f5dd Compare August 15, 2024 08:35

Luap99 added 5 commits August 15, 2024 11:07

libpod: do not stop pod on init ctr exit

30eb6b6

Init containers are meant to exit early before other containers are started. Thus stopping the infra container in such case is wrong. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

libpod: remove duplicated HasVolume() check

94fd5fe

removeVolume() already does the same check so we do not need it twice. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Luap99 force-pushed the wait branch from 9c6f5dd to 5effac4 Compare August 15, 2024 09:07

docs: update podman-wait man page

6fb1042

Waiting now actually makes sure to exit on first container exit. Also notice that it does not wait for --rm to have the container removed at this point. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Luap99 force-pushed the wait branch from 5effac4 to 6fb1042 Compare August 15, 2024 11:32

openshift-ci bot assigned mheon Aug 15, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 15, 2024

openshift-merge-bot bot merged commit b282902 into containers:main Aug 15, 2024
82 of 83 checks passed

Luap99 deleted the wait branch August 15, 2024 12:36

This was referenced Aug 15, 2024

remote: run --restart=always, then wait: timeout #23473

Closed

race(?) in podman run --rm then podman rm? #23640

Closed

Luap99 mentioned this pull request Aug 30, 2024

CI: pods and container names in /etc/hosts: cleanup race? #23811

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libpod: simplify WaitForExit() #23601

libpod: simplify WaitForExit() #23601

Luap99 commented Aug 13, 2024

openshift-ci bot commented Aug 13, 2024

packit-as-a-service bot commented Aug 13, 2024

mheon Aug 13, 2024

Luap99 Aug 13, 2024

mheon Aug 13, 2024

Luap99 Aug 13, 2024

Luap99 commented Aug 13, 2024

lsm5 commented Aug 13, 2024

mheon Aug 14, 2024

Luap99 Aug 14, 2024

Luap99 commented Aug 14, 2024

mheon commented Aug 14, 2024

Luap99 commented Aug 14, 2024

mheon commented Aug 14, 2024 via email

Luap99 commented Aug 14, 2024

mheon commented Aug 14, 2024

Luap99 commented Aug 14, 2024

Luap99 commented Aug 14, 2024

mheon commented Aug 14, 2024

baude commented Aug 14, 2024

Luap99 commented Aug 15, 2024

mheon commented Aug 15, 2024

libpod: simplify WaitForExit() #23601

libpod: simplify WaitForExit() #23601

Conversation

Luap99 commented Aug 13, 2024

Does this PR introduce a user-facing change?

openshift-ci bot commented Aug 13, 2024

packit-as-a-service bot commented Aug 13, 2024

mheon Aug 13, 2024

Choose a reason for hiding this comment

Luap99 Aug 13, 2024

Choose a reason for hiding this comment

mheon Aug 13, 2024

Choose a reason for hiding this comment

Luap99 Aug 13, 2024

Choose a reason for hiding this comment

Luap99 commented Aug 13, 2024

lsm5 commented Aug 13, 2024

mheon Aug 14, 2024

Choose a reason for hiding this comment

Luap99 Aug 14, 2024

Choose a reason for hiding this comment

Luap99 commented Aug 14, 2024

mheon commented Aug 14, 2024

Luap99 commented Aug 14, 2024

mheon commented Aug 14, 2024 via email

Luap99 commented Aug 14, 2024

mheon commented Aug 14, 2024

Luap99 commented Aug 14, 2024

Luap99 commented Aug 14, 2024

mheon commented Aug 14, 2024

baude commented Aug 14, 2024

Luap99 commented Aug 15, 2024

mheon commented Aug 15, 2024