Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpod: simplify WaitForExit() #23601

Merged
merged 6 commits into from
Aug 15, 2024
Merged

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Aug 13, 2024

The current code did several complicated state checks that simply do not work properly on a fast restarting container. It uses a special case for --restart=always but forgot to take care of --restart=on-failure which always hang for 20s until it run into the timeout.

The old logic also used to call CheckConmonRunning() but synced the state before which means it may check a new conmon every time and thus misses exits.

To fix the new the code is much simpler. Check the conmon pid, if it is no longer running then get then check exit file and get exit code.

This is related to #23473 but I am not sure if this fixes it because we cannot reproduce.

Does this PR introduce a user-facing change?

Fixes a bug where `podman wait` would exit only after 20s on a fast restarting container with a restart policy of on-failure.  

Copy link
Contributor

openshift-ci bot commented Aug 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 13, 2024
Copy link

We were not able to find or create Copr project packit/containers-podman-23601 specified in the config with the following error:

Packit received HTTP 500 Internal Server Error from Copr Service. Check the Copr status page: https://copr.fedorainfracloud.org/status/stats/, or ask for help in Fedora Build System matrix channel https://matrix.to/#/#buildsys:fedoraproject.org.

Unless the HTTP status code above is >= 500, please check your configuration for:

  1. typos in owner and project name (groups need to be prefixed with @)
  2. whether the project name doesn't contain not allowed characters (only letters, digits, underscores, dashes and dots must be used)
  3. whether the project itself exists (Packit creates projects only in its own namespace)
  4. whether Packit is allowed to build in your Copr project
  5. whether your Copr project/group is not private


func waitForConmonExit(conmonPID, conmonPidFd int, pollInterval time.Duration) error {
if conmonPidFd > -1 {
fds := []unix.PollFd{{Fd: int32(conmonPidFd), Events: unix.POLLIN}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work on FreeBSD? Otherwise will need to move over to one of the _linux files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes because getConmonPidFd() is already platform specific and the bsd stub returns -1 just like in the case where it is not supported on linux.

return -1, err
}

if err := c.checkExitFile(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can return nil if the exit file doesn't exist - as it might if Conmon was killed by a SIGKILL. I think we get into a bad state in that case - the container will still be marked as running.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not get into a bad state as this is just the wait call, cleanup/stop and other podman process would still be able to cleanup correctly I think.
However there is the issue that we call GetContainerExitCode() which would not have the current exit code and then we might return an old exit code from there instead of an error. So I guess I need to handle that somehow

@Luap99
Copy link
Member Author

Luap99 commented Aug 13, 2024

Well the comment about a deadlock in kube play is true apparently... although I really do not follow the logic there.

@lsm5
Copy link
Member

lsm5 commented Aug 13, 2024

/packit copr-build

if err := v.update(); err != nil {
return err
}
// DANGEROUS: Do not lock here yet because we might needed to remove containers first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we should have a "Removing" bool in volumes that prevents further use once it's set. Would solve the container-addition-during-removal problem. But that seems like a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and the longer I think about the more I think we might similar issues around networks, secretes, etc..

@Luap99
Copy link
Member Author

Luap99 commented Aug 14, 2024

[+1593s] # [11:42:37.160486147] # /var/tmp/go/src/github.com/containers/podman/bin/podman run -d --rm quay.io/libpod/testimage:20240123 true
[+1593s] # [11:42:37.387005051] de71300697120ae4842fe0729ede959abecee121d9babb5b936788d1b5798cdf
[+1593s] #
[+1593s] # [11:42:37.398484042] # /var/tmp/go/src/github.com/containers/podman/bin/podman wait de71300697120ae4842fe0729ede959abecee121d9babb5b936788d1b5798cdf
[+1593s] # [11:42:37.473532413] 0
[+1593s] #
[+1593s] # [11:42:37.488737037] # /var/tmp/go/src/github.com/containers/podman/bin/podman rm de71300697120ae4842fe0729ede959abecee121d9babb5b936788d1b5798cdf
[+1593s] # [11:42:37.546053991] de71300697120ae4842fe0729ede959abecee121d9babb5b936788d1b5798cdf
[+1593s] # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
[+1593s] # #| FAIL: exit code is 0; expected 1
[+1593s] # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[+1593s] # # [teardown]

The code is flaky somehow, the question is should podman wait even guarantee that the ctr is removed in this case? We just wait for exit not removed...

@mheon
Copy link
Member

mheon commented Aug 14, 2024

So we have a race of cleanup process against podman wait by the look of things.

I think the expectation would be that podman wait on a --rm container would wait until it's gone, but... I don't think the docs say it will do more than wait for exit?

@Luap99
Copy link
Member Author

Luap99 commented Aug 14, 2024

So we have a race of cleanup process against podman wait by the look of things.

I think the expectation would be that podman wait on a --rm container would wait until it's gone, but... I don't think the docs say it will do more than wait for exit?

I don't understand how this even works here, we wait on the conmon pid and conmon waits for podman container cleanup to finish and cleanup should have removed the ctr. I wonder if it is related the the super short lived true process because podman wait might never see the running conmon pid.

@mheon
Copy link
Member

mheon commented Aug 14, 2024 via email

@Luap99
Copy link
Member Author

Luap99 commented Aug 14, 2024

Do we actually wait for the cleanup process as rootless?

Which we?

conmon waits for the result of the podman container cleanup unless it was killed, so if we wait for the conmon pid to exit (which my rewrite does) that cannot fail unless the pid was dead already, then this wait code exits early as we are still locked at that point so cleanup had no chance to run yet

@mheon
Copy link
Member

mheon commented Aug 14, 2024

We meaning Conmon. I thought that, as part of rootless init/joining the rootless userns, Podman would do some forks, which could cause Conmon to think Podman had exited. On looking more, I do not believe that's the case after some checking, though.

@Luap99
Copy link
Member Author

Luap99 commented Aug 14, 2024

We meaning Conmon. I thought that, as part of rootless init/joining the rootless userns, Podman would do some forks, which could cause Conmon to think Podman had exited. On looking more, I do not believe that's the case after some checking, though.

Sure we do that, but podman run doesn't matter here conmon exec's podman container cleanup once the ctr process exits and then it waits on the cleanup process to finish to parent process doesn't no interact with that process at all and is not relevant.

@Luap99
Copy link
Member Author

Luap99 commented Aug 14, 2024

Ok so the problem was basically that syncContainer() already reads the exit file so there was that chance we called that before the cleanup process got the lock. As such it didn't wait for the ctr to be removed.
I think the current version should fix this.

@mheon
Copy link
Member

mheon commented Aug 14, 2024

LGTM

@baude
Copy link
Member

baude commented Aug 14, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2024
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2024
@Luap99 Luap99 force-pushed the wait branch 2 times, most recently from a60f611 to 9c6f5dd Compare August 15, 2024 08:35
The current code did several complicated state checks that simply do not
work properly on a fast restarting container. It uses a special case for
--restart=always but forgot to take care of --restart=on-failure which
always hang for 20s until it run into the timeout.

The old logic also used to call CheckConmonRunning() but synced the
state before which means it may check a new conmon every time and thus
misses exits.

To fix the new the code is much simpler. Check the conmon pid, if it is
no longer running then get then check exit file and get exit code.

This is related to containers#23473 but I am not sure if this fixes it because we
cannot reproduce.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Init containers are meant to exit early before other containers are
started. Thus stopping the infra container in such case is wrong.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Now that on-failure exits right away the test is racy as the
RestartCount is not at the value we expect as the container is still
restarting in the background. As such add a timer based approach.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We cannot get first the volume lock and the container locks. Other code
paths always have to first lock the container and the lock the volumes,
i.e. to mount/umount them. As such locking the volume fust can always
result in ABBA deadlocks.

To fix this move the lock down after the container removal. The removal
code is racy regardless of the lock as the volume lcok on create is no
longer taken since commit 3cc9db8 due another deadlock there.

Fixes containers#23613

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
removeVolume() already does the same check so we do not need it twice.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Waiting now actually makes sure to exit on first container exit. Also
notice that it does not wait for --rm to have the container removed at
this point.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@Luap99
Copy link
Member Author

Luap99 commented Aug 15, 2024

Ok this should be good now, so much for a "simple" change.

@mheon
Copy link
Member

mheon commented Aug 15, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 15, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit b282902 into containers:main Aug 15, 2024
82 of 83 checks passed
@Luap99 Luap99 deleted the wait branch August 15, 2024 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants