New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition #3212
Fix race condition #3212
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@nalind PTAL, does this make sense to you? |
Have we gotten errors that match it ("no such process")? I'd expect that perhaps if the runtime's `state' command exited out from under us and was reaped by someone else, though the waitpid(2) man page suggests we'd get an ECHILD ("no child processes") for cases like that. |
@nalind @giuseppe @edsantiago PTAL. I am now setting the stopped flag when pid1 of the container exits. Would this solve the problem? |
LGTM in principle. You'll need to re-push to get [NO TESTS NEEDED] to trigger. |
Ed has found situations where the container exits, before we can check the state causing a failure, where I think we can complete successfully. Fixes: https://github.com/containers/buildah/issues/3113 [NO TESTS NEEDED] since I have no way to generate this race condition. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
I'd wait to see if the fix in crun addresses the race we've seen. It might take some time between sending the KILL signal and the process being really terminated by the kernel, so I am afraid this change introduces a new race condition. |
@giuseppe I don't see this as being any worse then what we currently have, and quite possibly better |
the issue could be in the sequence:
The container could still be alive by the time we call At least runc and crun wait for the process to exit before the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry nevermind my last comment, I got confused.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Ed has found situations where the container exits, before we can check
the state causing a failure, where I think we can complete successfully.
Fixes: https://github.com/containers/buildah/issues/3113
[NO TESTS NEEDED] since I have no way to generate this race condition.
Signed-off-by: Daniel J Walsh dwalsh@redhat.com
What type of PR is this?
What this PR does / why we need it:
How to verify it
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?