New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman: workaround race during container creation #1659
Conversation
PR created, I'm still verifying the last revision of this patch locally on my environment. |
# runtime configures the container. If that happens, recreate | ||
# the container as long as we get the same error code or | ||
# until start timeout preempts us. | ||
while [ $rc -eq 127 ] && (echo "$out" | grep -q "cgroup.*scope not found") ; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we have some limit here, to prevent endless loop? It shouldn't happen, but some more security wouldn't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pacemaker itself preempts this infinite loop as soon as the start operation timeout is reached. This is usually the idiom we use in the resource agents to loop while we're allowed.
OK I verified locally that this patch recovers from that race:
|
podman and OCI runtime have a race that sometimes causes a container to fail to be created and run [1] if the cgroup to be used is not available yet. When that happens, try to recreate it until it succeeds or the start timeout is reached. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1972209
I just pushed a new revision to get rid of a spurious trailing whitespace, but the patch is still the same so it should be good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
LGTM. Thanks. |
podman and OCI runtime have a race that sometimes causes
a container to fail to be created and run [1] if the
cgroup to be used is not available yet. When that happens,
try to recreate it until it succeeds or the start
timeout is reached.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1972209