Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman: workaround race during container creation #1659

Merged
merged 1 commit into from Jun 16, 2021

Conversation

dciabrin
Copy link
Contributor

podman and OCI runtime have a race that sometimes causes
a container to fail to be created and run [1] if the
cgroup to be used is not available yet. When that happens,
try to recreate it until it succeeds or the start
timeout is reached.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1972209

@dciabrin
Copy link
Contributor Author

PR created, I'm still verifying the last revision of this patch locally on my environment.

# runtime configures the container. If that happens, recreate
# the container as long as we get the same error code or
# until start timeout preempts us.
while [ $rc -eq 127 ] && (echo "$out" | grep -q "cgroup.*scope not found") ; do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we have some limit here, to prevent endless loop? It shouldn't happen, but some more security wouldn't hurt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pacemaker itself preempts this infinite loop as soon as the start operation timeout is reached. This is usually the idiom we use in the resource agents to loop while we're allowed.

@dciabrin
Copy link
Contributor Author

OK I verified locally that this patch recovers from that race:

Jun 16 09:25:41 controller-2 podman(galera-bundle-podman-2)[7499]: INFO: running container galera-bundle-podman-2 for the first time
Jun 16 09:25:47 controller-2 podman(galera-bundle-podman-2)[8466]: ERROR: Error: OCI runtime error: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: Unit libpod-9a7a86ceba0dfdc2357f799e1b259846a9e51cd7537a2a3d67d548ad58f28c66.scope not found.
Jun 16 09:25:48 controller-2 podman(galera-bundle-podman-2)[8479]: WARNING: Internal podman error while assigning cgroup. Retrying.
Jun 16 09:25:54 controller-2 podman(galera-bundle-podman-2)[9195]: INFO: 555e280bbfa9d3afdd739b45aad700c8ef73539f60cf6937521690f39d0ef3c4
Jun 16 09:25:58 controller-2 podman(galera-bundle-podman-2)[9431]: INFO: Creating drop-in dependency for "galera-bundle-podman-2" (555e280bbfa9d3afdd739b45aad700c8ef73539f60cf6937521690f39d0ef3c4)
Jun 16 09:26:02 controller-2 podman(galera-bundle-podman-2)[9778]: NOTICE: Container galera-bundle-podman-2  started successfully
Jun 16 09:26:02 controller-2 pacemaker-controld[3233]:  notice: Result of start operation for galera-bundle-podman-2 on controller-2: ok

podman and OCI runtime have a race that sometimes causes
a container to fail to be created and run [1] if the
cgroup to be used is not available yet. When that happens,
try to recreate it until it succeeds or the start
timeout is reached.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1972209
@dciabrin
Copy link
Contributor Author

I just pushed a new revision to get rid of a spurious trailing whitespace, but the patch is still the same so it should be good to go.

Copy link
Contributor

@mbaldessari mbaldessari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@oalbrigt oalbrigt merged commit 38d4d1e into ClusterLabs:master Jun 16, 2021
@oalbrigt
Copy link
Contributor

LGTM. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants