podman: workaround race during container creation #1659

dciabrin · 2021-06-15T18:06:47Z

podman and OCI runtime have a race that sometimes causes
a container to fail to be created and run [1] if the
cgroup to be used is not available yet. When that happens,
try to recreate it until it succeeds or the start
timeout is reached.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1972209

dciabrin · 2021-06-15T18:07:31Z

PR created, I'm still verifying the last revision of this patch locally on my environment.

cjeanner · 2021-06-16T07:33:34Z

heartbeat/podman

+		# runtime configures the container. If that happens, recreate
+		# the container as long as we get the same error code or
+		# until start timeout preempts us.
+		while [ $rc -eq 127 ] && (echo "$out" | grep -q "cgroup.*scope not found") ; do


shouldn't we have some limit here, to prevent endless loop? It shouldn't happen, but some more security wouldn't hurt.

Pacemaker itself preempts this infinite loop as soon as the start operation timeout is reached. This is usually the idiom we use in the resource agents to loop while we're allowed.

dciabrin · 2021-06-16T09:39:09Z

OK I verified locally that this patch recovers from that race:

Jun 16 09:25:41 controller-2 podman(galera-bundle-podman-2)[7499]: INFO: running container galera-bundle-podman-2 for the first time
Jun 16 09:25:47 controller-2 podman(galera-bundle-podman-2)[8466]: ERROR: Error: OCI runtime error: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: Unit libpod-9a7a86ceba0dfdc2357f799e1b259846a9e51cd7537a2a3d67d548ad58f28c66.scope not found.
Jun 16 09:25:48 controller-2 podman(galera-bundle-podman-2)[8479]: WARNING: Internal podman error while assigning cgroup. Retrying.
Jun 16 09:25:54 controller-2 podman(galera-bundle-podman-2)[9195]: INFO: 555e280bbfa9d3afdd739b45aad700c8ef73539f60cf6937521690f39d0ef3c4
Jun 16 09:25:58 controller-2 podman(galera-bundle-podman-2)[9431]: INFO: Creating drop-in dependency for "galera-bundle-podman-2" (555e280bbfa9d3afdd739b45aad700c8ef73539f60cf6937521690f39d0ef3c4)
Jun 16 09:26:02 controller-2 podman(galera-bundle-podman-2)[9778]: NOTICE: Container galera-bundle-podman-2  started successfully
Jun 16 09:26:02 controller-2 pacemaker-controld[3233]:  notice: Result of start operation for galera-bundle-podman-2 on controller-2: ok

podman and OCI runtime have a race that sometimes causes a container to fail to be created and run [1] if the cgroup to be used is not available yet. When that happens, try to recreate it until it succeeds or the start timeout is reached. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1972209

dciabrin · 2021-06-16T09:41:23Z

I just pushed a new revision to get rid of a spurious trailing whitespace, but the patch is still the same so it should be good to go.

mbaldessari

lgtm

oalbrigt · 2021-06-16T12:06:29Z

LGTM. Thanks.

cjeanner reviewed Jun 16, 2021

View reviewed changes

dciabrin force-pushed the podman-cgroup branch from f41bcec to 7850aea Compare June 16, 2021 09:40

mbaldessari approved these changes Jun 16, 2021

View reviewed changes

oalbrigt merged commit 38d4d1e into ClusterLabs:master Jun 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman: workaround race during container creation #1659

podman: workaround race during container creation #1659

dciabrin commented Jun 15, 2021

dciabrin commented Jun 15, 2021

cjeanner Jun 16, 2021

dciabrin Jun 16, 2021

dciabrin commented Jun 16, 2021

dciabrin commented Jun 16, 2021

mbaldessari left a comment

oalbrigt commented Jun 16, 2021

podman: workaround race during container creation #1659

podman: workaround race during container creation #1659

Conversation

dciabrin commented Jun 15, 2021

dciabrin commented Jun 15, 2021

cjeanner Jun 16, 2021

Choose a reason for hiding this comment

dciabrin Jun 16, 2021

Choose a reason for hiding this comment

dciabrin commented Jun 16, 2021

dciabrin commented Jun 16, 2021

mbaldessari left a comment

Choose a reason for hiding this comment

oalbrigt commented Jun 16, 2021