Flake: Deleting pod: something-cgroup-conmon: EBUSY #14057

edsantiago · 2022-04-28T17:33:20Z

Yep, jinxed myself yesterday.

# podman [options] play kube --replace /tmp/podman_test1113683552/kube.yaml --configmap /tmp/podman_test1113683552/foo-cm.yaml
time="2022-04-28T17:02:07Z" level=error msg="Deleting pod 6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6 cgroup /libpod_parent/6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6: remove /sys/fs/cgroup/libpod_parent/6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6/conmon: remove /sys/fs/cgroup/libpod_parent/6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6/conmon: device or resource busy"
Error: error removing pod 6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6 conmon cgroup: remove /sys/fs/cgroup/libpod_parent/6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6/conmon: remove /sys/fs/cgroup/libpod_parent/6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6/conmon: device or resource busy

Looks similar to #11946 but different enough to merit a new issue.

Seen just now in int f36 root on an in-flight PR; logs show it happening since April 2:

Podman play kube [It] podman play kube test env value from configmap and --replace should reuse the configmap volume

fedora-35 : int podman fedora-35 root container
PR Allow creating anonymous volumes with --mount #13757

Podman play kube [It] podman play kube teardown

fedora-34 : int podman fedora-34 root container
- PR [BACKPORT] Backporte2efixes #13249
  - 02-16 10:48
fedora-35 : int podman fedora-35 root container

The text was updated successfully, but these errors were encountered:

It solves a race where a container cleanup process launched because of the container process exiting normally would hang. It also solves a problem when running as rootless on cgroup v1 since it is not possible to force pids.max = 1 on conmon to limit spawning the cleanup process. Partially copied from containers#13403 Related to: containers#14057 [NO NEW TESTS NEEDED] it doesn't add any new functionality Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe · 2022-04-29T10:58:54Z

not sure it is the right fix, since I wasn't able to reproduce locally, but there is one race condition I am aware of in the pod cleanup code: #14061

vrothberg · 2022-05-02T11:22:48Z

@edsantiago have you seen it flake after #14061?

edsantiago · 2022-05-02T11:27:54Z

No, but only three PRs have merged since then.

edsantiago · 2022-05-02T19:33:21Z

Yep, still happening (f36 root, PR is based on very-latest main)

edsantiago · 2022-05-02T19:38:16Z

f35 root too, in a different test

if the cgroup cleanup fails with EBUSY, attempt to kill the processes. Related to: containers/podman#14057 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe · 2022-05-03T09:26:35Z

let's also kill the processes contained in the cgroup, in the same way as the systemd backend does: containers/common#1019

A bigger hammer, but I think it solves the race condition we are observing

if the cgroup cleanup fails with EBUSY, attempt to kill the processes. Related to: containers/podman#14057 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

rhatdan · 2022-05-03T14:10:26Z

When containers/common gets merged into Podman we can close this issue.

It solves a race where a container cleanup process launched because of the container process exiting normally would hang. It also solves a problem when running as rootless on cgroup v1 since it is not possible to force pids.max = 1 on conmon to limit spawning the cleanup process. Partially copied from containers#13403 Related to: containers#14057 [NO NEW TESTS NEEDED] it doesn't add any new functionality Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. labels Apr 28, 2022

edsantiago assigned giuseppe Apr 28, 2022

giuseppe mentioned this issue Apr 29, 2022

libpod: unlock containers when removing pod #14061

Merged

giuseppe mentioned this issue May 3, 2022

cgroups: kill processes when deleting a cgroup containers/common#1019

Merged

rhatdan added the kube label May 17, 2022

vrothberg closed this as completed May 18, 2022

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flake: Deleting pod: something-cgroup-conmon: EBUSY #14057

Flake: Deleting pod: something-cgroup-conmon: EBUSY #14057

edsantiago commented Apr 28, 2022

giuseppe commented Apr 29, 2022

vrothberg commented May 2, 2022

edsantiago commented May 2, 2022

edsantiago commented May 2, 2022

edsantiago commented May 2, 2022

giuseppe commented May 3, 2022

rhatdan commented May 3, 2022

Flake: Deleting pod: something-cgroup-conmon: EBUSY #14057

Flake: Deleting pod: something-cgroup-conmon: EBUSY #14057

Comments

edsantiago commented Apr 28, 2022

Podman play kube [It] podman play kube test env value from configmap and --replace should reuse the configmap volume

Podman play kube [It] podman play kube teardown

giuseppe commented Apr 29, 2022

vrothberg commented May 2, 2022

edsantiago commented May 2, 2022

edsantiago commented May 2, 2022

edsantiago commented May 2, 2022

giuseppe commented May 3, 2022

rhatdan commented May 3, 2022