Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flake: Deleting pod: something-cgroup-conmon: EBUSY #14057

Closed
edsantiago opened this issue Apr 28, 2022 · 7 comments
Closed

Flake: Deleting pod: something-cgroup-conmon: EBUSY #14057

edsantiago opened this issue Apr 28, 2022 · 7 comments
Assignees
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. kube locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Collaborator

Yep, jinxed myself yesterday.

# podman [options] play kube --replace /tmp/podman_test1113683552/kube.yaml --configmap /tmp/podman_test1113683552/foo-cm.yaml
time="2022-04-28T17:02:07Z" level=error msg="Deleting pod 6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6 cgroup /libpod_parent/6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6: remove /sys/fs/cgroup/libpod_parent/6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6/conmon: remove /sys/fs/cgroup/libpod_parent/6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6/conmon: device or resource busy"
Error: error removing pod 6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6 conmon cgroup: remove /sys/fs/cgroup/libpod_parent/6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6/conmon: remove /sys/fs/cgroup/libpod_parent/6eeba76dfdf7c4ab35cd5e7064e286b43371aa36d8630ecdca4ef08181f6a3c6/conmon: device or resource busy

Looks similar to #11946 but different enough to merit a new issue.

Seen just now in int f36 root on an in-flight PR; logs show it happening since April 2:

Podman play kube [It] podman play kube test env value from configmap and --replace should reuse the configmap volume

Podman play kube [It] podman play kube teardown

@edsantiago edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. labels Apr 28, 2022
giuseppe added a commit to giuseppe/libpod that referenced this issue Apr 29, 2022
It solves a race where a container cleanup process launched because of
the container process exiting normally would hang.

It also solves a problem when running as rootless on cgroup v1 since
it is not possible to force pids.max = 1 on conmon to limit spawning
the cleanup process.

Partially copied from containers#13403

Related to: containers#14057

[NO NEW TESTS NEEDED] it doesn't add any new functionality

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member

not sure it is the right fix, since I wasn't able to reproduce locally, but there is one race condition I am aware of in the pod cleanup code: #14061

@vrothberg
Copy link
Member

@edsantiago have you seen it flake after #14061?

@edsantiago
Copy link
Collaborator Author

No, but only three PRs have merged since then.

@edsantiago
Copy link
Collaborator Author

Yep, still happening (f36 root, PR is based on very-latest main)

@edsantiago
Copy link
Collaborator Author

f35 root too, in a different test

giuseppe added a commit to giuseppe/common that referenced this issue May 3, 2022
if the cgroup cleanup fails with EBUSY, attempt to kill the
processes.

Related to: containers/podman#14057

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member

giuseppe commented May 3, 2022

let's also kill the processes contained in the cgroup, in the same way as the systemd backend does: containers/common#1019

A bigger hammer, but I think it solves the race condition we are observing

giuseppe added a commit to giuseppe/common that referenced this issue May 3, 2022
if the cgroup cleanup fails with EBUSY, attempt to kill the
processes.

Related to: containers/podman#14057

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@rhatdan
Copy link
Member

rhatdan commented May 3, 2022

When containers/common gets merged into Podman we can close this issue.

mheon pushed a commit to mheon/libpod that referenced this issue May 3, 2022
It solves a race where a container cleanup process launched because of
the container process exiting normally would hang.

It also solves a problem when running as rootless on cgroup v1 since
it is not possible to force pids.max = 1 on conmon to limit spawning
the cleanup process.

Partially copied from containers#13403

Related to: containers#14057

[NO NEW TESTS NEEDED] it doesn't add any new functionality

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@rhatdan rhatdan added the kube label May 17, 2022
gbraad pushed a commit to gbraad-redhat/podman that referenced this issue Jul 13, 2022
It solves a race where a container cleanup process launched because of
the container process exiting normally would hang.

It also solves a problem when running as rootless on cgroup v1 since
it is not possible to force pids.max = 1 on conmon to limit spawning
the cleanup process.

Partially copied from containers#13403

Related to: containers#14057

[NO NEW TESTS NEEDED] it doesn't add any new functionality

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. kube locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

4 participants