Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create --init-ctr once: once again, seems to be incorrectly running #16046

Closed
edsantiago opened this issue Oct 4, 2022 · 4 comments · Fixed by #16057
Closed

create --init-ctr once: once again, seems to be incorrectly running #16046

edsantiago opened this issue Oct 4, 2022 · 4 comments · Fixed by #16057
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Collaborator

Another init-ctr flake; it actually looks identical to #11682 except for the warning, which is new:

podman make sure once container is removed
 /var/tmp/go/src/github.com/containers/podman/test/e2e/pod_initcontainers_test.go:100
 
 [BeforeEach] Podman init containers
   /var/tmp/go/src/github.com/containers/podman/test/e2e/pod_initcontainers_test.go:22
 [It] podman make sure once container is removed
   /var/tmp/go/src/github.com/containers/podman/test/e2e/pod_initcontainers_test.go:100
$ podman [options] create --init-ctr once --pod new:foobar quay.io/libpod/alpine:latest bin/sh -c echo AjWwhTHctcuAxhxK > /dev/shm/XVlBzgbaiCMR
SHA1
$ podman [options] create --pod foobar -t quay.io/libpod/alpine:latest top
SHA2
$ podman [options] pod start foobar
time="2022-09-29T01:38:01Z" level=error msg="Removing container SHA1 from database: no container with ID SHA1 found in DB: no such container"
SHA3
$ podman [options] container exists SHA1
$ podman [options] pod stop foobar
SHA3
$ podman [options] pod start foobar
SHA3
$ podman [options] exec -it SHA2 cat /dev/shm/XVlBzgbaiCMR
AjWwhTHctcuAxhxK  <<<<------ UNEXPECTED! This causes test to fail, because this file should not exist

Only two instances in the past month; before that, May.

Podman init containers [It] podman make sure once container is removed

@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Oct 4, 2022
@rhatdan
Copy link
Member

rhatdan commented Oct 5, 2022

Could this be a race condition where containers within the first Pod start have no finished closing, before the second Pod starts and ends up continuing to use the same /;dev/shm?
Is this possible? Does the infra container wait for all of the containers to actually stop or does it mark the pod stopped as soon as it sends the stop signal to its containers.

Do we need a podman pod wait?

@rhatdan
Copy link
Member

rhatdan commented Oct 5, 2022

@mheon WDYT?

mheon added a commit to mheon/libpod that referenced this issue Oct 5, 2022
We have a test to verify that init containers in pods are
deleted when the `--init-ctr=once` option is specified. The test
creates two containers, one of them an init container, starts the
pod, stops the pod, and restarts the pod, checking for the
presence of a file created by the init container during the
second start. We're seeing a race where the file still exists,
which I'm fairly certain comes down to the SHM mount not being
cleaned up after the pod is stopped.

Fortunately, we already have code to do this - just flip the bool
that controls cleanup from false to true.

[NO NEW TESTS NEEDED] Fixes a difficult to reproduce race
condition.

Fixes containers#16046

Signed-off-by: Matthew Heon <mheon@redhat.com>
@mheon
Copy link
Member

mheon commented Oct 5, 2022

podman pod stop guarantees that all containers in the pod have stopped, but it only cleans them up if it's explicitly asked to (and we don't ask - the false in https://github.com/containers/podman/blob/main/pkg/domain/infra/abi/pods.go#L198) - so we are racing on the post-stop cleanup process to actually unmount the SHM.

Flip the false to a true and this ought to resolve itself. #16057 should fix.

@edsantiago
Copy link
Collaborator Author

Thank you!

mheon added a commit to mheon/libpod that referenced this issue Oct 18, 2022
We have a test to verify that init containers in pods are
deleted when the `--init-ctr=once` option is specified. The test
creates two containers, one of them an init container, starts the
pod, stops the pod, and restarts the pod, checking for the
presence of a file created by the init container during the
second start. We're seeing a race where the file still exists,
which I'm fairly certain comes down to the SHM mount not being
cleaned up after the pod is stopped.

Fortunately, we already have code to do this - just flip the bool
that controls cleanup from false to true.

[NO NEW TESTS NEEDED] Fixes a difficult to reproduce race
condition.

Fixes containers#16046

Signed-off-by: Matthew Heon <mheon@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 13, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants