Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman-kube@.service template: expected inactive, got active #17093

Closed
edsantiago opened this issue Jan 12, 2023 · 14 comments · Fixed by #17108
Closed

podman-kube@.service template: expected inactive, got active #17093

edsantiago opened this issue Jan 12, 2023 · 14 comments · Fixed by #17108
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Collaborator

New flake in system tests:

not ok 318 podman-kube@.service template
...
# podman pod kill test_pod
# 5a7b23b8b7028aa541d7c4643abd78c8bed281101c255b6f182bc75f8871c2e6
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: systemd service transitioned to 'inactive' state: podman-kube@-tmp-podman_bats.QBwJWn-test.yaml.service
# #| expected: 'inactive'
# #|   actual: 'active'
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[sys] 305 podman-kube@.service template

...as well as a few minutes ago in f36 root

@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Jan 12, 2023
vrothberg added a commit to vrothberg/libpod that referenced this issue Jan 13, 2023
Increase the loop range from 5 to 20 to make sure we give the service
enough time to transition to inactive.  Other tests have the same range
with 0.5 seconds sleeps, so I expect the new value to be sufficient and
consistent.

Fixes: containers#17093
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
@edsantiago
Copy link
Collaborator Author

Seen yesterday in f36 rootless, in a PR that I confirmed is based on main which includes #17108.

not ok 319 podman-kube@.service template
...
$ podman auto-update --dry-run --format {{.Unit}},{{.Container}},{{.Image}},{{.Updated}},{{.Policy}}
podman-kube@-tmp-podman_bats.KXwNCZ-test.yaml.service,3c8e549f7cc0 (test_pod-a),quay.io/libpod/testimage:20221018,false,local
podman-kube@-tmp-podman_bats.KXwNCZ-test.yaml.service,b8cbd1543291 (test_pod-b),quay.io/libpod/testimage:20221018,false,registry
$ podman pod kill test_pod
67938ccecd7c6f327196a878237a2f32e16f439dda5a5b0ece9b839da0dcf59d
#/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#|     FAIL: systemd service transitioned to 'inactive' state: podman-kube@-tmp-podman_bats.KXwNCZ-test.yaml.service
#| expected: 'inactive'
#|   actual: 'active'
#\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Reopening, sorry.

@edsantiago edsantiago reopened this Jan 25, 2023
@edsantiago
Copy link
Collaborator Author

Seen Jan 27 in f36 root. This is a v4.4 PR, but I've looked at the history and am 99% sure it includes #17108.

@github-actions
Copy link

github-actions bot commented Mar 2, 2023

A friendly reminder that this issue had no activity for 30 days.

@edsantiago
Copy link
Collaborator Author

Tuesday, f37 rootless, in a v4.4.1-rhel PR.

@edsantiago
Copy link
Collaborator Author

Slightly different symptom:

$ podman auto-update --format {{.Unit}},{{.Container}},{{.Image}},{{.Updated}},{{.Policy}}
podman-kube@-tmp-podman_bats.7CpZcR-test.yaml.service,8318174f781d (test_pod-a),quay.io/libpod/testimage:20221018,failed,registry
podman-kube@-tmp-podman_bats.7CpZcR-test.yaml.service,fc19694d7b22 (test_pod-b),localhost/image:KITdAADDq7,failed,local
Error: restarting unit podman-kube@-tmp-podman_bats.7CpZcR-test.yaml.service during rollback: error starting systemd unit "podman-kube@-tmp-podman_bats.7CpZcR-test.yaml.service" expected "done" but received "failed"

in f37 rootless sqlite

@vrothberg
Copy link
Member

in f37 rootless sqlite

I am unable to locate the journal logs. Can you help me @edsantiago ?

@edsantiago
Copy link
Collaborator Author

I'm so sorry! This got lost amid other things today. From the colorized log page:

  1. Scroll to top (Home key)
  2. Click the Task ID
  3. Click the red accordion banner to make it fold up, because it's too long.
  4. Click the run journal accordion. Then, to view full logs, click arrow-in-box icon at top right.

The really important thing to remember is to scroll to the top of the colorized log.

@vrothberg
Copy link
Member

Thanks so much, @edsantiago!

@vrothberg
Copy link
Member

vrothberg commented Apr 5, 2023

Slightly different symptom:

Aha!

Apr 03 08:20:56 cirrus-task-6282826287415296 podman[104241]: Error: 1 error occurred:
Apr 03 08:20:56 cirrus-task-6282826287415296 podman[104241]: * removing container 3081f86090646d68ba441ef63431b3b25b43d256f9b276d5c61d31809c900b8b from pod 84c78f59d46c1f2eec37008137119953fccdbac895c506addd15f231787010f3: removing container 3081f86090646d68ba441ef63431b3b25b43d256f9b276d5c61d31809c900b8b root filesystem: 1 error occurred:
Apr 03 08:20:56 cirrus-task-6282826287415296 podman[104241]: * unlinkat /home/some20100dude/.local/share/containers/storage/overlay-containers/3081f86090646d68ba441ef63431b3b25b43d256f9b276d5c61d31809c900b8b/userdata/shm: device or resource busy

Looks like our good old friend, EBUSY.

@vrothberg
Copy link
Member

The other linked flakes show

Jan 24 07:08:34 packer-63c7bbc7-e9aa-44a4-c9f5-bca2897e2841 systemd[3374]: Started podman-92305.scope.
Jan 24 07:08:34 packer-63c7bbc7-e9aa-44a4-c9f5-bca2897e2841 podman[92311]: Error: no pod with name or ID test_pod found: no such pod

which I will investigate now. I suspect something in kube play --replace.

@vrothberg
Copy link
Member

Also seeing

Error: open /tmp/podman_bats.BNwBhn/test.yaml: no such file or directory

@vrothberg
Copy link
Member

OK, I am fairly confident that commit 1541ce5 fixed the issue in main (notice the s/inactive/failed/ change in the test). 4.4.1-rhel does not have this commit (see https://github.com/containers/podman/blob/v4.4.1-rhel/test/system/250-systemd.bats#L429-L441), so the test will flake there unless we file a BZ and backport.

As mentioned above, the "slightly different symptom" is the the EBUSY fart (see #17216) preventing container cleanup.

@edsantiago feel free to double check. The Error: no pod with name or ID test_pod found: no such pod are red herrings that popped up when cleaning up the systemd service after the test failure.

@vrothberg
Copy link
Member

Shall we leave the issue open (i.e., backport to the branches?) or are we cool given main is good?

@vrothberg
Copy link
Member

Closing as I believe 1541ce5 fixed the issue. Please reopen if the flake still happens on the main branch.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants