podman-kube@.service template: expected inactive, got active #17093

edsantiago · 2023-01-12T15:09:28Z

New flake in system tests:

not ok 318 podman-kube@.service template
...
# podman pod kill test_pod
# 5a7b23b8b7028aa541d7c4643abd78c8bed281101c255b6f182bc75f8871c2e6
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: systemd service transitioned to 'inactive' state: podman-kube@-tmp-podman_bats.QBwJWn-test.yaml.service
# #| expected: 'inactive'
# #|   actual: 'active'
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[sys] 305 podman-kube@.service template

fedora-37 : sys podman fedora-37 rootless host
- PR Add container error message to ContainerState #16806
  - 01-05 14:37
  - 01-03 14:35

...as well as a few minutes ago in f36 root

The text was updated successfully, but these errors were encountered:

Increase the loop range from 5 to 20 to make sure we give the service enough time to transition to inactive. Other tests have the same range with 0.5 seconds sleeps, so I expect the new value to be sufficient and consistent. Fixes: containers#17093 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>

edsantiago · 2023-01-25T15:02:29Z

Seen yesterday in f36 rootless, in a PR that I confirmed is based on main which includes #17108.

not ok 319 podman-kube@.service template
...
$ podman auto-update --dry-run --format {{.Unit}},{{.Container}},{{.Image}},{{.Updated}},{{.Policy}}
podman-kube@-tmp-podman_bats.KXwNCZ-test.yaml.service,3c8e549f7cc0 (test_pod-a),quay.io/libpod/testimage:20221018,false,local
podman-kube@-tmp-podman_bats.KXwNCZ-test.yaml.service,b8cbd1543291 (test_pod-b),quay.io/libpod/testimage:20221018,false,registry
$ podman pod kill test_pod
67938ccecd7c6f327196a878237a2f32e16f439dda5a5b0ece9b839da0dcf59d
#/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#|     FAIL: systemd service transitioned to 'inactive' state: podman-kube@-tmp-podman_bats.KXwNCZ-test.yaml.service
#| expected: 'inactive'
#|   actual: 'active'
#\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Reopening, sorry.

edsantiago · 2023-01-30T15:30:02Z

Seen Jan 27 in f36 root. This is a v4.4 PR, but I've looked at the history and am 99% sure it includes #17108.

github-actions · 2023-03-02T00:08:01Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2023-03-16T17:47:01Z

Tuesday, f37 rootless, in a v4.4.1-rhel PR.

edsantiago · 2023-04-03T20:47:10Z

Slightly different symptom:

$ podman auto-update --format {{.Unit}},{{.Container}},{{.Image}},{{.Updated}},{{.Policy}}
podman-kube@-tmp-podman_bats.7CpZcR-test.yaml.service,8318174f781d (test_pod-a),quay.io/libpod/testimage:20221018,failed,registry
podman-kube@-tmp-podman_bats.7CpZcR-test.yaml.service,fc19694d7b22 (test_pod-b),localhost/image:KITdAADDq7,failed,local
Error: restarting unit podman-kube@-tmp-podman_bats.7CpZcR-test.yaml.service during rollback: error starting systemd unit "podman-kube@-tmp-podman_bats.7CpZcR-test.yaml.service" expected "done" but received "failed"

in f37 rootless sqlite

vrothberg · 2023-04-04T09:28:23Z

in f37 rootless sqlite

I am unable to locate the journal logs. Can you help me @edsantiago ?

edsantiago · 2023-04-04T22:40:17Z

I'm so sorry! This got lost amid other things today. From the colorized log page:

Scroll to top (Home key)
Click the Task ID
Click the red accordion banner to make it fold up, because it's too long.
Click the run journal accordion. Then, to view full logs, click arrow-in-box icon at top right.

The really important thing to remember is to scroll to the top of the colorized log.

vrothberg · 2023-04-05T08:51:58Z

Thanks so much, @edsantiago!

vrothberg · 2023-04-05T08:54:30Z

Slightly different symptom:

Aha!

Apr 03 08:20:56 cirrus-task-6282826287415296 podman[104241]: Error: 1 error occurred:
Apr 03 08:20:56 cirrus-task-6282826287415296 podman[104241]: * removing container 3081f86090646d68ba441ef63431b3b25b43d256f9b276d5c61d31809c900b8b from pod 84c78f59d46c1f2eec37008137119953fccdbac895c506addd15f231787010f3: removing container 3081f86090646d68ba441ef63431b3b25b43d256f9b276d5c61d31809c900b8b root filesystem: 1 error occurred:
Apr 03 08:20:56 cirrus-task-6282826287415296 podman[104241]: * unlinkat /home/some20100dude/.local/share/containers/storage/overlay-containers/3081f86090646d68ba441ef63431b3b25b43d256f9b276d5c61d31809c900b8b/userdata/shm: device or resource busy

Looks like our good old friend, EBUSY.

vrothberg · 2023-04-05T08:58:52Z

The other linked flakes show

Jan 24 07:08:34 packer-63c7bbc7-e9aa-44a4-c9f5-bca2897e2841 systemd[3374]: Started podman-92305.scope.
Jan 24 07:08:34 packer-63c7bbc7-e9aa-44a4-c9f5-bca2897e2841 podman[92311]: Error: no pod with name or ID test_pod found: no such pod

which I will investigate now. I suspect something in kube play --replace.

vrothberg · 2023-04-05T09:14:04Z

Also seeing

Error: open /tmp/podman_bats.BNwBhn/test.yaml: no such file or directory

vrothberg · 2023-04-05T09:34:47Z

OK, I am fairly confident that commit 1541ce5 fixed the issue in main (notice the s/inactive/failed/ change in the test). 4.4.1-rhel does not have this commit (see https://github.com/containers/podman/blob/v4.4.1-rhel/test/system/250-systemd.bats#L429-L441), so the test will flake there unless we file a BZ and backport.

As mentioned above, the "slightly different symptom" is the the EBUSY fart (see #17216) preventing container cleanup.

@edsantiago feel free to double check. The Error: no pod with name or ID test_pod found: no such pod are red herrings that popped up when cleaning up the systemd service after the test failure.

vrothberg · 2023-04-05T12:18:18Z

Shall we leave the issue open (i.e., backport to the branches?) or are we cool given main is good?

vrothberg · 2023-04-24T11:27:42Z

Closing as I believe 1541ce5 fixed the issue. Please reopen if the flake still happens on the main branch.

edsantiago added the flakes Flakes from Continuous Integration label Jan 12, 2023

vrothberg mentioned this issue Jan 13, 2023

fix flake in kube system test #17108

Merged

openshift-merge-robot closed this as completed in #17108 Jan 14, 2023

edsantiago reopened this Jan 25, 2023

github-actions bot added the stale-issue label Mar 2, 2023

rhatdan removed the stale-issue label Mar 2, 2023

vrothberg closed this as completed Apr 24, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 26, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman-kube@.service template: expected inactive, got active #17093

podman-kube@.service template: expected inactive, got active #17093

edsantiago commented Jan 12, 2023

edsantiago commented Jan 25, 2023

edsantiago commented Jan 30, 2023

github-actions bot commented Mar 2, 2023

edsantiago commented Mar 16, 2023

edsantiago commented Apr 3, 2023

vrothberg commented Apr 4, 2023

edsantiago commented Apr 4, 2023

vrothberg commented Apr 5, 2023

vrothberg commented Apr 5, 2023 •

edited

vrothberg commented Apr 5, 2023

vrothberg commented Apr 5, 2023

vrothberg commented Apr 5, 2023

vrothberg commented Apr 5, 2023

vrothberg commented Apr 24, 2023

podman-kube@.service template: expected inactive, got active #17093

podman-kube@.service template: expected inactive, got active #17093

Comments

edsantiago commented Jan 12, 2023

[sys] 305 podman-kube@.service template

edsantiago commented Jan 25, 2023

edsantiago commented Jan 30, 2023

github-actions bot commented Mar 2, 2023

edsantiago commented Mar 16, 2023

edsantiago commented Apr 3, 2023

vrothberg commented Apr 4, 2023

edsantiago commented Apr 4, 2023

vrothberg commented Apr 5, 2023

vrothberg commented Apr 5, 2023 • edited

vrothberg commented Apr 5, 2023

vrothberg commented Apr 5, 2023

vrothberg commented Apr 5, 2023

vrothberg commented Apr 5, 2023

vrothberg commented Apr 24, 2023

vrothberg commented Apr 5, 2023 •

edited