Set StoppedByUser earlier in the process of stopping #17077

mheon · 2023-01-11T15:31:36Z

The StoppedByUser variable indicates that the container was requested to stop by a user. It's used to prevent restart policy from firing (so that a restart=always container won't restart if the user does a podman stop. The problem is we were setting it very late in the stop() function. Originally, this was fine, but after the changes to add the new Stopping state, the logic that triggered restart policy was firing before StoppedByUser was even set - so the container would still restart.

Setting it earlier shouldn't hurt anything and guarantees that checks will see that the container was stopped manually.

Fixes #17069

Does this PR introduce a user-facing change?

Fixed a bug where containers with a restart policy set could still restart even after a manual `podman stop`.

mheon · 2023-01-11T15:31:54Z

Still needs a test

mheon · 2023-01-11T16:53:04Z

Test added

edsantiago · 2023-01-11T17:19:47Z

Do you have a sense for how easy/hard it is to trigger the restart? I started a loop-test, running against main, and have not seen it reproduce yet. (Test started a few minutes after you added your test). Any hints on how to manifest the restart bug?

mheon · 2023-01-11T17:25:25Z

Hm. It should be 100%. I'll look into it more...

edsantiago · 2023-01-11T17:40:56Z

Here's what I'm seeing:

if I run top with restart=always, then stop it, it stays stopped. Even on main.
if I run date, or something else that exits by itself, and then stop that, it continues restarting despite my stop.

Does that make any sense to you? Does it help?

edsantiago · 2023-01-11T17:50:18Z

Update: even with your PR, podman stop on the date container does not actually keep it stopped:

$ bin/podman run -d --restart=always --name foo quay.io/libpod/testimage:20221018 date
[cid]
$ bin/podman logs foo | wc -l
26
$ bin/podman stop foo
foo
$ bin/podman logs foo|wc -l
195
$ !!
216

mheon · 2023-01-11T17:51:04Z

I think that's probably a bug... Just a separate one. I'll look into what Docker does to be sure.

mheon · 2023-01-11T18:32:23Z

Yep, that's a separate bug. I'll handle it here.

edsantiago · 2023-01-11T18:50:33Z

Ack. Any idea how to reproduce the original bug that you set out to fix?

edsantiago · 2023-01-11T18:52:41Z

And, #17083 (purported fix for the hang) is merged, please rebase before pushing

mheon · 2023-01-11T18:56:46Z

@edsantiago The only reproducer I'm aware of uses podman-compose, which introduces enough questions into what's going on that I haven't been able to reproduce without it.

rhatdan · 2023-01-11T22:51:04Z

LGTM

TomSweeneyRedHat · 2023-01-11T23:28:07Z

test/e2e/run_test.go

+		Expect(stop).Should(Exit(0))
+
+		// This is ugly, but I don't see a better way
+		time.Sleep(10 * time.Second)


Does it need the 10? Could we get away with less? Perhaps 5?

TomSweeneyRedHat · 2023-01-11T23:35:05Z

Fedora root tests are timing out after 90 minutes. @edsantiago or @cevich I'm seeing this in a number of places, do we know the cause?

TomSweeneyRedHat · 2023-01-11T23:35:16Z

Changes LGTM with happy tests

edsantiago · 2023-01-12T01:06:51Z

@TomSweeneyRedHat timeout is fixed (:crossed_fingers:) in #17083

edsantiago

Blocking for merge because test needed (as discussed in conversation, test passes on main). It's possible that there's no reproducer that will fail on main, and maybe that's fine, but I don't want anyone to restart the flakes, have them pass, and then merge this overnight.

mheon · 2023-01-12T19:45:16Z

At this point, I don't really know what's going on with restart policy and podman-compose, but this does not reproduce on main. Still, we aren't actually testing that restart policy works in this way anywhere, so I think this is still worth merging just to make sure we don't regress in the future. I'll rebase to fix CI.

The StoppedByUser variable indicates that the container was requested to stop by a user. It's used to prevent restart policy from firing (so that a restart=always container won't restart if the user does a `podman stop`. The problem is we were setting it *very* late in the stop() function. Originally, this was fine, but after the changes to add the new Stopping state, the logic that triggered restart policy was firing before StoppedByUser was even set - so the container would still restart. Setting it earlier shouldn't hurt anything and guarantees that checks will see that the container was stopped manually. Fixes containers#17069 Signed-off-by: Matthew Heon <matthew.heon@pm.me>

edsantiago · 2023-01-12T19:49:31Z

SGTM. I was tempted to suggest ditching the expensive test, ... but I guess it's still a good regression check.

openshift-ci · 2023-01-12T22:54:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [edsantiago,mheon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rhatdan · 2023-01-12T23:34:54Z

/lgtm

openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 11, 2023

mheon mentioned this pull request Jan 11, 2023

[Bug]: podman stop results in panic: runtime error: invalid memory address or nil pointer dereference #17069

Closed

mheon force-pushed the set_stopping_early branch from daf7926 to 85e6be7 Compare January 11, 2023 16:52

TomSweeneyRedHat reviewed Jan 11, 2023

View reviewed changes

edsantiago requested changes Jan 12, 2023

View reviewed changes

mheon force-pushed the set_stopping_early branch from 85e6be7 to 1ab833f Compare January 12, 2023 19:46

edsantiago approved these changes Jan 12, 2023

View reviewed changes

openshift-ci bot assigned rhatdan Jan 12, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2023

openshift-merge-robot merged commit 3e229b0 into containers:main Jan 12, 2023

edsantiago mentioned this pull request Jan 13, 2023

play kube override with {tcp,udp} should keep {other} from YAML #17101

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 15, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set StoppedByUser earlier in the process of stopping #17077

Set StoppedByUser earlier in the process of stopping #17077

mheon commented Jan 11, 2023

mheon commented Jan 11, 2023

mheon commented Jan 11, 2023

edsantiago commented Jan 11, 2023

mheon commented Jan 11, 2023

edsantiago commented Jan 11, 2023

edsantiago commented Jan 11, 2023

mheon commented Jan 11, 2023

mheon commented Jan 11, 2023

edsantiago commented Jan 11, 2023

edsantiago commented Jan 11, 2023

mheon commented Jan 11, 2023

rhatdan commented Jan 11, 2023

TomSweeneyRedHat Jan 11, 2023

TomSweeneyRedHat commented Jan 11, 2023

TomSweeneyRedHat commented Jan 11, 2023

edsantiago commented Jan 12, 2023

edsantiago left a comment

mheon commented Jan 12, 2023

edsantiago commented Jan 12, 2023

openshift-ci bot commented Jan 12, 2023

rhatdan commented Jan 12, 2023

Set StoppedByUser earlier in the process of stopping #17077

Set StoppedByUser earlier in the process of stopping #17077

Conversation

mheon commented Jan 11, 2023

Does this PR introduce a user-facing change?

mheon commented Jan 11, 2023

mheon commented Jan 11, 2023

edsantiago commented Jan 11, 2023

mheon commented Jan 11, 2023

edsantiago commented Jan 11, 2023

edsantiago commented Jan 11, 2023

mheon commented Jan 11, 2023

mheon commented Jan 11, 2023

edsantiago commented Jan 11, 2023

edsantiago commented Jan 11, 2023

mheon commented Jan 11, 2023

rhatdan commented Jan 11, 2023

TomSweeneyRedHat Jan 11, 2023

Choose a reason for hiding this comment

TomSweeneyRedHat commented Jan 11, 2023

TomSweeneyRedHat commented Jan 11, 2023

edsantiago commented Jan 12, 2023

edsantiago left a comment

Choose a reason for hiding this comment

mheon commented Jan 12, 2023

edsantiago commented Jan 12, 2023

openshift-ci bot commented Jan 12, 2023

rhatdan commented Jan 12, 2023