Supervisor restarts a stopped container on a change to an unrelated service variable #1549

Ereski · 2020-12-14T18:01:33Z

To reproduce:

Run a test app with two services, A and B
Manually stop service B
Add a service variable for A
B gets restarted

Versions confirmed affected:

11.14.0
11.4.10

The text was updated successfully, but these errors were encountered:

jellyfish-bot · 2020-12-14T18:12:54Z

[ereski] This issue has attached support thread https://jel.ly.fish/9631c3da-8f3c-47ae-b023-024732e21b32

20k-ultra · 2020-12-14T22:06:06Z

Thanks for the steps to reproduce @Ereski!

pipex · 2020-12-15T20:27:08Z

Next step is to try to replicate on the latest supervisor version

pwt · 2021-04-19T09:55:07Z

FYI: I'm still seeing this behaviour on the latest supervisor version: 12.4.6 (balenaOS 2.73.1+rev1).

20k-ultra · 2021-04-20T22:38:45Z

thanks for confirming that @pwt. I believe this is happening because the Supervisor works by evaluating the target state as a whole and not individual changes. So when we make a change the Supervisor compares the current state to the entire target state and if you stop a container on the device that is not persisted to the target state so the Supervisor identifies the mismatch and tries to correct it by starting that service.

I think it's fair for us to support not starting a container if you stopped it via the Supervisor API (or the dashboard) but if you manually stopped it via balena stop ... then that will make it significantly harder to fit into our architecture. What do you think ?

pwt · 2021-04-21T07:51:43Z

Thanks for responding to this, @20k-ultra.

thanks for confirming that @pwt. I believe this is happening because the Supervisor works by evaluating the target state as a whole and not individual changes. So when we make a change the Supervisor compares the current state to the entire target state and if you stop a container on the device that is not persisted to the target state so the Supervisor identifies the mismatch and tries to correct it by starting that service.

That's what I figured. (This is why I needed to implement a monitor thread that checks if a stopped container has been restarted, then stops it again.)

I think it's fair for us to support not starting a container if you stopped it via the Supervisor API (or the dashboard) but if you manually stopped it via balena stop ... then that will make it significantly harder to fit into our architecture. What do you think ?

I think that the specified restart semantics for a container should be respected, regardless of how the container is stopped, if it has been stopped intentionally. I.e., if restart: unless-stopped is specified, an intentionally stopped container should not be restarted as a side-effect of some unrelated action. Essentially, the same semantics as Docker would apply to a container.

For my uses cases I'm happy if the API/dashboard behaviour is addressed, if it's architecturally challenging also to address the balena stop case.

20k-ultra · 2021-04-21T20:01:08Z

I think one of the philosophies for the Supervisor's development has been that the data being manipulated (docker) is only expected to be interacted with via the Supervisor's API. Any changes made via the cli on device aren't tracked in the cloud so will always get changed when we compare the target state again. Allowing the Supervisor to work in conjunction with user changes on device isn't something we have aimed to do. If you start adding images to the device the Supervisor might remove it as part of the clean up process by the way for example.

However, in the case of the unless-stopped restart policy it makes sense to ensure the Supervisor respects it. We could take that opportunity to rewrite how we decide to start stopped containers. The mechanism that is causing the restart is persisting into memory what containers we have started. That's why after target state polls it won't start a container because it already did unless the Supervisor was restarted and lost memory of starting it. We can modify this to inspect the stopped container's policy and exit code to understand why it stopped and if we should start it again.

For the immediate future, I know making the stop action on the dashboard persist to target state isn't a straight forward conversation. We discussed it before and agreed as a product decision they should be ephemeral. If we respect the unless-stopped policy though then this is not a problem because we won't care what the target state says as we are using the docker-compose restart policy.... which makes me think, I am guessing the target state has the running value just so the target state matches the current state schema but if we don't allow specifying the running state in the target and don't plan to, then just remove it or ignore it as I am about to suggest.

I created #1668 to enforce the following when evaluating stopped containers regardless of target state running value:

unless-stopped: if exit code is 0 then start
always: exit code is irrelevant, always start
on-failure if exit code is 1 then start
no: exit code is irrelevant, never start

This should satisfy your requirements and make the SV more sensible. Thoughts @pwt ?

pwt · 2021-04-22T08:04:07Z

I created #1668 to enforce the following when evaluating stopped containers regardless of target state running value:

unless-stopped: if exit code is 0 then start

always: exit code is irrelevant, always start

on-failure if exit code is 1 then start

no: exit code is irrelevant, never start

This should satisfy your requirements and make the SV more sensible. Thoughts @pwt ?

Yes, I think this is exactly the right approach. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supervisor restarts a stopped container on a change to an unrelated service variable #1549

Supervisor restarts a stopped container on a change to an unrelated service variable #1549

Ereski commented Dec 14, 2020

jellyfish-bot commented Dec 14, 2020

20k-ultra commented Dec 14, 2020

pipex commented Dec 15, 2020

pwt commented Apr 19, 2021

20k-ultra commented Apr 20, 2021

pwt commented Apr 21, 2021

20k-ultra commented Apr 21, 2021

pwt commented Apr 22, 2021

Supervisor restarts a stopped container on a change to an unrelated service variable #1549

Supervisor restarts a stopped container on a change to an unrelated service variable #1549

Comments

Ereski commented Dec 14, 2020

jellyfish-bot commented Dec 14, 2020

20k-ultra commented Dec 14, 2020

pipex commented Dec 15, 2020

pwt commented Apr 19, 2021

20k-ultra commented Apr 20, 2021

pwt commented Apr 21, 2021

20k-ultra commented Apr 21, 2021

pwt commented Apr 22, 2021