Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supervisor restarts a stopped container on a change to an unrelated service variable #1549

Open
Ereski opened this issue Dec 14, 2020 · 8 comments

Comments

@Ereski
Copy link

Ereski commented Dec 14, 2020

To reproduce:

  1. Run a test app with two services, A and B
  2. Manually stop service B
  3. Add a service variable for A
  4. B gets restarted

Versions confirmed affected:

  • 11.14.0
  • 11.4.10
@jellyfish-bot
Copy link

[ereski] This issue has attached support thread https://jel.ly.fish/9631c3da-8f3c-47ae-b023-024732e21b32

@20k-ultra
Copy link
Contributor

Thanks for the steps to reproduce @Ereski!

@pipex
Copy link
Contributor

pipex commented Dec 15, 2020

Next step is to try to replicate on the latest supervisor version

@pwt
Copy link

pwt commented Apr 19, 2021

FYI: I'm still seeing this behaviour on the latest supervisor version: 12.4.6 (balenaOS 2.73.1+rev1).

@20k-ultra
Copy link
Contributor

thanks for confirming that @pwt. I believe this is happening because the Supervisor works by evaluating the target state as a whole and not individual changes. So when we make a change the Supervisor compares the current state to the entire target state and if you stop a container on the device that is not persisted to the target state so the Supervisor identifies the mismatch and tries to correct it by starting that service.

I think it's fair for us to support not starting a container if you stopped it via the Supervisor API (or the dashboard) but if you manually stopped it via balena stop ... then that will make it significantly harder to fit into our architecture. What do you think ?

@pwt
Copy link

pwt commented Apr 21, 2021

Thanks for responding to this, @20k-ultra.

thanks for confirming that @pwt. I believe this is happening because the Supervisor works by evaluating the target state as a whole and not individual changes. So when we make a change the Supervisor compares the current state to the entire target state and if you stop a container on the device that is not persisted to the target state so the Supervisor identifies the mismatch and tries to correct it by starting that service.

That's what I figured. (This is why I needed to implement a monitor thread that checks if a stopped container has been restarted, then stops it again.)

I think it's fair for us to support not starting a container if you stopped it via the Supervisor API (or the dashboard) but if you manually stopped it via balena stop ... then that will make it significantly harder to fit into our architecture. What do you think ?

I think that the specified restart semantics for a container should be respected, regardless of how the container is stopped, if it has been stopped intentionally. I.e., if restart: unless-stopped is specified, an intentionally stopped container should not be restarted as a side-effect of some unrelated action. Essentially, the same semantics as Docker would apply to a container.

For my uses cases I'm happy if the API/dashboard behaviour is addressed, if it's architecturally challenging also to address the balena stop case.

@20k-ultra
Copy link
Contributor

I think one of the philosophies for the Supervisor's development has been that the data being manipulated (docker) is only expected to be interacted with via the Supervisor's API. Any changes made via the cli on device aren't tracked in the cloud so will always get changed when we compare the target state again. Allowing the Supervisor to work in conjunction with user changes on device isn't something we have aimed to do. If you start adding images to the device the Supervisor might remove it as part of the clean up process by the way for example.

However, in the case of the unless-stopped restart policy it makes sense to ensure the Supervisor respects it. We could take that opportunity to rewrite how we decide to start stopped containers. The mechanism that is causing the restart is persisting into memory what containers we have started. That's why after target state polls it won't start a container because it already did unless the Supervisor was restarted and lost memory of starting it. We can modify this to inspect the stopped container's policy and exit code to understand why it stopped and if we should start it again.

For the immediate future, I know making the stop action on the dashboard persist to target state isn't a straight forward conversation. We discussed it before and agreed as a product decision they should be ephemeral. If we respect the unless-stopped policy though then this is not a problem because we won't care what the target state says as we are using the docker-compose restart policy.... which makes me think, I am guessing the target state has the running value just so the target state matches the current state schema but if we don't allow specifying the running state in the target and don't plan to, then just remove it or ignore it as I am about to suggest.

I created #1668 to enforce the following when evaluating stopped containers regardless of target state running value:

  • unless-stopped: if exit code is 0 then start
  • always: exit code is irrelevant, always start
  • on-failure if exit code is 1 then start
  • no: exit code is irrelevant, never start

This should satisfy your requirements and make the SV more sensible. Thoughts @pwt ?

@pwt
Copy link

pwt commented Apr 22, 2021

I created #1668 to enforce the following when evaluating stopped containers regardless of target state running value:

  • unless-stopped: if exit code is 0 then start
  • always: exit code is irrelevant, always start
  • on-failure if exit code is 1 then start
  • no: exit code is irrelevant, never start

This should satisfy your requirements and make the SV more sensible. Thoughts @pwt ?

Yes, I think this is exactly the right approach. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants