New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move coreos-setgoodroot invocation out of update_engine #2292

Closed
bgilbert opened this Issue Dec 14, 2017 · 1 comment

Comments

Projects
None yet
1 participant
@bgilbert
Member

bgilbert commented Dec 14, 2017

Issue Report

Feature Request

Environment

Any

Desired Feature

coreos-setgoodroot is responsible for marking a USR partition good. update_engine currently runs coreos-setgoodroot unconditionally, 45 seconds after update_engine starts. That happens without any attempt to verify that the system has booted successfully, is running its intended workload, or is stable.

Move coreos-setgoodroot invocation out of update_engine to a separate systemd service:

  • setgoodroot should be enabled by default, either directly or via a timer unit that runs a few minutes after boot.
  • Users can add dependencies to the setgoodroot unit via drop-ins. For example, perhaps the user wants a failure of docker.service to prevent setgoodroot from running.
  • If the setgoodroot unit fails to start, the system automatically reboots (i.e. into the old partition). This should be possible via OnFailure and OnFailureJobMode=replace-irreversibly.
  • If the booted partition is already marked good, failure of the setgoodroot unit should not cause a reboot (in case e.g. Docker fails for an unrelated reason). We might still want to pull in the unit's dependencies, though, to be consistent between update and non-update boots. That can be done by e.g. writing a flag file indicating that this is an update, and making the setgoodroot unit ConditionPathExists on it.
  • The setgoodroot unit can be disabled by the user. (So, it will need to be default-enabled via a systemd preset and a coreos-postinst stanza for upgrades, not as a static unit symlinked in /usr.) In that case, it can be pulled in by a user-defined unit or started manually by a script if the user wants to do arbitrarily complicated acceptance tests without leaving services stuck in starting state. In this scenario, the user is responsible for rebooting the system if validation fails. We might also choose to document the option of bypassing the setgoodroot unit and running coreos-setgoodroot directly.
  • To avoid clobbering a USR partition that we may need later, update_engine will not run unless the booted partition has been marked good. That can't easily be expressed via systemd dependencies given the above, so update_engine might run unconditionally and check for a flag indicating that it's okay to proceed.

cc: @crawford

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert May 21, 2018

Member

Thank you for reporting this issue. Unfortunately, we don't think we'll end up addressing it in Container Linux.

As we recently announced, we're working on a successor to Container Linux, and we expect most major development to occur there instead. Meanwhile, Container Linux won't see many new features, but will still be fully maintained into 2020. Stay tuned for more details about that.

Member

bgilbert commented May 21, 2018

Thank you for reporting this issue. Unfortunately, we don't think we'll end up addressing it in Container Linux.

As we recently announced, we're working on a successor to Container Linux, and we expect most major development to occur there instead. Meanwhile, Container Linux won't see many new features, but will still be fully maintained into 2020. Stay tuned for more details about that.

@bgilbert bgilbert closed this May 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment