Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add start_after_enable property to systemd_unit resource #14188

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adsr
Copy link

@adsr adsr commented Jan 18, 2024

Description

It's a common operational task is to start a service after enabling it. systemd has a flag for this (--now). I didn't see a way to cleanly add that flag in the implementation so I added an explicit start.

Note this feature is different than the :start action which always starts a service.

A stop_after_disable property may also be useful.

Related Issue

#14187

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Chore (non-breaking change that does not add functionality or fix an issue)

Checklist:

  • I have read the CONTRIBUTING document.
  • I have run the pre-merge tests locally and they pass.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • If Gemfile.lock has changed, I have used --conservative to do it and included the full output in the Description above.
  • All new and existing tests passed.
  • All commits have been signed-off for the Developer Certificate of Origin.

It is a common operational task is to start a service after enabling it. systemd
even has a flag for this (`--now`). I didn't see a way to cleanly add that flag
in the implementation so I added an explicit start.

Note this feature is different than the `:start` action which always starts the
service.

A `stop_after_disable` property may also be useful.
@adsr adsr requested review from a team as code owners January 18, 2024 23:38
@github-actions github-actions bot added the documentation How do we use this project? label Jan 18, 2024
Copy link

sonarcloud bot commented Jan 18, 2024

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@johnmccrae
Copy link
Contributor

@adsr Hi, can you please add a DCO to this. This looks great, thanks for the feature. We're going to review this at the community meeting next week. Can you attend? It's at 1 PST on Tuesdays. https://progress.zoom.us/j/94141161621?from=addon

@tpowell-progress tpowell-progress self-assigned this Jan 30, 2024
@tpowell-progress
Copy link
Contributor

@dafyddcrosby May have some suggestions on the --now flag inclusion.

@dafyddcrosby
Copy link
Contributor

Glad others are also thinking about this! :-D

You'd mentioned the --now flag, which came about in systemd 220, so it's not new, but I don't know if that's the version on all supported Infra Client platforms, either. Even if it is, what I'll suggest next won't be rendered moot either way.

What I like about --now is that we get a single atomic operation where we used to need 2, and can nearly halve systemctl shellouts. In the case of bootstrapping new hosts, that can be a nice speedup ❤️

One way to sidestep the systemd version issue, AND get the intended atomicity, AND save an extra resource would be to create :enable_and_start and :disable_and_stop (and maybe :mask_and_stop, too) actions. Creating new actions also means we don't have to muddy the semantics about the existing enable behavior, which keeps it easy to reason about if other neat systemsctl flags pop up. We'd probably want to check pre-220 behavior with --now to see if we can catch it for error handling (maybe @tpowell-progress has access to some ancient VMs to do this ;-)

@Tensibai
Copy link
Contributor

Tensibai commented May 6, 2024

This current implementation doesn't really add anything than defining the service with action [:enable, :start] and it is a footgun likely to bite someone using only :enable and this property.
With this if the service doesn't start on first run, subsequent run won't try to start it again as the action won't converge if the service is already enabled.

Which means once systemd has exhausted the tries in the unit, next chef run won't try to get it started again and it will stay dead in the water until someone get on the system to start it manually.

If that's the desired behavior then something like:

service 'xyz'
  action :enable
  notify :start, 'service[xyz]', :immediately
end

Should achieve the same thing as this PR and is more explicit about how it behaves IMHO.

@adsr
Copy link
Author

adsr commented May 6, 2024

Assuming you're running chef-client on a cron, action :start can be undesirable as there are cases where you want a service to remain stopped (e.g., after a failure or if stopped by a human). If you omit :start and use only :enable, then you have a new problem -- newly added services won't start until the next boot. A :start_after_enable action solves this scenario. As I mentioned it also maps cleanly to an existing systemctl action (--now).

The notify solution is clever. It's definitely better than [:enable, :start] but of course it would cause the service to start if anything in the resource changed, not just alongside an enable. It's also less readable.

Naming it :enable_and_start might be more intuitive to some, but I felt it may be confused as a shortcut for [:enable, :start].

@Tensibai
Copy link
Contributor

Tensibai commented May 7, 2024

Well, that's kinda breaking the promise theory if you're not describing a desired state but a workflow of actions.

re: "but of course it would cause the service to start if anything in the resource changed"

This recipe wouldn't have this side effect:

systemd_unit 'xyz' do
  action [:create; :reload] # reload optional here, depends if you want systemd to be updated or not
  content [your unit description]
end

systemd_unit 'enable_start_xyz' do
  unit-name 'xyz'
  action :enable
  notifies :start, 'systemd_unit[xyz]', :immediately
end

This should behave as you're aiming, but that's clearly creating an unpredictable state for your node at the end as you can never be sure the unit is started or not and what version the service is using. This goes against the idea of describing a desired state IMHO.

If merged, I would prefer a :start_once_after_enable action to be extra explicit this won't fix a dead unit on subsequent runs.

Implementing the --now is another subject and an interesting one to save a system call.
An approach could be checking if both enable and start are in the actions and if the current state is not enabled then use an internal action doing the enable with this parameter.
It's a bit a "one shot" thing in the node lifecycle however, so the balance is between doing this check on each run vs the gains on the first run. (No strong opinion from my part on this)

@jaymzh
Copy link
Collaborator

jaymzh commented May 21, 2024

OK there was much discussion in the week's meeting about this. We came to a few conclusions:

  1. enable_after_start as a property and/or enable_and_start as an action are not inherently valuable (heck [:enable, :start] is as many characters as :enable_and_start) if it does the same underlying thing.
  2. There is potential value if we are saving shellouts, so using systemd's --now, would be potentially valuable. There was general consensus that this should be a separate action and not a property, as that fits the model of chef in which you define the desired state of the system.
  3. However, the vast majority of users are not managing their services with systemd_unit, they're doing so with service, so not only is this not likely to provide the possible benefits here, worse it'll encourage people to move away from the service abstraction to gain the performance benefits of halving their shellouts.

So, if the goal is to improve performance with less shellouts, then the new actions (:enable_and_start, :stop_and_disable) should be added not just here, but in service, and implemented in child resources as efficiently as possible. So for provider/service/systemd.rb, we'd use --now, and there's probably something similar for other OSes (probably solaris, maybe mac), For when there is not, we'll emulate with two calls. Obviously you wouldn't be expected to figure that out for every OS, so the service.rb could define these as calling enable/start functions, and we could implement the os-specific providers over time.

If the goal is just to have a single action that does two things, that's an anti-pattern, and we'd rather not introduce that - in reality the user is requesting two states of their system: that a service be enabled, and that a service be started, and it should be two actions. The only reason for bucking that pattern is to halve the number of shellouts for those who are trying to eek out as much performance as possible (e.g. Meta would benefit from it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation How do we use this project?
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants