Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait for {starting, stopping} units #35636

Open
wants to merge 2 commits into
base: devel
Choose a base branch
from
Open

Conversation

sethp-nr
Copy link
Contributor

@sethp-nr sethp-nr commented Feb 2, 2018

SUMMARY

When ansible runs systemd: name=foo state=started against a unit that is currently activating (starting up), it moves on immediately. For a unit that can take many seconds or minutes to start up, this causes some difficulty interacting with that unit later in the play. Further, a unit that was deactivating (stopping) would not await the ultimate shutdown of the unit.

This change waits for units that are activating when state=started is requested, and units that are deactivating when state=stopped is requested.

Additionally, some attempt has been made to rationalize the difference in semantics between state=reloaded and systemd's notion of "reload". In the latter, attempting to reload anything besides a fully running service is an error, and per b49aa70 the former makes an attempt to start any non-active service that's "reloaded." Systemd is nice to us here in that enqueuing a start job for any service that's failed or deactivating will (eventually) try to start it back up, which is considered to have the same rough effect as attempting a reload in Ansible, so we take that tack.

Unfortunately, still ambiguous is what to do when the unit is activating: does the unit have the latest configuration, or does it need to be reloaded? Here, we try to change the behavior of the systemd module to wait for the unit to start before issuing the requested reload (preferring too many reloads to too few), but truth be told I'm not sure the complexity that behavior adds to the implementation is a net positive, given the relative rarity of the case.

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

systemd

ANSIBLE VERSION
ansible 2.3.2.0
  config file =
  configured module search path = Default w/o overrides
  python version = 2.7.14 (default, Sep 25 2017, 09:53:22) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]
ADDITIONAL INFORMATION

Systemd reports units as being in one of five states: active, inactive, activating, deactivating, and failed. Combined with the four settings for the state parameter, produces 20 possible permutations. This change maps those 20 settings to systemd jobs in the following way:

state=started: active [ok], activating / deactivating / inactive / failed -> start
state=stopped: inactive[ok], failed [ok], activating / deactivating / inactive -> stop
state=restarted: active, inactive, failed, activating, deactivating -> restart
state=reloaded: active -> reload, inactive / deactivating / failed -> start, activating -> wait for startup, then reload

Systemd reports units as being in one of five states: active, inactive, activating, deactivating, and failed. Combined with the four settings for the `state` parameter, produces 20 possible permutations. This change maps those 20 settings to systemd jobs in the following way:

state=started: active [ok], activating / deactivating / inactive / failed -> start 
state=stopped: inactive[ok], failed [ok], activating / deactivating / inactive -> stop
state=restarted: active, inactive, failed, activating, deactivating -> restart 
state=reloaded: active -> reload, inactive / deactivating / failed -> start, activating -> wait for startup, then reload
@ansibot ansibot added bugfix_pull_request core_review In order to be merged, this PR must follow the core review workflow. module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Feb 2, 2018
@ansibot
Copy link
Contributor

ansibot commented Feb 2, 2018

The test ansible-test sanity --test pep8 [?] failed with the following errors:

lib/ansible/modules/system/systemd.py:425:51: W291 trailing whitespace
lib/ansible/modules/system/systemd.py:434:161: E501 line too long (162 > 160 characters)

The test ansible-test sanity --test pylint [?] failed with the following error:

lib/ansible/modules/system/systemd.py:425:0: trailing-whitespace Trailing whitespace

click here for bot help

@ansibot ansibot added ci_verified Changes made in this PR are causing tests to fail. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed core_review In order to be merged, this PR must follow the core review workflow. labels Feb 2, 2018
@mkrizek
Copy link
Contributor

mkrizek commented Feb 2, 2018

cc @bcoca

@mkrizek mkrizek removed the needs_triage Needs a first human triage before being processed. label Feb 2, 2018
@ansibot ansibot added core_review In order to be merged, this PR must follow the core review workflow. and removed ci_verified Changes made in this PR are causing tests to fail. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Feb 2, 2018
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Feb 10, 2018
@ansibot ansibot added bug This issue/PR relates to a bug. and removed bugfix_pull_request labels Mar 2, 2018
@ansibot ansibot added the affects_2.6 This issue/PR affects Ansible v2.6 label May 21, 2018
@maxamillion
Copy link
Contributor

I'm in favor of the change, the code looks good too. 👍

shipit

cc @bcoca for review

@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label May 23, 2018
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label May 31, 2018
@intjonathan
Copy link

Any updates here? We would love to use this from upstream.

Copy link
Member

@bcoca bcoca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure the change does what it is meant to do, 'activating' and 'deactivating' might require special consideration:

  • loop and check again until we get desired state (with timeout)?
  • for started/stopped return OK but with note 'service was already activating/deactivating'?
  • for reload/restart, same do we wait until we have a stable state?

if module.params['state'] == 'started':
if not is_running_service(result['status']):
if service_state != 'active':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure we should activate the service if it was already 'activating'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're "activating", then I think we want to run systemctl start UNIT so the module waits for the startup to complete.

action = 'start'
elif module.params['state'] == 'stopped':
if is_running_service(result['status']):
if service_state not in set(['inactive', 'failed']):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we count 'deactivating' also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so – we're saying "run systemctl stop for any state not these two", or IOW "please attempt and await shutdown unless we're already at rest"

if not module.check_mode:
if module.params['no_block']:
module.fail_json(msg="Unable to reload service %s: %s" % (unit, "cannot wait for service startup to complete with --no-block"))
(rc, out, err) = module.run_command("%s %s '%s'" % (systemctl, 'start', unit))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure i undestand this, you 'start' a unit that is already 'activating'?

@ansibot ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed core_review In order to be merged, this PR must follow the core review workflow. labels Jul 24, 2018
@intjonathan
Copy link

Thanks for the review @bcoca, I'll pass this along to @sethp-nr

@sethp-nr
Copy link
Contributor Author

hi @bcoca – this in my testing, I found that issuing a systemctl start for a unit already activating does indeed wait for the unit to enter the active state. I tested with the following setup:

$ systemctl cat foo
# /etc/systemd/system/foo.service
[Unit]
Name=foo.service

[Service]
ExecStartPre=/usr/bin/sleep 5
ExecStart=/usr/bin/sleep 3600

and I observed later start commands waiting for a previously enqueued start job:

systemctl restart --no-block foo && time systemctl start foo

real	0m5.005s
user	0m0.003s
sys	0m0.000s

What I believe is happening is that systemd is "merging" our second start job into the first: https://github.com/systemd/systemd/blob/f330408d62ebffe166d2b1284b9f021129cfe125/src/core/job.c#L202-L208

In that case, the second start we're issuing wouldn't turn into another job at all, but instead ends up causing our systemctl command to await completion of the in-progress start job. That's what we're doing on line 435 as well: it's not a "start" action as such but rather a request to "await startup."

@sethp-nr
Copy link
Contributor Author

Hi @bcoca! Did my comment above make sense to you? I believe the code is correct for the goal of waiting for units going through their activation or deactivation phases as it's written, would you be willing to take another look?

@ansibot ansibot added the system System category label Feb 17, 2019
@ansibot ansibot added the needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html label Aug 16, 2019
@relrod relrod added the P3 Priority 3 - Approved, No Time Limitation label Apr 3, 2020
@ansibot ansibot added collection Related to Ansible Collections work collection:community.general needs_collection_redirect https://github.com/ansible/ansibullbot/blob/master/docs/collection_migration.md support:community This issue/PR relates to code supported by the Ansible community. and removed support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels May 16, 2020
@ansibot ansibot added support:core This issue/PR relates to code supported by the Ansible Engineering Team. and removed collection:community.general needs_collection_redirect https://github.com/ansible/ansibullbot/blob/master/docs/collection_migration.md support:community This issue/PR relates to code supported by the Ansible community. labels May 27, 2020
@ansibot ansibot added pre_azp This PR was last tested before migration to Azure Pipelines. and removed stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. labels Dec 6, 2020
@AVee
Copy link

AVee commented Oct 18, 2021

I any of this still current? I'm running into issues with the started state when systemd is 'activating'.

Here's what I'm doing:

  • Push config for a yet to be install package.
  • Install the package.
  • Check the newly installed service has started correctly, to catch any error caused by the configuration.

This seemed to work, however when the new service is slow to start the startup check might not catch the fact the service never started because of config errors. It seems this patch would prevent that...

@ansibot ansibot removed the has_issue label Jul 12, 2023
@ansibot ansibot added needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html and removed needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html labels Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.6 This issue/PR affects Ansible v2.6 bug This issue/PR relates to a bug. collection Related to Ansible Collections work module This issue/PR relates to a module. needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. P3 Priority 3 - Approved, No Time Limitation pre_azp This PR was last tested before migration to Azure Pipelines. support:core This issue/PR relates to code supported by the Ansible Engineering Team. system System category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants