Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"service_facts" module cannot report whether a service is enabled or not if it is in "failed" state #81115

Open
1 task done
amg1127 opened this issue Jun 23, 2023 · 4 comments
Labels
affects_2.12 bug This issue/PR relates to a bug. module This issue/PR relates to a module. needs_verified This issue needs to be verified/reproduced by maintainer

Comments

@amg1127
Copy link

amg1127 commented Jun 23, 2023

Summary

I use a playbook that handles failures and restarts a service if it is enabled on a server, and does nothing if it is disabled. To detect whether the playbook should take action on the service, it relies on the service_facts module.

- name: Get service facts:
  service_facts:
  become: yes

- name: Act on the service
  vars:
    app_service: 'foo.service'
    app_service_cache_dir: '/var/cache/foo'
    app_service_user: 'foo'
  when: 'ansible_facts.services[app_service].status == "enabled" '
  block:

    - name: Stop service
      service:
        name: "{{ app_service }}"
        state: stopped
      become: yes

    - name: Clear the cache folder
      file:
        path: "{{ app_service_cache_dir }}"
        state: absent
      become: yes
  
    - name: Recreate the cache folder
      file:
        path: "{{ app_service_cache_dir }}"
        state: directory
        owner: "{{ app_service_user }}"
        mode: "0700"
      become: yes

    - name: Start service
      service:
        name: "{{ app_service }}"
        state: started
      become: yes

The playbook was working fine under Ansible 2.9.27, however it has stopped working under Ansible 2.14.2 if the service enters a failed state in systemd (for example, in case it crashes). Under Ansible 2.14.2, the service_facts module sets the status attribute to failed and, as a result, it is no longer possible to figure out from existing service facts whether the playbook should act on the service or not. As a workaround, I had to manually invoke systemctl is-enabled foo.service with the command module.

git bisect command has identified that this commit has introduced the error: 82bab063

$ git bisect start
$ git bisect bad v2.12.10
$ git bisect good v2.10.0
$ git bisect run ../bisect.sh
82bab063e7c60b77596c5c87258d5c3398b5efc2 is the first bad commit

I may have code the playbook incorrectly. By reading the existing documentation about the service_facts module, I could not figure out the attribute that says whether a service is configured to start at boot time or not, nor the attribute that says whether the service is currently running or not. The definitions of status and state attributes don't seem clear.

Issue Type

Bug Report

Component Name

service_facts

Ansible Version

$ ansible --version
[WARNING]: You are running the development version of Ansible. You should only run Ansible from "devel" if you are modifying the Ansible engine, or trying out features under development. This is a rapidly changing source of code and can become unstable at any point.
ansible [core 2.12.0.dev0] (detached HEAD 82bab063e7) last updated 2023/06/23 23:08:39 (GMT +1300)
  config file = None
  configured module search path = ['/home/admin/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/admin/ansible/lib/ansible
  ansible collection location = /home/admin/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/admin/ansible/bin/ansible
  python version = 3.9.16 (main, Mar  7 2023, 00:00:00) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4)]
  jinja version = 2.11.3
  libyaml = True

Configuration

# if using a version older than ansible-core 2.12 you should omit the '-t all'
$ ansible-config dump --only-changed -t all
[WARNING]: You are running the development version of Ansible. You should only run Ansible from "devel" if you are modifying the Ansible engine, or trying out features under development. This is a rapidly changing source of code and can become unstable at any point.

OS / Environment

CentOS Stream 9

Steps to Reproduce

This is a minimal playbook that reproduces the issue.

---
# service_facts.yml
- hosts: localhost
  gather_facts: no
  vars:
    test_service: 'foo.service'
  tasks:

    - name: Ensure that a failed service exists
      block:

        - name: Verify whether a failed service exists
          command:
            argv:
              - systemctl
              - 'is-failed'
              - "{{ test_service }}"

      rescue:

        - name: Install a service that fails
          copy:
            content: |
              [Unit]
              After=local-fs.target

              [Service]
              Type=simple
              ExecStart=/bin/false

              [Install]
              WantedBy=multi-user.target
            dest: "/etc/systemd/system/{{ test_service }}"
          become: yes

        - name: Reload systemd and start the test service
          systemd:
            name: "{{ test_service }}"
            state: started
            daemon_reload: yes
          become: yes
          ignore_errors: yes

        - name: Ensure that the installed failed service exists
          command:
            argv:
              - systemctl
              - 'is-failed'
              - "{{ test_service }}"

    - name: Ensure that the failed service is disabled
      systemd:
        name: "{{ test_service }}"
        enabled: no
      become: yes

    - name: Get service facts
      service_facts:
      become: yes

    - name: Store facts about the failed service that is disabled
      set_fact:
        test_service_disabled: "{{ ansible_facts.services[test_service] | dict2items }}"

    - name: Ensure that the failed service is enabled
      systemd:
        name: "{{ test_service }}"
        enabled: yes
      become: yes

    - name: Get service facts again
      service_facts:
      become: yes

    - name: Store facts about the failed service that is enabled
      set_fact:
        test_service_enabled: "{{ ansible_facts.services[test_service] | dict2items }}"

    - name: Compare service details
      vars:
        test_service_symmetric_diff: "{{ test_service_disabled | symmetric_difference(test_service_enabled) }}"
      assert:
        that:
          - '(test_service_symmetric_diff | length) > 0'
        success_msg: "service information is different: '{{ test_service_symmetric_diff | to_json }}'"
        fail_msg: "! service information is equal !!! '{{ test_service_disabled | to_json }}'"

This is the script used with git bisect.

#!/bin/bash
set -o pipefail
curdir="${0}"
[ -h "${curdir}" ] && curdir="`readlink -f \"${curdir}\"`"
curdir="`dirname \"${curdir}\"`"
. "${curdir}/ansible/hacking/env-setup" -q
tempfile="`mktemp`"
[ -f "${tempfile}" ] || exit 125
if ansible-playbook "${curdir}/service_facts.yml" | tee "${tempfile}"; then
    rm -f "${tempfile}"
    exit 0
elif tail --lines=10 "${tempfile}" | grep -qF '! service information is equal !!!'; then
    rm -f "${tempfile}"
    exit 1
else
    rm -fv "${tempfile}"
    exit 125
fi

Expected Results

After invoking the service_facts module to get information about a service that has entered a failed state on a managed host, it should be possible to identify whether a service is enabled or not (that is, whether a service is configured to start at boot time or not).

Actual Results

running  '../bisect.sh'
[WARNING]: You are running the development version of Ansible. You should only
run Ansible from "devel" if you are modifying the Ansible engine, or trying out
features under development. This is a rapidly changing source of code and can
become unstable at any point.
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that
the implicit localhost does not match 'all'

PLAY [localhost] ***************************************************************

TASK [Verify whether a failed service exists] **********************************
changed: [localhost]

TASK [Ensure that the failed service is disabled] ******************************
changed: [localhost]

TASK [Get service facts] *******************************************************
ok: [localhost]

TASK [Store facts about the failed service that is disabled] *******************
ok: [localhost]

TASK [Ensure that the failed service is enabled] *******************************
changed: [localhost]

TASK [Get service facts again] *************************************************
ok: [localhost]

TASK [Store facts about the failed service that is enabled] ********************
ok: [localhost]

TASK [Compare service details] *************************************************
fatal: [localhost]: FAILED! => {
    "assertion": "(test_service_symmetric_diff | length) > 0",
    "changed": false,
    "evaluated_to": false,
    "msg": "! service information is equal !!! '[{\"key\": \"name\", \"value\": \"foo.service\"}, {\"key\": \"state\", \"value\": \"stopped\"}, {\"key\": \"status\", \"value\": \"failed\"}, {\"key\": \"source\", \"value\": \"systemd\"}]'"
}

PLAY RECAP *********************************************************************
localhost                  : ok=7    changed=3    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

82bab063e7c60b77596c5c87258d5c3398b5efc2 is the first bad commit

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@ansibot
Copy link
Contributor

ansibot commented Jun 23, 2023

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibot ansibot added affects_2.12 bug This issue/PR relates to a bug. module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. labels Jun 23, 2023
@amg1127
Copy link
Author

amg1127 commented Jun 23, 2023

+label affects_2.14

@s-hertel s-hertel added needs_verified This issue needs to be verified/reproduced by maintainer and removed needs_triage Needs a first human triage before being processed. labels Jun 27, 2023
@ekolkman
Copy link

ekolkman commented Feb 5, 2024

For some reason the state and status fields are messed up when a service has failed.
According to ansible-doc 'failed' is a state value (which is logical, as 'running' and 'stopped' are also for the state).
But in the current version 'failed' shows up in the status of the service fact, thus hiding the 'enabled' or 'disabled' status.
A 'systemctl reset-failed' on the server removes the failed status of services and then the enabled/disabled shows up fine. But 'reset-failed' is not something I could find in the ansible-builtin modules.

@amg1127
Copy link
Author

amg1127 commented Feb 5, 2024

+label affects_2.16
The issue is reproducible under Ansible 2.16.2, as well.

$ ansible-playbook service_facts.yml --ask-become-pass 
BECOME password: 
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

PLAY [localhost] **********************************************************************************************
TASK [Verify whether a failed service exists] **********************************************************************************************Tuesday 06 February 2024  12:37:37 +1300 (0:00:00.008)       0:00:00.008 ****** 
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["systemctl", "is-failed", "foo.service"], "delta": "0:00:00.007093", "end": "2024-02-06 12:37:37.759622", "msg": "non-zero return code", "rc": 4, "start": "2024-02-06 12:37:37.752529", "stderr": "", "stderr_lines": [], "stdout": "inactive", "stdout_lines": ["inactive"]}

TASK [Install a service that fails] **********************************************************************************************Tuesday 06 February 2024  12:37:37 +1300 (0:00:00.320)       0:00:00.329 ****** 
changed: [localhost]

TASK [Reload systemd and start the test service] **********************************************************************************************Tuesday 06 February 2024  12:37:38 +1300 (0:00:00.648)       0:00:00.978 ****** 
changed: [localhost]

TASK [Ensure that the installed failed service exists] **********************************************************************************************Tuesday 06 February 2024  12:37:39 +1300 (0:00:01.021)       0:00:01.999 ****** 
changed: [localhost]

TASK [Ensure that the failed service is disabled] **********************************************************************************************Tuesday 06 February 2024  12:37:39 +1300 (0:00:00.231)       0:00:02.231 ****** 
changed: [localhost]

TASK [Get service facts] **********************************************************************************************Tuesday 06 February 2024  12:37:40 +1300 (0:00:00.780)       0:00:03.011 ****** 
ok: [localhost]

TASK [Store facts about the failed service that is disabled] **********************************************************************************************Tuesday 06 February 2024  12:37:46 +1300 (0:00:05.973)       0:00:08.985 ****** 
ok: [localhost]

TASK [Ensure that the failed service is enabled] **********************************************************************************************Tuesday 06 February 2024  12:37:46 +1300 (0:00:00.039)       0:00:09.024 ****** 
changed: [localhost]

TASK [Get service facts again] **********************************************************************************************Tuesday 06 February 2024  12:37:47 +1300 (0:00:00.764)       0:00:09.789 ****** 
ok: [localhost]

TASK [Store facts about the failed service that is enabled] **********************************************************************************************Tuesday 06 February 2024  12:37:52 +1300 (0:00:05.670)       0:00:15.459 ****** 
ok: [localhost]

TASK [Compare service details] **********************************************************************************************Tuesday 06 February 2024  12:37:52 +1300 (0:00:00.036)       0:00:15.496 ****** 
fatal: [localhost]: FAILED! => {
    "assertion": "(test_service_symmetric_diff | length) > 0",
    "changed": false,
    "evaluated_to": false,
    "msg": "! service information is equal !!! '[{\"key\": \"name\", \"value\": \"foo.service\"}, {\"key\": \"state\", \"value\": \"stopped\"}, {\"key\": \"status\", \"value\": \"failed\"}, {\"key\": \"source\", \"value\": \"systemd\"}]'"
}

PLAY RECAP **********************************************************************************************localhost                  : ok=9    changed=5    unreachable=0    failed=1    skipped=0    rescued=1    ignored=0   

$ ansible --version
ansible [core 2.16.2]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/amg1127/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.11/site-packages/ansible
  ansible collection location = /home/amg1127/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801] (/usr/bin/python)
  jinja version = 3.1.3
  libyaml = True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.12 bug This issue/PR relates to a bug. module This issue/PR relates to a module. needs_verified This issue needs to be verified/reproduced by maintainer
Projects
None yet
Development

No branches or pull requests

4 participants