Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ansible service_facts module returning failed loaded service on ubuntu 20.04 #83360

Open
1 task done
belope opened this issue Jun 4, 2024 · 12 comments · May be fixed by #83424
Open
1 task done

ansible service_facts module returning failed loaded service on ubuntu 20.04 #83360

belope opened this issue Jun 4, 2024 · 12 comments · May be fixed by #83424
Labels
affects_2.15 affects_2.17 bug This issue/PR relates to a bug. has_pr This issue has an associated PR. module This issue/PR relates to a module.

Comments

@belope
Copy link

belope commented Jun 4, 2024

Summary

When running
- name: populate service facts service_facts:

debugging ansible_facts.services variable shows on ubuntu hosts a service in failed state, specifically
'loaded': {'name': 'loaded', 'state': 'stopped', 'status': 'failed', 'source': 'systemd'}

running systemctl --failed on the host does not return any failed services

systemctl --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed.

tested on more versions of ansible as behaving the same

Issue Type

Bug Report

Component Name

lib/ansible/modules/service_facts.py

Ansible Version

$ ansible --version
ansible [core 2.17.0]
  config file = /home/petr/ansible/ansible_mai_update/ansible.cfg
  configured module search path = ['/home/petr/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/petr/.local/pipx/venvs/ansible-core/lib/python3.10/site-packages/ansible
  ansible collection location = /home/petr/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/petr/.local/pipx/venvs/ansible-core/bin/ansible
  python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (/home/petr/.local/pipx/venvs/ansible-core/bin/python)
  jinja version = 3.1.4
  libyaml = True

Configuration

# if using a version older than ansible-core 2.12 you should omit the '-t all'
$ ansible-config dump --only-changed -t all

OS / Environment

DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=21.3

Steps to Reproduce

- name: populate service facts
    service_facts:
 - name: debug
    debug:
      msg:  "{{ ansible_facts.services }}"

Expected Results

expected no services in failed state

Actual Results

shortened output, :
'loaded': {'name': 'loaded', 'state': 'stopped', 'status': 'failed', 'source': 'systemd'},

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@ansibot ansibot added bug This issue/PR relates to a bug. needs_triage Needs a first human triage before being processed. affects_2.15 module This issue/PR relates to a module. labels Jun 4, 2024
@ansibot
Copy link
Contributor

ansibot commented Jun 4, 2024

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the component bot command.

@ansibot
Copy link
Contributor

ansibot commented Jun 4, 2024

@belope ansible-core 2.15 is not supported and no longer receives bug fixes. Please test against one of the supported versions of ansible-core, preferably the most recent one, to see whether the bug has been fixed.

click here for bot help

@bcoca bcoca removed the needs_triage Needs a first human triage before being processed. label Jun 4, 2024
@bcoca
Copy link
Member

bcoca commented Jun 4, 2024

This is intended, service_facts returns data on all services it can find and their state, for example this helps people looking for services in different state/status and enable combination.

If you don't wan't failed services, this is easy to filter out from the results, but the data itself should reflect the state of the target.

@bcoca bcoca closed this as completed Jun 4, 2024
@belope
Copy link
Author

belope commented Jun 4, 2024

I apologise if I didn't explain my problem properly, I know that service facts returns failed services and I want that. On other operating systems (rhel based, debian) when you run systemctl --failed and there are no failed services both ansible and the direct command show the same thing. Only on ubuntu even though systemctl --failed and systemctl list-units --failed and systemctl status show no failed services the ansible module stil returns this "loaded" systemd service as failed, even though when directly asking on the ubuntu system there is no "loaded" service:
systemctl status loaded
Unit loaded.service could not be found

@bcoca
Copy link
Member

bcoca commented Jun 4, 2024

We do both:

systemctl list-units --type service --all"

and

systemctl list-unit-files --type service --all

To ensure we get a full list.

There are many 'views' systemd returns, specially with the different ways it was implemented, this is the way we found to minimize discrepancies and get the most data. If you do other different queries, deviations are expected.

@belope
Copy link
Author

belope commented Jun 4, 2024

ok, on the affected system I can see the following:
systemctl list-units --type service --all|grep failed
grub-initrd-fallback.service loaded inactive dead GRUB failed boot detection
I believe that is why I see the "loaded" service, because ansible is taking the failed not from the state, but from description.
Could you please check if my assumption is correct?

@bcoca
Copy link
Member

bcoca commented Jun 4, 2024

no, we take that from the different state fields (systemd has many), iirc its ActiveState

@belope
Copy link
Author

belope commented Jun 4, 2024

ok, any idea why I'm seeing the "loaded" systemd service? I would really like to figure this out as the only system that I can see this on is ubuntu and I would like to either not see it in the ansible output if it's there erroneously or figure out how to find what failed service to fix.
on another ubuntu server I can see 2 failed services in ansible facts, loaded and motd-news.service, but only motd-news in the systemctl --failed output as well as in the list units output
systemctl list-units --type service --all|grep failed
grub-initrd-fallback.service loaded inactive dead GRUB failed boot detection
● motd-news.service loaded failed failed Message of the Day

On the server with only loaded as failed I don't see any service where either ACTIVE or SUB are failed.

@belope
Copy link
Author

belope commented Jun 11, 2024

I have narrowed the problem with the help of my colleague to this portion of the code, lines 275-279:
for bad in self.BAD_STATES: if bad in fields: # dot is 0 status_val = bad fields = fields[1:] break
using the BAD_STATES = frozenset(['not-found', 'masked', 'failed'])
on ubuntu the incorrectly parsed service is as follows:
grub-initrd-fallback.service loaded inactive dead GRUB failed boot detection
the not-found, masked or failed services should norrmally have a ● sign on the beggining of the line, thus the removal of the first fields fiield (fields = fields[1:]) is necessary. However, in our situation, the "bad" state is matched in the service description, thus moving the index creates an incorrect failed service with the name "loaded".
the solution should probably be to not include description in the check for bad fields i guess? Or to use systemctl list-units --type service --all --plain to eliminate the ● sign altogether?
Could you look into this?

@bcoca bcoca reopened this Jun 11, 2024
@bcoca
Copy link
Member

bcoca commented Jun 11, 2024

I'll try using plain, it seems like a better solution

diff --git a/lib/ansible/modules/service_facts.py b/lib/ansible/modules/service_facts.py
index c15533b1bb..916e3ec9e5 100644
--- a/lib/ansible/modules/service_facts.py
+++ b/lib/ansible/modules/service_facts.py
@@ -263,7 +263,7 @@ class SystemctlScanService(BaseService):
     def _list_from_units(self, systemctl_path, services):
 
         # list units as systemd sees them
-        rc, stdout, stderr = self.module.run_command("%s list-units --no-pager --type service --all" % systemctl_path, use_unsafe_shell=True)
+        rc, stdout, stderr = self.module.run_command("%s list-units --no-pager --type service --all --plain" % systemctl_path, use_unsafe_shell=True)
         if rc != 0:
             self.module.warn("Could not list units from systemd: %s" % stderr)
         else:
@@ -275,7 +275,6 @@ class SystemctlScanService(BaseService):
                 for bad in self.BAD_STATES:
                     if bad in fields:  # dot is 0
                         status_val = bad
-                        fields = fields[1:]
                         break
                 else:
                     # active/inactive

let me know if that fixes it for you, the 'list from unit files' should overlap/overwrite this in any case and that pull from activestate, looking at the code now trying to figure out why that is not the case for you.

@bcoca bcoca linked a pull request Jun 11, 2024 that will close this issue
@ansibot ansibot added the has_pr This issue has an associated PR. label Jun 11, 2024
@belope
Copy link
Author

belope commented Jun 17, 2024

I tried the proposed fix, it fixes one half of the problem. It correctly parses the fileds, however, it still wrongly detects any service that has one of the words 'not-found', 'masked', 'failed' in the description field (in my case the grub-initrd-fallback.service)
As the --plain option has already removed the need to move the fileds index, I would propose to remove the whole for bad in fields for cyclus, as the status value should now be in the correct status_val = fields[2], however as you left the for cyclus in the code it currently detects those bad values in any field and sets status_val regardless of the fields[2] value as it doesn't enter the else branch.

@bcoca
Copy link
Member

bcoca commented Jun 17, 2024

That code handled that some versions of systemd were not consistent on showing a failed status even when the service had clearly failed as indicated by other fields. I'll need to check if all currently used versions now do so before removing that, though a better fix might just be to avoid the description as your own example shows grub as 'loaded' and 'inactive' and 'dead'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.15 affects_2.17 bug This issue/PR relates to a bug. has_pr This issue has an associated PR. module This issue/PR relates to a module.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants