Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running service_facts stops Graphites carbon-cache #81239

Open
1 task done
chgarling opened this issue Jul 12, 2023 · 7 comments
Open
1 task done

Running service_facts stops Graphites carbon-cache #81239

chgarling opened this issue Jul 12, 2023 · 7 comments
Assignees
Labels
affects_2.15 bug This issue/PR relates to a bug. module This issue/PR relates to a module. needs_verified This issue needs to be verified/reproduced by maintainer P3 Priority 3 - Approved, No Time Limitation

Comments

@chgarling
Copy link

chgarling commented Jul 12, 2023

Summary

I experienced the problem, that a run of service_facts stops carbon-cache reproducable. I was wondering why the service was stopped after a rollout and so I looked at the logs, where I found this:

Jul 11 13:35:36 szvgraphite python3[2897927]: ansible-ansible.legacy.setup Invoked with gather_subset=['all'] gather_timeout=10 filter=[] fact_path=/etc/ansible/facts.d
Jul 11 13:35:45 szvgraphite python3[2898022]: ansible-service_facts Invoked
Jul 11 13:35:46 szvgraphite systemd[1]: Stopping Graphite Carbon Cache...
Jul 11 13:35:47 szvgraphite systemd[1]: carbon-cache.service: Succeeded.
Jul 11 13:35:47 szvgraphite systemd[1]: Stopped Graphite Carbon Cache.

I retried the service_facts run twice and in both cases carbon-cache was stopped directly after the invokement. I have no clue if the problem occurs, because carbon-cache is also a python3 process.

Issue Type

Bug Report

Component Name

service_facts

Ansible Version

$ ansible --version
ansible [core 2.15.1]
  config file = None
  configured module search path = ['/runner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /runner/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.16 (main, Mar  7 2023, 00:00:00) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] (/usr/bin/python3)
  jinja version = 3.1.2
  libyaml = True

Configuration

# if using a version older than ansible-core 2.12 you should omit the '-t all'
$ ansible-config dump --only-changed -t all
CONFIG_FILE() = None

OS / Environment

Ansible is running as part of AWX installed with awx-operator 2.2.1.

The target host is a Ubuntu 20.04.5 LTS host.

Steps to Reproduce

---
- hosts: all
  become: true
  become_method: sudo

  tasks:
    - name: gather service facts
      service_facts:

Expected Results

No services should be stopped accidently when invoking service_facts.

Actual Results

Process /usr/bin/python3 /usr/bin/carbon-cache will be stopped when invoking service_facts.

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@chgarling chgarling changed the title Running service_facts stop Graphites carbon-cache Running service_facts stops Graphites carbon-cache Jul 12, 2023
@ansibot
Copy link
Contributor

ansibot commented Jul 12, 2023

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibot ansibot added affects_2.15 bug This issue/PR relates to a bug. module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. labels Jul 12, 2023
@jborean93 jborean93 added needs_verified This issue needs to be verified/reproduced by maintainer and removed needs_triage Needs a first human triage before being processed. labels Jul 13, 2023
@jborean93 jborean93 self-assigned this Jul 13, 2023
@jborean93 jborean93 added the P3 Priority 3 - Approved, No Time Limitation label Jul 13, 2023
@jborean93
Copy link
Contributor

The code that is being run here is

class SystemctlScanService(BaseService):
BAD_STATES = frozenset(['not-found', 'masked', 'failed'])
def systemd_enabled(self):
# Check if init is the systemd command, using comm as cmdline could be symlink
try:
f = open('/proc/1/comm', 'r')
except IOError:
# If comm doesn't exist, old kernel, no systemd
return False
for line in f:
if 'systemd' in line:
return True
return False
def _list_from_units(self, systemctl_path, services):
# list units as systemd sees them
rc, stdout, stderr = self.module.run_command("%s list-units --no-pager --type service --all" % systemctl_path, use_unsafe_shell=True)
if rc != 0:
self.module.warn("Could not list units from systemd: %s" % stderr)
else:
for line in [svc_line for svc_line in stdout.split('\n') if '.service' in svc_line]:
state_val = "stopped"
status_val = "unknown"
fields = line.split()
for bad in self.BAD_STATES:
if bad in fields: # dot is 0
status_val = bad
fields = fields[1:]
break
else:
# active/inactive
status_val = fields[2]
# array is normalize so predictable now
service_name = fields[0]
if fields[3] == "running":
state_val = "running"
services[service_name] = {"name": service_name, "state": state_val, "status": status_val, "source": "systemd"}
def _list_from_unit_files(self, systemctl_path, services):
# now try unit files for complete picture and final 'status'
rc, stdout, stderr = self.module.run_command("%s list-unit-files --no-pager --type service --all" % systemctl_path, use_unsafe_shell=True)
if rc != 0:
self.module.warn("Could not get unit files data from systemd: %s" % stderr)
else:
for line in [svc_line for svc_line in stdout.split('\n') if '.service' in svc_line]:
# there is one more column (VENDOR PRESET) from `systemctl list-unit-files` for systemd >= 245
try:
service_name, status_val = line.split()[:2]
except IndexError:
self.module.fail_json(msg="Malformed output discovered from systemd list-unit-files: {0}".format(line))
if service_name not in services:
rc, stdout, stderr = self.module.run_command("%s show %s --property=ActiveState" % (systemctl_path, service_name), use_unsafe_shell=True)
state = 'unknown'
if not rc and stdout != '':
state = stdout.replace('ActiveState=', '').rstrip()
services[service_name] = {"name": service_name, "state": state, "status": status_val, "source": "systemd"}
elif services[service_name]["status"] not in self.BAD_STATES:
services[service_name]["status"] = status_val
def gather_services(self):
services = {}
if self.systemd_enabled():
systemctl_path = self.module.get_bin_path("systemctl", opt_dirs=["/usr/bin", "/usr/local/bin"])
if systemctl_path:
self._list_from_units(systemctl_path, services)
self._list_from_unit_files(systemctl_path, services)
return services
. It essentially runs the following to list all the systemd units.

systemctl list-units --no-pager --type service --all
systemctl list-unit-files --no-pager --type service --all

I do notice that it might run systemctl show "${SERVICE}" --property=ActiveState if the unit file in question was not listed in list-units. Does running any of those commands restart your service? I'm not sure why it would do it in the first place, the module certainly isn't sending any restart signal I can see.

@chgarling
Copy link
Author

I tested the commands you provided to me but none of these stops the service. Then I created a playbook with only the service_facts task to run (first I had some other steps afterwards) and it stopped the carbon-cache again. So its surely the service_facts task that "kills" the application.

@jborean93
Copy link
Contributor

Did you disable fact gathering altogether and try just service_facts. Add gather_facts: false underneath the play hosts setting and try again. The problem is I don't see how service_facts is doing this as those 3 commands are the only ones I see it running in the module code.

@chgarling
Copy link
Author

It still happens:

Jul 18 13:54:34 szvgraphite systemd[1]: Started Session 16502 of user ansible_deploy.
Jul 18 13:54:35 szvgraphite python3[483526]: ansible-service_facts Invoked
Jul 18 13:54:36 szvgraphite systemd[1]: Stopping Graphite Carbon Cache...
Jul 18 13:54:37 szvgraphite systemd[1]: carbon-cache.service: Succeeded.
Jul 18 13:54:37 szvgraphite systemd[1]: Stopped Graphite Carbon Cache.

My playbook:

---
- hosts: all
  gather_facts: false
  become: true
  become_method: sudo

  tasks:
    - name: gather service facts
      service_facts:

@chgarling
Copy link
Author

image

@bcoca
Copy link
Member

bcoca commented Jul 18, 2023

@chgarling still, we cannot reproduce this and looking at what is executed ... cannot fathom how this is caused by that module, unless your systemd units are setup to execute a shutdown when querying status?!?!

I'm going to guess that this is a confluence of things happening (resource starvation?) that the module execution aggravates. I would suggest looking at dmesg, carbon service logs and other system logs for a cause of the service shutdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.15 bug This issue/PR relates to a bug. module This issue/PR relates to a module. needs_verified This issue needs to be verified/reproduced by maintainer P3 Priority 3 - Approved, No Time Limitation
Projects
None yet
Development

No branches or pull requests

4 participants