Ansible controller exponential memory usage when using handler listeners in collection #83392

ShawnHardwick · 2024-06-06T16:58:57Z

Summary

In Ansible 2.17 and 2.18, when using the listen parameter for handlers within an Ansible collection, the memory usage on the Ansible controller increases exponentially with the amount of handlers in the playbook. Depending on the playbook tasks, this can cause the Ansible controller to consume all memory on the host until the operating system handles the process by killing it.

I believe the issue was introduce as part of this commit, which I will go into more detail in the reproduction steps:
#82854

Image of ansible-playbook consuming all memory right before the kernel sends a SIGKILL:

Only workaround I have at the moment is to use Ansible 2.16 or remove all listen parameter usage for handlers.

Issue Type

Bug Report

Component Name

core

Ansible Version

ansible [core 2.18.0.dev0]
  config file = /home/shawn.hardwick/.ansible.cfg
  configured module search path = ['/home/shawn.hardwick/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/shawn.hardwick/code/venv/ansible-latest/lib/python3.10/site-packages/ansible
  ansible collection location = /home/shawn.hardwick/code/roles:/home/shawn.hardwick/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles:/home/shawn.hardwick/code:/home/shawn.hardwick/code/ansible_collections
  executable location = /home/shawn.hardwick/code/venv/ansible-latest/bin/ansible
  python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (/home/shawn.hardwick/code/venv/ansible-latest/bin/python)
  jinja version = 3.1.3
  libyaml = True

Configuration

CONFIG_FILE() = /home/shawn.hardwick/.ansible.cfg

OS / Environment

Ansible controller:
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"

Steps to Reproduce

Create a collection, lets name it foo.test.

In this collection, create the following files:
roles/test_role/tasks/main.yml

---
- name: Task 1
  ansible.builtin.debug:
    msg: Task 1 executed
  changed_when: true
  notify: handler_2_listen

- name: Task 2
  ansible.builtin.debug:
    msg: Task 2 executed
  changed_when: true
  notify: handler_2_listen

- name: Task 3
  ansible.builtin.debug:
    msg: Task 3 executed
  changed_when: true
  notify: Handler 3

roles/test_role/handlers/main.yml

---
- name: Handler 1
ansible.builtin.debug:
  msg: Handler 1 executed

- name: Handler 2
listen:
  - handler_2_listen
ansible.builtin.debug:
  msg: Handler 2 executed

- name: Handler 3
ansible.builtin.debug:
  msg: Handler 3 executed

Create a playbook, lets call it playbook.yml:
Notes:

The behavior is reproducible against localhost and a remote host, but we choose localhost here for simplicity of reproducing.
This behavior is not unique to executing the same role over and over, but we choose in this case so that we do not have to create multiple unique roles. It is important that we call the role multiple times though so that the memory usage balloons and is observable at scale.

- name: Test Handlers
  hosts: localhost
  become: false
  gather_facts: false
  tasks:
    - name: Import a role
      ansible.builtin.import_role:
        name: flatiron.test.test_role
    - name: Import a role
      ansible.builtin.import_role:
        name: flatiron.test.test_role
    - name: Import a role
      ansible.builtin.import_role:
        name: flatiron.test.test_role
    - name: Import a role
      ansible.builtin.import_role:
        name: flatiron.test.test_role
    - name: Import a role
      ansible.builtin.import_role:
        name: flatiron.test.test_role
    - name: Import a role
      ansible.builtin.import_role:
        name: flatiron.test.test_role
    - name: Import a role
      ansible.builtin.import_role:
        name: flatiron.test.test_role
    - name: Import a role
      ansible.builtin.import_role:
        name: flatiron.test.test_role

Execute the playbook with the below command:
ansible-playbook ./playbook.yml

In your process manager of preference, observe the memory usage of the ansible-playbook process.
I use htop.

For some additional debugging context, I added the below display.display lines to lib/ansible/plugins/strategy/__init__.py for additional context of what is happening to the array that is causing the memory leak:

        for handler in handlers:
+           display.display(f"Parsing handler: {handler.name}")
            if listeners := handler.listen:
+               display.display(f"Listeners array length: {len(listeners)}")
                listeners = handler.get_validated_value(
                    'listen',
                    handler.fattributes.get('listen'),
                    listeners,
                    templar,
                )
+               display.display(f"Listeners array length after validated value: {len(listeners)}")
                if handler._role is not None:
                    for listener in listeners.copy():
+                       display.display(f"Parsing listener {listener} in listeners array length {len(listeners)}")
                        listeners.extend([
                            handler._role.get_name(include_role_fqcn=True) + ' : ' + listener,
                            handler._role.get_name(include_role_fqcn=False) + ' : ' + listener
                        ])

Expected Results

Memory usage to be small.

Actual Results

Memory is consumed until either the listeners are fully resolved (dependent on handler list) or the machine runs out of memory. On my machine, it will consume half of my total CPU and increasing RAM usage by 2GB every minute.

Using the display.display statements from the reproduce steps, the output might look like this:

[truncated for brevity]
Parsing listener test_role : foo.test.test_role : foo.test.test_role : foo.test.test_role : foo.test.test_role : foo.test.test_role : test_role : test_role : handler_2_listen in listeners array length 2956403
Parsing listener foo.test.test_role : test_role : foo.test.test_role : foo.test.test_role : foo.test.test_role : foo.test.test_role : test_role : test_role : handler_2_listen in listeners array length 2956405
Parsing listener test_role : test_role : foo.test.test_role : foo.test.test_role : foo.test.test_role : foo.test.test_role : test_role : test_role : handler_2_listen in listeners array length 2956407
Parsing listener foo.test.test_role : foo.test.test_role : test_role : foo.test.test_role : foo.test.test_role : foo.test.test_role : test_role : test_role : handler_2_listen in listeners array length 2956409
[truncated for brevity]

Code of Conduct

I agree to follow the Ansible Code of Conduct

The text was updated successfully, but these errors were encountered:

ansibot · 2024-06-06T17:03:22Z

Files identified in the description:

None

If these files are incorrect, please update the component name section of the description or use the component bot command.

sivel · 2024-06-06T17:13:08Z

Looks like we just need to ensure we are operating on a copy of the handler.listen list before extending it for evaluation:

diff --git a/lib/ansible/plugins/strategy/__init__.py b/lib/ansible/plugins/strategy/__init__.py
index efd69efe9b..9cba974d07 100644
--- a/lib/ansible/plugins/strategy/__init__.py
+++ b/lib/ansible/plugins/strategy/__init__.py
@@ -558,7 +558,7 @@ class StrategyBase:
                     handler.fattributes.get('listen'),
                     listeners,
                     templar,
-                )
+                ).copy()
                 if handler._role is not None:
                     for listener in listeners.copy():
                         listeners.extend([

ShawnHardwick · 2024-06-06T18:04:29Z

Seems like that suggested fix works.
playbook_output.txt

mkrizek · 2024-06-07T12:00:22Z

Looks like we just need to ensure we are operating on a copy of the handler.listen list before extending it for evaluation:

I was wondering if we could/should take it a step further and just do the work of validating/extending listen just once on task validation since listen is static #83400?

briantist · 2024-06-07T18:48:08Z

Since this problem affects 2.17 as well as 2.18, it would be great if whichever PR is chosen is a candidate for backporting. I can't really tell whether @mkrizek 's #83400 or @ShawnHardwick 's #83393 is more or less of a candidate for that, just mentioning it since it's important to us and affecting all of our internal collection testing.

Fixes ansible#83392

Fixes #83392

Fixes ansible#83392 (cherry picked from commit cbbf068)

Fixes #83392 (cherry picked from commit cbbf068)

v2.17.1 ======= Minor Changes ------------- - ansible-test - Update ``pypi-test-container`` to version 3.1.0. Bugfixes -------- - Fix rapid memory usage growth when notifying handlers using the ``listen`` keyword (ansible/ansible#83392) - Fix the task attribute ``resolved_action`` to show the FQCN instead of ``None`` when ``action`` or ``local_action`` is used in the playbook. - Fix using ``module_defaults`` with ``local_action``/``action`` (ansible/ansible#81905). - fixed unit test test_borken_cowsay to address mock not been properly applied when existing unix system already have cowsay installed. - powershell - Implement more robust deletion mechanism for C# code compilation temporary files. This should avoid scenarios where the underlying temporary directory may be temporarily locked by antivirus tools or other IO problems. A failure to delete one of these temporary directories will result in a warning rather than an outright failure. - shell plugin - properly quote all needed components of shell commands (ansible/ansible#82535)

ansibot added bug This issue/PR relates to a bug. needs_triage Needs a first human triage before being processed. affects_2.18 labels Jun 6, 2024

ShawnHardwick mentioned this issue Jun 6, 2024

Fix issue where using listen parameter for handlers in collections could lead to memory issues #83393

Closed

ansibot added the has_pr This issue has an associated PR. label Jun 6, 2024

mattclay removed the needs_triage Needs a first human triage before being processed. label Jun 6, 2024

mkrizek added a commit to mkrizek/ansible that referenced this issue Jun 10, 2024

Validate and process Handler.listen only once

0ab4dfa

Fixes ansible#83392

mkrizek mentioned this issue Jun 10, 2024

Validate and process Handler.listen only once #83400

Merged

sivel closed this as completed in #83400 Jun 10, 2024

sivel pushed a commit that referenced this issue Jun 10, 2024

Validate and process Handler.listen only once (#83400)

cbbf068

Fixes #83392

mkrizek added a commit to mkrizek/ansible that referenced this issue Jun 10, 2024

Validate and process Handler.listen only once (ansible#83400)

db1689a

Fixes ansible#83392 (cherry picked from commit cbbf068)

sivel pushed a commit that referenced this issue Jun 10, 2024

Validate and process Handler.listen only once (#83400) (#83405)

b15c684

Fixes #83392 (cherry picked from commit cbbf068)

ansible locked and limited conversation to collaborators Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ansible controller exponential memory usage when using handler listeners in collection #83392

Ansible controller exponential memory usage when using handler listeners in collection #83392

ShawnHardwick commented Jun 6, 2024 •

edited

Loading

ansibot commented Jun 6, 2024

sivel commented Jun 6, 2024

ShawnHardwick commented Jun 6, 2024

mkrizek commented Jun 7, 2024

briantist commented Jun 7, 2024

Ansible controller exponential memory usage when using handler listeners in collection #83392

Ansible controller exponential memory usage when using handler listeners in collection #83392

Comments

ShawnHardwick commented Jun 6, 2024 • edited Loading

Summary

Issue Type

Component Name

Ansible Version

Configuration

OS / Environment

Steps to Reproduce

Expected Results

Actual Results

Code of Conduct

ansibot commented Jun 6, 2024

sivel commented Jun 6, 2024

ShawnHardwick commented Jun 6, 2024

mkrizek commented Jun 7, 2024

briantist commented Jun 7, 2024

ShawnHardwick commented Jun 6, 2024 •

edited

Loading