Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any_errors_fatal doesn't work when a remote host is unreachable in the given batch #82834

Open
1 task done
kurzandras opened this issue Mar 18, 2024 · 6 comments · May be fixed by #82852
Open
1 task done

any_errors_fatal doesn't work when a remote host is unreachable in the given batch #82834

kurzandras opened this issue Mar 18, 2024 · 6 comments · May be fixed by #82852
Assignees
Labels
affects_2.12 affects_2.16 bug This issue/PR relates to a bug. has_pr This issue has an associated PR. P3 Priority 3 - Approved, No Time Limitation verified This issue has been verified/reproduced by maintainer

Comments

@kurzandras
Copy link

Summary

It seems that any_errors_fatal is not always working as expected. If there is an unreachable host in the current batch the any_errors_fatal stops working completely:

PLAY [Testing] ***********************************************************************************************************************************************************************************************************

TASK [Testing] ***********************************************************************************************************************************************************************************************************
Monday 18 March 2024  14:54:08 +0100 (0:00:00.036)       0:00:00.036 ********** 
Monday 18 March 2024  14:54:08 +0100 (0:00:00.035)       0:00:00.035 ********** 
changed: [testhost2]
fatal: [testhost1]: FAILED! => {"changed": true, "cmd": ["/test/check_users", "30", "40"], "delta": "0:00:00.031467", "end": "2024-03-18 14:54:13.072137", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2024-03-18 14:54:13.040670", "stderr": "", "stderr_lines": [], "stdout": "Warning: 33 user(s) currently logged in", "stdout_lines": ["Warning: 33 user(s) currently logged in"]}
changed: [testhost3]
changed: [testhost4]
fatal: [testhost5]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 111.111.111.111 port 22: Connection timed out", "unreachable": true}

NO MORE HOSTS LEFT *******************************************************************************************************************************************************************************************************

PLAY [Testing] ***********************************************************************************************************************************************************************************************************

TASK [Testing] ***********************************************************************************************************************************************************************************************************
Monday 18 March 2024  14:54:13 +0100 (0:00:05.093)       0:00:05.129 ********** 
Monday 18 March 2024  14:54:13 +0100 (0:00:05.093)       0:00:05.128 ********** 
fatal: [testhost6]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname 222.222.222.222\\305\\261: Name or service not known", "unreachable": true}
changed: [testhost7]
changed: [testhost8]
changed: [testhost9]
fatal: [testhost10]: FAILED! => {"changed": true, "cmd": ["/test/check_users", "30", "40"], "delta": "0:00:00.033527", "end": "2024-03-18 14:54:14.477162", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2024-03-18 14:54:14.443635", "stderr": "", "stderr_lines": [], "stdout": "Warning: 33 user(s) currently logged in", "stdout_lines": ["Warning: 33 user(s) currently logged in"]}
changed: [testhost11]
changed: [testhost12]
changed: [testhost13]

It should have stopped the whole playbook because of on testhost1 the task failed.

Issue Type

Bug Report

Component Name

any_errors_fatal

Ansible Version

ansible [core 2.12.10]
  config file = /home/ak3/projects/asd-controller/ansible.cfg
  configured module search path = ['/home/ak3/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  ansible collection location = /home/ak3/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0]
  jinja version = 3.0.3
  libyaml = True

Configuration

ANSIBLE_PIPELINING(/home/ak3/projects/asd-controller/ansible.cfg) = False
CALLBACKS_ENABLED(/home/ak3/projects/asd-controller/ansible.cfg) = ['timer', 'profile_tasks', 'profile_roles']
DEFAULT_ASK_VAULT_PASS(/home/ak3/projects/asd-controller/ansible.cfg) = False
DEFAULT_FORKS(/home/ak3/projects/asd-controller/ansible.cfg) = 50
DEFAULT_GATHER_TIMEOUT(/home/ak3/projects/asd-controller/ansible.cfg) = 5
DEFAULT_TIMEOUT(/home/ak3/projects/asd-controller/ansible.cfg) = 5
DEFAULT_VAULT_PASSWORD_FILE(/home/ak3/projects/asd-controller/ansible.cfg) = /home/ak3/projects/asd-controller/.ansible_vault_password
DEPRECATION_WARNINGS(/home/ak3/projects/asd-controller/ansible.cfg) = False
HOST_KEY_CHECKING(/home/ak3/projects/asd-controller/ansible.cfg) = False
INTERPRETER_PYTHON(/home/ak3/projects/asd-controller/ansible.cfg) = /usr/bin/python
INVENTORY_ENABLED(/home/ak3/projects/asd-controller/ansible.cfg) = ['host_list', 'script', 'auto', 'yaml', 'ini', 'toml']
RETRY_FILES_ENABLED(/home/ak3/projects/asd-controller/ansible.cfg) = True
RETRY_FILES_SAVE_PATH(/home/ak3/projects/asd-controller/ansible.cfg) = /home/ak3/projects/asd-controller

BECOME:
======

CACHE:
=====

CALLBACK:
========

CLICONF:
=======

CONNECTION:
==========

paramiko_ssh:
____________
host_key_checking(/home/ak3/projects/asd-controller/ansible.cfg) = False
ssh_args(/home/ak3/projects/asd-controller/ansible.cfg) = -C -o ServerAliveInterval=30 -o ControlMaster=auto -o ControlPersist=60s

ssh:
___
...skipping...
ANSIBLE_PIPELINING(/home/ak3/projects/asd-controller/ansible.cfg) = False
CALLBACKS_ENABLED(/home/ak3/projects/asd-controller/ansible.cfg) = ['timer', 'profile_tasks', 'profile_roles']
DEFAULT_ASK_VAULT_PASS(/home/ak3/projects/asd-controller/ansible.cfg) = False
DEFAULT_FORKS(/home/ak3/projects/asd-controller/ansible.cfg) = 50
DEFAULT_GATHER_TIMEOUT(/home/ak3/projects/asd-controller/ansible.cfg) = 5
DEFAULT_TIMEOUT(/home/ak3/projects/asd-controller/ansible.cfg) = 5
DEFAULT_VAULT_PASSWORD_FILE(/home/ak3/projects/asd-controller/ansible.cfg) = /home/ak3/projects/asd-controller/.ansible_vault_password
DEPRECATION_WARNINGS(/home/ak3/projects/asd-controller/ansible.cfg) = False
HOST_KEY_CHECKING(/home/ak3/projects/asd-controller/ansible.cfg) = False
INTERPRETER_PYTHON(/home/ak3/projects/asd-controller/ansible.cfg) = /usr/bin/python
INVENTORY_ENABLED(/home/ak3/projects/asd-controller/ansible.cfg) = ['host_list', 'script', 'auto', 'yaml', 'ini', 'toml']
RETRY_FILES_ENABLED(/home/ak3/projects/asd-controller/ansible.cfg) = True
RETRY_FILES_SAVE_PATH(/home/ak3/projects/asd-controller/ansible.cfg) = /home/ak3/projects/asd-controller

BECOME:
======

CACHE:
=====

CALLBACK:
========

CLICONF:
=======

CONNECTION:
==========

paramiko_ssh:
____________
host_key_checking(/home/ak3/projects/asd-controller/ansible.cfg) = False
ssh_args(/home/ak3/projects/asd-controller/ansible.cfg) = -C -o ServerAliveInterval=30 -o ControlMaster=auto -o ControlPersist=60s

ssh:
___
host_key_checking(/home/ak3/projects/asd-controller/ansible.cfg) = False
pipelining(/home/ak3/projects/asd-controller/ansible.cfg) = False
ssh_args(/home/ak3/projects/asd-controller/ansible.cfg) = -C -o ServerAliveInterval=30 -o ControlMaster=auto -o ControlPersist=60s
timeout(/home/ak3/projects/asd-controller/ansible.cfg) = 5

HTTPAPI:
=======

INVENTORY:
=========

LOOKUP:
======

NETCONF:
=======

SHELL:
=====

VARS:
====

OS / Environment

Ubuntu 22.04.2 LTS
WSL1

Steps to Reproduce

---
- name: Testing
  hosts: all
  any_errors_fatal: true
  gather_facts: false
  become: true
  serial:
    - 5
    - 20
  remote_user: ansible
  tasks:
    - name: Testing
      ansible.builtin.command: /test/check_users 30 40
      changed_when: true
      register: command_result
      failed_when: command_result.rc != 0

Expected Results

I expect that the playbook will stop the execution for the current batch in case of failures even though there is one or more unreachable hosts.

Actual Results

The playbook execution did not stop even though there was a failed task in the execution on the first 5 hosts. On testhost1 a task has failed, and testhost5 was unreachable and as a result the playbook did not stop. If there are no unreachable hosts in the given batch, it behaves correctly and stops in case of failures.

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@ansibot
Copy link
Contributor

ansibot commented Mar 18, 2024

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the component bot command.

@ansibot ansibot added bug This issue/PR relates to a bug. needs_triage Needs a first human triage before being processed. affects_2.12 labels Mar 18, 2024
@ansibot
Copy link
Contributor

ansibot commented Mar 18, 2024

@kurzandras ansible-core 2.12 is not supported and no longer receives bug fixes. Please test against one of the supported versions of ansible-core, preferably the most recent one, to see whether the bug has been fixed.

click here for bot help

@limaofu
Copy link

limaofu commented Mar 19, 2024

@kurzandras ansible-core 2.12 is not supported and no longer receives bug fixes. Please test against one of the supported versions of ansible-core, preferably the most recent one, to see whether the bug has been fixed.

click here for bot help

what? ansible-core 2.12 is still in use

@flowerysong
Copy link
Contributor

It might still be in use (people still use Ansible 1.9), but it's not supported and no longer receives bug or security fixes. 2.12 reached end of life 2023-05-22.

The oldest currently supported version is 2.14, which will itself reach EOL in a couple of months (2024-05-20).

@kurzandras
Copy link
Author

Thank you for your answers! It is really a big problem for me, since I can not upgrade due to missing python dependencies. Could you please at least suggest any workarounds for the given issue? Any help is really appreciated! Thank you very much in advance!

@s-hertel s-hertel added needs_verified This issue needs to be verified/reproduced by maintainer and removed needs_triage Needs a first human triage before being processed. labels Mar 19, 2024
@mkrizek
Copy link
Contributor

mkrizek commented Mar 19, 2024

I believe this is a minimal reproducer:

- hosts: unreachable_host,host1,host2
  gather_facts: false
  any_errors_fatal: true
  serial:
    - 1
  tasks:
    - command: "false"

The first batch containing an unreachable host should fail the whole play but it continues on to the second batch, a reachable host, host1 executing a task that fails which finally fails the whole play not continuing to host2.

I am working on a fix. The issue seems to be that for any_errors_fatal we fail all hosts including the unreachable ones so then when we counts failed hosts to see if the whole batch failed, unreachable hosts are counted twice because they are counted as both failed and unreachable:

failed_hosts_count = len(self._tqm._failed_hosts) + len(self._tqm._unreachable_hosts) - \
(previously_failed + previously_unreachable)
if len(batch) == failed_hosts_count:
break_play = True
break

@mkrizek mkrizek added P3 Priority 3 - Approved, No Time Limitation verified This issue has been verified/reproduced by maintainer affects_2.16 and removed needs_verified This issue needs to be verified/reproduced by maintainer labels Mar 19, 2024
mkrizek added a commit to mkrizek/ansible that referenced this issue Mar 20, 2024
@mkrizek mkrizek linked a pull request Mar 20, 2024 that will close this issue
@ansibot ansibot added the has_pr This issue has an associated PR. label Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.12 affects_2.16 bug This issue/PR relates to a bug. has_pr This issue has an associated PR. P3 Priority 3 - Approved, No Time Limitation verified This issue has been verified/reproduced by maintainer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants