failure in run_once task while using strategy free leads to all other hosts finishing immediately #80737

GlysVenture · 2023-05-08T08:57:27Z

Summary

When I use the run_once keyword with strategy: free and the keyworded task fails, instead of it having no effect on the play, it makes all other running hosts finish their last task and terminates the play.

This is especially problematic when including roles from others, where the run_once keyword could be somewhere you didn't know, and it could break your plays if you are using the free strategy.

Issue Type

Bug Report

Component Name

run_once

Ansible Version

$ ansible --version
ansible [core 2.14.5]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.11/site-packages/ansible
  ansible collection location = /home/user/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/user/.local/bin/ansible
  python version = 3.11.3 (main, Apr  5 2023, 15:52:25) [GCC 12.2.1 20230201] (/usr/bin/python)
  jinja version = 3.1.2
  libyaml = True

Configuration

# if using a version older than ansible-core 2.12 you should omit the '-t all'
$ ansible-config dump --only-changed -t all
CONFIG_FILE() = /etc/ansible/ansible.cfg

OS / Environment

Archcraft x86_64 - kernel 6.3.1-arch1-1 (Arch)

But I also experienced the same issue using
Debian 5.10.103-1

Steps to Reproduce

playbook.yml

- name: Test
  hosts: all
  gather_facts: false
  strategy: free
  tasks:
    - name: fail if success_run_once is false
      fail:
      run_once: true
      when: success_run_once == false

    - name: wait some time
      ansible.builtin.wait_for:
        timeout: 10
      delegate_to: localhost

    - name: simulate say hi
      debug:
        msg: "hi from {{ inventory_hostname }}"

    - name: simulate do fail
      fail:
      when: inventory_hostname == 'host1'

    - name: say hi
      debug:
        msg: "{{ inventory_hostname }} just finished its tasks"

inventory.yml

all:
  hosts:
    host1:
      ansible_hostname: 127.0.0.1
      success_run_once: true
    host2:
      ansible_hostname: 127.0.0.1
      success_run_once: true
    host3:
      ansible_hostname: 127.0.0.1
      success_run_once: false

Run ansible-playbook:
ansible-playbook -i inventory.yml playbook.yml

Expected Results

I expected run_once to not have any effect on the play and tasks, as if it wasn't there.

Sample expected output obtained by removing the run_once keyword:

PLAY [Test] *************************************************************************************************************************************************************************************************

TASK [fail if success_run_once is false] ********************************************************************************************************************************************************************
skipping: [host1]
skipping: [host2]
fatal: [host3]: FAILED! => {"changed": false, "msg": "Failed as requested from task"}

TASK [wait some time] ***************************************************************************************************************************************************************************************
ok: [host1 -> localhost]
ok: [host2 -> localhost]

TASK [simulate say hi] **************************************************************************************************************************************************************************************
ok: [host1] => {
    "msg": "hi from host1"
}
ok: [host2] => {
    "msg": "hi from host2"
}

TASK [simulate do fail] *************************************************************************************************************************************************************************************
fatal: [host1]: FAILED! => {"changed": false, "msg": "Failed as requested from task"}
skipping: [host2]

TASK [say hi] ***********************************************************************************************************************************************************************************************
ok: [host2] => {
    "msg": "host2 just finished its tasks"
}

PLAY RECAP **************************************************************************************************************************************************************************************************
host1                      : ok=2    changed=0    unreachable=0    failed=1    skipped=1    rescued=0    ignored=0   
host2                      : ok=3    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0   
host3                      : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

Actual Results

After one of the hosts fails in the run_once task, all the other finish their current task, and the play ends reporting stats, with no indication that something went wrong with the other hosts (A part from the lower total of tasks done).


PLAY [Test] *************************************************************************************************************************************************************************************************
[WARNING]: Using run_once with the free strategy is not currently supported. This task will still be executed for every host in the inventory list.

TASK [fail if success_run_once is false] ********************************************************************************************************************************************************************
skipping: [host1]
skipping: [host2]
fatal: [host3]: FAILED! => {"changed": false, "msg": "Failed as requested from task"}

TASK [wait some time] ***************************************************************************************************************************************************************************************
ok: [host2 -> localhost]
ok: [host1 -> localhost]

PLAY RECAP **************************************************************************************************************************************************************************************************
host1                      : ok=1    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
host2                      : ok=1    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
host3                      : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

Code of Conduct

I agree to follow the Ansible Code of Conduct

The text was updated successfully, but these errors were encountered:

ansibot · 2023-05-08T09:01:46Z

Files identified in the description:

test/integration/targets/include_import/run_once/include_me.yml

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

bcoca · 2023-05-08T13:25:28Z

run_once implies 'apply result to all hosts', which includes the task failure.

GlysVenture · 2023-05-08T17:14:41Z

Ok, I can understand the logic and it is written in the documentation. But I was led to believe, by the warning, that it was not not implemented for strategy free and was ignoring the keyword.
Seems wording is important, my bad.
I do think it should be ignored if its not "supported" to prevent these kind of silent "failures".

bcoca · 2023-05-08T17:17:55Z

leaving open to verify the behavior and reconsider if this is a bug for 'partially working' , the part of run_once that is not supported in the strategy is limiting to only one host run ... so it should either create a synch point or not apply the result of the first host to the rest, which can give weird results even with success.

mkrizek · 2023-05-09T08:21:41Z

We need to guard run_once with strategies that actually support it whenever we check for it in the base strategy, this fixes the issue:

diff --git a/lib/ansible/plugins/strategy/__init__.py b/lib/ansible/plugins/strategy/__init__.py
index edab7aed0b4..8815769867d 100644
--- a/lib/ansible/plugins/strategy/__init__.py
+++ b/lib/ansible/plugins/strategy/__init__.py
@@ -590,7 +590,7 @@ class StrategyBase:
                     # save the current state before failing it for later inspection
                     state_when_failed = iterator.get_state_for_host(original_host.name)
                     display.debug("marking %s as failed" % original_host.name)
-                    if original_task.run_once:
+                    if original_task.run_once and iterator._play.strategy in add_internal_fqcns(('linear',)):
                         # if we're using run_once, we have to fail every host here
                         for h in self._inventory.get_hosts(iterator._play.hosts):
                             if h.name not in self._tqm._unreachable_hosts:

We already do the same in the task debugger:

ansible/lib/ansible/plugins/strategy/__init__.py

Line 194 in 4b0d014

    
           if task.run_once and iterator._play.strategy in add_internal_fqcns(('linear',)) and result.is_failed():

Although a more of a "system fix" would be better, related #73483.

bcoca · 2023-05-09T13:40:40Z

@mkrizek something like adding a '_supports_run_once = False' property to the base class and making it true for linear?

jborean93 · 2024-01-10T19:40:21Z

I think we need to decide on what the desired behaviour is for this particular scenario. If this is the desired behaviour today it should be at least documented.

bcoca · 2024-01-10T21:23:10Z

i see a few options for run_once:

it should be 'totally' ignored when 'unsupported' emit a warning or error
it becomes a sync point and functions 'normally'
mostly as is, allow to set facts for all hosts, but not status

ansibot added affects_2.14 bug This issue/PR relates to a bug. needs_triage Needs a first human triage before being processed. labels May 8, 2023

GlysVenture closed this as not planned Won't fix, can't repro, duplicate, stale May 8, 2023

bcoca reopened this May 8, 2023

s-hertel removed the needs_triage Needs a first human triage before being processed. label May 9, 2023

jborean93 added the P3 Priority 3 - Approved, No Time Limitation label Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failure in run_once task while using strategy free leads to all other hosts finishing immediately #80737

failure in run_once task while using strategy free leads to all other hosts finishing immediately #80737

GlysVenture commented May 8, 2023

ansibot commented May 8, 2023

bcoca commented May 8, 2023

GlysVenture commented May 8, 2023

bcoca commented May 8, 2023

mkrizek commented May 9, 2023

bcoca commented May 9, 2023

jborean93 commented Jan 10, 2024

bcoca commented Jan 10, 2024

failure in run_once task while using strategy free leads to all other hosts finishing immediately #80737

failure in run_once task while using strategy free leads to all other hosts finishing immediately #80737

Comments

GlysVenture commented May 8, 2023

Summary

Issue Type

Component Name

Ansible Version

Configuration

OS / Environment

Steps to Reproduce

Expected Results

Actual Results

Code of Conduct

ansibot commented May 8, 2023

bcoca commented May 8, 2023

GlysVenture commented May 8, 2023

bcoca commented May 8, 2023

mkrizek commented May 9, 2023

bcoca commented May 9, 2023

jborean93 commented Jan 10, 2024

bcoca commented Jan 10, 2024