Abort/fail if host is unreachable #18782

EthanStrider · 2016-12-06T21:05:47Z

ISSUE TYPE

Feature Idea

COMPONENT NAME

Abort/fail if host is unreachable

ANSIBLE VERSION

ansible 2.2.0.0
  config file =
  configured module search path = Default w/o overrides

CONFIGURATION

N/A

OS / ENVIRONMENT

N/A

SUMMARY

Ansible does not seem to treat an unreachable host as a failure, and in some cases (especially tasks where Ansible is involved in monitoring), I need any and all unreachable hosts to exit failure immediately, instead of continuing on to other hosts.

It looks like I'm not the only one either:
http://stackoverflow.com/questions/25930503/aborting-ansible-playbook-if-a-host-is-unreachable
http://stackoverflow.com/questions/31221165/ansible-abort-execution-if-a-host-is-unreachable
http://stackoverflow.com/questions/39657431/ansible-unreachable-instead-failed

FWIW, none of the suggestions in the posts above worked for me.

STEPS TO REPRODUCE

The feature would hopefully be as simple as a boolean that a user can toggle, e.g. 'fail_on_unreachable: true'

EXPECTED RESULTS

I expected some method for failing/aborting on unreachable hosts.

ACTUAL RESULTS

Couldn't find a good way to do it.

The text was updated successfully, but these errors were encountered:

krzysztof-magosa · 2016-12-08T18:31:24Z

Maybe any_errors_fatal could help?

linsomniac · 2017-04-18T17:22:06Z

@krzysztof-magosa : Unfortunately, unreachable doesn't seem to be considered a fatal in a way that "any_errors_fatal" or "max_failure_percentage" will detect. Even when combined with "gather_facts: false" and running a ping action on the host, it shows up in the output with a "fatal" prefix, but still doesn't seem to trigger the termination of the play with any combination of "any_errors_fatal" or "max_failure_percentage" I've tried. According to one of the Stack Overflow discussions I've read, this behavior changed in 2.1, it worked before that.

jimi-c · 2017-06-16T06:40:26Z

Hi @linsomniac, have you tested recently to see if the issue has been corrected?

bcoca · 2017-07-27T19:05:37Z

This makes sure that 'current hosts in play' are the same as the initial hosts selected for the play, will detect unreachable, but also 'failed' hosts.

    - assert:
        that:
            - ansible_play_hosts == ansible_play_hosts_all

another way is checking the registered var value for 'unreachable' key (via hostvars or clearing host errors).

- fail:  msg='host {{item}} was unreachable'
  when: "'unreachable' in hostvars[item]['registeredvar']"
  with_items: "{{ ansible_play_hosts_all }}"
  run_once: True

bcoca · 2017-07-27T19:07:09Z

@jimi-c as of 2.4, any_errors_fatal only looks at failed tasks, still does not consider unreachable 'failed'

- hosts: localhost,  unreachable_host
  gather_facts: false
  any_errors_fatal: yes
  tasks:
    - debug: msg='changed'
    - ping:
    - debug: msg='changed'

results in

PLAY [localhost, unreachable_host] ********************************************************************************************************************************************

TASK [debug] ******************************************************************************************************************************************************************
ok: [localhost] => {
    "msg": "changed"
}
ok: [unreachable_host] => {
    "msg": "changed"
}

TASK [ping] *******************************************************************************************************************************************************************
ok: [localhost]
fatal: [unreachable_host]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname unreachable_host: Name or service not known\r\n", "unreachable": true}

TASK [debug] ******************************************************************************************************************************************************************
ok: [localhost] => {
    "msg": "changed"
}

PLAY RECAP ********************************************************************************************************************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0
unreachable_host           : ok=1    changed=0    unreachable=1    failed=0

Playbook run took 0 days, 0 hours, 0 minutes, 0 seconds

briceburg · 2017-08-01T15:40:30Z

@bcoca we have a deployment playbook against a set of static hosts. if one is unreachable, we need to fail (to avoid the potential of having inconsistency amongst the hosts). we're on ansible 2.3 -- do you recommend adding

    - assert:
        that:
            - ansible_play_hosts == ansible_play_hosts_all

to each playbook that demands consensus amongst hosts? The second suggestion;

- fail:  msg='host {{item}} was unreachable'
  when: "'unreachable' in hostvars[item]['registeredvar']"
  with_items: "{{ ansible_play_hosts_all }}"
  run_once: True

seems like it could break in future versions if string/varname changes.

awiddersheim · 2017-12-07T14:38:53Z

I recently hit this issue I believe in 2.4. We rotate nodes out of our load balancer and wait for connections to drain using wait_for. However, during the wait polling the connection became UNREACHABLE and things just chugged on to the next host despite having any_errors_fatal turned on. This resulted in 2 nodes being out of rotation at a time which isn't good.

TASK [somerole : wait for connections to drain] ********************************
fatal: [foo]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to foo closed.\r\n", "unreachable": true}

NO MORE HOSTS LEFT *************************************************************

PLAY [app] *********************************************************************

TASK [Gathering Facts] *********************************************************
ok: [bar]

maurociancio · 2018-09-14T12:38:11Z

Is there any workaround for this issue?
The problem I'm having is that a host goes unreachable in the MIDDLE of the playbook, for some unknown reason (may be some network related glitch) and the playbook does not abort. Previously, ansible has run commands in this host, so, checking Ansible facts for the host cannot help me here, because the facts had been gathered. I'm doing a rolling deploy and Ansible continues with other hosts when a host was unre.
I'd like Ansible to abort immediately after an unreachable host was found.
Would you please post your workarounds?
Thanks!

seb54000 · 2018-10-07T11:24:27Z

Hello there, I'd like to share what is working for me

- name: Check SSH hosts reachability.
  hosts: all
  tasks:
    - name: Simple command (ping).
      ping:
    - name: Check if ansible_play_hosts == ansible_play_hosts_all (means UNREACHABLE hosts detected)
      run_once: True
      assert:
        that:
          - ansible_play_hosts == ansible_play_hosts_all

Results of the play

[13:21:40] Simple command (ping). | teclair-seb-master-1 | SUCCESS | 654ms
teclair-seb-etcd-2 | SUCCESS | 695ms
teclair-seb-ingress-1 | SUCCESS | 739ms
teclair-seb-etcd-1 | SUCCESS | 785ms
teclair-seb-etcd-3 | SUCCESS | 836ms
teclair-seb-worker-1 | SUCCESS | 893ms
teclair-seb-bastion | SUCCESS | 967ms
teclair-seb-worker-2 | UNREACHABLE!: SSH Error: data could not be sent to remote host "172.50.0.103". Make sure this host can be reached over ssh
[13:21:44] Check if ansible_play_hosts == ansible_play_hosts_all (means UNREACHABLE hosts detected) | teclair-seb-bastion | FAILED | 124ms
{
  - changed: False
  - assertion: ansible_play_hosts == ansible_play_hosts_all
  - evaluated_to: False
}
        to retry, use: --limit @/home/d824277/catalyse-core/install/platforms/site.retry
[13:21:44] system | -- Play recap --
teclair-seb-bastion        : ok=1    changed=0    unreachable=0    failed=1
teclair-seb-etcd-1         : ok=1    changed=0    unreachable=0    failed=0
teclair-seb-etcd-2         : ok=1    changed=0    unreachable=0    failed=0
teclair-seb-etcd-3         : ok=1    changed=0    unreachable=0    failed=0
teclair-seb-ingress-1      : ok=1    changed=0    unreachable=0    failed=0
teclair-seb-master-1       : ok=3    changed=0    unreachable=0    failed=0
teclair-seb-worker-1       : ok=1    changed=0    unreachable=0    failed=0
teclair-seb-worker-2       : ok=0    changed=0    unreachable=1    failed=0

vrevelas · 2019-12-07T23:35:35Z

Reading #15523 It looks like aborting on unreachable hosts used to be how Ansible behaved in v1. It was then changed to ignore unreachable hosts in v2, reverted to failing in 2.2, and then changed to ignore again in 2.4.

Although the suggested workaround of asserting ansible_play_hosts == ansible_play_hosts_all works for a single task, it's not practical for playbooks that contain many tasks and require fail-fast behaviour on unreachable hosts throughout.

Given that Ansible's behaviour has gone back and forth between failing and ignoring unreachable hosts a few times, perhaps making the behaviour configurable would be a reasonable approach? Perhaps a play/ansible.cfg/environment variable config item like any_unreachable_fatal that could be used alongside any_errors_fatal?

arvisha16 · 2022-02-24T16:57:01Z

Hello team,

I need quick help!!!

We have an ansible 2.9 version, any idea if this problem has been fixed in 2.9?

https://docs.ansible.com/ansible/latest/index.html

Please let me know.

Thanks,
Arvind

ebastos · 2023-02-18T18:23:37Z

I think this is covered by the max_fail_percentage feature now.
See error handling.

ansibot · 2023-08-08T11:26:09Z

Thank you very much for your submission to Ansible. It means a lot to us that you've taken time to contribute.

Unfortunately, this issue has been open for some time while waiting for a contributor to take it up but there does not seem to have been anyone that did so. So we are going to close this issue to clear up the queues and make it easier for contributors to browse possible implementation targets.

However, we're absolutely always up for discussion. Because this project is very active, we're unlikely to see comments made on closed tickets and we lock them after some time. If you or anyone else has any further questions, please let us know by using any of the communication methods listed in the page below:

https://docs.ansible.com/ansible/latest/community/communication.html

In the future, sometimes starting a discussion on the development list prior to proposing or implementing a feature can make getting things included a little easier, but it's not always necessary.

Thank you once again for this and your interest in Ansible!

click here for bot help

ansibot added affects_2.2 This issue/PR affects Ansible v2.2 feature_idea module This issue/PR relates to a module. and removed module This issue/PR relates to a module. plugin labels Dec 13, 2016

ansibot added the support:core This issue/PR relates to code supported by the Ansible Engineering Team. label Jun 29, 2017

ansibot added feature This issue/PR relates to a feature request. and removed feature_idea labels Mar 2, 2018

jautz mentioned this issue May 9, 2019

When host becomes unreachable any_errors_fatal aborts the play but max_fail_percentage=0 continues #56265

Open

ansibot added the needs_triage Needs a first human triage before being processed. label May 17, 2020

mkrizek removed the needs_triage Needs a first human triage before being processed. label May 18, 2020

bcoca added the affects_2.12 label May 14, 2021

s-hertel added the waiting_on_contributor This would be accepted but there are no plans to actively work on it. label Aug 3, 2022

ansibot added the bot_closed label Aug 8, 2023

ansibot removed the waiting_on_contributor This would be accepted but there are no plans to actively work on it. label Aug 8, 2023

ansibot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 8, 2023

ansible locked and limited conversation to collaborators Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abort/fail if host is unreachable #18782

Abort/fail if host is unreachable #18782

EthanStrider commented Dec 6, 2016

krzysztof-magosa commented Dec 8, 2016

linsomniac commented Apr 18, 2017

jimi-c commented Jun 16, 2017

bcoca commented Jul 27, 2017 •

edited

bcoca commented Jul 27, 2017 •

edited

briceburg commented Aug 1, 2017 •

edited

awiddersheim commented Dec 7, 2017

maurociancio commented Sep 14, 2018

seb54000 commented Oct 7, 2018

vrevelas commented Dec 7, 2019 •

edited

arvisha16 commented Feb 24, 2022

ebastos commented Feb 18, 2023

ansibot commented Aug 8, 2023

Abort/fail if host is unreachable #18782

Abort/fail if host is unreachable #18782

Comments

EthanStrider commented Dec 6, 2016

ISSUE TYPE

COMPONENT NAME

ANSIBLE VERSION

CONFIGURATION

OS / ENVIRONMENT

SUMMARY

STEPS TO REPRODUCE

EXPECTED RESULTS

ACTUAL RESULTS

krzysztof-magosa commented Dec 8, 2016

linsomniac commented Apr 18, 2017

jimi-c commented Jun 16, 2017

bcoca commented Jul 27, 2017 • edited

bcoca commented Jul 27, 2017 • edited

briceburg commented Aug 1, 2017 • edited

awiddersheim commented Dec 7, 2017

maurociancio commented Sep 14, 2018

seb54000 commented Oct 7, 2018

vrevelas commented Dec 7, 2019 • edited

arvisha16 commented Feb 24, 2022

ebastos commented Feb 18, 2023

ansibot commented Aug 8, 2023

bcoca commented Jul 27, 2017 •

edited

bcoca commented Jul 27, 2017 •

edited

briceburg commented Aug 1, 2017 •

edited

vrevelas commented Dec 7, 2019 •

edited