Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort/fail if host is unreachable #18782

Closed
EthanStrider opened this issue Dec 6, 2016 · 13 comments
Closed

Abort/fail if host is unreachable #18782

EthanStrider opened this issue Dec 6, 2016 · 13 comments
Labels
affects_2.2 This issue/PR affects Ansible v2.2 affects_2.12 bot_closed feature This issue/PR relates to a feature request. support:core This issue/PR relates to code supported by the Ansible Engineering Team.

Comments

@EthanStrider
Copy link

ISSUE TYPE
  • Feature Idea
COMPONENT NAME

Abort/fail if host is unreachable

ANSIBLE VERSION
ansible 2.2.0.0
  config file =
  configured module search path = Default w/o overrides
CONFIGURATION

N/A

OS / ENVIRONMENT

N/A

SUMMARY

Ansible does not seem to treat an unreachable host as a failure, and in some cases (especially tasks where Ansible is involved in monitoring), I need any and all unreachable hosts to exit failure immediately, instead of continuing on to other hosts.

It looks like I'm not the only one either:
http://stackoverflow.com/questions/25930503/aborting-ansible-playbook-if-a-host-is-unreachable
http://stackoverflow.com/questions/31221165/ansible-abort-execution-if-a-host-is-unreachable
http://stackoverflow.com/questions/39657431/ansible-unreachable-instead-failed

FWIW, none of the suggestions in the posts above worked for me.

STEPS TO REPRODUCE

The feature would hopefully be as simple as a boolean that a user can toggle, e.g. 'fail_on_unreachable: true'

EXPECTED RESULTS

I expected some method for failing/aborting on unreachable hosts.

ACTUAL RESULTS

Couldn't find a good way to do it.

@krzysztof-magosa
Copy link
Contributor

Maybe any_errors_fatal could help?

@ansibot ansibot added affects_2.2 This issue/PR affects Ansible v2.2 feature_idea module This issue/PR relates to a module. and removed module This issue/PR relates to a module. plugin labels Dec 13, 2016
@linsomniac
Copy link
Contributor

@krzysztof-magosa : Unfortunately, unreachable doesn't seem to be considered a fatal in a way that "any_errors_fatal" or "max_failure_percentage" will detect. Even when combined with "gather_facts: false" and running a ping action on the host, it shows up in the output with a "fatal" prefix, but still doesn't seem to trigger the termination of the play with any combination of "any_errors_fatal" or "max_failure_percentage" I've tried. According to one of the Stack Overflow discussions I've read, this behavior changed in 2.1, it worked before that.

@jimi-c
Copy link
Member

jimi-c commented Jun 16, 2017

Hi @linsomniac, have you tested recently to see if the issue has been corrected?

@ansibot ansibot added the support:core This issue/PR relates to code supported by the Ansible Engineering Team. label Jun 29, 2017
@bcoca
Copy link
Member

bcoca commented Jul 27, 2017

This makes sure that 'current hosts in play' are the same as the initial hosts selected for the play, will detect unreachable, but also 'failed' hosts.

    - assert:
        that:
            - ansible_play_hosts == ansible_play_hosts_all

another way is checking the registered var value for 'unreachable' key (via hostvars or clearing host errors).

- fail:  msg='host {{item}} was unreachable'
  when: "'unreachable' in hostvars[item]['registeredvar']"
  with_items: "{{ ansible_play_hosts_all }}"
  run_once: True

@bcoca
Copy link
Member

bcoca commented Jul 27, 2017

@jimi-c as of 2.4, any_errors_fatal only looks at failed tasks, still does not consider unreachable 'failed'

- hosts: localhost,  unreachable_host
  gather_facts: false
  any_errors_fatal: yes
  tasks:
    - debug: msg='changed'
    - ping:
    - debug: msg='changed'

results in

PLAY [localhost, unreachable_host] ********************************************************************************************************************************************

TASK [debug] ******************************************************************************************************************************************************************
ok: [localhost] => {
    "msg": "changed"
}
ok: [unreachable_host] => {
    "msg": "changed"
}

TASK [ping] *******************************************************************************************************************************************************************
ok: [localhost]
fatal: [unreachable_host]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname unreachable_host: Name or service not known\r\n", "unreachable": true}

TASK [debug] ******************************************************************************************************************************************************************
ok: [localhost] => {
    "msg": "changed"
}

PLAY RECAP ********************************************************************************************************************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0
unreachable_host           : ok=1    changed=0    unreachable=1    failed=0

Playbook run took 0 days, 0 hours, 0 minutes, 0 seconds

@briceburg
Copy link
Contributor

briceburg commented Aug 1, 2017

@bcoca we have a deployment playbook against a set of static hosts. if one is unreachable, we need to fail (to avoid the potential of having inconsistency amongst the hosts). we're on ansible 2.3 -- do you recommend adding

    - assert:
        that:
            - ansible_play_hosts == ansible_play_hosts_all

to each playbook that demands consensus amongst hosts? The second suggestion;

- fail:  msg='host {{item}} was unreachable'
  when: "'unreachable' in hostvars[item]['registeredvar']"
  with_items: "{{ ansible_play_hosts_all }}"
  run_once: True

seems like it could break in future versions if string/varname changes.

@awiddersheim
Copy link
Contributor

I recently hit this issue I believe in 2.4. We rotate nodes out of our load balancer and wait for connections to drain using wait_for. However, during the wait polling the connection became UNREACHABLE and things just chugged on to the next host despite having any_errors_fatal turned on. This resulted in 2 nodes being out of rotation at a time which isn't good.

TASK [somerole : wait for connections to drain] ********************************
fatal: [foo]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to foo closed.\r\n", "unreachable": true}

NO MORE HOSTS LEFT *************************************************************

PLAY [app] *********************************************************************

TASK [Gathering Facts] *********************************************************
ok: [bar]

@ansibot ansibot added feature This issue/PR relates to a feature request. and removed feature_idea labels Mar 2, 2018
@maurociancio
Copy link

Is there any workaround for this issue?
The problem I'm having is that a host goes unreachable in the MIDDLE of the playbook, for some unknown reason (may be some network related glitch) and the playbook does not abort. Previously, ansible has run commands in this host, so, checking Ansible facts for the host cannot help me here, because the facts had been gathered. I'm doing a rolling deploy and Ansible continues with other hosts when a host was unre.
I'd like Ansible to abort immediately after an unreachable host was found.
Would you please post your workarounds?
Thanks!

@seb54000
Copy link

seb54000 commented Oct 7, 2018

Hello there, I'd like to share what is working for me

- name: Check SSH hosts reachability.
  hosts: all
  tasks:
    - name: Simple command (ping).
      ping:
    - name: Check if ansible_play_hosts == ansible_play_hosts_all (means UNREACHABLE hosts detected)
      run_once: True
      assert:
        that:
          - ansible_play_hosts == ansible_play_hosts_all

Results of the play

[13:21:40] Simple command (ping). | teclair-seb-master-1 | SUCCESS | 654ms
teclair-seb-etcd-2 | SUCCESS | 695ms
teclair-seb-ingress-1 | SUCCESS | 739ms
teclair-seb-etcd-1 | SUCCESS | 785ms
teclair-seb-etcd-3 | SUCCESS | 836ms
teclair-seb-worker-1 | SUCCESS | 893ms
teclair-seb-bastion | SUCCESS | 967ms
teclair-seb-worker-2 | UNREACHABLE!: SSH Error: data could not be sent to remote host "172.50.0.103". Make sure this host can be reached over ssh
[13:21:44] Check if ansible_play_hosts == ansible_play_hosts_all (means UNREACHABLE hosts detected) | teclair-seb-bastion | FAILED | 124ms
{
  - changed: False
  - assertion: ansible_play_hosts == ansible_play_hosts_all
  - evaluated_to: False
}
        to retry, use: --limit @/home/d824277/catalyse-core/install/platforms/site.retry
[13:21:44] system | -- Play recap --
teclair-seb-bastion        : ok=1    changed=0    unreachable=0    failed=1
teclair-seb-etcd-1         : ok=1    changed=0    unreachable=0    failed=0
teclair-seb-etcd-2         : ok=1    changed=0    unreachable=0    failed=0
teclair-seb-etcd-3         : ok=1    changed=0    unreachable=0    failed=0
teclair-seb-ingress-1      : ok=1    changed=0    unreachable=0    failed=0
teclair-seb-master-1       : ok=3    changed=0    unreachable=0    failed=0
teclair-seb-worker-1       : ok=1    changed=0    unreachable=0    failed=0
teclair-seb-worker-2       : ok=0    changed=0    unreachable=1    failed=0

@vrevelas
Copy link

vrevelas commented Dec 7, 2019

Reading #15523 It looks like aborting on unreachable hosts used to be how Ansible behaved in v1. It was then changed to ignore unreachable hosts in v2, reverted to failing in 2.2, and then changed to ignore again in 2.4.

Although the suggested workaround of asserting ansible_play_hosts == ansible_play_hosts_all works for a single task, it's not practical for playbooks that contain many tasks and require fail-fast behaviour on unreachable hosts throughout.

Given that Ansible's behaviour has gone back and forth between failing and ignoring unreachable hosts a few times, perhaps making the behaviour configurable would be a reasonable approach? Perhaps a play/ansible.cfg/environment variable config item like any_unreachable_fatal that could be used alongside any_errors_fatal?

@ansibot ansibot added the needs_triage Needs a first human triage before being processed. label May 17, 2020
@mkrizek mkrizek removed the needs_triage Needs a first human triage before being processed. label May 18, 2020
@arvisha16
Copy link

Hello team,

I need quick help!!!

We have an ansible 2.9 version, any idea if this problem has been fixed in 2.9?

https://docs.ansible.com/ansible/latest/index.html

Please let me know.

Thanks,
Arvind

@s-hertel s-hertel added the waiting_on_contributor This would be accepted but there are no plans to actively work on it. label Aug 3, 2022
@ebastos
Copy link

ebastos commented Feb 18, 2023

I think this is covered by the max_fail_percentage feature now.
See error handling.

@ansibot
Copy link
Contributor

ansibot commented Aug 8, 2023

Thank you very much for your submission to Ansible. It means a lot to us that you've taken time to contribute.

Unfortunately, this issue has been open for some time while waiting for a contributor to take it up but there does not seem to have been anyone that did so. So we are going to close this issue to clear up the queues and make it easier for contributors to browse possible implementation targets.

However, we're absolutely always up for discussion. Because this project is very active, we're unlikely to see comments made on closed tickets and we lock them after some time. If you or anyone else has any further questions, please let us know by using any of the communication methods listed in the page below:

In the future, sometimes starting a discussion on the development list prior to proposing or implementing a feature can make getting things included a little easier, but it's not always necessary.

Thank you once again for this and your interest in Ansible!

click here for bot help

@ansibot ansibot removed the waiting_on_contributor This would be accepted but there are no plans to actively work on it. label Aug 8, 2023
@ansibot ansibot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 8, 2023
@ansible ansible locked and limited conversation to collaborators Aug 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.2 This issue/PR affects Ansible v2.2 affects_2.12 bot_closed feature This issue/PR relates to a feature request. support:core This issue/PR relates to code supported by the Ansible Engineering Team.
Projects
None yet
Development

No branches or pull requests