'any_errors_fatal' Broken For Unreachable Hosts v2 #15523

mattmonkey83 · 2016-04-21T15:56:28Z

ISSUE TYPE

Bug Report

ANSIBLE VERSION

ansible 2.0.2.0

CONFIGURATION

Default

OS / ENVIRONMENT

N/A

SUMMARY

We've been testing our playbooks in v2 and noticed that any_errors_fatal isn't respected when a host is unreachable.

The playbooks previously aborted in v1.9.5 but proceed in v2.

STEPS TO REPRODUCE

Run this playbook where test is a server who's SSH key is not loaded to force permission denied.

$ cat test.yml

---

- hosts: test
  any_errors_fatal: yes
  user: ansible
  gather_facts: false

  tasks:

  - ping:

- hosts: localhost
  connection: local
  gather_facts: false

  tasks:

  - debug:
      msg: 'Should not run'

EXPECTED RESULTS

Playbook should abort after failed ping task on test.

ACTUAL RESULTS

Playbook continues to next host group and completes debug task.

$ ansible-playbook test.yml

PLAY [test] ********************************************************************

TASK [ping] ********************************************************************
fatal: [ec2-54-204-28-247.compute-1.amazonaws.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}

PLAY [localhost] ***************************************************************

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "Should not run"
}
        to retry, use: --limit @test.retry

PLAY RECAP *********************************************************************
ec2-54-204-28-247.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0
localhost                  : ok=1    changed=0    unreachable=0    failed=0

The text was updated successfully, but these errors were encountered:

mattmonkey83 · 2016-04-27T14:54:04Z

Just noticed something while comparing v.2 with v.1.9.5 -

If I run the playbook above against two hosts, one reachable, the other not in v.1.9.5 I get this -

$ ansible-playbook -i test_inv test.yml

PLAY [test] *******************************************************************

TASK: [ping ] *****************************************************************
fatal: [ec2-54-235-218-190.compute-1.amazonaws.com] => SSH Error: Permission denied (publickey).
    while connecting to 10.154.152.240:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
ok: [ec2-54-242-223-5.compute-1.amazonaws.com]

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/matthewmcdonagh/test.retry

ec2-54-235-218-190.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0
ec2-54-242-223-5.compute-1.amazonaws.com : ok=1    changed=0    unreachable=0    failed=0

If I re-run the playbook against the same hosts but with gather_facts: true in v.1.9.5 I get this -

$ ansible-playbook -i test_inv test.yml

PLAY [test] *******************************************************************

GATHERING FACTS ***************************************************************
fatal: [ec2-54-235-218-190.compute-1.amazonaws.com] => SSH Error: Permission denied (publickey).
    while connecting to 10.154.152.240:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
ok: [ec2-54-242-223-5.compute-1.amazonaws.com]

TASK: [ping ] *****************************************************************
ok: [ec2-54-242-223-5.compute-1.amazonaws.com]

PLAY [localhost] **************************************************************

GATHERING FACTS ***************************************************************
ok: [localhost]

TASK: [debug ] ****************************************************************
ok: [localhost] => {
    "msg": "Should not run"
}

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/matthewmcdonagh/test.retry

ec2-54-235-218-190.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0
ec2-54-242-223-5.compute-1.amazonaws.com : ok=2    changed=0    unreachable=0    failed=0
localhost                  : ok=2    changed=0    unreachable=0    failed=0

The behaviour of any_errors_fatal is inconsistent both between versions and depending on where gather_facts is true or false. This could be quite dangerous in some scenarios.

jimi-c · 2016-05-17T19:38:56Z

Hi @mattmonkey83, this appears to be resolved by the feature branch I've pushed based mainly on @vroy's work: https://github.com/ansible/ansible/compare/vroy_backward-compatible-executor

Using an inventory with an unreachable host named test and your example playbook, I get the following output:

[root@jimi 15523]# ansible-playbook -vv -i inv test.yml 
Using /etc/ansible/ansible.cfg as config file
PLAYBOOK: test.yml *************************************************************
2 plays in test.yml
PLAY [test] ********************************************************************
TASK [ping] ********************************************************************
task path: /root/testing/15523/test.yml:10
fatal: [test]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to the remote host. Make sure this host can be reached over ssh", "unreachable": true}
PLAY RECAP *********************************************************************
test                       : ok=0    changed=0    unreachable=1    failed=0   
[root@jimi 15523]#

So the unreachable host halts the play execution. If you can confirm, I'll close this when we merge that feature branch in.

mattmonkey83 · 2016-05-18T17:46:30Z

Hi @jimi-c, thanks for looking at this.

I've finally managed to get round to testing your branch. There's some improvement -

When there's only unreachable hosts, it matches v.1.9 behaviour which is great -

$ ansible --version
ansible 2.2.0 (vroy_backward-compatible-executor 887850bbc7) last updated 2016/05/18 17:21:44 (GMT +000)
  lib/ansible/modules/core: (detached HEAD 92bf802cb8) last updated 2016/05/18 17:21:28 (GMT +000)
  lib/ansible/modules/extras: (detached HEAD e710dc47fe) last updated 2016/05/18 17:21:30 (GMT +000)
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
$ cat inv
[test]
ec2-54-235-218-190.compute-1.amazonaws.com
#ec2-54-160-195-217.compute-1.amazonaws.com
$ ansible-playbook -i inv test.yml

PLAY [test] ********************************************************************

TASK [ping] ********************************************************************
fatal: [ec2-54-235-218-190.compute-1.amazonaws.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
    to retry, use: --limit @test.retry

PLAY RECAP *********************************************************************
ec2-54-235-218-190.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0

Unfortunately there is still a difference when only a subset of hosts are unreachable -

$ ansible --version
ansible 2.2.0 (vroy_backward-compatible-executor 887850bbc7) last updated 2016/05/18 17:21:44 (GMT +000)
  lib/ansible/modules/core: (detached HEAD 92bf802cb8) last updated 2016/05/18 17:21:28 (GMT +000)
  lib/ansible/modules/extras: (detached HEAD e710dc47fe) last updated 2016/05/18 17:21:30 (GMT +000)
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
$ cat inv
[test]
ec2-54-235-218-190.compute-1.amazonaws.com
ec2-54-160-195-217.compute-1.amazonaws.com
$ ansible-playbook -i inv test.yml

PLAY [test] ********************************************************************

TASK [ping] ********************************************************************
fatal: [ec2-54-235-218-190.compute-1.amazonaws.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
ok: [ec2-54-160-195-217.compute-1.amazonaws.com]

PLAY [localhost] ***************************************************************

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "Should not run"
}
    to retry, use: --limit @test.retry

PLAY RECAP *********************************************************************
ec2-54-160-195-217.compute-1.amazonaws.com : ok=1    changed=0    unreachable=0    failed=0
ec2-54-235-218-190.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0
localhost                  : ok=1    changed=0    unreachable=0    failed=0

Versus what I'd consider to be correct behaviour in v.1.9 -

$ ansible --version
ansible 1.9.5
  configured module search path = None
$ cat inv
[test]
ec2-54-235-218-190.compute-1.amazonaws.com
ec2-54-160-195-217.compute-1.amazonaws.com
$ ansible-playbook -i inv test.yml

PLAY [test] *******************************************************************

TASK: [ping ] *****************************************************************
fatal: [ec2-54-235-218-190.compute-1.amazonaws.com] => SSH Error: Permission denied (publickey).
    while connecting to 10.234.24.10:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
ok: [ec2-54-160-195-217.compute-1.amazonaws.com]

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/matthewmcdonagh/test.retry

ec2-54-160-195-217.compute-1.amazonaws.com : ok=1    changed=0    unreachable=0    failed=0
ec2-54-235-218-190.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0

jimi-c · 2016-05-18T18:15:31Z

@mattmonkey83 I actually think this is more correct? The second play uses a different host than the first, so shouldn't it continue running?

mattmonkey83 · 2016-05-18T20:42:56Z

@jimi-c I can't really agree. Yes it is a different host targeted in the second part of the example playbook, but there are occasions when we want a playbook to abort rather than proceed.

For example if a playbook has tasks to configure 3 webservers followed by tasks to add them to the load balancer - We might not want the load balancer tasks to take place if one of the webservers wasn't successfully configured. That's the reason we use 'any_errors_fatal'.

I hope that makes sense. Let me know if you need further clarification.

jimi-c · 2016-05-18T22:12:46Z

@mattmonkey83 ahh yes, I wasn't really thinking of this in the context of any_errors_fatal. In that case, I'd agree with you.

jimi-c · 2016-06-08T15:44:39Z

Closing This Ticket

Hi!

We believe the above commit should resolve this problem for you. This will also be included in the next release.

If you continue seeing any problems related to this issue, or if you have any further questions, please let us know by stopping by one of the two mailing lists, as appropriate:

https://groups.google.com/forum/#!forum/ansible-project - for user questions, tips, and tricks
https://groups.google.com/forum/#!forum/ansible-devel - for strategy, future planning, and questions about writing code

Because this project is very active, we're unlikely to see comments made on closed tickets, but the mailing list is a great way to ask questions, or post if you don't think this particular issue is resolved.

Thank you!

This allows the PlaybookExecutor to receive more information regarding what happened internal to the TaskQueueManager and strategy, to determine things like whether or not the play iteration should stop. Fixes #15523 (cherry picked from commit fbec2d9)

mattmonkey83 · 2016-06-16T14:11:22Z

Thanks @jimi-c - Re-tested using latest from source and the problem now appears to be resolved 😄

virtusademo · 2017-08-17T06:25:20Z

Hi All ,
When ever i am trying bring the open shift cluster , i got following error. I run the below command.
sudo ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml

fatal: [g_all_hosts | default([])]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname g_all_hosts | default([]): Name or service not known\r\n", "unreachable": true}

[rhnuser3@ip-172-31-10-250 ~]$ ansible -m ping all
ip-172-31-10-250.ca-central-1.compute.internal | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: Host key verification failed.\r\n",
"unreachable": true

We have verified following steps.

Master communicating to the node

[rhnuser3@ip-172-31-10-250 ~]$ ssh rhnuser3@35.182.190.60
Last login: Wed Aug 16 08:05:16 2017 from master
[rhnuser3@node ~]$

Node Communicating to the master

[rhnuser3@ip-172-31-9-57 ~]$ ssh rhnuser3@172.31.10.250
Last login: Wed Aug 16 07:56:06 2017 from node
[rhnuser3@master ~]$

We have changed necessary changed necessary configuration file from master and node server.

vi /etc/ansible/ansible.cfg

inventory = /etc/ansible/hosts
sudo_user = rhnuser3

Removed comments from following lines.

We have updated /etc/hosts/file – it is look like

[rhnuser3@ip-172-31-10-250 ~]$ cat /etc/hosts
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

172.31.10.250 master
172.31.9.57 node

sudo vi /etc/ssh/sshd_config
#PasswordAuthentication yes
#PermitEmptyPasswords no
#PasswordAuthentication no

sudo cat /var/log/secure

Aug 16 08:28:09 localhost sshd[22792]: Disconnecting: Too many authentication failures for root [preauth]
Aug 16 08:28:09 localhost sshd[22792]: PAM 5 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=blk-222-40-174.eastlink.ca user=root
Aug 16 08:28:09 localhost sshd[22792]: PAM service(sshd) ignoring max retries; 6 > 3

Master server i performed the following commands
ssh-keygen -t rsa

cat /home/rhnuser3/.ssh/id_rsa.pub

sudo vi /home/rhnuser3/.ssh/authorized_keys

sudo chmod 600 .ssh/authorized_keys

sudo chown rhnuser3:rhnuser3 .ssh/authorized_keys

cat /home/rhnuser3/.ssh/id_rsa

Node server i performed the following command.

cd /home/rhnuser3

mkdir .ssh

chmod 700 .ssh

chown rhnuser3:rhnuser3 .ssh

sudo vi /home/rhnuser3/.ssh/authorized_keys

sudo chmod 600 .ssh/authorized_keys

sudo chown rhnuser3:rhnuser3 .ssh/authorized_keys

sudo vi /home/rhnuser3/.ssh/id_rsa

sudo chmod 600 .ssh/id_rsa

sudo chown rhnuser3:rhnuser3 .ssh/id_rsa

Please try suggest on these.

virtusademo · 2017-08-17T06:26:51Z

[rhnuser3@master ~]$ ansible --version
ansible 2.3.1.0
config file = /etc/ansible/ansible.cfg
configured module search path = Default w/o overrides
python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

[rhnuser3@master ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.1 (Maipo)
[rhnuser3@master ~]$

jimi-c · 2017-08-17T14:48:27Z

@virtusademo that may be a question better directed at the openstack-ansible team, which is not directly affiliated with us here in the Ansible project.

mattmonkey83 mentioned this issue Apr 22, 2016

SSH Connection Debug Broken in v2.0.2.0 #15525

Closed

mattmonkey83 changed the title ~~'any_errors_fatal' Unreachable Ansible 2~~ 'any_errors_fatal' Broken For Unreachable Hosts v2 Apr 22, 2016

jctanner added the bug_report label Apr 25, 2016

mattmonkey83 mentioned this issue Apr 28, 2016

Execution engine not backward compatible in 2.0 #15395

Closed

jimi-c added needs_info This issue requires further information. Please answer any outstanding questions. pending_action labels May 17, 2016

jimi-c removed needs_info This issue requires further information. Please answer any outstanding questions. pending_action labels May 18, 2016

jimi-c self-assigned this Jun 6, 2016

jimi-c added this to the stable-2.1 milestone Jun 6, 2016

jimi-c closed this as completed in fbec2d9 Jun 8, 2016

ansibot added bug This issue/PR relates to a bug. and removed bug_report labels Mar 7, 2018

ansible locked and limited conversation to collaborators Apr 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'any_errors_fatal' Broken For Unreachable Hosts v2 #15523

'any_errors_fatal' Broken For Unreachable Hosts v2 #15523

mattmonkey83 commented Apr 21, 2016

mattmonkey83 commented Apr 27, 2016

jimi-c commented May 17, 2016

mattmonkey83 commented May 18, 2016 •

edited

jimi-c commented May 18, 2016

mattmonkey83 commented May 18, 2016

jimi-c commented May 18, 2016

jimi-c commented Jun 8, 2016

mattmonkey83 commented Jun 16, 2016

virtusademo commented Aug 17, 2017

virtusademo commented Aug 17, 2017

jimi-c commented Aug 17, 2017

'any_errors_fatal' Broken For Unreachable Hosts v2 #15523

'any_errors_fatal' Broken For Unreachable Hosts v2 #15523

Comments

mattmonkey83 commented Apr 21, 2016

ISSUE TYPE

ANSIBLE VERSION

CONFIGURATION

OS / ENVIRONMENT

SUMMARY

STEPS TO REPRODUCE

EXPECTED RESULTS

ACTUAL RESULTS

mattmonkey83 commented Apr 27, 2016

jimi-c commented May 17, 2016

mattmonkey83 commented May 18, 2016 • edited

jimi-c commented May 18, 2016

mattmonkey83 commented May 18, 2016

jimi-c commented May 18, 2016

jimi-c commented Jun 8, 2016

Closing This Ticket

mattmonkey83 commented Jun 16, 2016

virtusademo commented Aug 17, 2017

virtusademo commented Aug 17, 2017

jimi-c commented Aug 17, 2017

mattmonkey83 commented May 18, 2016 •

edited