Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'any_errors_fatal' Broken For Unreachable Hosts v2 #15523

Closed
mattmonkey83 opened this issue Apr 21, 2016 · 11 comments
Closed

'any_errors_fatal' Broken For Unreachable Hosts v2 #15523

mattmonkey83 opened this issue Apr 21, 2016 · 11 comments
Assignees
Labels
bug This issue/PR relates to a bug.
Milestone

Comments

@mattmonkey83
Copy link

ISSUE TYPE
  • Bug Report
ANSIBLE VERSION
ansible 2.0.2.0
CONFIGURATION

Default

OS / ENVIRONMENT

N/A

SUMMARY

We've been testing our playbooks in v2 and noticed that any_errors_fatal isn't respected when a host is unreachable.

The playbooks previously aborted in v1.9.5 but proceed in v2.

STEPS TO REPRODUCE

Run this playbook where test is a server who's SSH key is not loaded to force permission denied.

$ cat test.yml

---

- hosts: test
  any_errors_fatal: yes
  user: ansible
  gather_facts: false

  tasks:

  - ping:

- hosts: localhost
  connection: local
  gather_facts: false

  tasks:

  - debug:
      msg: 'Should not run'
EXPECTED RESULTS

Playbook should abort after failed ping task on test.

ACTUAL RESULTS

Playbook continues to next host group and completes debug task.

$ ansible-playbook test.yml

PLAY [test] ********************************************************************

TASK [ping] ********************************************************************
fatal: [ec2-54-204-28-247.compute-1.amazonaws.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}

PLAY [localhost] ***************************************************************

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "Should not run"
}
        to retry, use: --limit @test.retry

PLAY RECAP *********************************************************************
ec2-54-204-28-247.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0
localhost                  : ok=1    changed=0    unreachable=0    failed=0
@mattmonkey83 mattmonkey83 changed the title 'any_errors_fatal' Unreachable Ansible 2 'any_errors_fatal' Broken For Unreachable Hosts v2 Apr 22, 2016
@mattmonkey83
Copy link
Author

Just noticed something while comparing v.2 with v.1.9.5 -

If I run the playbook above against two hosts, one reachable, the other not in v.1.9.5 I get this -

$ ansible-playbook -i test_inv test.yml

PLAY [test] *******************************************************************

TASK: [ping ] *****************************************************************
fatal: [ec2-54-235-218-190.compute-1.amazonaws.com] => SSH Error: Permission denied (publickey).
    while connecting to 10.154.152.240:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
ok: [ec2-54-242-223-5.compute-1.amazonaws.com]

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/matthewmcdonagh/test.retry

ec2-54-235-218-190.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0
ec2-54-242-223-5.compute-1.amazonaws.com : ok=1    changed=0    unreachable=0    failed=0

If I re-run the playbook against the same hosts but with gather_facts: true in v.1.9.5 I get this -

$ ansible-playbook -i test_inv test.yml

PLAY [test] *******************************************************************

GATHERING FACTS ***************************************************************
fatal: [ec2-54-235-218-190.compute-1.amazonaws.com] => SSH Error: Permission denied (publickey).
    while connecting to 10.154.152.240:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
ok: [ec2-54-242-223-5.compute-1.amazonaws.com]

TASK: [ping ] *****************************************************************
ok: [ec2-54-242-223-5.compute-1.amazonaws.com]

PLAY [localhost] **************************************************************

GATHERING FACTS ***************************************************************
ok: [localhost]

TASK: [debug ] ****************************************************************
ok: [localhost] => {
    "msg": "Should not run"
}

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/matthewmcdonagh/test.retry

ec2-54-235-218-190.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0
ec2-54-242-223-5.compute-1.amazonaws.com : ok=2    changed=0    unreachable=0    failed=0
localhost                  : ok=2    changed=0    unreachable=0    failed=0

The behaviour of any_errors_fatal is inconsistent both between versions and depending on where gather_facts is true or false. This could be quite dangerous in some scenarios.

@jimi-c
Copy link
Member

jimi-c commented May 17, 2016

Hi @mattmonkey83, this appears to be resolved by the feature branch I've pushed based mainly on @vroy's work: https://github.com/ansible/ansible/compare/vroy_backward-compatible-executor

Using an inventory with an unreachable host named test and your example playbook, I get the following output:

[root@jimi 15523]# ansible-playbook -vv -i inv test.yml 
Using /etc/ansible/ansible.cfg as config file
PLAYBOOK: test.yml *************************************************************
2 plays in test.yml
PLAY [test] ********************************************************************
TASK [ping] ********************************************************************
task path: /root/testing/15523/test.yml:10
fatal: [test]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to the remote host. Make sure this host can be reached over ssh", "unreachable": true}
PLAY RECAP *********************************************************************
test                       : ok=0    changed=0    unreachable=1    failed=0   
[root@jimi 15523]# 

So the unreachable host halts the play execution. If you can confirm, I'll close this when we merge that feature branch in.

@jimi-c jimi-c added needs_info This issue requires further information. Please answer any outstanding questions. pending_action labels May 17, 2016
@mattmonkey83
Copy link
Author

mattmonkey83 commented May 18, 2016

Hi @jimi-c, thanks for looking at this.

I've finally managed to get round to testing your branch. There's some improvement -

When there's only unreachable hosts, it matches v.1.9 behaviour which is great -

$ ansible --version
ansible 2.2.0 (vroy_backward-compatible-executor 887850bbc7) last updated 2016/05/18 17:21:44 (GMT +000)
  lib/ansible/modules/core: (detached HEAD 92bf802cb8) last updated 2016/05/18 17:21:28 (GMT +000)
  lib/ansible/modules/extras: (detached HEAD e710dc47fe) last updated 2016/05/18 17:21:30 (GMT +000)
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
$ cat inv
[test]
ec2-54-235-218-190.compute-1.amazonaws.com
#ec2-54-160-195-217.compute-1.amazonaws.com
$ ansible-playbook -i inv test.yml

PLAY [test] ********************************************************************

TASK [ping] ********************************************************************
fatal: [ec2-54-235-218-190.compute-1.amazonaws.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
    to retry, use: --limit @test.retry

PLAY RECAP *********************************************************************
ec2-54-235-218-190.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0

Unfortunately there is still a difference when only a subset of hosts are unreachable -

$ ansible --version
ansible 2.2.0 (vroy_backward-compatible-executor 887850bbc7) last updated 2016/05/18 17:21:44 (GMT +000)
  lib/ansible/modules/core: (detached HEAD 92bf802cb8) last updated 2016/05/18 17:21:28 (GMT +000)
  lib/ansible/modules/extras: (detached HEAD e710dc47fe) last updated 2016/05/18 17:21:30 (GMT +000)
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
$ cat inv
[test]
ec2-54-235-218-190.compute-1.amazonaws.com
ec2-54-160-195-217.compute-1.amazonaws.com
$ ansible-playbook -i inv test.yml

PLAY [test] ********************************************************************

TASK [ping] ********************************************************************
fatal: [ec2-54-235-218-190.compute-1.amazonaws.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
ok: [ec2-54-160-195-217.compute-1.amazonaws.com]

PLAY [localhost] ***************************************************************

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "Should not run"
}
    to retry, use: --limit @test.retry

PLAY RECAP *********************************************************************
ec2-54-160-195-217.compute-1.amazonaws.com : ok=1    changed=0    unreachable=0    failed=0
ec2-54-235-218-190.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0
localhost                  : ok=1    changed=0    unreachable=0    failed=0

Versus what I'd consider to be correct behaviour in v.1.9 -

$ ansible --version
ansible 1.9.5
  configured module search path = None
$ cat inv
[test]
ec2-54-235-218-190.compute-1.amazonaws.com
ec2-54-160-195-217.compute-1.amazonaws.com
$ ansible-playbook -i inv test.yml

PLAY [test] *******************************************************************

TASK: [ping ] *****************************************************************
fatal: [ec2-54-235-218-190.compute-1.amazonaws.com] => SSH Error: Permission denied (publickey).
    while connecting to 10.234.24.10:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
ok: [ec2-54-160-195-217.compute-1.amazonaws.com]

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/matthewmcdonagh/test.retry

ec2-54-160-195-217.compute-1.amazonaws.com : ok=1    changed=0    unreachable=0    failed=0
ec2-54-235-218-190.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0

@jimi-c
Copy link
Member

jimi-c commented May 18, 2016

@mattmonkey83 I actually think this is more correct? The second play uses a different host than the first, so shouldn't it continue running?

@mattmonkey83
Copy link
Author

@jimi-c I can't really agree. Yes it is a different host targeted in the second part of the example playbook, but there are occasions when we want a playbook to abort rather than proceed.

For example if a playbook has tasks to configure 3 webservers followed by tasks to add them to the load balancer - We might not want the load balancer tasks to take place if one of the webservers wasn't successfully configured. That's the reason we use 'any_errors_fatal'.

I hope that makes sense. Let me know if you need further clarification.

@jimi-c
Copy link
Member

jimi-c commented May 18, 2016

@mattmonkey83 ahh yes, I wasn't really thinking of this in the context of any_errors_fatal. In that case, I'd agree with you.

@jimi-c jimi-c removed needs_info This issue requires further information. Please answer any outstanding questions. pending_action labels May 18, 2016
@jimi-c jimi-c self-assigned this Jun 6, 2016
@jimi-c jimi-c added this to the stable-2.1 milestone Jun 6, 2016
@jimi-c jimi-c closed this as completed in fbec2d9 Jun 8, 2016
@jimi-c
Copy link
Member

jimi-c commented Jun 8, 2016

Closing This Ticket

Hi!

We believe the above commit should resolve this problem for you. This will also be included in the next release.

If you continue seeing any problems related to this issue, or if you have any further questions, please let us know by stopping by one of the two mailing lists, as appropriate:

Because this project is very active, we're unlikely to see comments made on closed tickets, but the mailing list is a great way to ask questions, or post if you don't think this particular issue is resolved.

Thank you!

jimi-c added a commit that referenced this issue Jun 8, 2016
This allows the PlaybookExecutor to receive more information regarding
what happened internal to the TaskQueueManager and strategy, to determine
things like whether or not the play iteration should stop.

Fixes #15523

(cherry picked from commit fbec2d9)
@mattmonkey83
Copy link
Author

Thanks @jimi-c - Re-tested using latest from source and the problem now appears to be resolved 😄

@virtusademo
Copy link

Hi All ,
When ever i am trying bring the open shift cluster , i got following error. I run the below command.
sudo ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml

fatal: [g_all_hosts | default([])]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname g_all_hosts | default([]): Name or service not known\r\n", "unreachable": true}

[rhnuser3@ip-172-31-10-250 ~]$ ansible -m ping all
ip-172-31-10-250.ca-central-1.compute.internal | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: Host key verification failed.\r\n",
"unreachable": true

We have verified following steps.

Master communicating to the node

[rhnuser3@ip-172-31-10-250 ~]$ ssh rhnuser3@35.182.190.60
Last login: Wed Aug 16 08:05:16 2017 from master
[rhnuser3@node ~]$

Node Communicating to the master

[rhnuser3@ip-172-31-9-57 ~]$ ssh rhnuser3@172.31.10.250
Last login: Wed Aug 16 07:56:06 2017 from node
[rhnuser3@master ~]$

We have changed necessary changed necessary configuration file from master and node server.

vi /etc/ansible/ansible.cfg

inventory = /etc/ansible/hosts
sudo_user = rhnuser3

  • Removed comments from following lines.

We have updated /etc/hosts/file – it is look like

[rhnuser3@ip-172-31-10-250 ~]$ cat /etc/hosts
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

172.31.10.250 master
172.31.9.57 node

sudo vi /etc/ssh/sshd_config
#PasswordAuthentication yes
#PermitEmptyPasswords no
#PasswordAuthentication no

sudo cat /var/log/secure

Aug 16 08:28:09 localhost sshd[22792]: Disconnecting: Too many authentication failures for root [preauth]
Aug 16 08:28:09 localhost sshd[22792]: PAM 5 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=blk-222-40-174.eastlink.ca user=root
Aug 16 08:28:09 localhost sshd[22792]: PAM service(sshd) ignoring max retries; 6 > 3

Master server i performed the following commands
ssh-keygen -t rsa

cat /home/rhnuser3/.ssh/id_rsa.pub

sudo vi /home/rhnuser3/.ssh/authorized_keys

sudo chmod 600 .ssh/authorized_keys

sudo chown rhnuser3:rhnuser3 .ssh/authorized_keys

cat /home/rhnuser3/.ssh/id_rsa

Node server i performed the following command.

cd /home/rhnuser3

mkdir .ssh

chmod 700 .ssh

chown rhnuser3:rhnuser3 .ssh

sudo vi /home/rhnuser3/.ssh/authorized_keys

sudo chmod 600 .ssh/authorized_keys

sudo chown rhnuser3:rhnuser3 .ssh/authorized_keys

sudo vi /home/rhnuser3/.ssh/id_rsa

sudo chmod 600 .ssh/id_rsa

sudo chown rhnuser3:rhnuser3 .ssh/id_rsa

Please try suggest on these.

@virtusademo
Copy link

[rhnuser3@master ~]$ ansible --version
ansible 2.3.1.0
config file = /etc/ansible/ansible.cfg
configured module search path = Default w/o overrides
python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

[rhnuser3@master ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.1 (Maipo)
[rhnuser3@master ~]$

@jimi-c
Copy link
Member

jimi-c commented Aug 17, 2017

@virtusademo that may be a question better directed at the openstack-ansible team, which is not directly affiliated with us here in the Ansible project.

@ansibot ansibot added bug This issue/PR relates to a bug. and removed bug_report labels Mar 7, 2018
@ansible ansible locked and limited conversation to collaborators Apr 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue/PR relates to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants