Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Playbook fails when ssh host key changes #452

Closed
astraios opened this issue Oct 17, 2017 · 21 comments
Closed

Playbook fails when ssh host key changes #452

astraios opened this issue Oct 17, 2017 · 21 comments

Comments

@astraios
Copy link

ISSUE TYPE
  • Bug Report
COMPONENT NAME
  • AWX task?
SUMMARY

When I ran a Job a second time against a set of host I've just rebuilt with terraform, it fails due to the host keys being different, invoking a possible spoofing attack.

ENVIRONMENT
  • AWX version: 1.0.1.31
  • AWX install method: docker on linux
  • Ansible version: 2.4
  • Operating System: Centos7 docker's image
  • Web Browser: Chrome Version 61.0.3163.100
STEPS TO REPRODUCE
  • Create a set of virtual machines
  • Execute a Job template a first time
  • Destroy then provision again the same set of VM
  • Execute again the same Job template
EXPECTED RESULTS

As said in the issue #387, host keys are ignored, thus the job execution should not fail for such a reason.

ACTUAL RESULTS

At the second execution, it fails with this output:

fatal: [node1.test]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nThe ECDSA host key for node1.test.somedomain.com has changed,\r\nand the key for the corresponding IP address xxx.xxx.xxx.xxx \r\nis unknown. This could either mean that\r\nDNS SPOOFING is happening or the IP address for the host\r\nand its host key have changed at the same time.\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\r\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\r\nIt is also possible that a host key has just been changed.\r\nThe fingerprint for the ECDSA key sent by the …
@falencastro
Copy link
Contributor

falencastro commented Oct 20, 2017

I'm also facing this

WORKAROUND
docker exec -i -t awx_task bash
sed '/^node1.test/d' -i /root/.ssh/known_hosts

@rahst12
Copy link

rahst12 commented Nov 15, 2017

We hit the same thing, we did something like this for it in a bash script to fix it.

echo "remove fqdn ($fqdn) and ip ($ip) from known hosts"
sed -i '/^'$fqdn'/d' ~/.ssh/known_hosts
sed -i '/^'$ip'/d' ~/.ssh/known_hosts
ssh-keyscan $fqdn >> ~/.ssh/known_hosts
ssh-keyscan $ip >> ~/.ssh/known_hosts

It'd be nice if there was a way to invoke this locally against an inventory.

matburt pushed a commit to matburt/awx that referenced this issue Nov 16, 2017
disable GCE inventory caching w/ a .ini file
@rahst12
Copy link

rahst12 commented Dec 1, 2017

Giant pain in Ansible Tower. Anytime an IP is recycled, we've got manually clear it from the known_hosts.

This is my +1 to hopefully help the merge pull request here.

@wenottingham
Copy link
Contributor

Maybe I'm missing something... I don't see a PR?

@rahst12
Copy link

rahst12 commented Dec 1, 2017

Is there one from ryanpetrello 452?.. I could be totally mis-reading GitHub here.. matburt@4510cd1

@wenottingham
Copy link
Contributor

wenottingham commented Dec 1, 2017

Unrelated PR tagged into issue due to naive github matching of the number "452".

@rahst12
Copy link

rahst12 commented Dec 1, 2017

Dang, alright, well my moral support is provided! A little more background - I (we) use openstack and when we terminate a server, then recycle the IP, this issue hits. Depends on what were doing is how frequent this hits. We try to create Ansible Playbooks for all new server components (expand disks, install certs, etc), that run nightly to make sure things are up to date.

@kbrady-cognizant
Copy link

kbrady-cognizant commented Mar 28, 2018

Also experiencing this in a VMWare private cloud. Does Tower not take in to account the project's ansible.cfg? I ask because I have the following in it:

[ssh_connection] ssh_args = -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no

I was under the impression StrictHostKeyChecking=no means it doesn't matter what the host key is, so ip's can be recycled?

@astraios
Copy link
Author

I think it should be combined with another option:

-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no

As a workaround, I mount /dev/null like this in my docker-compose for the awx_task container:

    volumes:
      - /dev/null:/root/.ssh/known_hosts

@kbrady-cognizant
Copy link

ok will try adding -o UserKnownHostsFile=/dev/null as well @notuscloud

@ParagMalshe
Copy link

I face the same issue. I have following in ansible.cfg on awx_task container:

host_key_checking = False

And it correctly translates into ssh connection parameters, Ansible opens with the target host and yet I get host key changed error:

ansible-playbook 2.5.4 config file = /etc/ansible/ansible.cfg configured module search path = [u'/var/lib/awx/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible-playbook python version = 2.7.5 (default, Apr 11 2018, 07:36:10) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] Using /etc/ansible/ansible.cfg as config file SSH password: PLAYBOOK: vncplaybook.yml ****************************************************** 1 plays in vncplaybook.yml PLAY [vnc playbook] ************************************************************ TASK [Gathering Facts] ********************************************************* task path: /var/lib/awx/projects/_308__foobar_project_3454f7c5_c305_444e_9a4e_c87483ce00461eaf2218_95ca_4438_9748_da07b125ab65/vncplaybook.yml:2 <10.31.127.84> ssh_retry: attempt: 0, ssh return code is 255. cmd (['sshpass', '-d14', 'ssh', '-C', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'StrictHostKeyChecking=no', '-o', 'User=foobar', '-o', 'ConnectTimeout=10', '-o', 'ControlPath=/tmp/awx_462__Pvzgp/cp/af797df922', '10.31.127.84', "/bin/sh -c 'echo ~foobar && sleep 0'"]...), pausing for 0 seconds <10.31.127.84> ssh_retry: attempt: 1, ssh return code is 255. cmd (['sshpass', '-d14', 'ssh', '-C', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'StrictHostKeyChecking=no', '-o', 'User=foobar', '-o', 'ConnectTimeout=10', '-o', 'ControlPath=/tmp/awx_462__Pvzgp/cp/af797df922', '10.31.127.84', "/bin/sh -c 'echo ~foobar && sleep 0'"]...), pausing for 1 seconds <10.31.127.84> ssh_retry: attempt: 2, ssh return code is 255. cmd (['sshpass', '-d14', 'ssh', '-C', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'StrictHostKeyChecking=no', '-o', 'User=foobar', '-o', 'ConnectTimeout=10', '-o', 'ControlPath=/tmp/awx_462__Pvzgp/cp/af797df922', '10.31.127.84', "/bin/sh -c 'echo ~foobar && sleep 0'"]...), pausing for 3 seconds fatal: [10.31.127.84]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\r\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\r\nIt is also possible that a host key has just been changed.\r\nThe fingerprint for the ECDSA key sent by the remote host is\nSHA256:DuE8GJ41siGMVl+jRHXRXPVYd6PsLkBrxKeno432t/w.\r\nPlease contact your system administrator.\r\nAdd correct host key in /root/.ssh/known_hosts to get rid of this message.\r\nOffending ECDSA key in /root/.ssh/known_hosts:6\r\nPassword authentication is disabled to avoid man-in-the-middle attacks.\r\nKeyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.\r\nPermission denied (publickey,password).\r\n", "unreachable": true} PLAY RECAP ********************************************************************* 10.31.127.84 : ok=0 changed=0 unreachable=1 failed=0

@sudomateo
Copy link

+1 on this issue. AWX doesn't seem to be respecting the ansible.cfg or the environment variable ANSIBLE_HOST_KEY_CHECKING set in the UI.

@sudomateo
Copy link

I worked around this by setting "ANSIBLE_SSH_ARGS": "-C -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no" in the environment variables in the AWX GUI.

@snerge
Copy link

snerge commented Feb 6, 2019

+1 on this issue

I worked around this by setting "ANSIBLE_SSH_ARGS": "-C -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no" in the environment variables in the AWX GUI.

Thanks to @sudomateo, your workaround did the job !

@guilhermeaiolfi
Copy link

It should at least remove the ssh fingertip from the known_hosts file when removing the host from the GUI.
@sudomateo hint worked, thanks.

@siw36
Copy link

siw36 commented Jun 17, 2019

You probably don't want to disable the host key checking "tower wide".
One approach is to overwrite the ansible.cfg variable in each inventory that contains nodes that may change their SSH key over time.

---
defaults:
  vars:
    host_key_checking: false

@ssuperuser
Copy link

None of the suggestions above worked for me while running AWX 9.2, Ansible 2.9, ansible.cfg file config is not being ignored if you run it in verbose mode, I saw that StrictHostKeyChecking=no was set, while running the job in -vvv verbose mode but still got the SSH key being changed error.

So that didn't work. So had to add the following to the inventory file and it worked.

[all:vars]
ansible_ssh_common_args='-o StrictHostKeyChecking=no -o userknownhostsfile=/dev/null'

I got this solution from
https://route1.ph/2020/01/14/disable-strict-host-key-checking-in-ansible/

@NC-Yungd
Copy link

Not the most efficient, but I've created a playbook that I can execute from AWX that will read through my inventory group and using the shell module it will remove the line items from /root/.ssh/known_hosts

Again, it's not efficient because if you have a ton of hosts in your inventory_group, then this run at a linear time.

- hosts: localhost
  become: yes
  become_method: sudo
  tasks:
  - name: Remove DHCP addresses that are present in the known_hosts from the docker container awx_task
    shell: |
      docker exec -it awx_task /bin/bash -c "sed '/^{{ item }}/d' -i /root/.ssh/known_hosts"
    with_inventory_hostnames:
      - awx_host_group
    ignore_errors: yes
    delegate_to: awx_server.your_domain.com

Welcome to hear suggestions and feedback on how this can be improved!

@shanemcd
Copy link
Member

shanemcd commented Apr 9, 2021

I believe this issue is no longer relevant under the new Execution Environment model. Each run is a new container, so the known_hosts should never get reused.

@shanemcd shanemcd closed this as completed Apr 9, 2021
@azrdev
Copy link

azrdev commented Apr 27, 2021

while we're not running a new enough AWX version to have EEs, I managed to run an ad-hoc job against localhost using the shell module to execute sed -i=~2021-04-27 -e '/reusable_host/d' /root/.ssh/known_hosts to remove all hostkeys from the "cache" that match the given regex

@BobrPetr
Copy link

BobrPetr commented Feb 18, 2023

how to clean know_host in k8s (MicroK8s v1.26.1 revision 4595) pod awx-task?
or connect volume to /root/.ssh
via kubectl exec... there are not enough rights to access /root/.ssh
for sudo need the password of the user awx which I don't have

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests