-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ansible hosts are randomly unreachable #18188
Comments
Can you add some output with |
needs_info |
This really isn't so easy, I might try to do what you want but keep in mind that is hundreds of thousands of debug lines out there. We are managing about 2000 servers with ansible and this can be easily reproduced only when I start it on all of them. On other hand, using "retries" option in config file, and setting it to high value, fixed this as workaround, with -vv I am now getting lot of ssh_retry: attempt: 1, ssh return code is 255. cmd (/bin/sh -c 'LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python'...), |
I observed the same problem with a large number of hosts. Reducing the number of |
I see this same behavior with just 3 hosts, using ansible 1.7.2 on debian 9 (stretch) on 2 VMware VMs, and debian 8 (jessie) on a raspberry pi. Completely random when it happens. |
Facing random "UNREACHABLE!" error on hosts. Ansible version: 2.3.1.0 |
I think this can be solved with
https://docs.ansible.com/ansible/latest/intro_configuration.html#retries Add this to your
|
This problem comes down to local considerations. Many solutions have been proposed here, that can help in reducing failures.
If you have further questions please stop by IRC or the mailing list:
|
ISSUE TYPE
COMPONENT NAME
SSH connectivity
ANSIBLE VERSION
CONFIGURATION
OS / ENVIRONMENT
RedHat 7.2
SUMMARY
We have about 2000 hosts managed by ansible and everytime I run any playbook or command on all of them, I always have about 3% of them as "UNREACHABLE", when I restart the task, some other random servers are UNREACHABLE, they however are not unreachable and there is no network outage or anything like that.
If I create a loop ssh connection (for loop in bash) that connects to every one of these 2000 servers it works without troubles, so there is clearly no issues related to SSH or network connectivity itself.
I almost believe that this is some problem with timeouts and the way how ansible determines that host is unreachable.
STEPS TO REPRODUCE
EXPECTED RESULTS
I expect the command to execute on 100% of hosts
ACTUAL RESULTS
It gets executed only on some hosts and random hosts are considered unreachable
Note that there were similar bugreports found on different forums on internet in regards of amazon EC2 http://stackoverflow.com/questions/39973103/ansible-ec2-random-ssh-connection-failures-after-provision
The text was updated successfully, but these errors were encountered: