-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong error message when unable to connect via ssh #12916
Comments
Interesting, with both the adhoc and playbook commands, I get the following for a VM that's offline: "msg": "ERROR! SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue", Have you changed any settings in your ansible.cfg which might affect the SSH connection params? |
This is my ansible.cfg settings:
|
@evanccnyc I haven't been able to reproduce this problem either (I ran an ansible command with Could you please try it and see? |
I did a clean reinstall recently rm -Rf and then git clone and it seems to have gone away. Ill close and reopen if I see it again. |
@amenonsen I've seen this error with the beta (v2.0.0-0.4.beta2) and the latest (c64ac90). I get the error even with the patch. Can this issue be reopened, or should I open a new issue? My ansible.cfg is: [defaults]
retry_files_save_path = $HOME/.ansible-retries
inventory = ./inventory/
vault_password_file = $HOME/.ansible_vault
remote_user = ubuntu
forks = 25
host_key_checking = False
gathering = smart
[privilege_escalation]
become = True
[ssh_connection]
control_path = %(directory)s/%%h-%%r
pipelining = True |
@leedm777 Could you perhaps send me output of a problematic run with |
I'm able to duplicate the issue by adding a bad route to the host, so that it's unreachable. $ sudo route add -host X.X.X.X -iface lo0 $ ssh -vvv X.X.X.X
OpenSSH_6.9p1, LibreSSL 2.1.7
debug1: Reading configuration data /Users/dlee/.ssh/config
debug1: /Users/dlee/.ssh/config line 1: Applying options for *
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 20: Applying options for *
debug1: /etc/ssh/ssh_config line 102: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to X.X.X.X [X.X.X.X] port 22.
# wait about 75 seconds
debug1: connect to address X.X.X.X port 22: Operation timed out
ssh: connect to host X.X.X.X port 22: Operation timed out Full playbook output is on pastebin $ ansible-playbook --limit X.X.X.X -vvvv -- deploy.yml
# snip
14 1447106272.11609: executing the command /bin/sh -c 'sudo -H -n -S -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-snhljbvnujrniptfyjnawvhxroahegzd; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python'"'"'' through the connection
<X.X.X.X> ESTABLISH SSH CONNECTION FOR USER: ubuntu
<X.X.X.X> SSH: ansible.cfg set ssh_args: (-o)(ControlMaster=auto)(-o)(ControlPersist=60s)
<X.X.X.X> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no)
<X.X.X.X> SSH: ansible_password/ansible_ssh_pass not set: (-o)(KbdInteractiveAuthentication=no)(-o)(PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey)(-o)(PasswordAuthentication=no)
<X.X.X.X> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User=ubuntu)
<X.X.X.X> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=10)
<X.X.X.X> SSH: PlayContext set ssh_common_args: ()
<X.X.X.X> SSH: PlayContext set ssh_extra_args: ()
<X.X.X.X> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath=/root/.ansible/cp/%h-%r)
<X.X.X.X> SSH: EXEC ssh -C -vvv -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=ubuntu -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/%h-%r X.X.X.X /bin/sh -c 'sudo -H -n -S -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-snhljbvnujrniptfyjnawvhxroahegzd; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python'"'"''
14 1447106272.12518: Initial state: awaiting_escalation: BECOME-SUCCESS-snhljbvnujrniptfyjnawvhxroahegzd
14 1447106272.12924: stderr chunk (state=1):
>>>OpenSSH_6.7p1 Debian-5, OpenSSL 1.0.1k 8 Jan 2015
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
<<<
14 1447106272.12966: stderr chunk (state=1):
>>>debug1: auto-mux: Trying existing master
debug1: Control socket "/root/.ansible/cp/X.X.X.X-ubuntu" does not exist
debug2: ssh_connect: needpriv 0
debug1: Connecting to X.X.X.X [X.X.X.X] port 22.
debug2: fd 3 setting O_NONBLOCK
<<<
# about a 10 second delay here
14 1447106282.14050: done running TaskExecutor() for X.X.X.X/TASK: setup
14 1447106282.14081: sending task result
14 1447106282.14197: done sending task result
39 1447106282.15155: worker 1 has data to read
39 1447106282.15443: got a result from worker 1: <ansible.executor.task_result.TaskResult object at 0x7fd3d110aa50>
39 1447106282.15452: sending result: [u'host_task_failed', u'<ansible.executor.task_result.TaskResult object at 0x7fd3d110aa50>']
39 1447106282.15563: done sending result
1 1447106282.16237: got result from result worker: [u'host_task_failed', u'<ansible.executor.task_result.TaskResult object at 0x7fd3d1021e90>']
1 1447106282.16248: marking X.X.X.X as failed
fatal: [X.X.X.X]: FAILED! => {"failed": true, "msg": "ERROR! Timeout (10s) waiting for privilege escalation prompt: "}
#snip
PLAY RECAP *********************************************************************
X.X.X.X : ok=0 changed=0 unreachable=0 failed=1 |
OK, so it actually is a timeout waiting for the escalation prompt then. :-) I don't think we can do better here unless we wait for the ssh process to timeout and die, and I don't think we want to do that. |
While you are technically correct, the error message is pretty unhelpful in terms of trying to debug what's happening when things go wrong. You could set It would make more sense to me if the privilege escalation prompt timer started once SSH was connected. If not that, it should at least be set to a value greater than the SSH connect timeout. Thoughts? |
Huh. Looks like the |
@amenonsen I've submitted a PR which set the escalation timeout to double that of the connection timeout. It's a bit of a hack, but it should work pretty well. Here's the behavior with my patch: TASK [setup] *******************************************************************
# wait about 10 seconds
fatal: [X.X.X.X]: UNREACHABLE! => {"changed": false, "msg": "ERROR! SSH Error: data could not be sent to the remote host. Make sure this host can be reached over ssh", "unreachable": true} |
It was set to match the SSH connect timeout. Unfortunately, they would race when ssh fails to connect, and the connect timeout usually failed. This led to some misleading error messages. Fixes ansible#12916
@leedm777 any opinion on @amenonsen's PR above? To me, that does seem to be correct and a more simple fix. |
@jimi-c His patch didn't work for me. My Python skills aren't strong enough to know why it didn't work, though. |
Just a note for the record: both patches were needed (and have now been merged). |
Version:
ansible-playbook 2.0.0 (devel 8f77dd1) last updated 2015/10/26 11:46:26 (GMT -400)
lib/ansible/modules/core: (detached HEAD 06f301b) last updated 2015/10/26 11:46:28 (GMT -400)
lib/ansible/modules/extras: (detached HEAD 405c3cb) last updated 2015/10/26 11:46:30 (GMT -400)
Issue:
When the server is not online, the error message is confusing.
What happens now:
Get the error message:
What should happen:
The text was updated successfully, but these errors were encountered: