Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong error message when unable to connect via ssh #12916

Closed
evanccnyc opened this issue Oct 26, 2015 · 15 comments · Fixed by #13147
Closed

Wrong error message when unable to connect via ssh #12916

evanccnyc opened this issue Oct 26, 2015 · 15 comments · Fixed by #13147
Labels
bug This issue/PR relates to a bug.
Milestone

Comments

@evanccnyc
Copy link
Contributor

Version:

ansible-playbook 2.0.0 (devel 8f77dd1) last updated 2015/10/26 11:46:26 (GMT -400)
lib/ansible/modules/core: (detached HEAD 06f301b) last updated 2015/10/26 11:46:28 (GMT -400)
lib/ansible/modules/extras: (detached HEAD 405c3cb) last updated 2015/10/26 11:46:30 (GMT -400)

Issue:

When the server is not online, the error message is confusing.

What happens now:

Get the error message:

fatal: [10.10.67.157]: FAILED! => {"failed": true, "msg": "ERROR! Timeout (10s) waiting for privilege escalation prompt: "}

What should happen:

fatal: [10.10.67.157]: FAILED! => {"failed": true, "msg": "ERROR! Unable to connect to server "}
@jimi-c jimi-c added this to the v2 milestone Oct 27, 2015
@jimi-c
Copy link
Member

jimi-c commented Oct 27, 2015

Interesting, with both the adhoc and playbook commands, I get the following for a VM that's offline:

    "msg": "ERROR! SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue", 

Have you changed any settings in your ansible.cfg which might affect the SSH connection params?

@evanccnyc
Copy link
Contributor Author

This is my ansible.cfg settings:

[defaults]
host_key_checking = False
jinja2_extensions = jinja2.ext.do
hash_behaviour=merge
[ssh_connection]
pipelining = True
control_path = /tmp/ansible-ssh-%%h-%%p-%%r

@amenonsen
Copy link
Contributor

@evanccnyc I haven't been able to reproduce this problem either (I ran an ansible command with -sKU to enable sudo, but the failure came on the first connection where it just creates the ansible directory without sudo), but I suspect the fix is along the lines of the patch at https://gist.github.com/amenonsen/a8131bf563524f694186

Could you please try it and see?

@evanccnyc
Copy link
Contributor Author

I did a clean reinstall recently rm -Rf and then git clone and it seems to have gone away. Ill close and reopen if I see it again.

@leedm777
Copy link
Contributor

leedm777 commented Nov 5, 2015

@amenonsen I've seen this error with the beta (v2.0.0-0.4.beta2) and the latest (c64ac90). I get the error even with the patch.

Can this issue be reopened, or should I open a new issue?

My ansible.cfg is:

[defaults]
retry_files_save_path = $HOME/.ansible-retries
inventory = ./inventory/
vault_password_file = $HOME/.ansible_vault
remote_user = ubuntu
forks = 25
host_key_checking = False
gathering = smart

[privilege_escalation]
become = True

[ssh_connection]
control_path = %(directory)s/%%h-%%r
pipelining = True

@evanccnyc evanccnyc reopened this Nov 5, 2015
@amenonsen
Copy link
Contributor

@leedm777 Could you perhaps send me output of a problematic run with -vvvv and ANSIBLE_DEBUG=1 please? I'd like to understand what's happening. Also please explain what "unable to connect" means in your specific case. If possible, run ssh -vvv (or better still, the exact command that ansible generates, which you'll see in -vvv output) on the host and send me that output too.

@leedm777
Copy link
Contributor

leedm777 commented Nov 9, 2015

I'm able to duplicate the issue by adding a bad route to the host, so that it's unreachable.

$ sudo route add -host X.X.X.X -iface lo0
$ ssh -vvv X.X.X.X
OpenSSH_6.9p1, LibreSSL 2.1.7
debug1: Reading configuration data /Users/dlee/.ssh/config
debug1: /Users/dlee/.ssh/config line 1: Applying options for *
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 20: Applying options for *
debug1: /etc/ssh/ssh_config line 102: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to X.X.X.X [X.X.X.X] port 22.
# wait about 75 seconds
debug1: connect to address X.X.X.X port 22: Operation timed out
ssh: connect to host X.X.X.X port 22: Operation timed out

Full playbook output is on pastebin

$ ansible-playbook --limit X.X.X.X -vvvv -- deploy.yml
# snip
    14 1447106272.11609: executing the command /bin/sh -c 'sudo -H -n -S -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-snhljbvnujrniptfyjnawvhxroahegzd; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python'"'"'' through the connection
<X.X.X.X> ESTABLISH SSH CONNECTION FOR USER: ubuntu
<X.X.X.X> SSH: ansible.cfg set ssh_args: (-o)(ControlMaster=auto)(-o)(ControlPersist=60s)
<X.X.X.X> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no)
<X.X.X.X> SSH: ansible_password/ansible_ssh_pass not set: (-o)(KbdInteractiveAuthentication=no)(-o)(PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey)(-o)(PasswordAuthentication=no)
<X.X.X.X> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User=ubuntu)
<X.X.X.X> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=10)
<X.X.X.X> SSH: PlayContext set ssh_common_args: ()
<X.X.X.X> SSH: PlayContext set ssh_extra_args: ()
<X.X.X.X> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath=/root/.ansible/cp/%h-%r)
<X.X.X.X> SSH: EXEC ssh -C -vvv -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=ubuntu -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/%h-%r X.X.X.X /bin/sh -c 'sudo -H -n -S -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-snhljbvnujrniptfyjnawvhxroahegzd; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python'"'"''
    14 1447106272.12518: Initial state: awaiting_escalation: BECOME-SUCCESS-snhljbvnujrniptfyjnawvhxroahegzd
    14 1447106272.12924: stderr chunk (state=1):
>>>OpenSSH_6.7p1 Debian-5, OpenSSL 1.0.1k 8 Jan 2015
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
<<<
    14 1447106272.12966: stderr chunk (state=1):
>>>debug1: auto-mux: Trying existing master
debug1: Control socket "/root/.ansible/cp/X.X.X.X-ubuntu" does not exist
debug2: ssh_connect: needpriv 0
debug1: Connecting to X.X.X.X [X.X.X.X] port 22.
debug2: fd 3 setting O_NONBLOCK
<<<
# about a 10 second delay here
    14 1447106282.14050: done running TaskExecutor() for X.X.X.X/TASK: setup 
    14 1447106282.14081: sending task result
    14 1447106282.14197: done sending task result
    39 1447106282.15155: worker 1 has data to read
    39 1447106282.15443: got a result from worker 1: <ansible.executor.task_result.TaskResult object at 0x7fd3d110aa50>
    39 1447106282.15452: sending result: [u'host_task_failed', u'<ansible.executor.task_result.TaskResult object at 0x7fd3d110aa50>']
    39 1447106282.15563: done sending result
     1 1447106282.16237: got result from result worker: [u'host_task_failed', u'<ansible.executor.task_result.TaskResult object at 0x7fd3d1021e90>']
     1 1447106282.16248: marking X.X.X.X as failed
fatal: [X.X.X.X]: FAILED! => {"failed": true, "msg": "ERROR! Timeout (10s) waiting for privilege escalation prompt: "}

#snip
PLAY RECAP *********************************************************************
X.X.X.X              : ok=0    changed=0    unreachable=0    failed=1   

@amenonsen
Copy link
Contributor

OK, so it actually is a timeout waiting for the escalation prompt then. :-) I don't think we can do better here unless we wait for the ssh process to timeout and die, and I don't think we want to do that.

@leedm777
Copy link
Contributor

While you are technically correct, the error message is pretty unhelpful in terms of trying to debug what's happening when things go wrong.

You could set -o ConnectTimeout=X to reduce the SSH connection timeout.

It would make more sense to me if the privilege escalation prompt timer started once SSH was connected. If not that, it should at least be set to a value greater than the SSH connect timeout.

Thoughts?

@leedm777
Copy link
Contributor

Huh. Looks like the ConnectTimeout is already set to 10. I guess it's racing with the select timeout, and the select timeout is normally winning.

@leedm777
Copy link
Contributor

@amenonsen I've submitted a PR which set the escalation timeout to double that of the connection timeout. It's a bit of a hack, but it should work pretty well. Here's the behavior with my patch:

TASK [setup] *******************************************************************
# wait about 10 seconds
fatal: [X.X.X.X]: UNREACHABLE! => {"changed": false, "msg": "ERROR! SSH Error: data could not be sent to the remote host. Make sure this host can be reached over ssh", "unreachable": true}

leedm777 added a commit to leedm777/ansible that referenced this issue Nov 13, 2015
It was set to match the SSH connect timeout. Unfortunately, they would
race when ssh fails to connect, and the connect timeout usually failed.
This led to some misleading error messages.

Fixes ansible#12916
@jimi-c
Copy link
Member

jimi-c commented Nov 13, 2015

@leedm777 any opinion on @amenonsen's PR above? To me, that does seem to be correct and a more simple fix.

@leedm777
Copy link
Contributor

@jimi-c His patch didn't work for me. My Python skills aren't strong enough to know why it didn't work, though.

@amenonsen
Copy link
Contributor

Just a note for the record: both patches were needed (and have now been merged).

@ajarv
Copy link

ajarv commented May 31, 2016

@ansible ansible locked and limited conversation to collaborators Apr 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue/PR relates to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants