Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vagrant up infinitely hangs due to retrying after permanent ssh errors #273

Closed
gsauthof opened this issue Jan 28, 2018 · 5 comments
Closed

Comments

@gsauthof
Copy link

How to reproduce - case A:

  1. set override.ssh.private_key_path to an ed25519 ssh key
  2. make sure that your net-ssh rubygem throws on ed25519 keys (e.g. use the vagrant 1.9.1/vagrant-digitalocean 0.9.3 packaged with Fedora 26)
  3. vagrant up --provider=digital_ocean

Actual result: the command hangs after it prints ==> dropletname: Assigned IP address: xxx.xxx.xx.xxx

Expected result: Command exits with exit status != 0 after printing a clear error message regarding ssh key type incompatibility. Optimally, the plugin would check net-ssh ed25519 support before uploading the ed25519 key to DO (using the Vagrant key name) and proceed with uploading if and only if net-ssh has support.

How to reproduce - case B:

  1. configure a ssh key in the DO settings with name 'Vagrant'
  2. don't change the default provider.ssh_key_name in the Vagrantfile
  3. make sure that override.ssh.private_key_path to yet another ssh key (i.e. one with a different fingerprint)
  4. vagrant up --provider=digital_ocean

Actual result: the command hangs after it prints ==> dropletname: Assigned IP address: xxx.xxx.xx.xxx

Expected result: Command exits with exit status != 0 after printing an error message indicating that public key authentication has failed. Ideally, the plugin would check if the ssh key name slot is already occupied in the DO settings. If it is, it would compare the key fingerprints and error out if the already uploaded public key doesn't match the one specified in the Vagrantfile (i.e. in override.ssh.private_key_path + '.pub').

Additional information:

Turning on debug mode (VAGRANT_LOG=debug vagrant up ...) reveals the root causes:

D, [2018-01-28T12:17:27.168873 #20828] DEBUG -- net.ssh.authentication.session[abcdef]: 
trying publickey

DEBUG ssh: == Net-SSH connection debug-level log END ==
 INFO ssh: SSH not up: #<Vagrant::Errors::SSHKeyTypeNotSupported:
The private key you're 
attempting to use with this Vagrant box uses
an unsupported encryption type. The SSH library Vagrant uses does not support
this key type. Please use `ssh-rsa` or `ssh-dss` instead. Note that
sometimes keys in your ssh-agent can interfere with this as well,
so verify the keys are valid there in addition to standard
file paths.>
 INFO retryable: Retryable exception raised: #<RuntimeError: not ready>

And for case B:

E, [2018-01-28T12:38:19.554251 #21088] ERROR -- net.ssh.authentication.session[abcdef]:
all authorization methods failed (tried none, publickey)

DEBUG ssh: == Net-SSH connection debug-level log END ==
 INFO ssh: SSH not up: #<Vagrant::Errors::SSHAuthenticationFailed:
    SSH authentication failed!
This is typically caused by the public/private
keypair for the SSH user not being properly set on the guest VM. Please
verify that the guest VM is setup with the proper public key, and that
the private key path for Vagrant is setup properly as well.>
 INFO retryable: Retryable exception raised: #<RuntimeError: not ready>

Sure, it would be great if those errors wouldn't be classified as 'Retryable exception' and if those messages would have the proper severity such they are always displayed without having to increase the verbosity to DEBUG.

There seem to be several bug reports about hung commands and unclear ssh failures related to the above root causes - often collecting several variations of the original issue. Thus, fixing the above should reduce the number of such reports, in the future. Examples:

@JonasVerhofste
Copy link

It seems the underlying problem lies with ssh-agent, on my machine anyway. Maybe most users of the issues mentioned above don't have it running (you can check this with pgrep ssh-agent, which has to return an id).

To start it, you have to run eval $(ssh-agent) and then add your key with ssh-add ~/.ssh/id_rsa (you have to change the path, of course).

You can automate this process on login in various ways, a couple are explained on the Arch Wiki, for example.

Can someone else with this issue confirm that starting ssh-agent correctly solves (some of) these problems?

@gsauthof
Copy link
Author

@JonasVerhofste - having problems with the ssh-agent is perhaps just another example for an permanent ssh error condition where vagrant up hangs because it indefinitely retries instead of properly displaying the error details and exiting with exit status unequal zero.

In the 2 cases I described in my original report, ssh-agent didn't play any role, at all. In fact, the involved ssh private keys weren't password protected.

At that time, I worked around A) by using a non-ed25519 key and I worked around B) by manually deleting the vagrant key name slot in the Digital Ocean Web GUI.

@JonasVerhofste
Copy link

JonasVerhofste commented Nov 10, 2018

In fact, the involved ssh private keys weren't password protected.

Mine aren't either. Even better: my key was always RSA, and the problem above still occurred. I noticed something was off when I tried to manually ssh into the server with ssh -i ~/.ssh/id_rsa 192.168.0.1 (with a correct IP-address of course), and my terminal said the following:

The authenticity of host '192.168.0.1 (192.168.0.1)' can't be established.
ECDSA key fingerprint is SHA256:somehash.

This notice is pretty normal the first time you authenticate, but it shouldn't say "ECDSA" for an RSA-key, I reckon. Starting ssh-agent and adding my (unencrypted) key solved it. I also think that running multiple instances of ssh-agent can also be part of the problem, as that was a problem a friend of mine had. Killing all instances and properly starting one solved the problem for him as well.

@gsauthof
Copy link
Author

gsauthof commented Feb 3, 2019

@JonasVerhofste The ECDSA message you are seeing is about the host key (of the server) and not about the RSA key you are using to authorize the client. Usually, servers have multiple host keys (i.e. rsa, ecdsa ...) and they basically use the one the client is asking for (where the ECDSA type is the default for recent and not so recent openssh/sshd versions).

You get that message if the fingerprint of that host key isn't in your known hosts file, yet. The default of the ssh client is then to ask to add it - but only if it runs interactively. Thus, it would be plausible if it fails repeatedly with that message if the client runs non-interactively.

@seth-reeser
Copy link
Member

Hello, please provide an update to either your resolution or if this continues to be a problem and we'll do our best to help in a timely manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants