New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ssh retry #2359
Ssh retry #2359
Conversation
I am a bit concerned that the "while" and the "else" clause seem to have different exception handling in this, but understand what you are trying to work around. That all being said the idea of inserting retries when we don't need retries just makes talking to down hosts slower. @lwade -- any thoughts as to why this may happen and if just using the ec2 module would avoid it? |
I wanted to offload the user suggestions to the 'else:' branch and somehow allow it to behave just like the original code if user didn't want retrying the connection. I think the issue could be observed wherever the cloud-init package is deployed, not just in ec2. But I do work with ec2 only atm... |
This isn't limited to EC2 but exagerated by it I guess, the same situation could occur if talking to cobbler to build some systems and then configure them afterwards? This relates directly to the "Running doesn't mean running" stuff we discussed before. Perhaps just use a wait_for and monitor the SSH port of the instance (although this would need to be parallel)? Or use a generic wait period between ec2 launch and the configuration tasks. wait_for module seems best, it's not very optimal but once the task has completely finished (i.e. iterated over all systems), then at least you know the instances are all ready. I would suggest using a slim image and replacing cloud-init with ansible tasks entirely to mitigate against this ;) :D Ref: cloud init example: https://help.ubuntu.com/community/CloudInit |
100% agree
well, the wait_for just asserts sshd is accepting connections. But the configuration might not be there yet. For example, public key might not yet be in place and one would get few authentication errors before being able to log in. So the user might require to retry the connection. Maybe not by default, though...
would be great, but is hardly the choice if one is obliged to run amis that are "provided as is" e.g. RHEL amis ;) |
On 11 Mar 2013 16:33, "milan" notifications@github.com wrote:
Absolutely but if this is the case I think it would be an issue with cloud I'm not very familiar with cloud-init, I'd need to look into exactly what
:( I guess this is am account restriction? Build your own RHEL ami? I'm trying to read @mdeehan 's mind a bit here but I think what he is
|
I understand mpdehaan's concern that detecting dead nodes will take longer this way. But still, given the parameters of the connect call, paramiko does some retries already implicitly. If only there was a switch to set a retry policy/connection settings based upon user's wishes... I wish somebody else could find a reasonable use case for the retry, too ;) |
It seems the wait_for module should just take a post delay in this case, or use of the 'sleep' module would be appropriate. If you are using cobbler, you'd just set up your key in the kickstart, there is no "ssh port but no key" issue, which seems to be unique to EC2 key injection and I'd consider it a buglet :) |
Well, I'd go with the buglet if only the connect didn't do a retry implicitly in paramiko in current implementation in the first place ;) If that is because of user convenience, why not extend that convenience to an explicit retry ;) Moreover, the retry logic could prove useful in any deployment option that relies on dynamic user configuration and has just booted with sshd running (wait_for host's port 22 gives one a green light)... Other options for me to implement in the order of my preference:
What would you prefer/recommend? Thanks! |
it does not do a retry at all right now, confused as to where your discussion of the implicit retry is coming from. There already is 'wait_for' which can do what you want in the SSH case, is there a reason wait_for does not work for you? You can also pause arbitrarily as it stands, should you want to, and there are also delays that can be used as part of the wait_for. Hopefully one of those work? I'm closing this one for now given the above exist but let me know if you have other thoughts. |
Hi,
In case the ssh port is already reachable --- e.g. via the wait_for construct --- but user's public key not yet being in place, it makes sense to give the ssh connection another chance. This is the case especially in EC2 while creating a new instance. Here, a booted instance undergoes some cloud-init tuning even though the sshd service is already running. Therefore, I'd like to propose following patch.
Thanks!