Ssh retry #2359

dparalen · 2013-03-11T11:55:40Z

Hi,
In case the ssh port is already reachable --- e.g. via the wait_for construct --- but user's public key not yet being in place, it makes sense to give the ssh connection another chance. This is the case especially in EC2 while creating a new instance. Here, a booted instance undergoes some cloud-init tuning even though the sshd service is already running. Therefore, I'd like to propose following patch.

Thanks!

mpdehaan · 2013-03-11T14:30:55Z

I am a bit concerned that the "while" and the "else" clause seem to have different exception handling in this, but understand what you are trying to work around.

That all being said the idea of inserting retries when we don't need retries just makes talking to down hosts slower.

@lwade -- any thoughts as to why this may happen and if just using the ec2 module would avoid it?

dparalen · 2013-03-11T14:38:18Z

I am a bit concerned that the "while" and the "else" clause seem to have different exception handling in this, but understand what you are trying to work around.

I wanted to offload the user suggestions to the 'else:' branch and somehow allow it to behave just like the original code if user didn't want retrying the connection.

I think the issue could be observed wherever the cloud-init package is deployed, not just in ec2. But I do work with ec2 only atm...

lwade · 2013-03-11T16:12:55Z

This isn't limited to EC2 but exagerated by it I guess, the same situation could occur if talking to cobbler to build some systems and then configure them afterwards?

This relates directly to the "Running doesn't mean running" stuff we discussed before. Perhaps just use a wait_for and monitor the SSH port of the instance (although this would need to be parallel)? Or use a generic wait period between ec2 launch and the configuration tasks.

wait_for module seems best, it's not very optimal but once the task has completely finished (i.e. iterated over all systems), then at least you know the instances are all ready.

I would suggest using a slim image and replacing cloud-init with ansible tasks entirely to mitigate against this ;) :D

Ref: cloud init example: https://help.ubuntu.com/community/CloudInit

dparalen · 2013-03-11T16:32:52Z

This isn't limited to EC2 but exagerated by it I guess, the same situation could occur if talking to cobbler to build some systems and then configure them afterwards?

100% agree

This relates directly to the "Running doesn't mean running" stuff we discussed before. Perhaps just use a wait_for and monitor the SSH port of the instance (although this would need to be parallel)? Or use a generic wait period between ec2 launch and the configuration tasks.

wait_for module seems best, it's not very optimal but once the task has completely finished (i.e. iterated over all systems), then at least you know the instances are all ready.

well, the wait_for just asserts sshd is accepting connections. But the configuration might not be there yet. For example, public key might not yet be in place and one would get few authentication errors before being able to log in. So the user might require to retry the connection. Maybe not by default, though...

I would suggest using a slim image and replacing cloud-init with ansible tasks entirely to mitigate against this ;) :D

would be great, but is hardly the choice if one is obliged to run amis that are "provided as is" e.g. RHEL amis ;)

lwade · 2013-03-11T16:55:59Z

On 11 Mar 2013 16:33, "milan" notifications@github.com wrote:

This isn't limited to EC2 but exagerated by it I guess, the same
situation could occur if talking to cobbler to build some systems and then
configure them afterwards?

100% agree

This relates directly to the "Running doesn't mean running" stuff we
discussed before. Perhaps just use a wait_for and monitor the SSH port of
the instance (although this would need to be parallel)? Or use a generic
wait period between ec2 launch and the configuration tasks.

wait_for module seems best, it's not very optimal but once the task has
completely finished (i.e. iterated over all systems), then at least you
know the instances are all ready.

well, the wait_for just asserts sshd is accepting connections. But the
configuration might not be there yet. For example, public key might not yet
be in place and one would get few authentication errors before being able
to log in. So the user might require to retry the connection. Maybe not by
default, though...

Absolutely but if this is the case I think it would be an issue with cloud
init. The last thing cloud init should be doing is curl'ing for the
public key and then starting sshd. The daemon should be stopped by
default and only started when the system is ready for user interaction.

I'm not very familiar with cloud-init, I'd need to look into exactly what
it does in this regard.

I would suggest using a slim image and replacing cloud-init with ansible
tasks entirely to mitigate against this ;) :D

would be great, but is hardly the choice if one is obliged to run amis
that are "provided as is" e.g. RHEL amis ;)

:( I guess this is am account restriction? Build your own RHEL ami?

I'm trying to read @mdeehan 's mind a bit here but I think what he is
suggesting is that your patch could be useful but then if we are having to
wait on a system like this, are we not looking in the wrong place to fix
it?

—
Reply to this email directly or view it on GitHub.

dparalen · 2013-03-11T17:15:51Z

I understand mpdehaan's concern that detecting dead nodes will take longer this way. But still, given the parameters of the connect call, paramiko does some retries already implicitly. If only there was a switch to set a retry policy/connection settings based upon user's wishes... I wish somebody else could find a reasonable use case for the retry, too ;)

mpdehaan · 2013-03-11T22:22:20Z

It seems the wait_for module should just take a post delay in this case, or use of the 'sleep' module would be appropriate.

If you are using cobbler, you'd just set up your key in the kickstart, there is no "ssh port but no key" issue, which seems to be unique to EC2 key injection and I'd consider it a buglet :)

dparalen · 2013-03-12T09:29:11Z

Well, I'd go with the buglet if only the connect didn't do a retry implicitly in paramiko in current implementation in the first place ;) If that is because of user convenience, why not extend that convenience to an explicit retry ;) Moreover, the retry logic could prove useful in any deployment option that relies on dynamic user configuration and has just booted with sshd running (wait_for host's port 22 gives one a green light)...

Other options for me to implement in the order of my preference:

a merciful paramiko connection module
a wait_ssh module
a sleep module

What would you prefer/recommend?

Thanks!

mpdehaan · 2013-03-17T14:27:07Z

it does not do a retry at all right now, confused as to where your discussion of the implicit retry is coming from.

There already is 'wait_for' which can do what you want in the SSH case, is there a reason wait_for does not work for you? You can also pause arbitrarily as it stands, should you want to, and there are also delays that can be used as part of the wait_for.

Hopefully one of those work?

I'm closing this one for now given the above exist but let me know if you have other thoughts.

dparalen added 2 commits March 11, 2013 11:33

retry logic for paramiko_ssh introduced

af27d7e

long line broken

7ff7864

mpdehaan closed this Mar 17, 2013

robinro pushed a commit to robinro/ansible that referenced this pull request Dec 9, 2016

Fix typo in documentation (ansible#2359)

25e2a93

ansible locked and limited conversation to collaborators Apr 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ssh retry #2359

Ssh retry #2359

dparalen commented Mar 11, 2013

mpdehaan commented Mar 11, 2013

dparalen commented Mar 11, 2013

lwade commented Mar 11, 2013

dparalen commented Mar 11, 2013

lwade commented Mar 11, 2013

dparalen commented Mar 11, 2013

mpdehaan commented Mar 11, 2013

dparalen commented Mar 12, 2013

mpdehaan commented Mar 17, 2013

Ssh retry #2359

Ssh retry #2359

Conversation

dparalen commented Mar 11, 2013

mpdehaan commented Mar 11, 2013

dparalen commented Mar 11, 2013

lwade commented Mar 11, 2013

dparalen commented Mar 11, 2013

lwade commented Mar 11, 2013

dparalen commented Mar 11, 2013

mpdehaan commented Mar 11, 2013

dparalen commented Mar 12, 2013

mpdehaan commented Mar 17, 2013