Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ssh retry #2359

Closed
wants to merge 2 commits into from
Closed

Ssh retry #2359

wants to merge 2 commits into from

Conversation

dparalen
Copy link
Contributor

Hi,
In case the ssh port is already reachable --- e.g. via the wait_for construct --- but user's public key not yet being in place, it makes sense to give the ssh connection another chance. This is the case especially in EC2 while creating a new instance. Here, a booted instance undergoes some cloud-init tuning even though the sshd service is already running. Therefore, I'd like to propose following patch.

Thanks!

@mpdehaan
Copy link
Contributor

I am a bit concerned that the "while" and the "else" clause seem to have different exception handling in this, but understand what you are trying to work around.

That all being said the idea of inserting retries when we don't need retries just makes talking to down hosts slower.

@lwade -- any thoughts as to why this may happen and if just using the ec2 module would avoid it?

@dparalen
Copy link
Contributor Author

I am a bit concerned that the "while" and the "else" clause seem to have different exception handling in this, but understand what you are trying to work around.

I wanted to offload the user suggestions to the 'else:' branch and somehow allow it to behave just like the original code if user didn't want retrying the connection.

I think the issue could be observed wherever the cloud-init package is deployed, not just in ec2. But I do work with ec2 only atm...

@lwade
Copy link
Contributor

lwade commented Mar 11, 2013

This isn't limited to EC2 but exagerated by it I guess, the same situation could occur if talking to cobbler to build some systems and then configure them afterwards?

This relates directly to the "Running doesn't mean running" stuff we discussed before. Perhaps just use a wait_for and monitor the SSH port of the instance (although this would need to be parallel)? Or use a generic wait period between ec2 launch and the configuration tasks.

wait_for module seems best, it's not very optimal but once the task has completely finished (i.e. iterated over all systems), then at least you know the instances are all ready.

I would suggest using a slim image and replacing cloud-init with ansible tasks entirely to mitigate against this ;) :D

Ref: cloud init example: https://help.ubuntu.com/community/CloudInit

@dparalen
Copy link
Contributor Author

This isn't limited to EC2 but exagerated by it I guess, the same situation could occur if talking to cobbler to build some systems and then configure them afterwards?

100% agree

This relates directly to the "Running doesn't mean running" stuff we discussed before. Perhaps just use a wait_for and monitor the SSH port of the instance (although this would need to be parallel)? Or use a generic wait period between ec2 launch and the configuration tasks.

wait_for module seems best, it's not very optimal but once the task has completely finished (i.e. iterated over all systems), then at least you know the instances are all ready.

well, the wait_for just asserts sshd is accepting connections. But the configuration might not be there yet. For example, public key might not yet be in place and one would get few authentication errors before being able to log in. So the user might require to retry the connection. Maybe not by default, though...

I would suggest using a slim image and replacing cloud-init with ansible tasks entirely to mitigate against this ;) :D

would be great, but is hardly the choice if one is obliged to run amis that are "provided as is" e.g. RHEL amis ;)

@lwade
Copy link
Contributor

lwade commented Mar 11, 2013

On 11 Mar 2013 16:33, "milan" notifications@github.com wrote:

This isn't limited to EC2 but exagerated by it I guess, the same
situation could occur if talking to cobbler to build some systems and then
configure them afterwards?

100% agree

This relates directly to the "Running doesn't mean running" stuff we
discussed before. Perhaps just use a wait_for and monitor the SSH port of
the instance (although this would need to be parallel)? Or use a generic
wait period between ec2 launch and the configuration tasks.

wait_for module seems best, it's not very optimal but once the task has
completely finished (i.e. iterated over all systems), then at least you
know the instances are all ready.

well, the wait_for just asserts sshd is accepting connections. But the
configuration might not be there yet. For example, public key might not yet
be in place and one would get few authentication errors before being able
to log in. So the user might require to retry the connection. Maybe not by
default, though...

Absolutely but if this is the case I think it would be an issue with cloud
init. The last thing cloud init should be doing is curl'ing for the
public key and then starting sshd. The daemon should be stopped by
default and only started when the system is ready for user interaction.

I'm not very familiar with cloud-init, I'd need to look into exactly what
it does in this regard.

I would suggest using a slim image and replacing cloud-init with ansible
tasks entirely to mitigate against this ;) :D

would be great, but is hardly the choice if one is obliged to run amis
that are "provided as is" e.g. RHEL amis ;)

:( I guess this is am account restriction? Build your own RHEL ami?

I'm trying to read @mdeehan 's mind a bit here but I think what he is
suggesting is that your patch could be useful but then if we are having to
wait on a system like this, are we not looking in the wrong place to fix
it?


Reply to this email directly or view it on GitHub.

@dparalen
Copy link
Contributor Author

I understand mpdehaan's concern that detecting dead nodes will take longer this way. But still, given the parameters of the connect call, paramiko does some retries already implicitly. If only there was a switch to set a retry policy/connection settings based upon user's wishes... I wish somebody else could find a reasonable use case for the retry, too ;)

@mpdehaan
Copy link
Contributor

It seems the wait_for module should just take a post delay in this case, or use of the 'sleep' module would be appropriate.

If you are using cobbler, you'd just set up your key in the kickstart, there is no "ssh port but no key" issue, which seems to be unique to EC2 key injection and I'd consider it a buglet :)

@dparalen
Copy link
Contributor Author

Well, I'd go with the buglet if only the connect didn't do a retry implicitly in paramiko in current implementation in the first place ;) If that is because of user convenience, why not extend that convenience to an explicit retry ;) Moreover, the retry logic could prove useful in any deployment option that relies on dynamic user configuration and has just booted with sshd running (wait_for host's port 22 gives one a green light)...

Other options for me to implement in the order of my preference:

  • a merciful paramiko connection module
  • a wait_ssh module
  • a sleep module

What would you prefer/recommend?

Thanks!

@mpdehaan
Copy link
Contributor

it does not do a retry at all right now, confused as to where your discussion of the implicit retry is coming from.

There already is 'wait_for' which can do what you want in the SSH case, is there a reason wait_for does not work for you? You can also pause arbitrarily as it stands, should you want to, and there are also delays that can be used as part of the wait_for.

Hopefully one of those work?

I'm closing this one for now given the above exist but let me know if you have other thoughts.

@mpdehaan mpdehaan closed this Mar 17, 2013
robinro pushed a commit to robinro/ansible that referenced this pull request Dec 9, 2016
@ansible ansible locked and limited conversation to collaborators Apr 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants