Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

knife ec2 server create attempting to connect to SSH too early, intermittently failing to bootstrap new nodes #659

Closed
tekbot opened this issue Sep 28, 2020 · 2 comments
Assignees
Labels
Status: Untriaged An issue that has yet to be triaged. Type: Bug Does not work as expected.

Comments

@tekbot
Copy link

tekbot commented Sep 28, 2020

Version:

aws-sdk-ec2 (1.195.0)
knife-ec2 (2.0.4)

Environment:

Ubuntu 18.04
Chef Infra Client: 16.5.64

Scenario:

When creating an ec2 instance using knife ec2 server create, about 50% of the time the bootstrap process fails to connect to SSH even though the readiness check succeeds. The failure has been pasted below. If I attempt to SSH into this host immediately after the failure occurs, I am successful. It seems like adding a very short wait time option may well solve this. (eg --bootstrap-delay). It's also possible that the readiness check itself has a bug and is returning true even though the SSH daemon isn't ready to accept connections. Another possible solution might be to add a configurable number of retries before attempting the bootstrap.

SSH Target Address: (public_dns_name)
done

SSH Target Address: (public_dns_name)
Connecting to 172.16.10.172 using ssh
WARN: [SSH] connection failed, terminating (#<Errno::ECONNREFUSED: Connection refused - connect(2) for 172.16.10.172:22>)
ERROR: Train::Transports::SSHFailed: SSH session could not be established

Steps to Reproduce:

The args I pass are below. The consts are initialized based on input given to the script at runtime. I should note that the instance itself is always created as expected, the failure always occurs once the instance is online and the bootstrap process is set to begin. The instance types I've been standing up most frequently are m5.large, if that helps!

knife ec2 server create --image $AMI -f $INSTSIZE -g $SECGROUP --subnet $SUBNET -Z $ZONE --region $REGION -S $AWSKEY -N $HNAME --tags Name=$HNAME,Rack=$NODEENV -U ubuntu --sudo -i $SSHKEY -E $NODEENV -r $ROLES $EXTRAARGS -a private_ip_address

Expected Result:

The the instance is created and that the bootstrap process runs.

Actual Result:

About half the time, the initial SSH connection fails and I either need to stand up a new instance or manually bootstrap the new node, which is mostly manual and a bit arduous.

@tekbot tekbot added Status: Untriaged An issue that has yet to be triaged. Type: Bug Does not work as expected. labels Sep 28, 2020
@kapilchouhan99 kapilchouhan99 self-assigned this Oct 28, 2020
@kapilchouhan99
Copy link

Hi @tekbot I have verified, It's working fine on the latest version of knife ec2.
Could you please check it on the latest version of the knife-ec2?

@dheerajd-msys
Copy link
Contributor

Closing this issue for now considering the above comment ^^. Please feel free to re-open if you face the same issue with the latest knife-ec2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Untriaged An issue that has yet to be triaged. Type: Bug Does not work as expected.
Projects
None yet
Development

No branches or pull requests

3 participants