-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-ha-release - servers are considered healthy before they actually are #32
Comments
I've noticed this too. elb-describe-instance-health returns the status of the Ec2 instance at first instead of the ELB status for that instance. |
@dfevre It'd be great if you could add your comments to an Amazon forum post I created here: https://forums.aws.amazon.com/thread.jspa?threadID=125859&tstart=50 In the meantime I'm working on a fix that basically requires an instance to be InService for some amount of time before considering it healthy. I started working on it in bash but then found it too annoying to implement, so I'm rewriting the whole script in Ruby and making aws-missing-tools a gem. Still in progress, but I'm close to done: https://github.com/Lytro/aws-missing-tools/tree/aws_ha_release_ruby |
Done. I'm running Windows instances so I'm thinking of testing with Powershell. I'd say this is a problem with the API though so it's probably not worth rewriting anything. |
@anujbiyani - just looked at https://github.com/Lytro/aws-missing-tools/tree/aws_ha_release_ruby - that is awesome. |
@dfevre Thanks for commenting; hopefully they'll give it more attention, now. Normally I wouldn't write code to fix a bug in a third party's API, but in this case manually cycling servers is annoying my team so I figured I'd work on a solution anyway just in case either 1) the API problem is actually something I screwed up, or 2) Amazon doesn't fix it. Plus I've been wanting to rewrite some of this stuff in Ruby ever since I had to brush up on my Bash when I first worked on aws-ha-release. @colinbjohnson Thanks! Hopefully I'll have it done within a day or two and have a pull request open. |
I addressed this issue in #34 |
I just reproduced this problem with Powershell. Definitely an API issue. I had a thought that might fix it. An ELB requires multiple health checks to pass before it considers an instance to be in service. I think release-ha should do the same. After seeing the instance as in service, it should do another poll 10 seconds later. it should require 2 consecutive healthy responses before terminating the other instance. I might have a look at implementing this soon. |
@dfevre I've implemented pretty much what you suggested in a Ruby version of aws-ha-release in the pull request linked just above your post. Basically I require instances to be InService for some time period before terminating an old instance. https://github.com/colinbjohnson/aws-missing-tools/pull/34/files#L3R25 https://github.com/colinbjohnson/aws-missing-tools/pull/34/files#L9R101 def all_instances_inservice_for_time_period?(load_balancers, change_in_time)
if all_instances_inservice?(load_balancers)
if @time_spent_inservice >= @opts[:min_inservice_time]
return true
else
puts "\nAll instances have been InService for #{@time_spent_inservice} seconds."
@time_spent_inservice += change_in_time
return false
end
else
@time_spent_inservice = 0
return false
end
end |
I just tried to reproduce this problem with the bash script numerous times on our test and production environment and it seems that AWS has fixed the bug. The bash script now works well. |
I'm still able to reproduce the AWS bug D: . @dfevre have you changed any settings on your end that might have helped, or did things just randomly start working? |
This is actually a bug with AWS but I wanted to open an issue here so that anyone using aws-ha-release.sh might see this.
It seems that when an instance is being spun up, the ELB health checks first fail, then pass (false positive; they can't possible pass because Passenger (in my case) is still starting up), then fail, and then pass (true positive; the web requests are being processed).
aws-ha-release sees the first pass and thinks the instance is healthy and so moves on before the instance is actually healthy.
I'm checking to see if there's already a bug post with AWS; if not, I'll make one.
We could add some sort of tolerance to aws-ha-release where it requires an instance to be in service for some amount of time before moving forward.
The text was updated successfully, but these errors were encountered: