aws-ha-release - servers are considered healthy before they actually are #32

anujbiyani · 2013-05-29T19:09:14Z

This is actually a bug with AWS but I wanted to open an issue here so that anyone using aws-ha-release.sh might see this.

It seems that when an instance is being spun up, the ELB health checks first fail, then pass (false positive; they can't possible pass because Passenger (in my case) is still starting up), then fail, and then pass (true positive; the web requests are being processed).

aws-ha-release sees the first pass and thinks the instance is healthy and so moves on before the instance is actually healthy.

I'm checking to see if there's already a bug post with AWS; if not, I'll make one.

We could add some sort of tolerance to aws-ha-release where it requires an instance to be in service for some amount of time before moving forward.

dfevre · 2013-06-02T23:50:09Z

I've noticed this too. elb-describe-instance-health returns the status of the Ec2 instance at first instead of the ELB status for that instance.

anujbiyani · 2013-06-03T00:16:34Z

@dfevre It'd be great if you could add your comments to an Amazon forum post I created here: https://forums.aws.amazon.com/thread.jspa?threadID=125859&tstart=50

In the meantime I'm working on a fix that basically requires an instance to be InService for some amount of time before considering it healthy. I started working on it in bash but then found it too annoying to implement, so I'm rewriting the whole script in Ruby and making aws-missing-tools a gem.

Still in progress, but I'm close to done: https://github.com/Lytro/aws-missing-tools/tree/aws_ha_release_ruby

dfevre · 2013-06-03T04:14:18Z

Done. I'm running Windows instances so I'm thinking of testing with Powershell. I'd say this is a problem with the API though so it's probably not worth rewriting anything.

colinbjohnson · 2013-06-03T05:14:19Z

@anujbiyani - just looked at https://github.com/Lytro/aws-missing-tools/tree/aws_ha_release_ruby - that is awesome.

anujbiyani · 2013-06-03T06:02:26Z

@dfevre Thanks for commenting; hopefully they'll give it more attention, now. Normally I wouldn't write code to fix a bug in a third party's API, but in this case manually cycling servers is annoying my team so I figured I'd work on a solution anyway just in case either 1) the API problem is actually something I screwed up, or 2) Amazon doesn't fix it. Plus I've been wanting to rewrite some of this stuff in Ruby ever since I had to brush up on my Bash when I first worked on aws-ha-release.

@colinbjohnson Thanks! Hopefully I'll have it done within a day or two and have a pull request open.

anujbiyani · 2013-06-05T02:50:56Z

I addressed this issue in #34

dfevre · 2013-06-05T13:12:40Z

I just reproduced this problem with Powershell. Definitely an API issue. I had a thought that might fix it. An ELB requires multiple health checks to pass before it considers an instance to be in service. I think release-ha should do the same. After seeing the instance as in service, it should do another poll 10 seconds later. it should require 2 consecutive healthy responses before terminating the other instance. I might have a look at implementing this soon.

anujbiyani · 2013-06-05T15:58:14Z

@dfevre I've implemented pretty much what you suggested in a Ruby version of aws-ha-release in the pull request linked just above your post. Basically I require instances to be InService for some time period before terminating an old instance.

https://github.com/colinbjohnson/aws-missing-tools/pull/34/files#L3R25
-m, --min-inservice-time TIME - Minimum time an instance must be in service before it is considered healthy (seconds). Defaults to 30

https://github.com/colinbjohnson/aws-missing-tools/pull/34/files#L9R101

def all_instances_inservice_for_time_period?(load_balancers, change_in_time)
  if all_instances_inservice?(load_balancers)
    if @time_spent_inservice >= @opts[:min_inservice_time]
      return true
    else
      puts "\nAll instances have been InService for #{@time_spent_inservice} seconds."

      @time_spent_inservice += change_in_time
      return false
    end
  else
    @time_spent_inservice = 0
    return false
  end
end

dfevre · 2013-07-03T01:57:25Z

I just tried to reproduce this problem with the bash script numerous times on our test and production environment and it seems that AWS has fixed the bug. The bash script now works well.

anujbiyani · 2013-07-03T05:43:03Z

I'm still able to reproduce the AWS bug D: . @dfevre have you changed any settings on your end that might have helped, or did things just randomly start working?

anujbiyani mentioned this issue Jun 5, 2013

Aws ha release ruby #34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws-ha-release - servers are considered healthy before they actually are #32

aws-ha-release - servers are considered healthy before they actually are #32

anujbiyani commented May 29, 2013

dfevre commented Jun 2, 2013

anujbiyani commented Jun 3, 2013

dfevre commented Jun 3, 2013

colinbjohnson commented Jun 3, 2013

anujbiyani commented Jun 3, 2013

anujbiyani commented Jun 5, 2013

dfevre commented Jun 5, 2013

anujbiyani commented Jun 5, 2013

dfevre commented Jul 3, 2013

anujbiyani commented Jul 3, 2013

aws-ha-release - servers are considered healthy before they actually are #32

aws-ha-release - servers are considered healthy before they actually are #32

Comments

anujbiyani commented May 29, 2013

dfevre commented Jun 2, 2013

anujbiyani commented Jun 3, 2013

dfevre commented Jun 3, 2013

colinbjohnson commented Jun 3, 2013

anujbiyani commented Jun 3, 2013

anujbiyani commented Jun 5, 2013

dfevre commented Jun 5, 2013

anujbiyani commented Jun 5, 2013

dfevre commented Jul 3, 2013

anujbiyani commented Jul 3, 2013