Fix haproxy updater #560

Merged
merged 11 commits into from Apr 30, 2015

Projects

None yet

3 participants

@hansode
Member
hansode commented Apr 29, 2015

Problem

Sometimes the haproxy_updator upstart system job failed before initializing networking with wakame-init.

Possible Solution

  1. change respawn limitation to unlimited
  2. add a proper sleep time before executing new process
  3. build new lb machine image

I've already gotten good result in #559.
So if this PR is merged, please close #559.

hansode added some commits Apr 24, 2015
@hansode hansode change respawn limitation to unlimited. 985c675
@hansode hansode remove unnecessary exit. dc5172e
@hansode hansode improbe complex if-statement. 767ea97
@hansode hansode remove unnecessary eval. 7584759
@hansode hansode make sure to use "exec". 55ef33c
@hansode hansode use logger instead of stderr. 9b8e6cc
@hansode hansode add logger tag. bce17bc
@hansode hansode remove unnecessary echo. fa67cda
@hansode hansode add a proper sleep. 49c1512
@hansode hansode remove unnecessary directory test. 663e771
@hansode hansode update comments.
504a1ed
@hansode hansode self-assigned this Apr 29, 2015
@hansode hansode added the Type : Bug label Apr 29, 2015
@axsh-bot
Member

504a1ed success - wakame-ci/rspec

@axsh-bot
Member

504a1ed success - wakame-ci/rpmbuild

@axsh-bot
Member

504a1ed success - wakame-ci/to-s3

@axsh-bot
Member

504a1ed failure - wakame-ci/dummy.smoke

@axsh-bot
Member

504a1ed failure - wakame-ci/kvm.smoke

@axsh-bot
Member

504a1ed failure - wakame-ci/vz.smoke

@axsh-bot
Member

504a1ed failure - wakame-ci/kvm.smoke.allowed-failure

@axsh-bot
Member

504a1ed failure - wakame-ci/lxc.smoke.allowed-failure

@axsh-bot
Member

504a1ed success - wakame-ci/rspec

@axsh-bot
Member

504a1ed success - wakame-ci/rpmbuild

@axsh-bot
Member

504a1ed success - wakame-ci/to-s3

@axsh-bot
Member

504a1ed failure - wakame-ci/lxc.smoke.allowed-failure

@axsh-bot
Member

504a1ed success - wakame-ci/dummy.smoke

@axsh-bot
Member

504a1ed success - wakame-ci/kvm.smoke.allowed-failure

@axsh-bot
Member

504a1ed success - wakame-ci/kvm.smoke

@axsh-bot
Member

504a1ed success - wakame-ci/vz.smoke

@Metallion
Member

First of all, thank you so much for fixing this! 😺

But I feel like respawn unlimited is a little too aggressive in this case. How about calling initctl start haproxy_updater at the end of the wakame-init script instead?

@hansode
Member
hansode commented Apr 30, 2015

How about calling initctl start haproxy_updater at the end of the wakame-init script instead?

How do you call it?

@hansode
Member
hansode commented Apr 30, 2015

But I feel like respawn unlimited is a little too aggressive in this case.

Btw. Why did you use unlimited respawn in stud?

@Metallion
Member

How do you call it?

This solution will only work if wakame-init is called again on reboot. Can you confirm if that is the case?

Btw. Why did you use unlimited respawn in stud?

The problem was that STUD forks worker processes and then sleeps forever. When calling initctl restart stud, the main process would restart before the old child processes had died. It's easier if I explain step by step.

STUD first start

  1. STUD upstart job starts on boot
    1.1) STUD main process forks worker child processes (1 per CPU core)
    1.2.1) STUD children start listening on TCP port xxxx.
    1.2.2) STUD main process sleeps forever

STUD restart

  1. User makes PUT call to load balancer WebAPI
    2.1) Wakame-vdc sends AMQP call with new settings to load blaancer
    2.2) The updater script in load balancer updates config files and calls initctl restart stud
    2.3) Upstart terminates the main STUD process (but children are still alive)
    2.4) Upstart starts new STUD main process (the old children are still alive)
    2.5) New STUD process tries to listen on TCP port xxxx(the old children are still alive)
    2.6) The new STUD process fails because TCP port xxxx is still in use by the old children

If STUD trapped the SIGTERM signal and killed off its children before exiting, the respawn unlimited would not have been necessary.

@hansode
Member
hansode commented Apr 30, 2015

Thanks for stud background.

@hansode
Member
hansode commented Apr 30, 2015

I think that respawn parameter should not be changed in this case/issue.
The porpose of this change is adding SLEEP.

@Metallion
Member

After discussing in person, I believe the unlimited respawn is OK for the following reasons:

  • My idea of adding a start to wakame-init doesn't work because the load balancer is using the standard instance's wakame-init. I wrongly assumed that it uses a specialized version.
  • This is a specialized upstart job that will only be used specifically for this Wakame-vdc load balancer appliance. There is no need to keep a scenario where HAProxy isn't running in mind.
  • The sleep that is in place right now will prevent high CPU load from fast respawns.

Merging.

@Metallion Metallion merged commit aa9acd5 into master Apr 30, 2015

7 of 8 checks passed

wakame-ci/lxc.smoke.allowed-failure The build was failure on wakame-ci #19238 (504a1ed3).
wakame-ci/dummy.smoke The build was success on wakame-ci #19239 (504a1ed3).
wakame-ci/kvm.smoke The build was success on wakame-ci #19244 (504a1ed3).
wakame-ci/kvm.smoke.allowed-failure The build was success on wakame-ci #19240 (504a1ed3).
wakame-ci/rpmbuild The build was success on wakame-ci #19230 (504a1ed3).
wakame-ci/rspec The build was success on wakame-ci #19228 (504a1ed3).
wakame-ci/to-s3 The build was success on wakame-ci #19232 (504a1ed3).
wakame-ci/vz.smoke The build was success on wakame-ci #19245 (504a1ed3).
@Metallion Metallion deleted the fix-haproxy-updater branch Apr 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment