-
-
Notifications
You must be signed in to change notification settings - Fork 762
Description
problem
sometimes Keepalived_healthcheckers (the healthchecker of keepalived) stop working without dying or writing anything on the logs.
example
# journalctl -u keepalived.service
…
Jan 02 10:47:14 redacted Keepalived_healthcheckers[3911]: HTTP status code error to server [10.36.6.35]:80.
Jan 03 18:38:23 redacted Keepalived[3910]: Keepalived_healthcheckers exited due to signal 9
Jan 03 18:38:23 redacted Keepalived[3910]: Healthcheck child process(3911) died: Respawning
Jan 03 18:38:23 redacted Keepalived[3910]: Starting Healthcheck child process, pid=14114
Jan 03 18:38:23 redacted Keepalived_healthcheckers[14114]: Initializing ipvs
…
Jan 03 18:38:23 redacted Keepalived_healthcheckers[14114]: Activating healthchecker for service [10.36.6.35]:80
here between Jan 02 10:47:14 and Jan 03 18:38:23 server [10.36.6.35]:80 "came back to life" but Keepalived_healthcheckers was stuck/freezed and didn't noticed it.
At Jan 03 18:38:23 I kill -9 $(cat /run/checkers.pid), the healthchecker respawn and everything came back to normal ([10.36.6.35]:80 came back in the backend).
note
- This happens on multiple (eg. v2.0.10) version of keepalived.
- I don't know if it happens on recent keepalived version.
- I don't know how or what freeze/stuck the healthchecker
- the freeze is completely silent in the logs
- those freeze are seldom so a
kill -9 …is ok - but they are too frequent and too impactful, for us to "sweep it under the carpet"
solution/feature I would like
Has the freeze is completely silent I wonder if there was some signal or socket API or anything else that would allow me to check for the healthcheckers health.
My goal is to create a monitoring prob to check the liveness of the healthcheckers.
I looked at the code (current master) but didn't find anything.
- Either I missed something, and there is already something that allow the check of the healthcheckers → in that case, can you point it to me ? (and maybe add a paragraph to the doc)
- Either there is nothing… and I think it would be a good feature to add.
I think this feature would be benefit not only to myself, but also to others.
Thank you in advance