add probes for load balancer health checks #3

frisi · 2017-12-20T18:25:48Z

see http://hvelarde.blogspot.co.at/2017/12/configuring-better-load-balancing-and.html for details how to set this up for haproxy

option tcp-check
tcp-check send health_db_connected\r\n
tcp-check expect string OK

default-server maxconn 4 inter 2s slowstart 1m

server instance1 127.0.0.1:8081 check port 8881

see http://hvelarde.blogspot.co.at/2017/12/configuring-better-load-balancing-and.html for details how to set this up for haproxy ``` option tcp-check tcp-check send health_db_connected\r\n tcp-check expect string OK default-server maxconn 4 inter 2s slowstart 1m server instance1 127.0.0.1:8081 check port 8881 ```

hvelarde

LGTM, @frisi did you check that the "ok" probe returns "OK" even when the database is not connected?

frisi · 2017-12-20T21:20:07Z

LGTM, @frisi did you check that the "ok" probe returns "OK" even when the database is not connected?

yes - it does.

when i stopped the zeo-server while the client was still running i got these errors in the instance.log now and then:

2017-12-20T02:05:45 WARNING ZEO.zrpc (11986) CW: error connecting to ('127.0.0.1', 8501): ECONNREFUSED

however, i could still access the App and do things like this with a pdb in the health_* probes

>>> from Zope2 import app as App
>>> app = App()
>>> app.plone.getObjectIds()
[...]

sometimes you'll get a ClientDisconnected error

>>> app.plone
2017-12-20 22:17:59 ERROR ZODB.Connection Couldn't load state for 0x02107a
Traceback (most recent call last):
  File "/home/frisi/.buildout/eggs/ZODB3-3.10.7-py2.7-linux-x86_64.egg/ZODB/Connection.py", line 860, in setstate
    self._setstate(obj)
  File "/home/frisi/.buildout/eggs/ZODB3-3.10.7-py2.7-linux-x86_64.egg/ZODB/Connection.py", line 901, in _setstate
    p, serial = self._storage.load(obj._p_oid, '')
  File "/home/frisi/.buildout/eggs/ZODB3-3.10.7-py2.7-linux-x86_64.egg/ZEO/ClientStorage.py", line 833, in load
    data, tid = self._server.loadEx(oid)
  File "/home/frisi/.buildout/eggs/ZODB3-3.10.7-py2.7-linux-x86_64.egg/ZEO/ClientStorage.py", line 88, in __getattr__
    raise ClientDisconnected()
ClientDisconnected

this might also have to do with the zodb-cache...

with a stopped zeo i could also ask for the database size, and got a value without any error

>>> app.Control_Panel.Database[dbname]._getDB()._storage.getSize()
201613344L

i digged through the source code and found the is_connected() method and this returned False when zeo-server was stopped and True if it was running.
this might not work on all setups but looks good for a start.

here is the output of the probe:

# zeo and instance running:
$ bin/instance1 monitor health_db_connected
OK

# zeo stopped
$ bin/instance1 monitor health_db_connected
database main is not connected

hvelarde · 2017-12-20T22:45:40Z

awesome! anyway you have to be careful: you could mark an instance as unusable when it can really keep serving some content because of the ZODB cache.

I think I prefer to look for other errors (like an increase of 503 responses on the backend) instead.

frisi · 2017-12-21T11:22:21Z

@bsuttor are you ok with these changes. especially the naming/docstrings of the probes?

eventually i'll add haproxy demo settings to readme or docs/healthchecks.rst in another pr or commit docs directly to master

bsuttor · 2017-12-21T13:40:14Z

Thank you,
LGTM,

Indeed a haproxy config file example is a good idea to help people use this package to check HAProxy healthy.

I will megre this PR, you can add docs directly on master branch.

bsuttor · 2017-12-21T13:44:24Z

@frisi
Can you send me your pypi username please ?
I m'going to add you as collective.monitor maintainer, so you can make a release.

frisi · 2017-12-21T14:31:48Z

thanks for your feedback @bsuttor!
my pypi user is frisi as well

bsuttor · 2017-12-21T14:36:17Z

You have right to make release now ;-)

frisi requested review from bsuttor and hvelarde December 20, 2017 18:25

hvelarde approved these changes Dec 20, 2017

View reviewed changes

improve docstrings

a8d8690

bsuttor merged commit 9e7789f into master Dec 21, 2017

bsuttor deleted the healthcheck branch December 21, 2017 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add probes for load balancer health checks #3

add probes for load balancer health checks #3

frisi commented Dec 20, 2017

hvelarde left a comment

frisi commented Dec 20, 2017

hvelarde commented Dec 20, 2017

frisi commented Dec 21, 2017

bsuttor commented Dec 21, 2017

bsuttor commented Dec 21, 2017

frisi commented Dec 21, 2017

bsuttor commented Dec 21, 2017

add probes for load balancer health checks #3

add probes for load balancer health checks #3

Conversation

frisi commented Dec 20, 2017

hvelarde left a comment

Choose a reason for hiding this comment

frisi commented Dec 20, 2017

hvelarde commented Dec 20, 2017

frisi commented Dec 21, 2017

bsuttor commented Dec 21, 2017

bsuttor commented Dec 21, 2017

frisi commented Dec 21, 2017

bsuttor commented Dec 21, 2017