Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add probes for load balancer health checks #3

Merged
merged 2 commits into from Dec 21, 2017
Merged

add probes for load balancer health checks #3

merged 2 commits into from Dec 21, 2017

Conversation

frisi
Copy link
Member

@frisi frisi commented Dec 20, 2017

see http://hvelarde.blogspot.co.at/2017/12/configuring-better-load-balancing-and.html for details how to set this up for haproxy

option tcp-check
tcp-check send health_db_connected\r\n
tcp-check expect string OK

default-server maxconn 4 inter 2s slowstart 1m

server instance1 127.0.0.1:8081 check port 8881

see http://hvelarde.blogspot.co.at/2017/12/configuring-better-load-balancing-and.html for details how to set this up for haproxy

```
option tcp-check
tcp-check send health_db_connected\r\n
tcp-check expect string OK

default-server maxconn 4 inter 2s slowstart 1m

server instance1 127.0.0.1:8081 check port 8881
```
Copy link
Member

@hvelarde hvelarde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @frisi did you check that the "ok" probe returns "OK" even when the database is not connected?

@frisi
Copy link
Member Author

frisi commented Dec 20, 2017

LGTM, @frisi did you check that the "ok" probe returns "OK" even when the database is not connected?

yes - it does.

when i stopped the zeo-server while the client was still running i got these errors in the instance.log now and then:

2017-12-20T02:05:45 WARNING ZEO.zrpc (11986) CW: error connecting to ('127.0.0.1', 8501): ECONNREFUSED

however, i could still access the App and do things like this with a pdb in the health_* probes

>>> from Zope2 import app as App
>>> app = App()
>>> app.plone.getObjectIds()
[...]

sometimes you'll get a ClientDisconnected error

>>> app.plone
2017-12-20 22:17:59 ERROR ZODB.Connection Couldn't load state for 0x02107a
Traceback (most recent call last):
  File "/home/frisi/.buildout/eggs/ZODB3-3.10.7-py2.7-linux-x86_64.egg/ZODB/Connection.py", line 860, in setstate
    self._setstate(obj)
  File "/home/frisi/.buildout/eggs/ZODB3-3.10.7-py2.7-linux-x86_64.egg/ZODB/Connection.py", line 901, in _setstate
    p, serial = self._storage.load(obj._p_oid, '')
  File "/home/frisi/.buildout/eggs/ZODB3-3.10.7-py2.7-linux-x86_64.egg/ZEO/ClientStorage.py", line 833, in load
    data, tid = self._server.loadEx(oid)
  File "/home/frisi/.buildout/eggs/ZODB3-3.10.7-py2.7-linux-x86_64.egg/ZEO/ClientStorage.py", line 88, in __getattr__
    raise ClientDisconnected()
ClientDisconnected

this might also have to do with the zodb-cache...

with a stopped zeo i could also ask for the database size, and got a value without any error

>>> app.Control_Panel.Database[dbname]._getDB()._storage.getSize()
201613344L

i digged through the source code and found the is_connected() method and this returned False when zeo-server was stopped and True if it was running.
this might not work on all setups but looks good for a start.

here is the output of the probe:

# zeo and instance running:
$ bin/instance1 monitor health_db_connected
OK

# zeo stopped
$ bin/instance1 monitor health_db_connected
database main is not connected

@hvelarde
Copy link
Member

awesome! anyway you have to be careful: you could mark an instance as unusable when it can really keep serving some content because of the ZODB cache.

I think I prefer to look for other errors (like an increase of 503 responses on the backend) instead.

@frisi
Copy link
Member Author

frisi commented Dec 21, 2017

@bsuttor are you ok with these changes. especially the naming/docstrings of the probes?

eventually i'll add haproxy demo settings to readme or docs/healthchecks.rst in another pr or commit docs directly to master

@bsuttor
Copy link
Member

bsuttor commented Dec 21, 2017

Thank you,
LGTM,

Indeed a haproxy config file example is a good idea to help people use this package to check HAProxy healthy.

I will megre this PR, you can add docs directly on master branch.

@bsuttor bsuttor merged commit 9e7789f into master Dec 21, 2017
@bsuttor
Copy link
Member

bsuttor commented Dec 21, 2017

@frisi
Can you send me your pypi username please ?
I m'going to add you as collective.monitor maintainer, so you can make a release.

@bsuttor bsuttor deleted the healthcheck branch December 21, 2017 13:44
@frisi
Copy link
Member Author

frisi commented Dec 21, 2017

thanks for your feedback @bsuttor!
my pypi user is frisi as well

@bsuttor
Copy link
Member

bsuttor commented Dec 21, 2017

You have right to make release now ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants