Skip to content
This repository

Health checks upstreams for nginx

branch: master
README
Healthcheck plugin for nginx.  It polls backends and if they respond with
HTTP 200 + an optional request body, they are marked good.  Otherwise, they
are marked bad.  Similar to haproxy/varnish health checks.

For help on all the options, see the docblocks inside the .c file where each
option is defined.

Note this also gives you access to a health status page that lets you see
how well your healthcheck are doing.


==Important=
Nginx gives you full freedom which server peer to pick when you write an
upstream.  This means that the healthchecking plugin is only a tool that
other upstreams must know about to use.  So your upstream code MUST SUPPORT
HEALTHCHECKS.  It's actually pretty easy to modify the code to support them.

See the .h file for how as well as the upstream_hash patch which shows
how to modify upstream_hash to support healthchecks.

For an example plugin modified to support healthchecks, see my modifications
to the upstream_hash plugin here:

http://github.com/cep21/nginx_upstream_hash/tree/support_http_healthchecks

==Limitations==
The module only supports HTTP 1.0, not 1.1.  What that really means is it
doesn't understand chunked encoding.  You should ask for a 1.0 reponse with
your healthcheck, unless you're sure the upstream won't send back chunked
encoding.  See the sample config for an example.

==INSTALL==
# Similar to the upstream_hash module

cd nginx-0.7.62 # or whatever
patch -p1 < /path/to/this/directory/nginx.patch
./configure --add-module=/path/to/this/directory
make
make install

==How the module works==
My first attempt was to spawn a pthread inside the master process, but nginx
freaks out on all kinds of levels when you try to have multiple threads
running at the same time.  Then I thought, fine I'll just fork my own child.
But that caused lots of issues when I tried to HUP the master process because
my own child wasn't getting signals.  I was thinking to myself, these just
don't feel like the nginx way of doing things.  So, I figured I would just
work directly with the worker process model.

When each worker process starts, they add an repeating event to the event
tree asking for ownership of a server's healthcheck.  When that ownership
event comes up, they lock the server's healthcheck and try to claim it with
their pid.  If the process can't claim it, then it retries to claim the
healthcheck later, cause maybe the worker that does own it dies or something.

For the worker that does own it, it inserts a healthcheck event into nginx's
event tree.  When that triggers, then it starts a peer connection to the
server and goes to town sending and getting data.  When the healthcheck
finishes, or times out, it updates the shared memory structure and signals for
another healthcheck later.

A few random issues I had were:
1) When nginx tries to shut down, it waits for the event tree to empty out.
To get around this, I check for ngx_quit and all kinds of other variables.
This means that when you do HUP nginx, your worker needs to sit around doing
nothing until *something* in the healthcheck event tree comes up, after which
it can clear all the healthcheck events and move on.  I could fix this if
nginx added a per module callback on HUP.  Maybe a 'cleanup' or something.
The current exit_process callback is called after the event tree is empty, not
after a request to shutdown a worker.

==Extending==
It should be very easy to extend this module to work with fastcgi or even
generic TCP backends.  You would need to just change, or abstract out,
ngx_http_healthcheck_process_recv.  Patches that do that are welcome, and I'm
happy to help out with any questions.  I'm also happy to help out with
extending your upstream picking modules to work with healthchecks as well.
Your code can even be no healthcheck compatable by surrounding the changes
with #if (NGX_HTTP_HEALTHCHECK)

==Config==
See sample_ngx_config.conf

Author: Jack Lindamood <jack@facebook.com>

==License==

Apache License, Version 2.0
Something went wrong with that request. Please try again.