limit the number of connections to upstream for NGINX
C
Switch branches/tags
Clone or download
cfsego
Latest commit 06031cf Apr 17, 2018

README

Limit upstream module for nginx

=OVERVIEW===============================================================

  When you use nginx as a proxy, there is a problem you may have to face:
a upstream server is able to achieve high QPS within a relatively low resource
stress. For example, mysql works well at the speed of 8000-9000 QPS within 60 parallel
connections, but only 2000 QPS when the number of connections increases to 500.

  I do not find any module that will solve the problem because they all face client,
not the upstream. I tried the limit_req, but failed. when I write down "limit_req rate=8000/s",
I find there are over 2000 active connections between nginx and mysql,
and when I write down "limit_req rate=60/s", Jesus, it is really 60 QPS for my nginx.

  So, I have a try, and the limit_upstream module comes. This module will limit the
number of connections to a upstream server identified by its ip address. With it, I
could control the upstream connections. When it reaches the set threshold, new requests
will suspend and wait until a active upstream request finishes and wake them up,
otherwise, they will die when they get timeout. Of course, I can also specify the length
of waiting queue, and the waiting timeout, as limit_req do.

  In this module, each worker suspend its requests when upstreams' counters reach the
threshold, and resume them or cancel them all by itself. The only things they share
are the global upstreams' counters. A special case for a request be allowed to go to
upstream even when the threshold is reached is that, no request related to that upstream
is being processed at that time for the worker. as a result, the maximum number of
established connections to a upstream server is ('limit' + 'worker_processes').

=MANUAL=================================================================

Example:

stream {
    server {
        proxy_pass spool;
    }

    limit_upstream_zone stest 10m;

    upstream spool {
        server 10.232.36.98:3112;
        ...

        limit_upstream_conn limit=260 zone=stest backlog=10000 timeout=180s;
        limit_upstream_log_level error;
    }
}

http {
    server {
        location           =/test {
            proxy_pass      http://pool;
        }
    }

    limit_upstream_zone test 10m;

    upstream pool {
        server 10.232.36.98:3111;
        ...

        limit_upstream_conn limit=260 zone=test backlog=10000 timeout=180s;
        limit_upstream_log_level error;
    }
}

syntax:  limit_upstream_zone zone_name size;
default: -
context: http, stream

  Define a shared memory pool for the module.

syntax:  limit_upstream_conn zone=zone_name limit=limit [backlog=length] [timeout=timeout | nodelay] [instant_hook];
default: -
context: upstream

  Set the zone used, the maximum allowed for each server of the upstream,
the backlog length, and the timeout for each suspended request.
The default for 'backlog' is 1000. Moreover, the default for 'timeout' is
the same as the reading timeout set by proxy. For example, when using fastcgi,
it is the same as the value set by 'fastcgi_read_timeout' or the default
value of 'fastcgi_read_timeout' when not set.

  When backlog for a server is full, it may use other server, Otherwise,
it wait in the queue. Timeout will cause 408 return status code.

  Parameter 'instant_hook' shouild be used when you are with a dynamic upstream
updating module, eg. ngx_dyn_ups_module. What about with Nginx Plus? I didn't test that.
For more details, you can refer to https://github.com/cfsego/nginx-limit-upstream/issues/13

syntax:  limit_upstream_log_level [ error | warn | notice | info ];
default: limit_upstream_log_level notice;
context: http, stream, upstream

  Define the log level of ordinary logs for the module.

=INSTALL===============================================================

patch -p2 < nginx.patch (nginx 1.0.X, 1.2.X)
or
patch -p1 < nginx-1.4.4.patch (nginx 1.4.X, nginx 1.6.X)
or
patch -p0 < nginx-1.8.1.patch  (nginx 1.8.X, nginx 1.9.X)
or
patch -p0 < nginx-1.10.1.patch  (nginx 1.10.X)
or
cp /path/to/module/nginx.1.10.1.patch /path/to/nginx-1.10.0/debian/patches (nginx 1.10 on debian)
or
patch -p0 < nginx-1.12.1.patch  (nginx 1.12.X)

if you want to use the module for stream, use this patch in addition:
patch -p1 < nginx-1.12.1-stream.patch

./configure --add-module=/path/to/module
make
make install

=PATCH=================================================================

  The nginx.patch is based on nginx 1.0.11, and it is compatible with nginx 1.2.0.
However, I did not test it on the other versions :)
  The nginx.1.4.4.patch is based on nginx 1.4.4, and tested only under nginx 1.4.4.
  And there is patches for nginx.1.8.1 and 1.10.1.
  Thanks to @ivan-antonyuck, who made patch for nginx 1.10.X and 1.10 on debian.

=BUGS==================================================================

* bugfixed    2018.2.6
  1. Some logs do not follow the directive 'limit_upstream_log_level', are always WARN.
  2. There are some warnings in compilation.

* bugfixed    2016.6.13
  The requests are choked when proxying https to backend. The cause is it transfers
  invalid data to set_session().

* bugfixed    2016.6.1
  The rbtree node are always deleted after reload, because snode->last is the timestamp
  used for removing useless node, but has never been set a value. This will cause the
  module to allow any request to pass after reload.

* bugfixed    2014.6.5
  Local variable 'pc' is only used for printing debug infomation, so there is a
warning when compiling nginx without the configuration '--with-debug'.

* bugfixed    2013.11.11
  The worker crushes after it processes a request. The problem occurs under
nginx 1.4.3, because nginx sets u->sockaddr to NULL before my module uses it.

* bugfixed    2012.7.9
  There is a critical bug that may make upstreams stuck. The description is
if a request uses a server, and the server fails, then the request tries another
server with the help of ngx_http_upstream_next, however, the counter of the
former server is not decreased, then the whole system fails to work after a number
of such fails.

=CHANGES===============================================================

  1.3.0 [2018.4.17] now we can use this module with stream.

  1.2.3 [2018.2.6] change the default of 'timeout' to the same as the reading timeout
                   set by proxy.

  1.2.2 [2016.7.13] now can work with ngx_dyn_ups_module.

  1.2.1 [2013.6.5] fixed a bug variable 'pc' is unused.

  1.2.0 [2013.11.11] fixed a bug occured under nginx 1.4.4, new patch for nginx 1.4.4

  1.1.0 [2012.7.17] bugfixed, and optimize for efficiency, 10% faster.

  1.0.0 [2012.5.31] initial version, including some bug fixes.

=USED WITH STREAM======================================================

  On timeout, nginx will close the connection immediately, so the client may receive
TCP datagram with RST flag. When the module drops a session, it returns NGX_DECLINED, so
nginx will try another peer and mark current peer as failed, this may take effects on
load balancer. If this cause any problem, let me know.

=USED WITH UPSTREAM KEEPALIVE==========================================

  Once nginx upstream keepalive module returns NGX_DONE, it means there is already
one setup connection, so my module let the request pass, no matter it overpasses the
threshold or not. There is a problem with this case (when the number reaches, nginx
trends to distribute upstream connections for newcomers among workers at the proportion
of connections already setup, and it is unfair). I think set keepalive as formula below,
  keepalive = limit_conn / worker_processes
may resolve the problem.

=MISUSE================================================================

1. someone uses it like this:

    limit_upstream_zone test 10m;

    upstream bb {
        server 127.0.0.1:3111;
        server 127.0.0.1:3112;

        limit_upstream_conn limit=8 zone=test backlog=2000 timeout=1500ms;
    }

    upstream cc {
        server 127.0.0.1:3111;
        server 127.0.0.1:3112;

        limit_upstream_conn limit=16 zone=test backlog=1000 timeout=500ms;
    }

  It will absolutely fail to work. The correct is:

    limit_upstream_zone test_high 10m;
    limit_upstream_zone test_low 10m;

    upstream bb {
        server 127.0.0.1:3111;
        server 127.0.0.1:3112;

        limit_upstream_conn limit=8 zone=test_low backlog=2000 timeout=1500ms;
    }

    upstream cc {
        server 127.0.0.1:3111;
        server 127.0.0.1:3112;

        limit_upstream_conn limit=16 zone=test_high backlog=1000 timeout=500ms;
    }