New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent database connections with async workers on Django >= 1.6 fails #996

Closed
Starefossen opened this Issue Mar 20, 2015 · 10 comments

Comments

Projects
None yet
7 participants
@Starefossen
Contributor

Starefossen commented Mar 20, 2015

I noticed something when switching from sync to async eventlet workers in Gunicorn; Postgres which had been working perfectly so far begun complaining that maximum connection limit was reached, and it was refusing new connections. When enabling monitoring I could clearly see that new connections to the database was established for each HTTP request. Even though CONN_MAX_AGE was set to 60 database connections were not reused.

After some digging around I found this post on serverfault which explains the a very similar situation which concludes that persistent database connections are indeed set up – but they are never reused in any sequential HTTP requests and they are eventually closed when they reach the max age.

Is this a known issues? I can not see it has been mentioned in this issue tracker, nor on eventlet.

djano v1.7
gunicorn v19.3
eventlet v0.17
psycopg2 v2.6
@ramiro

This comment has been minimized.

ramiro commented Apr 15, 2015

I'd change the title to something that makes clear the report is about Django >= 1.6 persistent DB connections feature.

@Starefossen Starefossen changed the title from Persistent database connections with async workers to Persistent database connections with async workers on Django >= 1.6 fails Apr 15, 2015

@ramiro

This comment has been minimized.

ramiro commented Apr 15, 2015

I'm going to try to add some extra info with an excerpt of an internal issue in which we are exploring & testing changing CONN_MAX_AGE from its current default value 0. We use the gevent worker:

Django handles DB connections like this:

  • If CONN_MAX_AGE = 0 then connections are opened when needed & closed at the end of every request. This is implemented with the request_finished signal.
  • If CONN_MAX_AGE != 0 (n or None) they are handled per [1]process, every worker process started by the web server can have one connection which it uses for the requests it's in charge of: It reuses it (or opens it if it's closed) when needed BUT doesn't necessarily closes it at the end of requests. They are recycled every n seconds (by closing them if the TTL has been reached, this is implemented with both request_started request_finished signals) or never (None).

If the web server uses a concurrency model based on greenlets (eventlet, gevent workers) then connections are still opened when needed on every request and...:

  • In the default case of CONN_MAX_AGE = 0 there aren't problems as they are closed at the end of the request.
  • Now, if we set CONN_MAX_AGE != 0 then what happens is that connections aren't closed at the end of requests. So they start to accumulate.

Some options are:

  1. Make connections persistent with CONN_MAX_AGE != 0 but switch the gunicorn worker to the 'sync' one (process-based)
  2. Set CONN_MAX_AGE = 0 and switch to a DB connection pooling solution:

[1] This is actually per thread, but in actual web server deployment scenarios with e.g. Linux + apache or gunicorn, multithreading model is rarely used. Sync, multi-process (which means a one-to-one process-thread mapping) or async greenlets-based concurrency models are the usual choices.

If the above analysis is correct, I'm not sure it's in gunicorn scope to 'fix' this. Don't think it's in Django scope either (i.e., [try to] detect what concurrency model does the web server it has been deployed to uses and adapt its handling of these persistent DB connections accordingly.)

@tilgovi

This comment has been minimized.

Collaborator

tilgovi commented Apr 15, 2015

@ramiro I agree with your analysis.

@tilgovi

This comment has been minimized.

Collaborator

tilgovi commented Dec 31, 2015

I think, given the analysis, that this will not be fixed. If multiple, concurrent connections are made by an async worker the application needs to use a thread safe connection pool to get safe, correct behavior.

@tilgovi tilgovi closed this Dec 31, 2015

@kevart

This comment has been minimized.

kevart commented Feb 23, 2017

Re: "Now, if we set CONN_MAX_AGE != 0 then what happens is that connections aren't closed at the end of requests. So they start to accumulate."

Why is that an issue with async workers? As soon as an async worker completes a request it still closes the connection? Why would connections accumulate then? By accumulate you mean connections are getting more and more (and finally exceeding the DB's connection limit)?

I can understand that async workers produce more connections (because a single worker can handle more than 1 request at a time, hence opening more db connections) but once a certain number of connections is reached, it should converge.

Additionally can someone explain why/how https://github.com/jneight/django-db-geventpool would help in this case?

@DanielStevenLewis

This comment has been minimized.

DanielStevenLewis commented May 5, 2017

Could somebody (perhaps @ramiro or @tilgovi) please confirm that https://github.com/jneight/django-db-geventpool should actually help in this case? I think I ran into this problem but it kept occurring after installing django-db-geventpool (with the import from eventlet included, and the import from gevent not included). I was getting a log of "FATAL: sorry, too many clients already" which indicates that I opened up more than the allowed limit of connections to the database. CPU was also maxing out so perhaps that's why I kept getting the same error after installing django-db-geventpool...?

@DanielStevenLewis

This comment has been minimized.

DanielStevenLewis commented May 5, 2017

Following up from my last post, it seems possible that the CPU max out was (at least partially) caused by the issue @Starefossen raised above. http://stackoverflow.com/a/43799635/805141 indicates this is a possibility.

Also, shouldn't this issue be documented somewhere?

@DanielStevenLewis

This comment has been minimized.

DanielStevenLewis commented May 6, 2017

The lack of issue resolution that I experienced may have been caused by jneight/django-db-geventpool#21

mtyaka added a commit to open-craft/configuration that referenced this issue Jul 3, 2018

Make edx_django_service CONN_MAX_AGE default to 0.
The default gunicorn worker class is 'gevent', which is not compatible
with django's CONN_MAX_AGE because it leaks database connections.
See: benoitc/gunicorn#996 (comment)

This changes the default from 60 to 0.

mtyaka added a commit to open-craft/configuration that referenced this issue Jul 6, 2018

Make edx_django_service CONN_MAX_AGE default to 0.
The default gunicorn worker class is 'gevent', which is not compatible
with django's CONN_MAX_AGE because it leaks database connections.
See: benoitc/gunicorn#996 (comment)

This changes the default from 60 to 0.
@jxltom

This comment has been minimized.

jxltom commented Sep 20, 2018

Re: "Now, if we set CONN_MAX_AGE != 0 then what happens is that connections aren't closed at the end of requests. So they start to accumulate."

Why is that an issue with async workers? As soon as an async worker completes a request it still closes the connection? Why would connections accumulate then? By accumulate you mean connections are getting more and more (and finally exceeding the DB's connection limit)?

I can understand that async workers produce more connections (because a single worker can handle more than 1 request at a time, hence opening more db connections) but once a certain number of connections is reached, it should converge.

Additionally can someone explain why/how https://github.com/jneight/django-db-geventpool would help in this case?

I think it is because the established connections will only be closed when requests starts or finishes if conn_max_age is not 0. So if async workers die for some reason such as timeout, they won't receive new requests and then the db connections can not be closed for ever (they can only be closed in reqeusts starts or finishes).

As for the django-db-geventpool, it is used when conn_max_age is 0. When it is 0, every new request will create new db connection. The db pool can improve the performance.

Correct me if I'm wrong. :D

@benoitc

This comment has been minimized.

Owner

benoitc commented Sep 21, 2018

In current design, connections will be kept alive to handle HTTP keepalive before releasing. New coming requests are handled in a new greenlet.

The Django documentation states:

Caveats¶

Since each thread maintains its own connection, your database must support at least as many > simultaneous connections as you have worker threads.

I guess the way django persistent connections work is to associate them to the thread id. If threads are patched in the gevent or eventlet workers, it is probably creating a new connection / greenlets which is limited by the number of maximum connections which may explains what you observe. If it' that then maybe django could be patched to actually use the real thread on which it run? Or by limiting the number of concurrent connections.

snopoke added a commit to dimagi/commcare-cloud that referenced this issue Nov 21, 2018

remove CONN_MAX_AGE
* setting is not respected by celery < v4.2
  * celery/celery#4292
* not compatible with 'gevent' gunicorn worker class
  * benoitc/gunicorn#996 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment