New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support database connection pooling for postgresql #16881
Conversation
Note that this would be usable as is in an existing project (ie Django 4.2) if one extracts the code into a custom database backend (would be a bit of copy and paste but maybe better than running on main) |
You are storing the pool inside the
Thread workers can be useful even with the GIL since DB queries release the GIL. |
Jupp you are right about the threading behavior. Moving the pool process wide will we easy enough though. Will push a patch later. Also agree wrt the GIL, but how well this works for threads strongly depends on your application (I have more details in https://forum.djangoproject.com/t/possible-performance-improvements-of-psycopg3-support-in-django-4-2/18978/11 ) -- I just don't want to get people's hopes up here to much for the non-async case.
…On Sun, May 21, 2023, at 22:17, Ran Benita wrote:
You are storing the pool inside the `DatabaseWrapper` instance. Unless
I'm confused/mistaken, `DatabaseWrapper`s are stored in thread/task
local (`asgiref.Local`). So it will end up creating a pool per
thread/task, but surely it should be process-global?
> Realistically speaking, this only becomes interesting in the sync case once we can have multithreaded gil-free python :D
>
Thread workers can be useful even with the GIL since DB queries release the GIL.
—
Reply to this email directly, view it on GitHub
<#16881 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAT5C55JDTEZ7XUB5YSRSDXHJZ6VANCNFSM6AAAAAAYI2XEOQ>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Pushed a new commit which makes the pool actually process wide (please double check). While testing this I also noticed that runserver takes out one connection and never closes it again (probably to check the migration status among others :D) |
I wonder if we should call |
I forgot to mention that we're already using something very similar in production (since Django 4.2 rc release), with moderate traffic, and it works well. So I'll be very happy to see this go into Django itself. I think the current pool initialization approach is not thread safe. Or at least it can cause two pools to be created for the same alias. Optional suggestion: Ideally, the |
Mind sharing that code :)
Mhm, good point. I hate to add a threading.Lock for that, but I guess that is the only way to get this done safely. Another option would be to use
Maybe, not to sure, this is something Django as a whole could benefit from and not just the DB backends?
Agreed, but be don't have a concept of a shutdown in Django at all. Maybe we can enable it at least for the ASGI lifecycle events, though I guess we won't have to add much inside the backend for that. |
Here's the code I've used. I'm sure it's buggy and incomplete but was good enough for us. I used the singleton "double checked locking" pattern to avoid taking the lock if already initialized. Details
from typing import Any, ClassVar
import threading
from django.core.exceptions import ImproperlyConfigured
from django.db.backends.postgresql.base import (
Cursor,
ServerBindingCursor,
)
from django.db.backends.postgresql.base import (
DatabaseWrapper as PostgresqlDatabaseWrapper,
)
from django.utils.asyncio import async_unsafe
import psycopg
from psycopg_pool import ConnectionPool
class DatabaseWrapper(PostgresqlDatabaseWrapper):
_pools_lock: ClassVar = threading.Lock()
# (unique connection params key) -> ConnectionPool
_pools: ClassVar[dict[object, ConnectionPool]] = {}
def _configure_new_conn(self, connection: psycopg.Connection[Any]) -> None:
# self.isolation_level must be set:
# - after connecting to the database in order to obtain the database's
# default when no value is explicitly specified in options.
# - before calling _set_autocommit() because if autocommit is on, that
# will set connection.isolation_level to ISOLATION_LEVEL_AUTOCOMMIT.
options = self.settings_dict['OPTIONS']
try:
isolation_level_value = options['isolation_level']
except KeyError:
self.isolation_level = psycopg.IsolationLevel.READ_COMMITTED
else:
# Set the isolation level to the value from OPTIONS.
try:
self.isolation_level = psycopg.IsolationLevel(isolation_level_value)
except ValueError:
raise ImproperlyConfigured(
f'Invalid transaction isolation level {isolation_level_value} '
f'specified. Use one of the psycopg.IsolationLevel values.'
) from None
connection.isolation_level = self.isolation_level
connection.cursor_factory = (
ServerBindingCursor
if options.get('server_side_binding') is True
else Cursor
)
@async_unsafe
def get_new_connection(self, conn_params) -> psycopg.Connection[Any]:
dbname = conn_params.get('dbname')
key = (self.alias, dbname, conn_params.get('options'))
cls = self.__class__
if key not in cls._pools:
with cls._pools_lock:
if key not in cls._pools:
cls._pools[key] = ConnectionPool(
min_size=self.settings_dict.get('POOL_MIN_SIZE', 4),
max_size=self.settings_dict.get('POOL_MAX_SIZE', None),
open=True,
name=f'dbbackend-pool-{self.alias}',
timeout=self.settings_dict.get('POOL_TIMEOUT', 30.0),
max_waiting=self.settings_dict.get('POOL_MAX_WAITING', 0),
max_lifetime=self.settings_dict.get('POOL_MAX_LIFETIME', 60 * 60.0),
max_idle=self.settings_dict.get('POOL_MAX_IDLE', 5 * 60.0),
reconnect_timeout=self.settings_dict.get('POOL_RECONNECT_TIMEOUT', 5 * 60.0),
reconnect_failed=None,
num_workers=self.settings_dict.get('POOL_NUM_WORKERS', 3),
configure=self._configure_new_conn,
reset=None,
kwargs=conn_params,
)
return self._pools[key].getconn()
def _close(self) -> None:
if self.connection is not None:
with self.wrap_database_errors:
pool: ConnectionPool = self.connection.connection._pool
pool.putconn(self.connection)
Yes I think so, but how would the DatabaseWrapper hook into it - using signals maybe?
I was mostly thinking about test frameworks or such, that maybe init Django as a library multiple times with different settings etc during the same run. Mutable globals are problematic for this (see PEP-684 "Per-Interpreter GIL" for an analogous problem in CPython itself). But if this is not something that's supported currently, probably fine to ignore here. |
Thanks, good to see that we did it pretty similar. Interestingly enough you are using
I guess it won't matter. It really is only an issue during startup, later on it would always find a pool.
Yeah, tests will probably be tricky :) |
@bluetech Thank you for your thorough reviews so far. I have opted for |
I'll close this PR since I currently do not have the time to bring this over the finish line. If someone wants to get this merged I guess only docs & tests are missing. |
So this needs tests and release notes. But I figured there might be one or two people out there who want to try this as is.
To get the pool running either set
pool
to True in the databaseOPTIONS
dict or set it to a dict which then gets passed on to https://www.psycopg.org/psycopg3/docs/api/pool.html#psycopg_pool.ConnectionPoolPlease note that this pool is most likely only really useful in the async case. Using a pool with a multi-process deployment (like standard gunicorn workers) is kinda useless since the will be just one concurrent request per process. Realistically speaking, this only becomes interesting in the sync case once we can have multithreaded gil-free python :D