Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support database connection pooling for postgresql #16881

Closed
wants to merge 8 commits into from

Conversation

apollo13
Copy link
Member

@apollo13 apollo13 commented May 20, 2023

So this needs tests and release notes. But I figured there might be one or two people out there who want to try this as is.

To get the pool running either set pool to True in the database OPTIONS dict or set it to a dict which then gets passed on to https://www.psycopg.org/psycopg3/docs/api/pool.html#psycopg_pool.ConnectionPool

Please note that this pool is most likely only really useful in the async case. Using a pool with a multi-process deployment (like standard gunicorn workers) is kinda useless since the will be just one concurrent request per process. Realistically speaking, this only becomes interesting in the sync case once we can have multithreaded gil-free python :D

@apollo13
Copy link
Member Author

Note that this would be usable as is in an existing project (ie Django 4.2) if one extracts the code into a custom database backend (would be a bit of copy and paste but maybe better than running on main)

@bluetech
Copy link
Contributor

You are storing the pool inside the DatabaseWrapper instance. Unless I'm confused/mistaken, DatabaseWrappers are stored in thread/task local (asgiref.Local). So it will end up creating a pool per thread/task, but surely it should be process-global?

Realistically speaking, this only becomes interesting in the sync case once we can have multithreaded gil-free python :D

Thread workers can be useful even with the GIL since DB queries release the GIL.

@apollo13
Copy link
Member Author

apollo13 commented May 22, 2023 via email

@apollo13
Copy link
Member Author

Pushed a new commit which makes the pool actually process wide (please double check). While testing this I also noticed that runserver takes out one connection and never closes it again (probably to check the migration status among others :D)

@apollo13
Copy link
Member Author

I wonder if we should call DISCARD ALL when returning the connection to the pool. Then we could configure stuff like timezone/role per request again (though I am not convinced that we actually would want that :D). As it currently is we might leave the connection in a "suboptimal" state if someone changed some session settings. Then again it is not different from what persistent connections would have now, so I think that would be okay.

@bluetech
Copy link
Contributor

I forgot to mention that we're already using something very similar in production (since Django 4.2 rc release), with moderate traffic, and it works well. So I'll be very happy to see this go into Django itself.

I think the current pool initialization approach is not thread safe. Or at least it can cause two pools to be created for the same alias.

Optional suggestion: Ideally, the DatabaseWrapper infrastructure will provide some thread-safe "hooks" for process-wide state. This will avoid the need for other backends to duplicate the locking code, and will ensure it is done correctly. Also, it should perhaps provide a shutdown hook for closing the pool. While not strictly necessary ATM, I think it's a good habit to have a defined lifecycle for every resource.

@apollo13
Copy link
Member Author

I forgot to mention that we're already using something very similar in production (since Django 4.2 rc release), with moderate traffic, and it works well.

Mind sharing that code :)

I think the current pool initialization approach is not thread safe. Or at least it can cause two pools to be created for the same alias.

Mhm, good point. I hate to add a threading.Lock for that, but I guess that is the only way to get this done safely. Another option would be to use setdefault which is hopefully threadsafe?

Optional suggestion: Ideally, the DatabaseWrapper infrastructure will provide some thread-safe "hooks" for process-wide state.

Maybe, not to sure, this is something Django as a whole could benefit from and not just the DB backends?

While not strictly necessary ATM, I think it's a good habit to have a defined lifecycle for every resource.

Agreed, but be don't have a concept of a shutdown in Django at all. Maybe we can enable it at least for the ASGI lifecycle events, though I guess we won't have to add much inside the backend for that.

@bluetech
Copy link
Contributor

Mind sharing that code :)

Here's the code I've used. I'm sure it's buggy and incomplete but was good enough for us. I used the singleton "double checked locking" pattern to avoid taking the lock if already initialized.

Details

from typing import Any, ClassVar
import threading

from django.core.exceptions import ImproperlyConfigured
from django.db.backends.postgresql.base import (
    Cursor,
    ServerBindingCursor,
)
from django.db.backends.postgresql.base import (
    DatabaseWrapper as PostgresqlDatabaseWrapper,
)
from django.utils.asyncio import async_unsafe
import psycopg
from psycopg_pool import ConnectionPool


class DatabaseWrapper(PostgresqlDatabaseWrapper):
    _pools_lock: ClassVar = threading.Lock()
    # (unique connection params key) -> ConnectionPool
    _pools: ClassVar[dict[object, ConnectionPool]] = {}

    def _configure_new_conn(self, connection: psycopg.Connection[Any]) -> None:
        # self.isolation_level must be set:
        # - after connecting to the database in order to obtain the database's
        #   default when no value is explicitly specified in options.
        # - before calling _set_autocommit() because if autocommit is on, that
        #   will set connection.isolation_level to ISOLATION_LEVEL_AUTOCOMMIT.
        options = self.settings_dict['OPTIONS']
        try:
            isolation_level_value = options['isolation_level']
        except KeyError:
            self.isolation_level = psycopg.IsolationLevel.READ_COMMITTED
        else:
            # Set the isolation level to the value from OPTIONS.
            try:
                self.isolation_level = psycopg.IsolationLevel(isolation_level_value)
            except ValueError:
                raise ImproperlyConfigured(
                    f'Invalid transaction isolation level {isolation_level_value} '
                    f'specified. Use one of the psycopg.IsolationLevel values.'
                ) from None
            connection.isolation_level = self.isolation_level

        connection.cursor_factory = (
            ServerBindingCursor
            if options.get('server_side_binding') is True
            else Cursor
        )

    @async_unsafe
    def get_new_connection(self, conn_params) -> psycopg.Connection[Any]:
        dbname = conn_params.get('dbname')
        key = (self.alias, dbname, conn_params.get('options'))
        cls = self.__class__
        if key not in cls._pools:
            with cls._pools_lock:
                if key not in cls._pools:
                    cls._pools[key] = ConnectionPool(
                        min_size=self.settings_dict.get('POOL_MIN_SIZE', 4),
                        max_size=self.settings_dict.get('POOL_MAX_SIZE', None),
                        open=True,
                        name=f'dbbackend-pool-{self.alias}',
                        timeout=self.settings_dict.get('POOL_TIMEOUT', 30.0),
                        max_waiting=self.settings_dict.get('POOL_MAX_WAITING', 0),
                        max_lifetime=self.settings_dict.get('POOL_MAX_LIFETIME', 60 * 60.0),
                        max_idle=self.settings_dict.get('POOL_MAX_IDLE', 5 * 60.0),
                        reconnect_timeout=self.settings_dict.get('POOL_RECONNECT_TIMEOUT', 5 * 60.0),
                        reconnect_failed=None,
                        num_workers=self.settings_dict.get('POOL_NUM_WORKERS', 3),
                        configure=self._configure_new_conn,
                        reset=None,
                        kwargs=conn_params,
                    )

        return self._pools[key].getconn()

    def _close(self) -> None:
        if self.connection is not None:
            with self.wrap_database_errors:
                pool: ConnectionPool = self.connection.connection._pool
                pool.putconn(self.connection)

Mhm, good point. I hate to add a threading.Lock for that, but I guess that is the only way to get this done safely. Another option would be to use setdefault which is hopefully threadsafe?

setdefault will be thread safe but it can still cause multiple pools to be created. The redundant ones will be immediately discarded -- maybe that's OK and worth it for avoiding a lock.

Maybe, not to sure, this is something Django as a whole could benefit from and not just the DB backends?

Yes I think so, but how would the DatabaseWrapper hook into it - using signals maybe?

Agreed, but be don't have a concept of a shutdown in Django at all. Maybe we can enable it at least for the ASGI lifecycle events, though I guess we won't have to add much inside the backend for that.

I was mostly thinking about test frameworks or such, that maybe init Django as a library multiple times with different settings etc during the same run. Mutable globals are problematic for this (see PEP-684 "Per-Interpreter GIL" for an analogous problem in CPython itself). But if this is not something that's supported currently, probably fine to ignore here.

@apollo13
Copy link
Member Author

Here's the code I've used. I'm sure it's buggy and incomplete but was good enough for us.

Thanks, good to see that we did it pretty similar. Interestingly enough you are using wrap_database_errors with putconn which we can enable as well now that we did remove the _reset callback (because that runs in it's own thread https://github.com/psycopg/psycopg/blob/master/psycopg_pool/psycopg_pool/pool.py#L244-L248).

setdefault will be thread safe but it can still cause multiple pools to be created. The redundant ones will be immediately discarded -- maybe that's OK and worth it for avoiding a lock.

I guess it won't matter. It really is only an issue during startup, later on it would always find a pool.

I was mostly thinking about test frameworks or such, that maybe init Django as a library multiple times with different settings etc during the same run.

Yeah, tests will probably be tricky :)

@apollo13
Copy link
Member Author

@bluetech Thank you for your thorough reviews so far. I have opted for setdefault now. Let's see if there is some more interest in this PR and maybe someone wants to write some tests and docs :)

@apollo13
Copy link
Member Author

apollo13 commented Jul 2, 2023

I'll close this PR since I currently do not have the time to bring this over the finish line. If someone wants to get this merged I guess only docs & tests are missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants