asyncpg uses prepared statements despite they're disabled #1058

ionsome · 2023-08-11T16:16:20Z

asyncpg version: 0.28.0
PostgreSQL version: 14
Python version: 3.10
Platform: OSX
Do you use pgbouncer?: yes (transactional mode)
Did you install asyncpg with pip?: yes

Started from sqlalchemy/sqlalchemy#10226.

asyncpg uses prepared statements despite they're disabled.

To reproduce (thanks @zzzeek for the script!):

import asyncpg
import asyncio
import uuid

async def main():
    conn = await asyncpg.connect(
        user="scott",
        password="tiger",
        host="127.0.0.1",
        database="test",
        port=6432,
        statement_cache_size=0,
    )

    pps = await conn.prepare("select 1", name=f'__asyncpg_{uuid.uuid4()}__')
    rows = await pps.fetch()

    # remove this 'del' and the error goes away
    del pps

    (await conn.fetchrow(";"))


asyncio.run(main())

Error

Traceback (most recent call last):
  File "/user/app/script.py", line 57, in <module>
    asyncio.run(main())
  File "/user/.pyenv/versions/3.10.3/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/user/.pyenv/versions/3.10.3/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()
  File "/user/app/script.py", line 53, in main
    (await conn.fetchrow(";"))
  File "/user/app/venv/lib/python3.10/site-packages/asyncpg/connection.py", line 678, in fetchrow
    data = await self._execute(
  File "/user/app/venv/lib/python3.10/site-packages/asyncpg/connection.py", line 1658, in _execute
    result, _ = await self.__execute(
  File "/user/app/venv/lib/python3.10/site-packages/asyncpg/connection.py", line 1683, in __execute
    return await self._do_execute(
  File "/user/app/venv/lib/python3.10/site-packages/asyncpg/connection.py", line 1730, in _do_execute
    result = await executor(stmt, None)
  File "asyncpg/protocol/protocol.pyx", line 201, in bind_execute
asyncpg.exceptions.InvalidSQLStatementNameError: unnamed prepared statement does not exist
HINT:  
NOTE: pgbouncer with pool_mode set to "transaction" or
"statement" does not support prepared statements properly.
You have two options:

* if you are using pgbouncer for connection pooling to a
  single server, switch to the connection pool functionality
  provided by asyncpg, it is a much better option for this
  purpose;

* if you have no option of avoiding the use of pgbouncer,
  then you can set statement_cache_size to 0 when creating
  the asyncpg connection object.

The text was updated successfully, but these errors were encountered:

elprans · 2023-08-15T18:59:50Z

statement_cache_size=0 does not mean that asyncpg will not use prepared statements at all, only that it will not attempt to use named prepared statements and re-use them. Unnamed prepared statements will still be used as they're an essential part of PostgreSQL Extended Query protocol.

Unfortunately, I'm unable to reproduce the error using your script (I tried on Python 3.10 and 3.11).

zzzeek · 2023-08-15T19:28:33Z

hi @elprans -

this error reproduces if you run it against pgbouncer in "transaction" mode, here is a config file I used for pgbouncer to reproduce this:

[databases]
test = host=127.0.0.1 dbname=test auth_user=scott password=tiger

[pgbouncer]
pool_mode = transaction
listen_port = 6432
listen_addr = localhost
auth_type = md5
auth_file = users.txt
logfile = pgbouncer.log
pidfile = pgbouncer.pid
admin_users = someuser
stats_users = stat_collector

also, can we look at the error message that asyncpg is emitting, which is also in the FAQ - it suggests using "statement_cache_size = 0" in order to mitigate the issue, however our test case here shows that this is not sufficient. if you were able to reproduce this, would it be considered a bug, or is the whole notion of using "transaction" mode in pgbouncer something that asyncpg should say is explicitly not supported?

elprans · 2023-08-15T23:35:41Z

this error reproduces if you run it against pgbouncer in "transaction" mode, here is a config file I used for pgbouncer to reproduce this

OK, thanks, can repro now.

if you were able to reproduce this, would it be considered a bug, or is the whole notion of using "transaction" mode in pgbouncer something that asyncpg should say is explicitly not supported?

It's technically a bug. Or, rather, insufficiently strong mitigation. For now a solution would be to force another Parse before Bind/Execute when statement_cache_size == 0, which would impose a certain performance penalty. Avoiding the second Parse might be possible, but not without a big code reorg.

That said, ps = conn.prepare(); ps.execute() would obviously still not work as it faces the same problem of prepared statement persistence. I don't think lying about statement "preparedness" here is a good idea, so you'll need to apply a similar workaround in SQLAlchemy for the pgbouncer case (i.e. only use prepare() for introspection and instead of calling ps.fetch() redo the query with conn.fetch()).

It appears that PgBouncer's `transaction` pooling mode does not consider implicit transactions properly, and so in a [`Parse`, `Flush`, `Bind`, `Execute`, `Sync`] sequence, `Flush` would be (incorrectly) considered by PgBouncer as a transaction boundary and it will happily send the following `Bind` / `Execute` messages to a different backend process. This makes it so that when `statement_cache_size` is set to `0`, asyncpg assumes a pessimistic stance on prepared statement persistence and does not rely on them even in implicit transactions. The above message sequence thus becomes `Parse`, `Flush`, `Parse` (a second time), `Bind`, `Execute`, `Sync`. This obviously has negative performance impact due to the extraneous `Parse`. Fixes: #1058 Fixes: #1041

zzzeek · 2023-08-16T02:26:27Z

when used in this mode (and as seen in the test case) we are giving our prepared statements names and they run on pgbouncer without issue. the issue here is that a totally unrelated statement that is using conn.fetch() appears to be polluted due to previous, unrelated activity on the connection. why would conn.fetch() do anything with prepared statements when it is not using conn.prepare() ?

We can "fix" this issue on our end by using conn.prepare() with a name in all cases. indeed the poster here first proposed we simply revert our change that uses conn.fetch() for this one particular operation that does not require overhead of prepared statements.

additionally it seems, though maybe you can clarify, that it would be quite wasteful to do conn.prepare(), get the attributes for the result, then do conn.fetch() anyway?

elprans · 2023-08-16T16:17:27Z

when used in this mode (and as seen in the test case) we are giving our prepared statements names and they run on pgbouncer without issue

This is surprising to me. This is from the PgBouncer FAQ:

"How to use prepared statements with transaction pooling?

To make prepared statements work in this mode would need PgBouncer to keep track of them internally, which it does not do. So the only way to keep using PgBouncer in this mode is to disable prepared statements in the client."

why would conn.fetch() do anything with prepared statements when it is not using conn.prepare()

fetch() relies on PostgreSQL Extended Query protocol, where prepared statements (named or unnamed) are part of the normal flow. Asyncpg relies on prepared statement introspection in order to actually be able to encode argument data in Bind and decode row data in RowData responses. The issue here is that PgBouncer seems to be completely broken with respect to implicit transactions at the protocol level.

elprans · 2023-08-16T17:15:26Z

Here's an illustration of what PgBouncer sends to Postgres in the above script (I modified fetchrow(";") to fetchrow('select $1::int', 1)):

[2653098] LOG:  duration: 0.104 ms  statement: SET client_encoding='''utf-8''';
[2653098] LOG:  duration: 0.118 ms  parse __asyncpg_790766f0-8d39-42e6-9089-8afaa114e054__: select 1
[2653098] LOG:  duration: 0.057 ms  bind __asyncpg_790766f0-8d39-42e6-9089-8afaa114e054__: select 1
[2653098] LOG:  duration: 0.008 ms  execute __asyncpg_790766f0-8d39-42e6-9089-8afaa114e054__: select 1
[2653099] LOG:  duration: 0.121 ms  statement: SET client_encoding='''utf-8''';
[2653099] LOG:  duration: 0.159 ms  parse <unnamed>: select $1::int
[2653098] LOG:  duration: 0.025 ms  statement: SET client_encoding='''utf-8''';
[2653098] LOG:  duration: 0.063 ms  parse <unnamed>: select $1::int
[2653098] LOG:  duration: 0.020 ms  bind <unnamed>: select $1::int
[2653098] DETAIL:  parameters: $1 = '1'
[2653098] LOG:  duration: 0.003 ms  execute <unnamed>: select $1::int
[2653098] DETAIL:  parameters: $1 = '1'

Note two things:

injection of SET client_encoding (this is what causes the unnamed prepared statement to get blown up)
change of PID indicating that the initial parse and subsequent bind/execute run on different backend despite being logically part of the same transaction (there is no Sync message between the Parse and the Bind).

There is no reason why this wouldn't happen with explicit named prepared statements.

It appears that PgBouncer's `transaction` pooling mode does not consider implicit transactions properly, and so in a [`Parse`, `Flush`, `Bind`, `Execute`, `Sync`] sequence, `Flush` would be (incorrectly) considered by PgBouncer as a transaction boundary and it will happily send the following `Bind` / `Execute` messages to a different backend process. This makes it so that when `statement_cache_size` is set to `0`, asyncpg assumes a pessimistic stance on prepared statement persistence and does not rely on them even in implicit transactions. The above message sequence thus becomes `Parse`, `Flush`, `Parse` (a second time), `Bind`, `Execute`, `Sync`. This obviously has negative performance impact due to the extraneous `Parse`. Fixes: #1058 Fixes: #1041

zzzeek · 2023-08-16T19:35:16Z

This is surprising to me. This is from the PgBouncer FAQ:

"How to use prepared statements with transaction pooling?

To make prepared statements work in this mode would need PgBouncer to keep track of them internally, which it does not do. So the only way to keep using PgBouncer in this mode is to disable prepared statements in the client."

So it's weird they say that, because I think I understand the problem now and that FAQ is not really accurate. Ive googled a bit for other takes on this issue, and the issue really is nothing more than "transaction" mode is going to give you a different connection for each command, as you've illustrated in your example above. So what the FAQ does not say is that prepared statements won't work, if you did not actually begin a transaction. Because that's what "transaction" mode means; pgbouncer will give you the same connection for all commands if you open a transaction and work inside that.

So we can make the above test case work just like this:

async def main():
    conn = await asyncpg.connect(
        user="scott",
        password="tiger",
        host="127.0.0.1",
        database="test",
        port=6432,
        statement_cache_size=0,
    )

    async with conn.transaction():
        pps = await conn.prepare("select 1", name=f'__asyncpg_{uuid.uuid4()}__')
        rows = await pps.fetch()

        # del or not, doesn't matter
        del pps

        (await conn.fetchrow(";"))

the other part of it is that the prepared statement we use explicitly has a name using the feature added by #837, that is asyncpg otherwise will name it something that might be already present on a different PG connection that nonetheless to asyncpg looks like the same connection, as pgbouncer switches connections around behind the scenes. This itself could also be mitigated by using server_reset_query_always , while potentially though not necessarily setting server_reset_query to use DEALLOCATE ALL instead of DISCARD ALL. Again pgbouncer's docs here are misleading when they use terminology like "This setting is for working around broken setups that run applications that use session features over a transaction-pooled PgBouncer."

So what's happening in SQLAlchemy is that we've wrapped asyncpg behind a layer that provides pep-249 compatibility, which includes, all SQL commands implicitly auto-begin a transaction. That's why things work for us. This particular "SELECT ;" thing is a liveness ping, that occurs outside of any transaction.....on the asyncpg driver. On other drivers like pg8000, the DBAPI always uses a transaction.

So. Great news for asyncpg, I can fix this on my end just by making sure our "ping" is run inside of a transaction. clearly asyncpg can't do much here, would be nice if pgbouncer had less opinionated documentation that is not so confusing to everyone.

elprans · 2023-08-16T20:00:07Z

if you did not actually begin a transaction. Because that's what "transaction" mode means; pgbouncer will give you the same connection for all commands if you open a transaction and work inside that.

Yes, except they completely overlook that implicit transactions exist and break those.

clearly asyncpg can't do much here

I think I'm still going to merge the "redo the Parse if statement cache is off" PR.

zzzeek · 2023-08-16T22:20:20Z

sounds great!

…1065) It appears that PgBouncer's `transaction` pooling mode does not consider implicit transactions properly, and so in a [`Parse`, `Flush`, `Bind`, `Execute`, `Sync`] sequence, `Flush` would be (incorrectly) considered by PgBouncer as a transaction boundary and it will happily send the following `Bind` / `Execute` messages to a different backend process. This makes it so that when `statement_cache_size` is set to `0`, asyncpg assumes a pessimistic stance on prepared statement persistence and does not rely on them even in implicit transactions. The above message sequence thus becomes `Parse`, `Flush`, `Parse` (a second time), `Bind`, `Execute`, `Sync`. This obviously has negative performance impact due to the extraneous `Parse`. Fixes: #1058 Fixes: #1041

nsbradford · 2024-01-07T03:29:11Z

Hi all - is there any workaround for this? I'm using SQLModel/sqlalchemy on Supabase, but this is blocking moving to async.

Setting the cache to 0 doesn't seem to work, perhaps this is incorrect? (The error only happens on the second query).

asyncpg version: 0.29.0
SQLModel version: 0.0.11
sqlalchemy version: 1.4.51
PostgreSQL version: PostgreSQL 15.1
Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
the issue with a local PostgreSQL install?: Supabase; N/A did not attempt
Python version: 3.11.6
Platform: MacOS
Do you use pgbouncer?: Supabase recently switched to its own connection pooler Supavisor https://supabase.com/blog/supavisor-1-million
Did you install asyncpg with pip?: no, with poetry
If you built asyncpg locally, which version of Cython did you use?: N/A did not attempt
Can the issue be reproduced under both asyncio and
uvloop?: N/A did not attempt

async_engine = create_async_engine(
        async_connection_str,
        pool_pre_ping=True,
        pool_size=8,
        connect_args={"statement_cache_size": 0}, # doesn't seem to work?
    )

# first query is fine
async with AsyncSession(async_engine) as session:
    result = await session.get(MyRow, "123")
  
# second query throws error
async with AsyncSession(async_engine) as session:
    result = await session.get(MyRow, "123")

# sqlalchemy.exc.ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) 
# <class 'asyncpg.exceptions.DuplicatePreparedStatementError'>: 
# prepared statement "__asyncpg_stmt_1__" already exists

edit: it's now started failing on the first query as well.

wwarne · 2024-01-24T17:56:04Z

# <class 'asyncpg.exceptions.DuplicatePreparedStatementError'>: 
# prepared statement "__asyncpg_stmt_1__" already exists

Hi @nsbradford. I've had same error and fixed it by adding 'prepared_statement_name_func': lambda: f'__asyncpg_{uuid.uuid4()}__', to connect_args as described here (please check the Warning section there): https://docs.sqlalchemy.org/en/20/dialects/postgresql.html#prepared-statement-name

nsbradford · 2024-01-27T17:37:20Z

Thanks for the link! But now i get

connect() got an unexpected keyword argument 'prepared_statement_name_func'

Presumably because i'm on SQLAlchemy 1.4 instead of 2.0+. The legacy docs (https://docs.sqlalchemy.org/en/14/dialects/postgresql.html#module-sqlalchemy.dialects.postgresql.asyncpg) seem to suggest

engine = create_async_engine("postgresql+asyncpg://user:pass@hostname/dbname?prepared_statement_cache_size=0")

However, this doesn't seem to work either :( still prepared statement "__asyncpg_stmt_1__" already exists - can't find another way to remove.

        self.async_engine = create_async_engine(
            # https://docs.sqlalchemy.org/en/14/dialects/postgresql.html#module-sqlalchemy.dialects.postgresql.asyncpg
            postgres_connection_string.replace(
                "postgresql://", "postgresql+asyncpg://", 1
            ) + "?prepared_statement_cache_size=0",
            poolclass=NullPool,
            connect_args={"statement_cache_size": 0
            },
        )

wwarne · 2024-01-28T13:55:19Z

@nsbradford
Then maybe this method with new Connection class could help - sqlalchemy/sqlalchemy#6467 (comment)
Could you try it please?
(adding ?prepared_statement_cache_size to the server URI didn't help me either)

nkhitrov · 2024-01-28T15:33:14Z

@nsbradford in my project I added prepared_statement_cache_size as connection param insted of url param. Seems to be working with QueuePool and overrided id (tested on stage)

# sqlalchemy 1.4 example
from asyncpg import Connection

class UniquePreparedStatementConnection(Connection):
    def _get_unique_id(self, prefix: str) -> str:
        return f"__asyncpg_{prefix}_{uuid.uuid4()}__"

engine = create_async_engine(
        url=url,
        connect_args={
            "connection_class": UniquePreparedStatementConnection,
            "statement_cache_size": 0,
            "prepared_statement_cache_size": 0,
        },
    )

# sqlalchemy 2.0 example

engine = create_async_engine(
        url=url,
        connect_args={
            "prepared_statement_name_func": lambda: f"__asyncpg_{uuid.uuid4()}__",
            "statement_cache_size": 0,
            "prepared_statement_cache_size": 0,
        },
)

zzzeek mentioned this issue Aug 11, 2023

PGDialect_asyncpg.do_ping occasionally uses prepared statements sqlalchemy/sqlalchemy#10226

Closed

elprans mentioned this issue Aug 16, 2023

When prepared statements are disabled, avoid relying on them harder #1065

Merged

elprans closed this as completed in #1065 Aug 17, 2023

awalias mentioned this issue Jan 29, 2024

How to use supavisor with asyncpg? supabase/supavisor#287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asyncpg uses prepared statements despite they're disabled #1058

asyncpg uses prepared statements despite they're disabled #1058

ionsome commented Aug 11, 2023

elprans commented Aug 15, 2023

zzzeek commented Aug 15, 2023

elprans commented Aug 15, 2023

zzzeek commented Aug 16, 2023

elprans commented Aug 16, 2023

elprans commented Aug 16, 2023

zzzeek commented Aug 16, 2023

elprans commented Aug 16, 2023

zzzeek commented Aug 16, 2023

nsbradford commented Jan 7, 2024 •

edited

Loading

wwarne commented Jan 24, 2024

nsbradford commented Jan 27, 2024 •

edited

Loading

wwarne commented Jan 28, 2024

nkhitrov commented Jan 28, 2024

asyncpg uses prepared statements despite they're disabled #1058

asyncpg uses prepared statements despite they're disabled #1058

Comments

ionsome commented Aug 11, 2023

elprans commented Aug 15, 2023

zzzeek commented Aug 15, 2023

elprans commented Aug 15, 2023

zzzeek commented Aug 16, 2023

elprans commented Aug 16, 2023

elprans commented Aug 16, 2023

zzzeek commented Aug 16, 2023

elprans commented Aug 16, 2023

zzzeek commented Aug 16, 2023

nsbradford commented Jan 7, 2024 • edited Loading

wwarne commented Jan 24, 2024

nsbradford commented Jan 27, 2024 • edited Loading

wwarne commented Jan 28, 2024

nkhitrov commented Jan 28, 2024

nsbradford commented Jan 7, 2024 •

edited

Loading

nsbradford commented Jan 27, 2024 •

edited

Loading