Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncpg uses prepared statements despite they're disabled #1058

Closed
ionsome opened this issue Aug 11, 2023 · 14 comments · Fixed by #1065
Closed

asyncpg uses prepared statements despite they're disabled #1058

ionsome opened this issue Aug 11, 2023 · 14 comments · Fixed by #1065

Comments

@ionsome
Copy link

ionsome commented Aug 11, 2023

  • asyncpg version: 0.28.0
  • PostgreSQL version: 14
  • Python version: 3.10
  • Platform: OSX
  • Do you use pgbouncer?: yes (transactional mode)
  • Did you install asyncpg with pip?: yes

Started from sqlalchemy/sqlalchemy#10226.

asyncpg uses prepared statements despite they're disabled.

To reproduce (thanks @zzzeek for the script!):

import asyncpg
import asyncio
import uuid

async def main():
    conn = await asyncpg.connect(
        user="scott",
        password="tiger",
        host="127.0.0.1",
        database="test",
        port=6432,
        statement_cache_size=0,
    )

    pps = await conn.prepare("select 1", name=f'__asyncpg_{uuid.uuid4()}__')
    rows = await pps.fetch()

    # remove this 'del' and the error goes away
    del pps

    (await conn.fetchrow(";"))


asyncio.run(main())

Error

Traceback (most recent call last):
  File "/user/app/script.py", line 57, in <module>
    asyncio.run(main())
  File "/user/.pyenv/versions/3.10.3/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/user/.pyenv/versions/3.10.3/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()
  File "/user/app/script.py", line 53, in main
    (await conn.fetchrow(";"))
  File "/user/app/venv/lib/python3.10/site-packages/asyncpg/connection.py", line 678, in fetchrow
    data = await self._execute(
  File "/user/app/venv/lib/python3.10/site-packages/asyncpg/connection.py", line 1658, in _execute
    result, _ = await self.__execute(
  File "/user/app/venv/lib/python3.10/site-packages/asyncpg/connection.py", line 1683, in __execute
    return await self._do_execute(
  File "/user/app/venv/lib/python3.10/site-packages/asyncpg/connection.py", line 1730, in _do_execute
    result = await executor(stmt, None)
  File "asyncpg/protocol/protocol.pyx", line 201, in bind_execute
asyncpg.exceptions.InvalidSQLStatementNameError: unnamed prepared statement does not exist
HINT:  
NOTE: pgbouncer with pool_mode set to "transaction" or
"statement" does not support prepared statements properly.
You have two options:

* if you are using pgbouncer for connection pooling to a
  single server, switch to the connection pool functionality
  provided by asyncpg, it is a much better option for this
  purpose;

* if you have no option of avoiding the use of pgbouncer,
  then you can set statement_cache_size to 0 when creating
  the asyncpg connection object.

@elprans
Copy link
Member

elprans commented Aug 15, 2023

statement_cache_size=0 does not mean that asyncpg will not use prepared statements at all, only that it will not attempt to use named prepared statements and re-use them. Unnamed prepared statements will still be used as they're an essential part of PostgreSQL Extended Query protocol.

Unfortunately, I'm unable to reproduce the error using your script (I tried on Python 3.10 and 3.11).

@zzzeek
Copy link

zzzeek commented Aug 15, 2023

hi @elprans -

this error reproduces if you run it against pgbouncer in "transaction" mode, here is a config file I used for pgbouncer to reproduce this:

[databases]
test = host=127.0.0.1 dbname=test auth_user=scott password=tiger

[pgbouncer]
pool_mode = transaction
listen_port = 6432
listen_addr = localhost
auth_type = md5
auth_file = users.txt
logfile = pgbouncer.log
pidfile = pgbouncer.pid
admin_users = someuser
stats_users = stat_collector

also, can we look at the error message that asyncpg is emitting, which is also in the FAQ - it suggests using "statement_cache_size = 0" in order to mitigate the issue, however our test case here shows that this is not sufficient. if you were able to reproduce this, would it be considered a bug, or is the whole notion of using "transaction" mode in pgbouncer something that asyncpg should say is explicitly not supported?

@elprans
Copy link
Member

elprans commented Aug 15, 2023

this error reproduces if you run it against pgbouncer in "transaction" mode, here is a config file I used for pgbouncer to reproduce this

OK, thanks, can repro now.

if you were able to reproduce this, would it be considered a bug, or is the whole notion of using "transaction" mode in pgbouncer something that asyncpg should say is explicitly not supported?

It's technically a bug. Or, rather, insufficiently strong mitigation. For now a solution would be to force another Parse before Bind/Execute when statement_cache_size == 0, which would impose a certain performance penalty. Avoiding the second Parse might be possible, but not without a big code reorg.

That said, ps = conn.prepare(); ps.execute() would obviously still not work as it faces the same problem of prepared statement persistence. I don't think lying about statement "preparedness" here is a good idea, so you'll need to apply a similar workaround in SQLAlchemy for the pgbouncer case (i.e. only use prepare() for introspection and instead of calling ps.fetch() redo the query with conn.fetch()).

elprans added a commit that referenced this issue Aug 16, 2023
It appears that PgBouncer's `transaction` pooling mode does not consider
implicit transactions properly, and so in a [`Parse`, `Flush`, `Bind`,
`Execute`, `Sync`] sequence, `Flush` would be (incorrectly) considered by
PgBouncer as a transaction boundary and it will happily send the
following `Bind` / `Execute` messages to a different backend process.

This makes it so that when `statement_cache_size` is set to `0`, asyncpg
assumes a pessimistic stance on prepared statement persistence and does
not rely on them even in implicit transactions.  The above message
sequence thus becomes `Parse`, `Flush`, `Parse` (a second time), `Bind`,
`Execute`, `Sync`.

This obviously has negative performance impact due to the extraneous
`Parse`.

Fixes: #1058
Fixes: #1041
@zzzeek
Copy link

zzzeek commented Aug 16, 2023

when used in this mode (and as seen in the test case) we are giving our prepared statements names and they run on pgbouncer without issue. the issue here is that a totally unrelated statement that is using conn.fetch() appears to be polluted due to previous, unrelated activity on the connection. why would conn.fetch() do anything with prepared statements when it is not using conn.prepare() ?

We can "fix" this issue on our end by using conn.prepare() with a name in all cases. indeed the poster here first proposed we simply revert our change that uses conn.fetch() for this one particular operation that does not require overhead of prepared statements.

additionally it seems, though maybe you can clarify, that it would be quite wasteful to do conn.prepare(), get the attributes for the result, then do conn.fetch() anyway?

@elprans
Copy link
Member

elprans commented Aug 16, 2023

when used in this mode (and as seen in the test case) we are giving our prepared statements names and they run on pgbouncer without issue

This is surprising to me. This is from the PgBouncer FAQ:

"How to use prepared statements with transaction pooling?

To make prepared statements work in this mode would need PgBouncer to keep track of them internally, which it does not do. So the only way to keep using PgBouncer in this mode is to disable prepared statements in the client."

why would conn.fetch() do anything with prepared statements when it is not using conn.prepare()

fetch() relies on PostgreSQL Extended Query protocol, where prepared statements (named or unnamed) are part of the normal flow. Asyncpg relies on prepared statement introspection in order to actually be able to encode argument data in Bind and decode row data in RowData responses. The issue here is that PgBouncer seems to be completely broken with respect to implicit transactions at the protocol level.

@elprans
Copy link
Member

elprans commented Aug 16, 2023

Here's an illustration of what PgBouncer sends to Postgres in the above script (I modified fetchrow(";") to fetchrow('select $1::int', 1)):

[2653098] LOG:  duration: 0.104 ms  statement: SET client_encoding='''utf-8''';
[2653098] LOG:  duration: 0.118 ms  parse __asyncpg_790766f0-8d39-42e6-9089-8afaa114e054__: select 1
[2653098] LOG:  duration: 0.057 ms  bind __asyncpg_790766f0-8d39-42e6-9089-8afaa114e054__: select 1
[2653098] LOG:  duration: 0.008 ms  execute __asyncpg_790766f0-8d39-42e6-9089-8afaa114e054__: select 1
[2653099] LOG:  duration: 0.121 ms  statement: SET client_encoding='''utf-8''';
[2653099] LOG:  duration: 0.159 ms  parse <unnamed>: select $1::int
[2653098] LOG:  duration: 0.025 ms  statement: SET client_encoding='''utf-8''';
[2653098] LOG:  duration: 0.063 ms  parse <unnamed>: select $1::int
[2653098] LOG:  duration: 0.020 ms  bind <unnamed>: select $1::int
[2653098] DETAIL:  parameters: $1 = '1'
[2653098] LOG:  duration: 0.003 ms  execute <unnamed>: select $1::int
[2653098] DETAIL:  parameters: $1 = '1'

Note two things:

  1. injection of SET client_encoding (this is what causes the unnamed prepared statement to get blown up)
  2. change of PID indicating that the initial parse and subsequent bind/execute run on different backend despite being logically part of the same transaction (there is no Sync message between the Parse and the Bind).

There is no reason why this wouldn't happen with explicit named prepared statements.

elprans added a commit that referenced this issue Aug 16, 2023
It appears that PgBouncer's `transaction` pooling mode does not consider
implicit transactions properly, and so in a [`Parse`, `Flush`, `Bind`,
`Execute`, `Sync`] sequence, `Flush` would be (incorrectly) considered by
PgBouncer as a transaction boundary and it will happily send the
following `Bind` / `Execute` messages to a different backend process.

This makes it so that when `statement_cache_size` is set to `0`, asyncpg
assumes a pessimistic stance on prepared statement persistence and does
not rely on them even in implicit transactions.  The above message
sequence thus becomes `Parse`, `Flush`, `Parse` (a second time), `Bind`,
`Execute`, `Sync`.

This obviously has negative performance impact due to the extraneous
`Parse`.

Fixes: #1058
Fixes: #1041
@zzzeek
Copy link

zzzeek commented Aug 16, 2023

This is surprising to me. This is from the PgBouncer FAQ:

"How to use prepared statements with transaction pooling?

To make prepared statements work in this mode would need PgBouncer to keep track of them internally, which it does not do. So the only way to keep using PgBouncer in this mode is to disable prepared statements in the client."

So it's weird they say that, because I think I understand the problem now and that FAQ is not really accurate. Ive googled a bit for other takes on this issue, and the issue really is nothing more than "transaction" mode is going to give you a different connection for each command, as you've illustrated in your example above. So what the FAQ does not say is that prepared statements won't work, if you did not actually begin a transaction. Because that's what "transaction" mode means; pgbouncer will give you the same connection for all commands if you open a transaction and work inside that.

So we can make the above test case work just like this:

async def main():
    conn = await asyncpg.connect(
        user="scott",
        password="tiger",
        host="127.0.0.1",
        database="test",
        port=6432,
        statement_cache_size=0,
    )

    async with conn.transaction():
        pps = await conn.prepare("select 1", name=f'__asyncpg_{uuid.uuid4()}__')
        rows = await pps.fetch()

        # del or not, doesn't matter
        del pps

        (await conn.fetchrow(";"))

the other part of it is that the prepared statement we use explicitly has a name using the feature added by #837, that is asyncpg otherwise will name it something that might be already present on a different PG connection that nonetheless to asyncpg looks like the same connection, as pgbouncer switches connections around behind the scenes. This itself could also be mitigated by using server_reset_query_always , while potentially though not necessarily setting server_reset_query to use DEALLOCATE ALL instead of DISCARD ALL. Again pgbouncer's docs here are misleading when they use terminology like "This setting is for working around broken setups that run applications that use session features over a transaction-pooled PgBouncer."

So what's happening in SQLAlchemy is that we've wrapped asyncpg behind a layer that provides pep-249 compatibility, which includes, all SQL commands implicitly auto-begin a transaction. That's why things work for us. This particular "SELECT ;" thing is a liveness ping, that occurs outside of any transaction.....on the asyncpg driver. On other drivers like pg8000, the DBAPI always uses a transaction.

So. Great news for asyncpg, I can fix this on my end just by making sure our "ping" is run inside of a transaction. clearly asyncpg can't do much here, would be nice if pgbouncer had less opinionated documentation that is not so confusing to everyone.

@elprans
Copy link
Member

elprans commented Aug 16, 2023

if you did not actually begin a transaction. Because that's what "transaction" mode means; pgbouncer will give you the same connection for all commands if you open a transaction and work inside that.

Yes, except they completely overlook that implicit transactions exist and break those.

clearly asyncpg can't do much here

I think I'm still going to merge the "redo the Parse if statement cache is off" PR.

@zzzeek
Copy link

zzzeek commented Aug 16, 2023

sounds great!

elprans added a commit that referenced this issue Aug 17, 2023
…1065)

It appears that PgBouncer's `transaction` pooling mode does not consider
implicit transactions properly, and so in a [`Parse`, `Flush`, `Bind`,
`Execute`, `Sync`] sequence, `Flush` would be (incorrectly) considered by
PgBouncer as a transaction boundary and it will happily send the
following `Bind` / `Execute` messages to a different backend process.

This makes it so that when `statement_cache_size` is set to `0`, asyncpg
assumes a pessimistic stance on prepared statement persistence and does
not rely on them even in implicit transactions.  The above message
sequence thus becomes `Parse`, `Flush`, `Parse` (a second time), `Bind`,
`Execute`, `Sync`.

This obviously has negative performance impact due to the extraneous
`Parse`.

Fixes: #1058
Fixes: #1041
@nsbradford
Copy link

nsbradford commented Jan 7, 2024

Hi all - is there any workaround for this? I'm using SQLModel/sqlalchemy on Supabase, but this is blocking moving to async.

Setting the cache to 0 doesn't seem to work, perhaps this is incorrect? (The error only happens on the second query).

  • asyncpg version: 0.29.0
  • SQLModel version: 0.0.11
  • sqlalchemy version: 1.4.51
  • PostgreSQL version: PostgreSQL 15.1
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
    the issue with a local PostgreSQL install?
    : Supabase; N/A did not attempt
  • Python version: 3.11.6
  • Platform: MacOS
  • Do you use pgbouncer?: Supabase recently switched to its own connection pooler Supavisor https://supabase.com/blog/supavisor-1-million
  • Did you install asyncpg with pip?: no, with poetry
  • If you built asyncpg locally, which version of Cython did you use?: N/A did not attempt
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : N/A did not attempt
async_engine = create_async_engine(
        async_connection_str,
        pool_pre_ping=True,
        pool_size=8,
        connect_args={"statement_cache_size": 0}, # doesn't seem to work?
    )

# first query is fine
async with AsyncSession(async_engine) as session:
    result = await session.get(MyRow, "123")
  
# second query throws error
async with AsyncSession(async_engine) as session:
    result = await session.get(MyRow, "123")

# sqlalchemy.exc.ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) 
# <class 'asyncpg.exceptions.DuplicatePreparedStatementError'>: 
# prepared statement "__asyncpg_stmt_1__" already exists

edit: it's now started failing on the first query as well.

@wwarne
Copy link

wwarne commented Jan 24, 2024

# <class 'asyncpg.exceptions.DuplicatePreparedStatementError'>: 
# prepared statement "__asyncpg_stmt_1__" already exists

Hi @nsbradford. I've had same error and fixed it by adding 'prepared_statement_name_func': lambda: f'__asyncpg_{uuid.uuid4()}__', to connect_args as described here (please check the Warning section there): https://docs.sqlalchemy.org/en/20/dialects/postgresql.html#prepared-statement-name

@nsbradford
Copy link

nsbradford commented Jan 27, 2024

Thanks for the link! But now i get

connect() got an unexpected keyword argument 'prepared_statement_name_func'

Presumably because i'm on SQLAlchemy 1.4 instead of 2.0+. The legacy docs (https://docs.sqlalchemy.org/en/14/dialects/postgresql.html#module-sqlalchemy.dialects.postgresql.asyncpg) seem to suggest

engine = create_async_engine("postgresql+asyncpg://user:pass@hostname/dbname?prepared_statement_cache_size=0")

However, this doesn't seem to work either :( still prepared statement "__asyncpg_stmt_1__" already exists - can't find another way to remove.

        self.async_engine = create_async_engine(
            # https://docs.sqlalchemy.org/en/14/dialects/postgresql.html#module-sqlalchemy.dialects.postgresql.asyncpg
            postgres_connection_string.replace(
                "postgresql://", "postgresql+asyncpg://", 1
            ) + "?prepared_statement_cache_size=0",
            poolclass=NullPool,
            connect_args={"statement_cache_size": 0
            },
        )

@wwarne
Copy link

wwarne commented Jan 28, 2024

@nsbradford
Then maybe this method with new Connection class could help - sqlalchemy/sqlalchemy#6467 (comment)
Could you try it please?
(adding ?prepared_statement_cache_size to the server URI didn't help me either)

@nkhitrov
Copy link

@nsbradford in my project I added prepared_statement_cache_size as connection param insted of url param. Seems to be working with QueuePool and overrided id (tested on stage)

# sqlalchemy 1.4 example
from asyncpg import Connection

class UniquePreparedStatementConnection(Connection):
    def _get_unique_id(self, prefix: str) -> str:
        return f"__asyncpg_{prefix}_{uuid.uuid4()}__"

engine = create_async_engine(
        url=url,
        connect_args={
            "connection_class": UniquePreparedStatementConnection,
            "statement_cache_size": 0,
            "prepared_statement_cache_size": 0,
        },
    )

# sqlalchemy 2.0 example

engine = create_async_engine(
        url=url,
        connect_args={
            "prepared_statement_name_func": lambda: f"__asyncpg_{uuid.uuid4()}__",
            "statement_cache_size": 0,
            "prepared_statement_cache_size": 0,
        },
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants