Feature request: 'get_attributes' without prepared statements #837

nikitagashkov · 2021-10-12T15:29:33Z

Hello!

Right now I'm trying out the AsyncSession in the SQLAlchemy@1.4 which uses asyncpg under the hood and as it turns out that specific configuration does not work well with pgbouncer even if you disable statement caching both in asyncpg and SQLAlchemy.

As discussed in the corresponding issue — sqlalchemy/sqlalchemy#6467 — SQLAlchemy uses Connection.prepare() regardless of user's decision on whether or not to use prepared statements. It is required to fetch the query attributes.

It looks like if there would be an ability to fetch attributes of the query without preparing it first, the issue could be mitigated. Therefore, I wanted to ask if there is a way to fetch those attributes without calling Connection.prepare() first?

cc @zzzeek

The text was updated successfully, but these errors were encountered:

elprans · 2021-10-12T16:34:02Z

I wanted to ask if there is a way to fetch those attributes without calling Connection.prepare() first?

Alas, no. Asyncpg is fundamentally built around the Postgres extended query protocol, and so the API and the implementation are built around that. The other way to get information about the query result is to run it with the simple query protocol, but we can't use that, because the protocol does not return enough information about the attribute types to decode them effectively and so we must introspect the type catalog to build the decoding pipeline.

Looking at sqlalchemy/sqlalchemy#6467, it seems like the issue isn't necessarily that prepare() doesn't work, it's that prepared statement names aren't unique across connections and pgbouncer mixes them up. Merging #775 might help with this, although I need to think whether that should be unconditional or based on some connection parameter, since UUID generation isn't necessarily very fast.

zzzeek · 2021-10-12T18:24:05Z

an option to provide a unique identifier would be great because then i dont have to change our prepared statement approach. if there's a way to allow the caller to provide the identifier i could allow it to be pluggable in case the uuid thing is "too slow" - but also if we are using prepared statement caching, that would reduce the uuid generation in any case since the uuid would be per prepared stattement only is that right?

elprans · 2021-10-12T18:52:46Z

an option to provide a unique identifier would be great

I'd like to avoid cluttering the API with too many knobs if we can avoid it.

if we are using prepared statement caching

prepare() intentionally does not use the builtin LRU cache, because the intent is to always return a new prepared statement (which you might or might not cache yourself). This is because you sometimes need to get a fresh and specific query plan instead of using the generic plan that might have been cached by Postgres for the asyncpg-cached prepared statement.

To clarify, I'm chiefly worried about systems where os.urandom() might block or is slow. uuid.uuid1() would potentially be less variadic, but it poses privacy risks (you'd be exposing your MAC address to the server). Maybe using a value from Python's PRNG would be enough?

zzzeek · 2021-10-12T20:17:44Z

yeah i know all about knobs and switches. so we cache those prepared statements on our end because we found this to improve performance a lot. so...we are talking about uuid/urandom/whatever when the prepared statement is created, and that's it, right.

PRNG is usually enough but you can also potentially combine it with os.pid() and perhaps something local to the host, such as the MAC address of the network interface, which i guess is about the same as uuid1(), although if you sha256 or sha512 that data that makes it...difficult? to reverse it, since brute force is the only approach. or you can provide the "mac" portion to uuid.uuid1() via the node parameter that can either be a PRNG or perhaps a sha512 of getnode().

this is the thing, on a private network, which is the 99% use case for python programs talking to postgresql databases, if the machines are on the same subnet the server sees the mac address anyway. then there is that use case where someone is connecting over routed internet to a potentially hostile postgresql server where they dont want you to be able to extract the mac address from the hash, being able to turn this feature on/off makes it a lot easier to suit these different cases.

this is why if you let the calling user send "the id", then the security of these tokens could be our problem and we would offer all the bells and whistles to provide it in different ways, not asyncpg. if this identifier is part of PG's prepared statement protocol then it's reasonable that asyncpg would expose this.

elprans · 2021-10-12T21:32:57Z

Fair enough. Adding the optional name keyword argument to prepare() wouldn't be too bad, I think.

zzzeek · 2021-10-12T23:04:52Z

so this is we assume the only issue people are having w/ pgbouncer? id collision of prepared statements from different connections?

elprans · 2021-10-12T23:10:24Z

Yeah, I think so.

This adds the new `name` keyword argument to `Connection.prepare()` and `PreparedStatement.get_name()` method returning the name of a statement. Some users of asyncpg might find it useful to be able to control how prepared statements are named, especially when a custom prepared statement caching scheme is in use. Specifically, This should help with pgbouncer support in SQLAlchemy asyncpg dialect. Fixes: #837.

…846) This adds the new `name` keyword argument to `Connection.prepare()` and `PreparedStatement.get_name()` method returning the name of a statement. Some users of asyncpg might find it useful to be able to control how prepared statements are named, especially when a custom prepared statement caching scheme is in use. Specifically, This should help with pgbouncer support in SQLAlchemy asyncpg dialect. Fixes: #837.

elprans mentioned this issue Nov 7, 2021

Make it possible to specify a statement name in Connection.prepare() #846

Merged

elprans closed this as completed in #846 Nov 16, 2021

zzzeek mentioned this issue Nov 16, 2021

Using statement_cache_size asyncpg setting / prepared statement name for asyncpg w pgbouncer sqlalchemy/sqlalchemy#6467

Closed

zzzeek mentioned this issue Aug 16, 2023

asyncpg uses prepared statements despite they're disabled #1058

Closed

awalias mentioned this issue Jan 29, 2024

How to use supavisor with asyncpg? supabase/supavisor#287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: 'get_attributes' without prepared statements #837

Feature request: 'get_attributes' without prepared statements #837

nikitagashkov commented Oct 12, 2021

elprans commented Oct 12, 2021

zzzeek commented Oct 12, 2021

elprans commented Oct 12, 2021

zzzeek commented Oct 12, 2021

elprans commented Oct 12, 2021

zzzeek commented Oct 12, 2021

elprans commented Oct 12, 2021

Feature request: 'get_attributes' without prepared statements #837

Feature request: 'get_attributes' without prepared statements #837

Comments

nikitagashkov commented Oct 12, 2021

elprans commented Oct 12, 2021

zzzeek commented Oct 12, 2021

elprans commented Oct 12, 2021

zzzeek commented Oct 12, 2021

elprans commented Oct 12, 2021

zzzeek commented Oct 12, 2021

elprans commented Oct 12, 2021