Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: 'get_attributes' without prepared statements #837

Closed
nikitagashkov opened this issue Oct 12, 2021 · 7 comments · Fixed by #846
Closed

Feature request: 'get_attributes' without prepared statements #837

nikitagashkov opened this issue Oct 12, 2021 · 7 comments · Fixed by #846

Comments

@nikitagashkov
Copy link

Hello!

Right now I'm trying out the AsyncSession in the SQLAlchemy@1.4 which uses asyncpg under the hood and as it turns out that specific configuration does not work well with pgbouncer even if you disable statement caching both in asyncpg and SQLAlchemy.

As discussed in the corresponding issue — sqlalchemy/sqlalchemy#6467SQLAlchemy uses Connection.prepare() regardless of user's decision on whether or not to use prepared statements. It is required to fetch the query attributes.

It looks like if there would be an ability to fetch attributes of the query without preparing it first, the issue could be mitigated. Therefore, I wanted to ask if there is a way to fetch those attributes without calling Connection.prepare() first?

cc @zzzeek

@elprans
Copy link
Member

elprans commented Oct 12, 2021

I wanted to ask if there is a way to fetch those attributes without calling Connection.prepare() first?

Alas, no. Asyncpg is fundamentally built around the Postgres extended query protocol, and so the API and the implementation are built around that. The other way to get information about the query result is to run it with the simple query protocol, but we can't use that, because the protocol does not return enough information about the attribute types to decode them effectively and so we must introspect the type catalog to build the decoding pipeline.

Looking at sqlalchemy/sqlalchemy#6467, it seems like the issue isn't necessarily that prepare() doesn't work, it's that prepared statement names aren't unique across connections and pgbouncer mixes them up. Merging #775 might help with this, although I need to think whether that should be unconditional or based on some connection parameter, since UUID generation isn't necessarily very fast.

@zzzeek
Copy link

zzzeek commented Oct 12, 2021

an option to provide a unique identifier would be great because then i dont have to change our prepared statement approach. if there's a way to allow the caller to provide the identifier i could allow it to be pluggable in case the uuid thing is "too slow" - but also if we are using prepared statement caching, that would reduce the uuid generation in any case since the uuid would be per prepared stattement only is that right?

@elprans
Copy link
Member

elprans commented Oct 12, 2021

an option to provide a unique identifier would be great

I'd like to avoid cluttering the API with too many knobs if we can avoid it.

if we are using prepared statement caching

prepare() intentionally does not use the builtin LRU cache, because the intent is to always return a new prepared statement (which you might or might not cache yourself). This is because you sometimes need to get a fresh and specific query plan instead of using the generic plan that might have been cached by Postgres for the asyncpg-cached prepared statement.

To clarify, I'm chiefly worried about systems where os.urandom() might block or is slow. uuid.uuid1() would potentially be less variadic, but it poses privacy risks (you'd be exposing your MAC address to the server). Maybe using a value from Python's PRNG would be enough?

@zzzeek
Copy link

zzzeek commented Oct 12, 2021

yeah i know all about knobs and switches. so we cache those prepared statements on our end because we found this to improve performance a lot. so...we are talking about uuid/urandom/whatever when the prepared statement is created, and that's it, right.

PRNG is usually enough but you can also potentially combine it with os.pid() and perhaps something local to the host, such as the MAC address of the network interface, which i guess is about the same as uuid1(), although if you sha256 or sha512 that data that makes it...difficult? to reverse it, since brute force is the only approach. or you can provide the "mac" portion to uuid.uuid1() via the node parameter that can either be a PRNG or perhaps a sha512 of getnode().

this is the thing, on a private network, which is the 99% use case for python programs talking to postgresql databases, if the machines are on the same subnet the server sees the mac address anyway. then there is that use case where someone is connecting over routed internet to a potentially hostile postgresql server where they dont want you to be able to extract the mac address from the hash, being able to turn this feature on/off makes it a lot easier to suit these different cases.

this is why if you let the calling user send "the id", then the security of these tokens could be our problem and we would offer all the bells and whistles to provide it in different ways, not asyncpg. if this identifier is part of PG's prepared statement protocol then it's reasonable that asyncpg would expose this.

@elprans
Copy link
Member

elprans commented Oct 12, 2021

Fair enough. Adding the optional name keyword argument to prepare() wouldn't be too bad, I think.

@zzzeek
Copy link

zzzeek commented Oct 12, 2021

so this is we assume the only issue people are having w/ pgbouncer? id collision of prepared statements from different connections?

@elprans
Copy link
Member

elprans commented Oct 12, 2021

Yeah, I think so.

elprans added a commit that referenced this issue Nov 7, 2021
This adds the new `name` keyword argument to `Connection.prepare()` and
`PreparedStatement.get_name()` method returning the name of a statement.

Some users of asyncpg might find it useful to be able to control how
prepared statements are named, especially when a custom prepared
statement caching scheme is in use.  Specifically, This should help with
pgbouncer support in SQLAlchemy asyncpg dialect.

Fixes: #837.
elprans added a commit that referenced this issue Nov 16, 2021
…846)

This adds the new `name` keyword argument to `Connection.prepare()` and
`PreparedStatement.get_name()` method returning the name of a statement.

Some users of asyncpg might find it useful to be able to control how
prepared statements are named, especially when a custom prepared
statement caching scheme is in use.  Specifically, This should help with
pgbouncer support in SQLAlchemy asyncpg dialect.

Fixes: #837.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants