-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pyformat and keyword args #9
Comments
+1 for pyformat: we use massively this format for our queries, it's the most explicit, at least to me. |
I'll let this issue stay open for a while to see if a lot of people is interested in this feature. Adding named parameters will require us to tokenize SQL, which is something I don't want to do unless it's absolutely necessary. |
Will tokenizing hurt performance? I, too, prefer the dictionary vs ordered list approach. |
I can say that if this feature were added I would use it over ordered list. |
@1st1 Interested here as well. It makes everything so much easier |
@1st1 I'd like to see this implemented as well. It would make maintaining queries easier. With just ordered parameters you have to be very careful when updating queries, and it's easy to get it wrong, especially if you have lots of parameters. |
named keywords have proven more reliable for me as well. On brand with the python-zen 'explicit > implicit' credo as well. |
Why don't you just use little wrappers like this?
|
I'd very much like to see named parameters as well - ordered ones have proven error prone to the point where we've made it a code style across the company to use only named parameters unless there's some reason it's impossible. |
If tokenizing I'd suggest re-using any available code in libpq/psycopg2 |
psycopg2 is LGPL, we can't reuse any code from there. |
meant as a library :) |
See issue #9 for details. This is not ready to be committed yet. My main concern is the prolification of 'kwargs' argument in all APIs. Perhaps a better approach would be implemnt a Plugin/Middleware class.
There is a number of wrappers for asyncpg and SQLAlchemy: asyncpgsa, metapensiero.sqlalchemy.asyncpg, GINO, etc. All of those use SQLAlchemy to generate queries with the correct paramstyle. #196 has been on a backburner for a while, but it's not dead. We need to figure out how to do the query "middleware" interface properly. |
@1st1 hi there, This is an awesome library, but this is one of the features which is sorely missing. We've been thinking about making a wrapper which transforms the sql statement based on the kwargs positions, and then calling the vanilla execution, but I feel like we're missing something (the solution seems too simple to have been overlooked). What do you think? |
Hi @juandiegopalomino I've built buildpg for this purpose. It's well tested and I'm using it in production now, however I haven't yet got round to writing to docs, however, the unit tests should demonstrate what it can do. Basic usage is import asyncio
from buildpg import Var, asyncpg, render, select_fields
print(render('SELECT :x', x=4))
#> ('SELECT $1', [4])
async def main():
conn = await asyncpg.connect_b('postgresql://postgres:waffle@localhost/test')
# Execute a statement to create a new table.
await conn.fetch_b('SELECT * FROM foobar WHERE a=:a AND b=:b', a='foo', b='bar')
# executes 'SELECT * FROM foobar WHERE a=$1 AND b=$2' with params ['foo', 'bar']
await conn.fetchval_b(
'SELECT :fields FROM foobar WHERE :where',
fields=select_fields('first_name', 'last_name'),
where=(Var('foo') > 2) & ((Var('bar') == 4) | (Var('spam') != Var('whatever') ** 2))
)
# executes 'SELECT first_name, last_name FROM foobar WHERE foo > $1 AND (bar = $2 OR spam != whatever ^ $3)'
# with params [2, 4, 2]
await conn.close()
asyncio.get_event_loop().run_until_complete(main()) In summary:
If anyone's keen to use this, let me know and I'll write proper docs. I used colon variable references eg. |
@samuelcolvin hey, thanks for writing that but for now we're writing a wrapper that handles named & indexed parameters and multiple statements via sqlparse and the default string formatter. Thanks for the input though! |
Having server side query formatting is good for performance and security. I wish we could always rely on this in asyncpg. |
I'd love this feature as well... |
Would it be possible to transform a pyformat query to a native Postgres parameterized query ? |
"pyformat query" is a little ambiguous, but I think I understand what you mean. That's what buildpg does, it takes "sql + keyword arguments" and modifies that into "modified sql + positional arguments", then passes that to the standard asyncpg methods to in turn be passed to postgresql's prepared statements. See this class for details. I think it's the opinion of @1st1 that this should be done in a separate library, rather than as part of asyncpg, to avoid bloat. |
@1st1 - same here, would be great to have this feature. Has a decision been made it will not be implemented? From what I can see here it looks like it's been three years with no progress |
We have made a decision to not support keyword args directly in asyncpg. The goal of asyncpg is to be a minimal interface around native Postgres features. Query tokenization and rendering fall outside of this scope, and, as several commenters mentioned above, can be implemented in a wrapper. We are still open to the concept of middleware, as proposed in #152, which would make plugging in support for custom query syntaxes easier to accomplish. If someone is willing to champion that effort, I'd be happy to mentor and review. |
Testing it today: https://gitlab.com/osfda/asyncpg_utility Let me know what you think. (a standalone utility module, as you had preferred...) |
@bitmodeler Good start. I wish to see tests, documentation improvement and less repetitions in code. and absence of |
Unit test: can add a jacket that furnishes a query of one's choosing, and an expected result on one's choosing.
More Documentation: coming, once I'm pretty sure all the functionality is stable (didn't want to have to be revising it in lockstep, as I thought up better ways to implement it...)
Removed the code redundancies in the raising of errors (did not want to complicate the nesting of the traceback further, but that's alright...)
What would you want in place of sys.exit??
(I deemed it a critical error, so I presumed you would want the the code shut down...)
|
asyncio.get_event_loop().close() ?? |
Dependencies: why does the library depend on Unit tests are good even on the beginning. They show how should I use and how I shouldn't use your library, and also show where are possible usage obstacles and how to use them easier. This part doesn't replace documentation, though Documentation: the current documentation is really not enough even to try your code. Application exit: throwing an exception is more than enough to say something is going wrong, it's not libraries' work to decide when and how application should be shut down. |
asyncpg dependency: it prepares the statement, and passes through values to your fetch routines; rather than two calls for each, it's done with just one (no repetition; right?) As for insufficient documentation (keeping in mind more to come...): it shows a COMPLETE working example! If a person can''t get it from that, I doubt they will ever... |
Libraries' work is just to prepare statement, but I'd prefer to control every step on my own, because layer which prepares the statement and arguments should be separated from each other, just because of security and architecture reasons (e.g. log statements, set connection parameters before an execution, execute many statements on the same conn and more). But this library now works like I see scope of such library is:
I see things which are out of scope are done in this library:
docs:
BTW: I'm happy that you've written this anyway and want to shape it to a really usable code in production with tests and good documentation. |
"creating a pool and a wrapper around run_until_complete which are not required at all even for demo purposes" Those are just convenience helpers I added; one can opt to disregard them. I use them often (less wordy!) So you'll note it's a general purpose asyncpg "utility" module; not just the "prepared statement" module. "_exit (exit_code) static method which is dangerous and should be avoided (it's programmer's task to decide when to stop a program. This also doesn't required even for demo purposes (and will break tests)" Presently it's a pseudo-exit, added to allow a programmer to override it with the handling one chooses. Perhaps I call it "_ab_end" instead? "_key_err static method which can be inlined as it just a wrapper above KeyError(message)" A lot of controversy on methods for inlining in python; seeing as how this is not called often, I'll leave it be. Could use the "inline" module, but articles say it messes up cython (which is getting greater adoption now...) Or maybe I go back to introducing the minor repetition? "raising SyntaxError which is reserved for syntax errors for python code. You should use ValueError or make a subclass of it" SOLD! Will replace it. "why did you choose {{name}}, but not {name} which is more logical and pyformat compatible?" It's a big django standard for template variables, and was less likely to be accidentally used for other purposes in one's fabrication of SQL queries. Adding unit testing today; will make a temporary table as part of the unit test... |
I feel this conversation has gone off topic. |
? We're providing a solution to the problem that the original topic was all about... |
Providing a solution involves a single comment with a link to a library that can be used as a workaround. Any further discussion about that workaround belongs in my opinion on that libraries issue tracker. |
OK; signing off this thread. Anybody needing the named parameter solution can find it at the git link posted above. It's an add-on class that leverages off of the existing asyncpg codebase. [As I say on the git: I think named parameters are much safer than positional ones, and in my opinion it should be the rule and not the exception...] |
PS Thanks for the constructive suggestions, Arseny... |
@samuelcolvin thank you for a notification @bitmodeler Thank you for a discussion |
By the way, Arseny: you were right! The query parsing needed to be decoupled from the statement generation, for reasons of thread efficiency (otherwise, the threads using prepared statements might get bottlenecked on one connection; accomplished that by using two classes...) [Just wanted to credit Arseny for the good advice in this thread, Sam...] |
pyformat could be simply converted to psql-format using import collections
import itertools
from typing import Any, Dict, Tuple, List
def pyformat2psql(query: str, named_args: Dict[str, Any]) -> Tuple[str, List[Any]]:
positional_generator = itertools.count(1)
positional_map = collections.defaultdict(lambda: '${}'.format(next(positional_generator)))
formatted_query = query % positional_map
positional_items = sorted(
positional_map.items(),
key=lambda item: int(item[1].replace('$', '')),
)
positional_args = [named_args[named_arg] for named_arg, _ in positional_items]
return formatted_query, positional_args Example: query = "SELECT * FROM user WHERE name = %(name)s, email = %(email)s"
named_args = {'name': 'John Cena', 'email': 'cena@example.org'}
new_query, positional_args = pyformat2psql(query, named_args)
print(new_query) # prints SELECT * FROM user WHERE name = $1, email = $2
print(positional_args) # prints ['John Cena', 'cena@example.org'] |
Is there any sql alchemy core implementation for asyncpg? I don't want to use raw queries
Working code:
But I want sqlalchemy version without raw query and table name. |
SQLAlchemy 1.4 supports asyncpg natively: https://docs.sqlalchemy.org/en/14/changelog/migration_14.html#asynchronous-io-support-for-core-and-orm |
@sany2k8 there is also asyncpgsa : https://github.com/CanopyTax/asyncpgsa |
Thank you @cordalace for the polished solution for developers who want optional named parameters! 👍 Should seriously be considered for inclusion as a core utility function, considering how popular this story is! Also to be fair many of us do not want to introduce ORM bloat. Named parameters are a highly valuable (for many), yet tiny, feature. That said, thank you for asyncpg. I'm astounded at the quality of this project! @elprans @1st1 |
@cordalace built upon your code above for a "many" variant, list of dicts... seems to work, but curious how you would implement!? Thank you for your code too! def pyformat2psql_many(query: str, named_args: List[Dict[str, Any]]) -> Tuple[str, List[Any]]:
arg_items = []
positional_generator = itertools.count(1)
positional_map = collections.defaultdict(lambda: '${}'.format(next(positional_generator)))
formatted_query = query % positional_map
for arg_item in named_args:
positional_items = sorted(
positional_map.items(),
key=lambda item: int(item[1].replace('$', '')),
)
positional_args = [arg_item[named_arg] for named_arg, _ in positional_items]
arg_items.append(positional_args)
return formatted_query, arg_items |
Considering Python dictionaries are now ordered and that the format_map method doesn't create a new dictionary on every call, like format does, one could use a Serial dict to solve this problem: # Serial Dict
#############
class Serial(dict):
def __getitem__(self, key):
return f"${list(self.keys()).index(key) + 1}"
# Tests
#######
parameters = Serial(foo='FOO', bar='BAR', baz='BAZ')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $1 from $2 where baz=$3" # Ordered Parameters
parameters = Serial(baz='BAZ', foo='FOO', bar='BAR')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $2 from $3 where baz=$1" # Unordered Parameters
parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})
query = "select {foo} from {foo} where baz={baz}".format_map(parameters)
assert query == "select $1 from $1 where baz=$2" # Repeated Parameters
# Mocking Fetch
###############
def mock_fetch(query, *args):
assert query == "select $1 from $1 where baz=$2"
assert args == ('FOO', 'BAZ')
query = "select {foo} from {foo} where baz={baz}"
parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})
mock_fetch(query.format_map(parameters), *parameters.values()) NOTE: The examples here are not valid PostgreSQL statements. These were used just to demonstrate the usage of format_map to create serial numeric placeholders based on a dictionary.
https://www.postgresql.org/docs/current/xfunc-sql.html Hope this might be helpful for some of you! :D |
Wanted to handle auto-removal of arguments not present in the query, instead of throwing an error. Based on @tebanep's snippet. Enjoy: # Serial Dictionary.
class Serial(dict):
def __getitem__(self, key):
return f"${list(self.keys()).index(key) + 1}"
# Pyformat to psql format.
def pyformat2psql(query: str, args_dict: Dict[str, Any]) -> Tuple[str, List[Any]]:
# Remove args not present in query.
args_list = list(args_dict.keys())
for value in args_list:
if f"{{{value}}}" not in query:
args_dict.pop(value, None)
# Generate query with serial positions.
args = Serial(args_dict)
query_formatted = query.format_map(args)
args_formatted = list(args.values())
return query_formatted, args_formatted Usage # Will just ignore 'foo' instead of throwing error.
query, params = pyformat2psql("SELECT * FROM users where name={baz}", {'foo': 'bob2', 'baz': 'bob'})
result = await conn.fetch(query, *params) |
This is almost perfect, but what about an .executemany() variant? 😄 |
Based on the code provided by @cordalace and @tebanep
|
It looks like for prepared statements, statements have to be in the form
'select $1 from $2 where baz=$3
with the arguments('foo', 'bar', 'baz')
It is common in python database api's to support the pyformat int PEP 249. http://legacy.python.org/dev/peps/pep-0249/#paramstyle
Can this support at least one of the param styles in the pep? Preferably pyformat.
Ideally, the parameters could then be passed in as a dictionary instead of a ordered list.
conn.fetch(myquery, {bar='bar', foo='foo'})
The text was updated successfully, but these errors were encountered: