Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pyformat and keyword args #9

Closed
nhumrich opened this issue Jul 25, 2016 · 50 comments
Closed

Support pyformat and keyword args #9

nhumrich opened this issue Jul 25, 2016 · 50 comments

Comments

@nhumrich
Copy link

nhumrich commented Jul 25, 2016

It looks like for prepared statements, statements have to be in the form
'select $1 from $2 where baz=$3 with the arguments ('foo', 'bar', 'baz')

It is common in python database api's to support the pyformat int PEP 249. http://legacy.python.org/dev/peps/pep-0249/#paramstyle

Can this support at least one of the param styles in the pep? Preferably pyformat.

Ideally, the parameters could then be passed in as a dictionary instead of a ordered list.

conn.fetch(myquery, {bar='bar', foo='foo'})

@ludovic-gasc
Copy link

+1 for pyformat: we use massively this format for our queries, it's the most explicit, at least to me.

@1st1
Copy link
Member

1st1 commented Jul 26, 2016

I'll let this issue stay open for a while to see if a lot of people is interested in this feature. Adding named parameters will require us to tokenize SQL, which is something I don't want to do unless it's absolutely necessary.

@1st1 1st1 added the feedback label Jul 26, 2016
@gigatexal
Copy link

Will tokenizing hurt performance? I, too, prefer the dictionary vs ordered list approach.

@sethmlarson
Copy link

I can say that if this feature were added I would use it over ordered list.

@gglanzani
Copy link

@1st1 Interested here as well. It makes everything so much easier

@treycucco
Copy link

@1st1 I'd like to see this implemented as well. It would make maintaining queries easier. With just ordered parameters you have to be very careful when updating queries, and it's easy to get it wrong, especially if you have lots of parameters.

@bobcolner
Copy link

named keywords have proven more reliable for me as well. On brand with the python-zen 'explicit > implicit' credo as well.

@imbolc
Copy link

imbolc commented Oct 28, 2016

Why don't you just use little wrappers like this?

def insert(conn, table, fields=None, **kwargs):                                                                               
    fields = fields or {}                                                                                                     
    fields.update(kwargs)                                                                                                     
    items = list(fields.items())                                                                                              
    keys, values = [x[0] for x in items], [x[1] for x in items]                                                               
    placeholders = ['${}'.format(i) for i, _ in enumerate(values, 1)]                                                         
    query = 'INSERT INTO {} ({}) VALUES ({}) RETURNING id'.format(                                                            
        table, ', '.join(keys), ', '.join(placeholders))                                                                      
    return conn.fetchval(query, *values)

id = await insert(conn, 'users', {'email': 'foo@gmail.com', 'name': 'foo'})                                               
id = await insert(conn, 'users', email='bar@gmail.com', name='bar') 

@Daenyth
Copy link

Daenyth commented Sep 12, 2017

I'd very much like to see named parameters as well - ordered ones have proven error prone to the point where we've made it a code style across the company to use only named parameters unless there's some reason it's impossible.

@thehesiod
Copy link

If tokenizing I'd suggest re-using any available code in libpq/psycopg2

@elprans
Copy link
Member

elprans commented Sep 12, 2017

If tokenizing I'd suggest re-using any available code in libpq/psycopg2

psycopg2 is LGPL, we can't reuse any code from there.

@thehesiod
Copy link

meant as a library :)

1st1 added a commit that referenced this issue Sep 15, 2017
See issue #9 for details.

This is not ready to be committed yet.  My main concern is the
prolification of 'kwargs' argument in all APIs.  Perhaps a better
approach would be implemnt a Plugin/Middleware class.
@samuelcolvin
Copy link
Contributor

is there any progress on this?

Both #196 and #152 seem to be cold. For me this is by far the biggest missing feature from asyncpg.

Or would it actually be simpler to leave this out of asyncpg and implement it in a wrapper?

@elprans
Copy link
Member

elprans commented May 30, 2018

There is a number of wrappers for asyncpg and SQLAlchemy: asyncpgsa, metapensiero.sqlalchemy.asyncpg, GINO, etc. All of those use SQLAlchemy to generate queries with the correct paramstyle.

#196 has been on a backburner for a while, but it's not dead. We need to figure out how to do the query "middleware" interface properly.

@juandiegopalomino
Copy link

@1st1 hi there,

This is an awesome library, but this is one of the features which is sorely missing. We've been thinking about making a wrapper which transforms the sql statement based on the kwargs positions, and then calling the vanilla execution, but I feel like we're missing something (the solution seems too simple to have been overlooked). What do you think?

@samuelcolvin
Copy link
Contributor

samuelcolvin commented Jul 31, 2018

Hi @juandiegopalomino I've built buildpg for this purpose.

It's well tested and I'm using it in production now, however I haven't yet got round to writing to docs, however, the unit tests should demonstrate what it can do.

Basic usage is

import asyncio
from buildpg import Var, asyncpg, render, select_fields

print(render('SELECT :x', x=4))
#> ('SELECT $1', [4])

async def main():
    conn = await asyncpg.connect_b('postgresql://postgres:waffle@localhost/test')
    # Execute a statement to create a new table.
    await conn.fetch_b('SELECT * FROM foobar WHERE a=:a AND b=:b', a='foo', b='bar')
    # executes 'SELECT * FROM foobar WHERE a=$1 AND b=$2' with params ['foo', 'bar']

    await conn.fetchval_b(
        'SELECT :fields FROM foobar WHERE :where',
        fields=select_fields('first_name', 'last_name'),
        where=(Var('foo') > 2) & ((Var('bar') == 4) | (Var('spam') != Var('whatever') ** 2))
    )
    # executes 'SELECT first_name, last_name FROM foobar WHERE foo > $1 AND (bar = $2 OR spam != whatever ^ $3)'
    # with params [2, 4, 2]

    await conn.close()

asyncio.get_event_loop().run_until_complete(main())

In summary:

  • the render method can be used to prepare sql and positional parameters
  • from buildpg import asyncpg gets you a thin wrapper around asyncpg, this provides connect_b and create_pool_b (the b suffix denotes the query "builder" variant), these just use a connection class BuildPgConnection .
  • BuildPgConnection just adds execute_b, fetch_b, fetchval_b etc. which take keyword arguments instead of positional arguments
  • buildpg supports simple keyword arguments but also arbitrarily complex query construction. I find this approach of writing sql but allowing logic construction in python much easier to use than having to reverse engineer a full ORM like sqlalchemy.

If anyone's keen to use this, let me know and I'll write proper docs.

I used colon variable references eg. :foo rather than {foo}) to avoid confusion with sql braces and since that approach seems to be more common, but it's possible to create your own render which uses different regexes.

@juandiegopalomino
Copy link

@samuelcolvin hey, thanks for writing that but for now we're writing a wrapper that handles named & indexed parameters and multiple statements via sqlparse and the default string formatter. Thanks for the input though!

@bersace
Copy link

bersace commented Oct 22, 2018

Having server side query formatting is good for performance and security. I wish we could always rely on this in asyncpg.

@pvanliefland
Copy link

I'd love this feature as well...

@bersace
Copy link

bersace commented Dec 3, 2018

Would it be possible to transform a pyformat query to a native Postgres parameterized query ?

@samuelcolvin
Copy link
Contributor

"pyformat query" is a little ambiguous, but I think I understand what you mean.

That's what buildpg does, it takes "sql + keyword arguments" and modifies that into "modified sql + positional arguments", then passes that to the standard asyncpg methods to in turn be passed to postgresql's prepared statements. See this class for details.

I think it's the opinion of @1st1 that this should be done in a separate library, rather than as part of asyncpg, to avoid bloat.

@mikhaildubov
Copy link

@1st1 - same here, would be great to have this feature. Has a decision been made it will not be implemented? From what I can see here it looks like it's been three years with no progress

@elprans
Copy link
Member

elprans commented Oct 3, 2019

We have made a decision to not support keyword args directly in asyncpg. The goal of asyncpg is to be a minimal interface around native Postgres features. Query tokenization and rendering fall outside of this scope, and, as several commenters mentioned above, can be implemented in a wrapper.

We are still open to the concept of middleware, as proposed in #152, which would make plugging in support for custom query syntaxes easier to accomplish. If someone is willing to champion that effort, I'd be happy to mentor and review.

@elprans elprans closed this as completed Oct 3, 2019
@bitmodeler
Copy link

Testing it today: https://gitlab.com/osfda/asyncpg_utility

Let me know what you think.

(a standalone utility module, as you had preferred...)

@eirnym
Copy link

eirnym commented Nov 15, 2019

@bitmodeler Good start. I wish to see tests, documentation improvement and less repetitions in code. and absence of sys.exit, obviously.

@bitmodeler
Copy link

bitmodeler commented Nov 15, 2019 via email

@bitmodeler
Copy link

asyncio.get_event_loop().close() ??

@eirnym
Copy link

eirnym commented Nov 16, 2019

Dependencies: why does the library depend on asyncpg and asyncio while it do just string formatting and nothing else?

Unit tests are good even on the beginning. They show how should I use and how I shouldn't use your library, and also show where are possible usage obstacles and how to use them easier. This part doesn't replace documentation, though

Documentation: the current documentation is really not enough even to try your code.

Application exit: throwing an exception is more than enough to say something is going wrong, it's not libraries' work to decide when and how application should be shut down.

@bitmodeler
Copy link

bitmodeler commented Nov 16, 2019

asyncpg dependency: it prepares the statement, and passes through values to your fetch routines; rather than two calls for each, it's done with just one (no repetition; right?)

As for insufficient documentation (keeping in mind more to come...): it shows a COMPLETE working example! If a person can''t get it from that, I doubt they will ever...

@eirnym
Copy link

eirnym commented Nov 17, 2019

@bitmodeler,

Libraries' work is just to prepare statement, but I'd prefer to control every step on my own, because layer which prepares the statement and arguments should be separated from each other, just because of security and architecture reasons (e.g. log statements, set connection parameters before an execution, execute many statements on the same conn and more). But this library now works like format function will print out data to /dev/stdout.

I see scope of such library is:

  1. re-format statement with names to statement with numbers
  2. provide a dictionary name to list of indices and a simple wrapper around (I'd prefer to have an ability of my own implementations for an optimization)
  3. no additional functions included (they are could be nice for a demo, but not required or dangerous for a production code)

I see things which are out of scope are done in this library:

  1. creating a pool and a wrapper around run_until_complete which are not required at all even for demo purposes
  2. _exit (exit_code) static method which is dangerous and should be avoided (it's programmer's task to decide when to stop a program. This also doesn't required even for demo purposes (and will break tests)
  3. _key_err static method which can be inlined as it just a wrapper above KeyError(message)
  4. raising SyntaxError which is reserved for syntax errors for python code. You should use ValueError or make a subclass of it
  5. some reasons to replace ReferenceError with something else
  6. calling asyncpg. A programmer can do it himself.
  7. absence of ability to just format values without a asyncpg call.

docs:

  1. I see, there's just no syntax highlighting.
  2. No comments in an example code highlighting features
  3. There should be an example with custom types
  4. why did you choose {{name}}, but not {name} which is more logical and pyformat compatible?

BTW: I'm happy that you've written this anyway and want to shape it to a really usable code in production with tests and good documentation.

@bitmodeler
Copy link

"creating a pool and a wrapper around run_until_complete which are not required at all even for demo purposes"

Those are just convenience helpers I added; one can opt to disregard them. I use them often (less wordy!) So you'll note it's a general purpose asyncpg "utility" module; not just the "prepared statement" module.

"_exit (exit_code) static method which is dangerous and should be avoided (it's programmer's task to decide when to stop a program. This also doesn't required even for demo purposes (and will break tests)"

Presently it's a pseudo-exit, added to allow a programmer to override it with the handling one chooses. Perhaps I call it "_ab_end" instead?

"_key_err static method which can be inlined as it just a wrapper above KeyError(message)"

A lot of controversy on methods for inlining in python; seeing as how this is not called often, I'll leave it be. Could use the "inline" module, but articles say it messes up cython (which is getting greater adoption now...) Or maybe I go back to introducing the minor repetition?

"raising SyntaxError which is reserved for syntax errors for python code. You should use ValueError or make a subclass of it"

SOLD! Will replace it.
Open to a suggested replacement for ReferenceError.
[I would rather shy away from making custom exceptions, because that will just be more idiosyncratic stuff a person has to know to use it...]

"why did you choose {{name}}, but not {name} which is more logical and pyformat compatible?"

It's a big django standard for template variables, and was less likely to be accidentally used for other purposes in one's fabrication of SQL queries.

Adding unit testing today; will make a temporary table as part of the unit test...

@samuelcolvin
Copy link
Contributor

I feel this conversation has gone off topic.

@bitmodeler
Copy link

?

We're providing a solution to the problem that the original topic was all about...

@samuelcolvin
Copy link
Contributor

Providing a solution involves a single comment with a link to a library that can be used as a workaround.

Any further discussion about that workaround belongs in my opinion on that libraries issue tracker.

@bitmodeler
Copy link

OK; signing off this thread. Anybody needing the named parameter solution can find it at the git link posted above. It's an add-on class that leverages off of the existing asyncpg codebase.

[As I say on the git: I think named parameters are much safer than positional ones, and in my opinion it should be the rule and not the exception...]

@bitmodeler
Copy link

PS Thanks for the constructive suggestions, Arseny...

@eirnym
Copy link

eirnym commented Nov 17, 2019

@samuelcolvin thank you for a notification

@bitmodeler Thank you for a discussion

@bitmodeler
Copy link

By the way, Arseny: you were right! The query parsing needed to be decoupled from the statement generation, for reasons of thread efficiency (otherwise, the threads using prepared statements might get bottlenecked on one connection; accomplished that by using two classes...)

[Just wanted to credit Arseny for the good advice in this thread, Sam...]

@cordalace
Copy link

pyformat could be simply converted to psql-format using defaultdict:

import collections
import itertools
from typing import Any, Dict, Tuple, List


def pyformat2psql(query: str, named_args: Dict[str, Any]) -> Tuple[str, List[Any]]:
    positional_generator = itertools.count(1)
    positional_map = collections.defaultdict(lambda: '${}'.format(next(positional_generator)))
    formatted_query = query % positional_map
    positional_items = sorted(
        positional_map.items(),
        key=lambda item: int(item[1].replace('$', '')),
    )
    positional_args = [named_args[named_arg] for named_arg, _ in positional_items]
    return formatted_query, positional_args

Example:

query = "SELECT * FROM user WHERE name = %(name)s, email = %(email)s"
named_args = {'name': 'John Cena', 'email': 'cena@example.org'}
new_query, positional_args = pyformat2psql(query, named_args)
print(new_query)  # prints SELECT * FROM user WHERE name = $1, email = $2
print(positional_args)  # prints ['John Cena', 'cena@example.org']

@sany2k8
Copy link

sany2k8 commented Dec 1, 2020

Is there any sql alchemy core implementation for asyncpg? I don't want to use raw queries

statement = (
                insert(table_obj).
                    values(named_args)
            )

Working code:

async def insert_named_parameter(self, table, fields=None, **kwargs):
     fields = fields or {}
     fields.update(kwargs)
     items = list(fields.items())
     keys, values = [x[0] for x in items], [x[1] for x in items]
     placeholders = ['${}'.format(i) for i, _ in enumerate(values, 1)]
     query = 'INSERT INTO {} ({}) VALUES ({})'.format(table, ', '.join(keys), ', '.join(placeholders))
     return await self.connection.execute(query, *values)

 async def insert_keyword(self, keyword_data):

     if keyword_data:
         keyword_data['update_timestamp'] = datetime.datetime.now()
         row = await self.insert_named_parameter('keywords', keyword_data)
         return row
     else:
         return None

But I want sqlalchemy version without raw query and table name.

@elprans
Copy link
Member

elprans commented Dec 1, 2020

SQLAlchemy 1.4 supports asyncpg natively: https://docs.sqlalchemy.org/en/14/changelog/migration_14.html#asynchronous-io-support-for-core-and-orm

@nhumrich
Copy link
Author

@sany2k8 there is also asyncpgsa : https://github.com/CanopyTax/asyncpgsa

@gnat
Copy link

gnat commented Jan 1, 2021

Thank you @cordalace for the polished solution for developers who want optional named parameters! 👍 Should seriously be considered for inclusion as a core utility function, considering how popular this story is!

Also to be fair many of us do not want to introduce ORM bloat. Named parameters are a highly valuable (for many), yet tiny, feature.

That said, thank you for asyncpg. I'm astounded at the quality of this project! @elprans @1st1

@singlecheeze
Copy link

@cordalace built upon your code above for a "many" variant, list of dicts... seems to work, but curious how you would implement!? Thank you for your code too!

def pyformat2psql_many(query: str, named_args: List[Dict[str, Any]]) -> Tuple[str, List[Any]]:
    arg_items = []
    positional_generator = itertools.count(1)
    positional_map = collections.defaultdict(lambda: '${}'.format(next(positional_generator)))
    formatted_query = query % positional_map

    for arg_item in named_args:
        positional_items = sorted(
            positional_map.items(),
            key=lambda item: int(item[1].replace('$', '')),
        )
        positional_args = [arg_item[named_arg] for named_arg, _ in positional_items]
        arg_items.append(positional_args)

    return formatted_query, arg_items

@tebanep
Copy link

tebanep commented Jul 1, 2021

Considering Python dictionaries are now ordered and that the format_map method doesn't create a new dictionary on every call, like format does, one could use a Serial dict to solve this problem:

# Serial Dict
#############

class Serial(dict):
    def __getitem__(self, key):
        return f"${list(self.keys()).index(key) + 1}"


# Tests
#######

parameters = Serial(foo='FOO', bar='BAR', baz='BAZ')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $1 from $2 where baz=$3"  # Ordered Parameters


parameters = Serial(baz='BAZ', foo='FOO', bar='BAR')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $2 from $3 where baz=$1"  # Unordered Parameters


parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})
query = "select {foo} from {foo} where baz={baz}".format_map(parameters)
assert query == "select $1 from $1 where baz=$2"  # Repeated Parameters


# Mocking Fetch
###############

def mock_fetch(query, *args):
    assert query == "select $1 from $1 where baz=$2"
    assert args == ('FOO', 'BAZ')


query = "select {foo} from {foo} where baz={baz}"
parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})

mock_fetch(query.format_map(parameters), *parameters.values())

NOTE: The examples here are not valid PostgreSQL statements. These were used just to demonstrate the usage of format_map to create serial numeric placeholders based on a dictionary.

SQL function arguments can only be used as data values, not as identifiers.

https://www.postgresql.org/docs/current/xfunc-sql.html

Hope this might be helpful for some of you! :D

@gnat
Copy link

gnat commented Jul 21, 2021

Wanted to handle auto-removal of arguments not present in the query, instead of throwing an error. Based on @tebanep's snippet.

Enjoy:

# Serial Dictionary.
class Serial(dict):
    def __getitem__(self, key):
        return f"${list(self.keys()).index(key) + 1}"

# Pyformat to psql format.
def pyformat2psql(query: str, args_dict: Dict[str, Any]) -> Tuple[str, List[Any]]:
    # Remove args not present in query.
    args_list = list(args_dict.keys())
    for value in args_list:
        if f"{{{value}}}" not in query:
            args_dict.pop(value, None)
    # Generate query with serial positions.
    args = Serial(args_dict)
    query_formatted = query.format_map(args)
    args_formatted = list(args.values())
    return query_formatted, args_formatted

Usage

# Will just ignore 'foo' instead of throwing error.
query, params = pyformat2psql("SELECT * FROM users where name={baz}", {'foo': 'bob2', 'baz': 'bob'})
result = await conn.fetch(query, *params)

@singlecheeze
Copy link

Considering Python dictionaries are now ordered and that the format_map method doesn't create a new dictionary on every call, like format does, one could use a Serial dict to solve this problem:

# Serial Dict
#############

class Serial(dict):
    def __getitem__(self, key):
        return f"${list(self.keys()).index(key) + 1}"


# Tests
#######

parameters = Serial(foo='FOO', bar='BAR', baz='BAZ')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $1 from $2 where baz=$3"  # Ordered Parameters


parameters = Serial(baz='BAZ', foo='FOO', bar='BAR')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $2 from $3 where baz=$1"  # Unordered Parameters


parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})
query = "select {foo} from {foo} where baz={baz}".format_map(parameters)
assert query == "select $1 from $1 where baz=$2"  # Repeated Parameters


# Mocking Fetch
###############

def mock_fetch(query, *args):
    assert query == "select $1 from $1 where baz=$2"
    assert args == ('FOO', 'BAZ')


query = "select {foo} from {foo} where baz={baz}"
parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})

mock_fetch(query.format_map(parameters), *parameters.values())

NOTE: The examples here are not valid PostgreSQL statements. These were used just to demonstrate the usage of format_map to create serial numeric placeholders based on a dictionary.

SQL function arguments can only be used as data values, not as identifiers.

https://www.postgresql.org/docs/current/xfunc-sql.html

Hope this might be helpful for some of you! :D

This is almost perfect, but what about an .executemany() variant? 😄

@svaponi
Copy link

svaponi commented Jul 27, 2024

Based on the code provided by @cordalace and @tebanep

import collections
import itertools
import typing


def pyformat2psql(
    query: str,
    named_args: dict[str, typing.Any] = None,
    /,
    **kwargs,
) -> (str, list[typing.Any]):
    named_args = named_args or {}
    named_args.update(kwargs)
    positional_generator = itertools.count(1)
    positional_map = collections.defaultdict(
        lambda: "${}".format(next(positional_generator))
    )
    formatted_query = query % positional_map
    positional_items = sorted(
        positional_map.items(),
        key=lambda item: int(item[1].replace("$", "")),
    )
    positional_args = [named_args[named_arg] for named_arg, _ in positional_items]
    return formatted_query, positional_args


class _Serial(dict):
    def __getitem__(self, key):
        return f"${list(self.keys()).index(key) + 1}"


def format2psql(
    query: str,
    named_args: dict[str, typing.Any] = None,
    /,
    **kwargs,
) -> (str, list[typing.Any]):
    named_args = named_args or {}
    named_args.update(kwargs)
    parameters = _Serial(named_args)
    return query.format_map(parameters), list(parameters.values())


def test_pyformat2psql():
    query, positional_args = pyformat2psql(
        "SELECT * FROM user WHERE foo = %(foo)s AND name = %(name)s AND email = %(email)s",
        {"name": "John Cena", "email": "cena@example.org"},
        foo="FOO",
    )
    assert query == "SELECT * FROM user WHERE foo = $1 AND name = $2 AND email = $3"
    assert positional_args == ["FOO", "John Cena", "cena@example.org"]


def test_format2psql():
    query, positional_args = format2psql(
        "SELECT * FROM user WHERE foo = {foo} AND name = {name} AND email = {email}",
        {"name": "John Cena", "email": "cena@example.org"},
        foo="FOO",
    )
    assert query == "SELECT * FROM user WHERE foo = $3 AND name = $1 AND email = $2"
    assert positional_args == ["John Cena", "cena@example.org", "FOO"]

    query, positional_args = format2psql(
        "select {foo} from {bar} where baz={baz}", foo="FOO", bar="BAR", baz="BAZ"
    )
    assert query == "select $1 from $2 where baz=$3"
    assert positional_args == ["FOO", "BAR", "BAZ"]

    query, positional_args = format2psql(
        "select {foo} from {bar} where baz={baz}", baz="BAZ", foo="FOO", bar="BAR"
    )
    assert query == "select $2 from $3 where baz=$1"
    assert positional_args == ["BAZ", "FOO", "BAR"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests