Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate the feasibility of implementing executemany #36

Closed
styvane opened this issue Oct 27, 2016 · 6 comments
Closed

Investigate the feasibility of implementing executemany #36

styvane opened this issue Oct 27, 2016 · 6 comments

Comments

@styvane
Copy link

styvane commented Oct 27, 2016

How to insert multiple rows without a traditional for loop?
Something like executemany in psycopg2

I am asking because I failed to find any example or method in the documentation.

Actually what I am doing is:

async def insertdemo(data, dns=DNS):
    async with asyncpg.create_pool(dns) as pool:
        async with pool.acquire() as con:
            async with con.transaction():
                stmt = '''insert into demo (num, name) select * from unnest($1::int[], $2::varchar[]);'''
                await con.execute(stmt, *zip(*data))

I would like to avoid to unzip the data array.

@styvane styvane changed the title Insert many row with prepare query How to insert many rows? Oct 27, 2016
@wrobell
Copy link

wrobell commented Nov 8, 2016

According to https://magic.io/blog/asyncpg-1m-rows-from-postgres-to-python/ asyncpg supports arrays and composite types, so I can imagine that determining semantics of value in con.excute(query, value) can be tricky.

However, it would be good to have at least executemany method, so

for row in data:
    await con.execute(query, *row)

can be executed as

await con.executemany(query, data)

where data could be any iterable.

BTW. How con.execute(query, *row) executed multiple times works exactly? Does it send each statement one by one over network or are multiple rows bundled into a batch?

@wrobell
Copy link

wrobell commented Nov 8, 2016

Also, looking at http://initd.org/psycopg/docs/usage.html

# Pass data to fill a query placeholders and let Psycopg perform
# the correct conversion (no more SQL injections!)
>>> cur.execute("INSERT INTO test (num, data) VALUES (%s, %s)",
...      (100, "abc'def"))

so asyncpg connection execute method seems to be incompatible with DB-API?

@wrobell
Copy link

wrobell commented Nov 8, 2016

What about changing execute method signature to

    async def execute(self, query: str, *args, data=None, timeout: float=None) -> str:

If args is non-empty then raise deprecation warning. If both args and data are specified, then raise TypeErrror. After few asyncpg versions, the args positional parameters could be removed.

@elprans
Copy link
Member

elprans commented Nov 14, 2016

asyncpg explicitly supports prepared statements, so there is no need for a dedicated executemany method. Simply iterate over a loop, calling the prepared statement:

ps = await conn.prepare('INSERT INTO test(num, data) VALUES($1, $2)')
for row in data:
   await ps.fetchval(*row)

Or, since the statements are prepared by default, you can use even simpler form:

for row in data:
    await conn.execute('INSERT INTO test(num, data) VALUES($1, $2)', *row)

@elprans elprans closed this as completed Nov 14, 2016
@wrobell
Copy link

wrobell commented Nov 15, 2016

IMHO, dedicated executemany method would simplify API. It is easier to write

await conn.executemany('...', data)

than the whole loop. If this was implemented at Cython level, wouldn't it be faster as well?

I hope you do not mind, I will repeat the question from my first comment - are the rows iterated in the loop sent one by one or are the statements batched?

@elprans
Copy link
Member

elprans commented Nov 15, 2016

IMHO, dedicated executemany method would simplify API. It is easier to write

await conn.executemany('...', data)

than the whole loop. If this was implemented at Cython level, wouldn't it be faster as well?

No, high-level APIs in asyncpg are not Cythoned, and even if they were, the perf difference would be minimal.

I hope you do not mind, I will repeat the question from my first comment - are the rows iterated in the loop sent one by one or are the statements batched?

There is no such thing as batching, psycopg2 does the same row-by-row statement execution.

That said, event loop overhead needs to be benchmarked for this case. If it turns out to be significant, we will look into implementing the executemany method.

Meanwhile you can create a helper method using the loop technique mentioned above.

@elprans elprans reopened this Nov 15, 2016
@elprans elprans changed the title How to insert many rows? Investigate the feasibility of implementing executemany Nov 15, 2016
elprans added a commit that referenced this issue Nov 16, 2016
The executemany() method runs an SQL command for each sequence
of arguments in the given iterable.

Closes: #36
elprans added a commit that referenced this issue Nov 16, 2016
The executemany() method runs an SQL command for each sequence
of arguments in the given iterable.

Closes: #36
elprans added a commit that referenced this issue Nov 16, 2016
The executemany() method runs an SQL command for each sequence
of arguments in the given iterable.

Closes: #36
elprans added a commit that referenced this issue Nov 16, 2016
The executemany() method runs an SQL command for each sequence
of arguments in the given iterable.

Closes: #36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants