Very bad performance using insert many #346

akashgurava · 2018-08-21T15:02:16Z

asyncpg version: 0.17.0
PostgreSQL version: 10
Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
the issue with a local PostgreSQL install?: local
Python version: 3.7
Platform: Windows
Do you use pgbouncer?: No
Did you install asyncpg with pip?: Yes
If you built asyncpg locally, which version of Cython did you use?: No
Can the issue be reproduced under both asyncio and
uvloop?: did not use UVLoop

I am building a DB for storing financial tick data, so was testing insert performance for different DBs using different libs. For postgres I used psycopg2 in sync mode, aiopg, and asyncpg. I was inserting 120000 rows of symbol and OHLC data.
What I was getting very bad insert performances using asyncpg insertmany
For creating sample data

import asyncio
import string
import random
from time import time

import numpy as np
import pandas as pd

import asyncpg
import psycopg2
import aiopg

# Number of securirities to insert
SECURITIES = 2000


def string_gen(size=6, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.choice(chars) for _ in range(size))


def generate_random_data():
    """Generate random data for inserting to DB"""
    index = pd.date_range(start='2018-01-01 00:00:01', end='2018-01-01 00:01:00', freq='s')
    
    dflist = []
    for _ in range(SECURITIES):
        data = np.random.rand(len(index), 4)
        data = pd.DataFrame(data, index=index, columns=['open', 'high', 'low', 'close'])
        data['symbol'] = string_gen()
        dflist.append(data)
    data = pd.concat(dflist)
    data.index.name = 'time'
    data = data.reset_index()
    data = data[['time', 'symbol', 'open', 'high', 'low', 'close']]

    return [tuple(x) for x in data.values]

For psycopg2 insert took 5 to 6 seconds

args_str = b','.join(cur.mogrify(string, row) for row in data)
args_str = args_str.decode('utf-8') # Convert byte string to UTF-8
cur.execute("INSERT INTO ohlc (time, symbol, open, high, low, close) VALUES " + args_str)
conn.commit()

For asyncpg insertmany took 30 seconds so i did something like by creating a insert statement using psycopg2 and insert using asyncpg.. still took 7 to 8 seconds

p_conn = psycopg2.connect(user='postgres', password='postgres')
cur = p_conn.cursor()
string = '(' + ('%s,' * len(data[0]))[:-1] + ')'
args_str = b','.join(cur.mogrify(string, row) for row in data)
cur.close()
args_str = args_str.decode('utf-8')
insert_str = "INSERT INTO ohlc (time, symbol, open, high, low, close) VALUES " + args_str
await conn.execute(insert_str)

similarly for aiopg i was getting 6 to 7 seconds..
I guess i am doing something wrong.. since the DB is same for all 3 libs.. we can ignore performance issues of DB.

The text was updated successfully, but these errors were encountered:

elprans · 2018-08-26T20:37:41Z

You are comparing apples to oranges. connection.executemany() essentially runs the INSERT query 2000 times, whereas in your psycopg2 test you mogrify arguments and run the query only once. In your case I recommend using copy_records_to_table() instead:

        await conn.copy_records_to_table(
            'ohlc', records=data,
            columns=['time', 'symbol', 'open', 'high', 'low', 'close'])

For me this took 6 seconds versus 9 seconds I got from psycopg2.

Connection.executemany() is not the fastest choice for bulk insert, copy_records_to_table() is a better choice, so make it easier to find by putting a note in executemany() documentation. See #346 for an example of a performance confusion.

elprans · 2018-08-26T20:58:02Z

Also, #295, once merged will improve the performance of executemany() considerably in the general case.

elprans closed this as completed Aug 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very bad performance using insert many #346

Very bad performance using insert many #346

akashgurava commented Aug 21, 2018

elprans commented Aug 26, 2018

elprans commented Aug 26, 2018

Very bad performance using insert many #346

Very bad performance using insert many #346

Comments

akashgurava commented Aug 21, 2018

elprans commented Aug 26, 2018

elprans commented Aug 26, 2018