You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
the issue with a local PostgreSQL install?: local
Python version: 3.7
Platform: Windows
Do you use pgbouncer?: No
Did you install asyncpg with pip?: Yes
If you built asyncpg locally, which version of Cython did you use?: No
Can the issue be reproduced under both asyncio and uvloop?: did not use UVLoop
I am building a DB for storing financial tick data, so was testing insert performance for different DBs using different libs. For postgres I used psycopg2 in sync mode, aiopg, and asyncpg. I was inserting 120000 rows of symbol and OHLC data.
What I was getting very bad insert performances using asyncpg insertmany
For creating sample data
import asyncio
import string
import random
from time import time
import numpy as np
import pandas as pd
import asyncpg
import psycopg2
import aiopg
# Number of securirities to insert
SECURITIES = 2000
def string_gen(size=6, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for _ in range(size))
def generate_random_data():
"""Generate random data for inserting to DB"""
index = pd.date_range(start='2018-01-01 00:00:01', end='2018-01-01 00:01:00', freq='s')
dflist = []
for _ in range(SECURITIES):
data = np.random.rand(len(index), 4)
data = pd.DataFrame(data, index=index, columns=['open', 'high', 'low', 'close'])
data['symbol'] = string_gen()
dflist.append(data)
data = pd.concat(dflist)
data.index.name = 'time'
data = data.reset_index()
data = data[['time', 'symbol', 'open', 'high', 'low', 'close']]
return [tuple(x) for x in data.values]
For psycopg2 insert took 5 to 6 seconds
args_str = b','.join(cur.mogrify(string, row) for row in data)
args_str = args_str.decode('utf-8') # Convert byte string to UTF-8
cur.execute("INSERT INTO ohlc (time, symbol, open, high, low, close) VALUES " + args_str)
conn.commit()
For asyncpg insertmany took 30 seconds so i did something like by creating a insert statement using psycopg2 and insert using asyncpg.. still took 7 to 8 seconds
similarly for aiopg i was getting 6 to 7 seconds..
I guess i am doing something wrong.. since the DB is same for all 3 libs.. we can ignore performance issues of DB.
The text was updated successfully, but these errors were encountered:
You are comparing apples to oranges. connection.executemany() essentially runs the INSERT query 2000 times, whereas in your psycopg2 test you mogrify arguments and run the query only once. In your case I recommend using copy_records_to_table() instead:
Connection.executemany() is not the fastest choice for bulk insert,
copy_records_to_table() is a better choice, so make it easier to find by
putting a note in executemany() documentation. See #346 for an example
of a performance confusion.
the issue with a local PostgreSQL install?: local
uvloop?: did not use UVLoop
I am building a DB for storing financial tick data, so was testing insert performance for different DBs using different libs. For postgres I used psycopg2 in sync mode, aiopg, and asyncpg. I was inserting 120000 rows of symbol and OHLC data.
What I was getting very bad insert performances using asyncpg insertmany
For creating sample data
For psycopg2 insert took 5 to 6 seconds
For asyncpg insertmany took 30 seconds so i did something like by creating a insert statement using psycopg2 and insert using asyncpg.. still took 7 to 8 seconds
similarly for aiopg i was getting 6 to 7 seconds..
I guess i am doing something wrong.. since the DB is same for all 3 libs.. we can ignore performance issues of DB.
The text was updated successfully, but these errors were encountered: