Postgres: execute_batch still significantly slower than copy command #47

gvenzl · 2020-06-14T00:06:41Z

As discovered as part of #44, the execute_batch facility in psycopg2 used for Postgres still leaves a lot of room for improvement. The native Postgres COPY command is about 55x faster to load the same amount of data locally as compared to batch inserts using execute_batch.

Using execute_batch – 1m51s:

$ wc -l 201701-citibike-tripdata.csv
  726677 201701-citibike-tripdata.csv

$ ls -lh 201701-citibike-tripdata.csv
   117M Jun 13 12:35 201701-citibike-tripdata.csv

$ time csv2db load -f 201701-citibike-tripdata.csv -o postgres -u test -d test -t test

Loading file 201701-citibike-tripdata.csv
File loaded.

real	1m51.615s
user	0m17.277s
sys	0m1.015s

Using COPY – 2.7s:

test=> \timing
Timing is on.
test=> \copy test from '/tests/201701-citibike-tripdata.csv' DELIMITER ',' CSV HEADER;
COPY 726676
Time: 2755.240 ms (00:02.755)

The text was updated successfully, but these errors were encountered:

gvenzl · 2020-06-14T00:13:34Z

Issue psycopg/psycopg2#491 (comment) acknowledges that:

Adding functions into the extras module to implement the ideas discussed here and in the ML thread without changing the executemany() implementation.

@altaurog has reported the following speedups over transatlantic connection:

cur.executemany(): 761.322767019 sec

execute_batch(): 14.6529989243 sec

execute_values(): 12.4037430286 sec

pgcopy: 2.85529208183 sec

gvenzl · 2020-06-14T18:55:49Z

It seems like there is no easy wrapper function in psycopg to leverage the same performance mechanism that COPY provides for simple batch INSERT statements.

However, the driver does support cursor.copy_from() and cursor.copy_expert() functions that allows the client to tap into the COPY command instead. This concept could be used to load the data via memory-based StringIO or BytesIO classes to speed up the data load to numbers as shown above.

A question to be answered is how to deal with extra encoding/decoding of the values already read from the file.

gvenzl · 2020-06-14T23:03:13Z

After some more research, it looks like that psycopg2 will never offer an executemany() DBAPI method that will get anywhere close to the performance of the COPY command (see this long email thread). There might be a chance that this will happen with psycopg3 but that's still in development and far from being clear whether it would even implement the DBAPI, see https://www.varrazzo.com/blog/2020/03/06/thinking-psycopg3/

gvenzl added the Postgres Postgres related issues label Jun 14, 2020

gvenzl self-assigned this Jun 14, 2020

gvenzl added the enhancement New feature or request label Jun 14, 2020

andygrunwald mentioned this issue Aug 20, 2021

MySQL: Support for LOAD DATA INFILE #55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres: execute_batch still significantly slower than copy command #47

Postgres: execute_batch still significantly slower than copy command #47

gvenzl commented Jun 14, 2020

gvenzl commented Jun 14, 2020

gvenzl commented Jun 14, 2020

gvenzl commented Jun 14, 2020 •

edited

Postgres: execute_batch still significantly slower than copy command #47

Postgres: execute_batch still significantly slower than copy command #47

Comments

gvenzl commented Jun 14, 2020

gvenzl commented Jun 14, 2020

gvenzl commented Jun 14, 2020

gvenzl commented Jun 14, 2020 • edited

gvenzl commented Jun 14, 2020 •

edited