You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't mind using adbc_ingest() to populate my database, but later in its lifecycle I need to upsert records and more. For example, I need to do something like:
which is too slow. Apparently executemany() is extremely inefficient for this ask. What is the cause of such a poor performance? What is the bottleneck?
The same outcome could be achieved much faster by first ingesting data into a temporary table and then making Postgres run a more complex operation from it rather than from input:
# 0.2swithadbc_driver_postgresql.dbapi.connect(uri) asconn:
withconn.cursor() ascursor:
cursor.adbc_ingest('test_table2', table, mode="replace")
query= (
'INSERT INTO test_table ("col1", "col2", "col3")\n''SELECT "col1", "col2", "col3" FROM test_table2\n''ON CONFLICT ("col1") DO UPDATE SET "col2" = EXCLUDED."col2", "col3" = 0;'
)
cursor.execute(query)
conn.commit()
This approach gives a reasonable performance, but is this how one supposed to do this? Is there anything that can be easily improved? I do not know much about Postgres's backend operation and what optimisations it does for ingestion, but I suspect that it is not best practice to create temporary tables (which are not even TEMPORARY) when we just want to stream data.
The text was updated successfully, but these errors were encountered:
The difference is that a bulk ingest runs via COPY, which can't do a fancy upsert, but is instead optimized to just throw data into the database as fast as possible. While a regular insert uses bind parameters, which requires a database roundtrip for every single row. I would say a temporary table is reasonable for your use case.
The difference is that a bulk ingest runs via COPY
Just to clarify, the "COPY" you are referring to, is it the COPY command of Postgres wire protocol or is it something else? And is it called/implemented in adbc_driver_postgres via libpq library?
What would you like help with?
executemany()
much slower thanadbc_ingest()
?I want to insert and update records in a table using Python API of adbc_driver_postgres, let's say, I have 10k rows:
I noticed that
executemany()
is much slower thanadbc_ingest()
for ingesting data. Let's say I have 10k rows:as compared to
I don't mind using
adbc_ingest()
to populate my database, but later in its lifecycle I need to upsert records and more. For example, I need to do something like:which is too slow. Apparently
executemany()
is extremely inefficient for this ask. What is the cause of such a poor performance? What is the bottleneck?The same outcome could be achieved much faster by first ingesting data into a temporary table and then making Postgres run a more complex operation from it rather than from input:
This approach gives a reasonable performance, but is this how one supposed to do this? Is there anything that can be easily improved? I do not know much about Postgres's backend operation and what optimisations it does for ingestion, but I suspect that it is not best practice to create temporary tables (which are not even TEMPORARY) when we just want to stream data.
The text was updated successfully, but these errors were encountered: