-
Notifications
You must be signed in to change notification settings - Fork 59
Description
What happens?
We have a Prefect flow using DuckDB that started raising the following error in the past days, after ~30 minutes:
duckdb.duckdb.Error: Failed to execute COPY: ERROR: canceling statement due to statement timeout
The table to be copied is around 370-400 GB (~318 million rows). Attempts to increase concurrency and memory were made in order to reduce the time of the query execution, and then it was noticed that the error message seems misleading, since the time to execute the query was reduced to ~15 minutes but still saying that it's due to statement timeout.
Outputting the statement timeout in Postgres:
logger.info(
con.sql(
"SELECT * FROM postgres_query('db', 'select setting from pg_settings where name = ''statement_timeout''');"
)
)
┌─────────┐
│ setting │
│ varchar │
├─────────┤
│ 10000 │
└─────────┘
Which doesn't seem to be aligned to the query time execution, since the query is still able to run much longer than the given statement timeout.
We also tried setting a different http_timeout
through DuckDB settings, which wasn't helpful.
Tested using DuckDB v0.10.1 and v1.1.3
To Reproduce
import duckdb
MAX_FILE_SIZE = "250M"
local_folder = "/tmp/data/"
con = duckdb.connect()
con.sql("INSTALL postgres;")
con.sql(f"ATTACH '{postgres_uri}' AS db (TYPE postgres, READ_ONLY);")
con.sql(
f"""COPY (SELECT * FROM db.{postgres_table_name})
TO '{local_folder}' (
FORMAT 'parquet',
COMPRESSION 'zstd',
PER_THREAD_OUTPUT,
OVERWRITE_OR_IGNORE,
FILE_SIZE_BYTES '{MAX_FILE_SIZE}');"""
)
OS:
linux/amd64
DuckDB Version:
1.1.3
DuckDB Client:
Python
Hardware:
CPU: 16GB, Memory: 64GB
Full Name:
Rina Sakurai
Affiliation:
Mentimeter
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot easily share my data sets due to their large size
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have