Skip to content

Misleading statement timeout error message #273

@leticiarina

Description

@leticiarina

What happens?

We have a Prefect flow using DuckDB that started raising the following error in the past days, after ~30 minutes:

duckdb.duckdb.Error: Failed to execute COPY: ERROR: canceling statement due to statement timeout

The table to be copied is around 370-400 GB (~318 million rows). Attempts to increase concurrency and memory were made in order to reduce the time of the query execution, and then it was noticed that the error message seems misleading, since the time to execute the query was reduced to ~15 minutes but still saying that it's due to statement timeout.

Outputting the statement timeout in Postgres:

logger.info(
    con.sql(
        "SELECT * FROM postgres_query('db', 'select setting from pg_settings where name = ''statement_timeout''');" 
    )
)

┌─────────┐
│ setting │
│ varchar │
├─────────┤
│ 10000   │
└─────────┘

Which doesn't seem to be aligned to the query time execution, since the query is still able to run much longer than the given statement timeout.

We also tried setting a different http_timeout through DuckDB settings, which wasn't helpful.

Tested using DuckDB v0.10.1 and v1.1.3

To Reproduce

import duckdb

MAX_FILE_SIZE = "250M"

local_folder = "/tmp/data/"

con = duckdb.connect()
con.sql("INSTALL postgres;")
con.sql(f"ATTACH '{postgres_uri}' AS db (TYPE postgres, READ_ONLY);")


con.sql(
    f"""COPY (SELECT * FROM db.{postgres_table_name})
    TO '{local_folder}' (
    FORMAT 'parquet',
    COMPRESSION 'zstd',
    PER_THREAD_OUTPUT,
    OVERWRITE_OR_IGNORE,
    FILE_SIZE_BYTES '{MAX_FILE_SIZE}');"""
)

OS:

linux/amd64

DuckDB Version:

1.1.3

DuckDB Client:

Python

Hardware:

CPU: 16GB, Memory: 64GB

Full Name:

Rina Sakurai

Affiliation:

Mentimeter

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

No - I cannot easily share my data sets due to their large size

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions