Skip to content

make Postgres connect timeout configurable#148

Merged
krisztianfekete merged 2 commits into
mainfrom
fix/more-robust-startup
May 14, 2026
Merged

make Postgres connect timeout configurable#148
krisztianfekete merged 2 commits into
mainfrom
fix/more-robust-startup

Conversation

@krisztianfekete
Copy link
Copy Markdown
Contributor

This PR makes the Postgres connect-retry budget tunable via AGENTEVALS_DB_CONNECT_TIMEOUT_S with a 600s default.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the Postgres “connect with retry” wall-clock budget configurable via AGENTEVALS_DB_CONNECT_TIMEOUT_S, with a new 600s default intended to better tolerate slow Kubernetes database bring-up during startup.

Changes:

  • Replace the fixed connect-retry deadline constant usage with a resolver function (connect_deadline_seconds()) that reads AGENTEVALS_DB_CONNECT_TIMEOUT_S.
  • Increase the default connect-retry budget from 60s to 600s and document the env var override behavior.
  • Add a Helm value (database.postgres.connectTimeoutSeconds) and wire it into the Deployment as AGENTEVALS_DB_CONNECT_TIMEOUT_S.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/agentevals/storage/postgres/pool.py Uses the new deadline resolver when retrying asyncpg pool warmup.
src/agentevals/storage/postgres/migrator.py Introduces env-var based connect-retry budget resolution and updates retry deadline usage/default.
charts/agentevals/values.yaml Adds a configurable Helm value for the DB connect timeout with documentation.
charts/agentevals/templates/deployment.yaml Plumbs the Helm value into the container env as AGENTEVALS_DB_CONNECT_TIMEOUT_S.
Comments suppressed due to low confidence (1)

src/agentevals/storage/postgres/migrator.py:288

  • connect_deadline_seconds() accepts non-finite floats like NaN (e.g. env var value "nan"), which will propagate into the retry deadline math and can make sleep_for become NaN, causing asyncio.sleep() to raise and abort startup. Consider rejecting non-finite values (e.g. math.isfinite(val)), and falling back to the default with a warning as intended by the docstring.
def connect_deadline_seconds() -> float:
    """Resolve the connect-retry budget. Reads ``AGENTEVALS_DB_CONNECT_TIMEOUT_S``
    and falls back to :data:`CONNECT_RETRY_DEADLINE_S` if the env var is
    unset, empty, non-numeric, or non-positive."""
    raw = os.getenv("AGENTEVALS_DB_CONNECT_TIMEOUT_S")
    if raw is None or raw == "":
        return CONNECT_RETRY_DEADLINE_S
    try:
        val = float(raw)
    except ValueError:
        logger.warning(
            "Invalid AGENTEVALS_DB_CONNECT_TIMEOUT_S=%r (not a number); using default %.0fs",
            raw,
            CONNECT_RETRY_DEADLINE_S,
        )
        return CONNECT_RETRY_DEADLINE_S
    if val <= 0:
        logger.warning(
            "Invalid AGENTEVALS_DB_CONNECT_TIMEOUT_S=%r (must be positive); using default %.0fs",
            raw,
            CONNECT_RETRY_DEADLINE_S,
        )
        return CONNECT_RETRY_DEADLINE_S
    return val

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/agentevals/storage/postgres/migrator.py
Comment thread charts/agentevals/values.yaml Outdated
@krisztianfekete krisztianfekete merged commit 094f9e8 into main May 14, 2026
5 checks passed
@krisztianfekete krisztianfekete deleted the fix/more-robust-startup branch May 14, 2026 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants