Skip to content

Add DB pool observability, OTel pg instrumentation, and statement_timeout#520

Merged
iscekic merged 3 commits intomainfrom
igor/db-observability-and-timeout
Feb 24, 2026
Merged

Add DB pool observability, OTel pg instrumentation, and statement_timeout#520
iscekic merged 3 commits intomainfrom
igor/db-observability-and-timeout

Conversation

@iscekic
Copy link
Contributor

@iscekic iscekic commented Feb 24, 2026

Summary

Adds application-side database observability and enforces the statement_timeout guard that was validated but never applied.

1. Pool metrics → Axiom (via Vercel log drain)

Logs pool.totalCount, pool.idleCount, and pool.waitingCount every 30s as structured JSON for both primary and replica pools. This gives the client-side view of connection usage per Vercel instance/region, complementing the server-side view from the new Supabase → Grafana integration.

Example log line:

{"type":"pool_metrics","region":"iad1","primary":{"total":8,"idle":5,"waiting":0},"replica":{"total":3,"idle":2,"waiting":0}}

Queryable in Axiom with:

['vercel'] | where ['fields.type'] == 'pool_metrics'

2. statement_timeout enforcement

Applies POSTGRES_MAX_QUERY_TIME (already validated as a required env var) as statement_timeout on both primary and replica pools. Previously the env var was checked at startup but never actually used — queries had no server-side timeout guard.

Not included: OTel pg instrumentation

Sentry's postgresIntegration (enabled by default) already uses @opentelemetry/instrumentation-pg under the hood, so per-query spans are already being captured in Sentry traces without any code change needed.

Context

Follow-up to #497 (connection exhaustion fix). The Supabase dashboards provide limited visibility, so we're adding application-side observability to correlate with the new Grafana dashboards.

Changes

  • src/lib/drizzle.ts — pool metrics logging + statement_timeout on both pools

…eout

- Log pool metrics (totalCount, idleCount, waitingCount) every 30s as
  structured JSON, picked up by Vercel log drain → Axiom
- Register @opentelemetry/instrumentation-pg in OTel setup to get
  per-query spans (duration, errors) in Sentry traces
- Enforce POSTGRES_MAX_QUERY_TIME as statement_timeout on both primary
  and replica pools (was validated as required but never applied)
@iscekic iscekic self-assigned this Feb 24, 2026
@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Feb 24, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Clean, well-structured change that adds two improvements to the database connection layer:

  1. Statement timeout (statement_timeout) applied to both primary and replica pools using the existing POSTGRES_MAX_QUERY_TIME env var. This prevents runaway queries from holding connections indefinitely — good defensive measure.

  2. Pool observability via periodic JSON-structured metrics logging (pool_metrics). Properly guarded with NODE_ENV !== 'test', uses .unref() to avoid keeping the process alive, and includes region context for log correlation.

Files Reviewed (1 file)
  • src/lib/drizzle.ts - 0 issues

…Integration

Sentry's postgresIntegration (enabled by default) uses the same
@opentelemetry/instrumentation-pg under the hood. Adding it explicitly
in registerOTel would double-instrument pg queries.
@iscekic iscekic requested review from RSO and markijbema February 24, 2026 17:21
@iscekic iscekic enabled auto-merge February 24, 2026 17:22
@iscekic iscekic merged commit e3464db into main Feb 24, 2026
12 checks passed
@iscekic iscekic deleted the igor/db-observability-and-timeout branch February 24, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants