Skip to content

perf(api): AIN-222 C1 · runtime SQLAlchemy pool sized + pooler URL split#60

Merged
hizrianraz merged 1 commit into
mainfrom
feat/ain-222-c1-sqlalchemy-pool
May 22, 2026
Merged

perf(api): AIN-222 C1 · runtime SQLAlchemy pool sized + pooler URL split#60
hizrianraz merged 1 commit into
mainfrom
feat/ain-222-c1-sqlalchemy-pool

Conversation

@hizrianraz
Copy link
Copy Markdown
Contributor

@hizrianraz hizrianraz commented May 22, 2026

Summary

  • Adds DATABASE_POOLER_URL setting. When set, the runtime engine uses the Supabase transaction-mode pooler (:6543); when unset, falls back to database_url (:5432). Alembic always uses database_url.
  • Sizes the runtime pool for warm reuse (pool_size=10, max_overflow=20, pool_recycle=300, pool_pre_ping=True).
  • Sets statement_cache_size=0 on asyncpg when the pooler is in use — required for transaction-mode multiplexing.

Why

Live api single-call TTFB measured 0.47–1.07s today; the AIN-222 gate is <150ms. Cold connection setup dominates that spend, and pool_size=5 default + no pre-warming means most requests pay it. This change is necessary but not sufficient: setting DATABASE_POOLER_URL in Railway prod env is the second half (Discipline #6 — env var change is a founder action; see deliverables for the URL shape to set).

Safety

  • Code-side change only. Without DATABASE_POOLER_URL set, the engine uses the existing direct URL — no behavior change vs today's deploy.
  • Alembic path unchanged. alembic/env.py reads database_url explicitly and uses NullPool — already correct.
  • Pre-commit ran: ruff + mypy strict + pytest (unit + smoke) all green locally.

Test plan

  • CI: lint + typecheck + integration (PG service + alembic upgrade head + seed) all green.
  • Post-merge: founder sets DATABASE_POOLER_URL in Railway env.
  • Single /v1/* TTFB drops below 150ms on prod after Railway redeploys.

🤖 Generated with Claude Code


Note

Medium Risk
Medium risk because it changes database connection behavior (URL selection, pool sizing, and asyncpg connect args), which can impact runtime stability and performance if misconfigured or if the pooler behaves differently under load.

Overview
Adds an optional database_pooler_url setting so the runtime SQLAlchemy engine can connect via a Supabase transaction-mode pooler when configured, while keeping migrations on the direct database_url.

Updates DB engine construction to tune pooling (pool_size, max_overflow, pool_recycle, pool_pre_ping) and conditionally disables asyncpg prepared-statement caching (statement_cache_size=0) when using the pooler.

Reviewed by Cursor Bugbot for commit 2897bc7. Bugbot is set up for automated code reviews on this repo. Configure here.

Adds DATABASE_POOLER_URL setting. When set, the runtime engine uses
the Supabase transaction-mode pooler (:6543); when unset, falls back
to database_url (:5432). Alembic always uses database_url since DDL
needs session-level state the pooler does not preserve — alembic/env.py
already pulls from settings.database_url and uses NullPool, so no
change there.

Runtime pool sized for warm reuse:
  pool_size = 10
  max_overflow = 20
  pool_recycle = 300s
  pool_pre_ping = True (kept)

asyncpg + Supabase transaction-mode pooler requires
statement_cache_size=0 — prepared statements break when the pooler
hands each transaction a fresh session. Applied only when the pooler
URL is set (no-op against the direct port).

Single-call TTFB on the live api today: 0.47–1.07s — gate is <150ms.
This change is necessary but not sufficient: setting DATABASE_POOLER_URL
in Railway production env is the second half. See deliverables for the
env var to set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 22, 2026

AIN-222 [perf] Dashboard slow — DB connection-per-request + sequential waterfall (not the query)

Owner: Aule (exec) · Counsel/audit: CC · Measured 2026-05-22. The DB is not the bottleneck — the connection path is.

Evidence (measured live, not theorized)

Layer Measured Verdict
DB query select * from models where active (EXPLAIN ANALYZE) 0.126ms exec, 4.5ms plan 🟢 not the problem
API /v1/models TTFB (warm, ×N) ~620ms 🔴 ~595ms is overhead, not query
API /v1/providers TTFB ~550ms 🔴 same signature
API /v1/audit/public?limit=20 TTFB ~617ms 🔴 same
API /health (no DB) TTFB ×5 0.16–0.50s (first 0.85 incl TLS) 🟡 Railway runtime floor / worker jitter
app.ainfera.ai/ 307 redirect, 1.34s 🟡 extra hop for logged-in users

A 0.13ms query that takes 620ms over the wire = new DB connection per request (TCP+TLS+Postgres-auth handshake every call) and/or cross-region Railway↔Supabase(us-east-1). The dashboard then fires agents+models+audit+billing sequentially → ~2–3s stacked waterfall. That's the "slow dashboard."

Fixes — ranked by win-per-effort

P0 — Persistent DB connection pool in the api

  • SQLAlchemy engine with a real pool (pool_size, max_overflow, pool_pre_ping=True), reused across requests. Do NOT open/close a connection per request.
  • If async (asyncpg) behind the Supabase transaction pooler :6543, set statement_cache_size=0 / appropriate pool class.
  • Expected: 620ms → ~80ms per call. Single biggest win.

P0 — Collapse the dashboard waterfall

  • Add an aggregate endpoint GET /v1/dashboard/summary returning agents + models + audit-tail + billing in ONE round-trip, OR Promise.all the existing client fetches.
  • Expected: 2–3s → ~0.6s.

P1 — Confirm Railway region == Supabase us-east-1

  • If the api service is in a different region, every call eats 50–150ms RTT ×N. Co-locate.

P1 — Cache the static catalog

  • models/providers change rarely → Cache-Control + React Query/SWR staleTime (e.g. 5min), or edge cache. Stop re-fetching every load.

P2 — Perceived speed

  • Loading skeletons + stream the shell (React Suspense / streaming SSR) so first paint never blocks on data.

P2 — Connection via transaction pooler :6543

  • Pooling survives container scaling + gives IPv4 (ties to the DATABASE_URL split: runtime→:6543, Alembic→:5432).

P2 — Redirect hop

  • app.ainfera.ai/ 307s at 1.34s — route logged-in users straight to /dashboard.

Definition of Done

Dashboard interactive < 1s on a warm load, verified live (browser timing on app.ainfera.ai/dashboard), AND a single warmed /v1/* call < 150ms TTFB. Repo-grep ≠ done — measure the live origin.

Review in Linear

@hizrianraz hizrianraz merged commit c7d30e0 into main May 22, 2026
4 checks passed
@hizrianraz hizrianraz deleted the feat/ain-222-c1-sqlalchemy-pool branch May 22, 2026 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant