Skip to content

Fix /api/markets/movers timeout and make /api/health honest#11

Merged
VittorioC13 merged 1 commit into
mainfrom
fix/movers-timeout-and-honest-health
May 13, 2026
Merged

Fix /api/markets/movers timeout and make /api/health honest#11
VittorioC13 merged 1 commit into
mainfrom
fix/movers-timeout-and-honest-health

Conversation

@VittorioC13
Copy link
Copy Markdown
Contributor

Summary

Production /api/markets/movers has been timing out 100% of the time for 8 days (since 5/4 16:00 UTC). The marketing bot built on top of the API has been silently broken — /api/health returned "status": "healthy" the whole time because it only probed 10 markets per source and never checked KV reachability or movers freshness.

Root cause: the movers handler did 1,600 sequential kv.get() calls inside detectMovers() (one per tracked market), plus 1,600 KV writes via recordPriceSnapshots(), all in the request path. Once @vercel/kv's Upstash migration added a small amount of per-request latency, total runtime crossed the 25s function timeout and never recovered.

Changes

  • api/lib/price-snapshots.ts (new): detectMoversBatch() uses kv.mget() in chunks of 100, replacing 1,600 individual reads with ~16 batched calls. recordPriceSnapshots() now reads via mget and caps each market's snapshot array at 300 entries.
  • api/cron/refresh-markets.ts (new): runs every 2 minutes, refreshes markets, records snapshots, precomputes movers for buckets 0.02/0.05/0.1/0.2 into movers:precomputed:<bucket> with a 5-minute TTL. Writes freshness timestamps for /api/health to consume.
  • api/markets/movers.ts (rewrite): now read-only — single kv.get() of the precomputed bucket, in-memory filter for limit/category, 20s in-process response cache. Returns 503 if the cron hasn't run since deploy.
  • api/health.ts (rewrite): 4 honest sub-checks — real market counts (Poly ≥ 800, Kalshi ≥ 200), KV read+write probe, snapshot freshness (< 5min), movers freshness (< 5min). Returns 503 on any degraded check so external monitors actually fire.
  • vercel.json: added refresh-markets cron entry and rewrite.

Test plan

  • pnpm typecheck (already green locally)
  • Deploy to a preview env and confirm /api/cron/refresh-markets (with Authorization: Bearer $CRON_SECRET) completes in <10s and writes both meta:last_snapshot_run and movers:precomputed:0.05 keys
  • time curl https://<preview>/api/markets/movers returns in <1s with count > 0
  • curl https://<preview>/api/health | jq shows polymarket.markets >= 1200, kalshi.markets >= 400, and status: "healthy"
  • After preview burns in, promote to production and re-run the marketing bot end-to-end

Rollback

Phase 1 (mget batching) has no schema change. If anything goes wrong after deploy, git revert restores the prior endpoint; precomputed KV keys naturally expire in 5 minutes. The price_history:* data shape is unchanged.

🤖 Generated with Claude Code

Production movers endpoint timed out 100% of the time for 8 days starting
5/4 16:00 UTC. Root cause: handler did 1,600 sequential kv.get() calls
inside detectMovers() (one per tracked market), plus 1,600 KV writes via
recordPriceSnapshots(), all in the request path. Once @vercel/kv's
Upstash migration added a small amount of per-request latency, total
runtime crossed the 25s function timeout and never recovered.

Changes:
- Extract snapshot/movers logic to api/lib/price-snapshots.ts.
  detectMoversBatch() uses kv.mget() in chunks of 100, replacing 1,600
  individual gets with ~16 batched calls. recordPriceSnapshots() now
  also reads via mget and caps each market's snapshot array at 300
  entries so values stop bloating until 7-day TTL.
- Add api/cron/refresh-markets.ts. Runs every 2 minutes, fetches markets,
  records snapshots, precomputes movers for buckets 0.02/0.05/0.1/0.2,
  and writes them to movers:precomputed:<bucket> with a 5-minute TTL.
  Writes freshness timestamps to meta:last_snapshot_run and
  meta:last_movers_run for /api/health to consume.
- Rewrite api/markets/movers.ts as a read-only endpoint: single kv.get()
  of the precomputed bucket, in-memory filter for limit/category, plus
  a 20s in-process response cache. Returns 503 if the cron hasn't run.
- Rewrite api/health.ts as a 4-check honest health probe: real market
  counts (Poly >= 800, Kalshi >= 200), KV read+write probe, snapshot
  freshness (< 5min), movers freshness (< 5min). Returns 503 on any
  degraded check so external monitors actually fire.
- Update vercel.json with the new cron entry and rewrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
musashi Ready Ready Preview, Comment May 12, 2026 1:54pm

Request Review

@VittorioC13 VittorioC13 merged commit cfd4bfd into main May 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant