Fix /api/markets/movers timeout and make /api/health honest#11
Merged
Conversation
Production movers endpoint timed out 100% of the time for 8 days starting 5/4 16:00 UTC. Root cause: handler did 1,600 sequential kv.get() calls inside detectMovers() (one per tracked market), plus 1,600 KV writes via recordPriceSnapshots(), all in the request path. Once @vercel/kv's Upstash migration added a small amount of per-request latency, total runtime crossed the 25s function timeout and never recovered. Changes: - Extract snapshot/movers logic to api/lib/price-snapshots.ts. detectMoversBatch() uses kv.mget() in chunks of 100, replacing 1,600 individual gets with ~16 batched calls. recordPriceSnapshots() now also reads via mget and caps each market's snapshot array at 300 entries so values stop bloating until 7-day TTL. - Add api/cron/refresh-markets.ts. Runs every 2 minutes, fetches markets, records snapshots, precomputes movers for buckets 0.02/0.05/0.1/0.2, and writes them to movers:precomputed:<bucket> with a 5-minute TTL. Writes freshness timestamps to meta:last_snapshot_run and meta:last_movers_run for /api/health to consume. - Rewrite api/markets/movers.ts as a read-only endpoint: single kv.get() of the precomputed bucket, in-memory filter for limit/category, plus a 20s in-process response cache. Returns 503 if the cron hasn't run. - Rewrite api/health.ts as a 4-check honest health probe: real market counts (Poly >= 800, Kalshi >= 200), KV read+write probe, snapshot freshness (< 5min), movers freshness (< 5min). Returns 503 on any degraded check so external monitors actually fire. - Update vercel.json with the new cron entry and rewrite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Production
/api/markets/movershas been timing out 100% of the time for 8 days (since 5/4 16:00 UTC). The marketing bot built on top of the API has been silently broken —/api/healthreturned"status": "healthy"the whole time because it only probed 10 markets per source and never checked KV reachability or movers freshness.Root cause: the movers handler did 1,600 sequential
kv.get()calls insidedetectMovers()(one per tracked market), plus 1,600 KV writes viarecordPriceSnapshots(), all in the request path. Once@vercel/kv's Upstash migration added a small amount of per-request latency, total runtime crossed the 25s function timeout and never recovered.Changes
api/lib/price-snapshots.ts(new):detectMoversBatch()useskv.mget()in chunks of 100, replacing 1,600 individual reads with ~16 batched calls.recordPriceSnapshots()now reads viamgetand caps each market's snapshot array at 300 entries.api/cron/refresh-markets.ts(new): runs every 2 minutes, refreshes markets, records snapshots, precomputes movers for buckets0.02/0.05/0.1/0.2intomovers:precomputed:<bucket>with a 5-minute TTL. Writes freshness timestamps for/api/healthto consume.api/markets/movers.ts(rewrite): now read-only — singlekv.get()of the precomputed bucket, in-memory filter forlimit/category, 20s in-process response cache. Returns 503 if the cron hasn't run since deploy.api/health.ts(rewrite): 4 honest sub-checks — real market counts (Poly ≥ 800, Kalshi ≥ 200), KV read+write probe, snapshot freshness (< 5min), movers freshness (< 5min). Returns 503 on any degraded check so external monitors actually fire.vercel.json: addedrefresh-marketscron entry and rewrite.Test plan
pnpm typecheck(already green locally)/api/cron/refresh-markets(withAuthorization: Bearer $CRON_SECRET) completes in <10s and writes bothmeta:last_snapshot_runandmovers:precomputed:0.05keystime curl https://<preview>/api/markets/moversreturns in <1s withcount > 0curl https://<preview>/api/health | jqshowspolymarket.markets >= 1200,kalshi.markets >= 400, andstatus: "healthy"Rollback
Phase 1 (mget batching) has no schema change. If anything goes wrong after deploy,
git revertrestores the prior endpoint; precomputed KV keys naturally expire in 5 minutes. Theprice_history:*data shape is unchanged.🤖 Generated with Claude Code