[api] AIN-232+AIN-234 W2: gateway resilience probes + cost kill-switch#89
Conversation
…ll-switch W2/9 SHIP-NOW. Brings in the gateway hardening from closed PR #77 (re-baselined). Wires through brain + inference router + deploys to Railway on merge. routers/health.py (NEW): - GET /healthz: in-process liveness, no I/O (HEALTHCHECK target) - GET /readyz: aggregate readiness (process_drain + db + audit + ks snapshot); 503 on any probe fail - get_readiness_gate() flipped FALSE on SIGTERM for drain services/cost_killswitch.py (NEW): - guard_or_raise() called at dispatch_with_brain entry - rolling-window spend (default today UTC) vs AINFERA_SPEND_KILLSWITCH_USD - Default $50 + enabled; ops env-config without restart - Pinned passthroughs bypass guard by design (moat-safe) - Aggregate-only logging (no PII) routing_brain.py: - await cost_killswitch.guard_or_raise(db) before brain runs - Disc #12 preserved: scoring/candidate-set/weights untouched inference.py: - Catch CostKillswitchEngagedError → 503 with code + spent/threshold main.py: - Register health.router; rename inline /health → health_legacy Tests: - test_health_probes.py (4) + test_cost_killswitch.py (20) + openapi contract (4 — /healthz, /readyz documented as non-v1) - All 28 pass Founder config (set in Railway env on api): AINFERA_SPEND_KILLSWITCH_USD=<real_threshold> # default $50 AINFERA_SPEND_KILLSWITCH_ENABLED=1 # default Refs: AIN-232 · AIN-234 · supersedes closed PR #77 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AIN-232 [Foundation] Gateway resilience — kill the fleet SPOF (circuit breakers + gateway-independent fallback)
Fleet pressure-test finding #1 (LOCKED 2026-05-22). The dogfood thesis — every agent routes through Ainfera Routing — is ALSO the fleet's single biggest point of failure. Gateway down → all 7 agents go blind simultaneously. This ticket hardens the gateway so a single failure is contained, not fleet-wide. Problem
What it must do
Acceptance criteria
Why M1Gates the whole one-router invariant's safety. Every agent (Phase 5) inherits this. Pairs with AIN-226 (the gateway itself). AIN-234 [Foundation] Fleet cost governance + financial kill-switch (per-agent metering + circuit breakers)
Fleet pressure-test finding #5 (LOCKED 2026-05-22). 7 agents, Námo overnight Brain-Factory, Ulmo continuous monitors, Tulkas red-team sweeps — all routing through the gateway. Aggregate inference cost can balloon, and a cascading failure can exhaust the API budget in minutes. The 2026 playbook prescription: treat the agent swarm as governed software with hard financial kill-switches + per-agent cost observability. We already have the immutable audit trail (L4); we're missing the meters and the switches. What it must do
Acceptance criteria
Why M1Cost governance + kill-switch are foundational guardrails the whole fleet (Phase 5) runs inside. Pre-revenue, runaway inference is an existential risk for a solo founder. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.
Reviewed by Cursor Bugbot for commit aa4b0d4. Configure here.
| # router. This is a guard, NOT a routing change: it fires BEFORE the | ||
| # brain is invoked; if it doesn't fire, the engine runs exactly as | ||
| # before. Disc #12 preserved (no scoring, no candidate-set change). | ||
| await cost_killswitch.guard_or_raise(db) |
There was a problem hiding this comment.
Streaming path doesn't catch new CostKillswitchEngagedError
High Severity
dispatch_with_brain now raises CostKillswitchEngagedError (via cost_killswitch.guard_or_raise), and the non-streaming /v1/inference handler catches it properly. However, services/streaming.py also calls dispatch_with_brain (for /v1/messages with stream=true) and only catches NoCandidateError and AllCandidatesFailedError. When the killswitch engages during a streaming request, the unhandled exception will propagate through the SSE async generator, producing a broken stream or 500 instead of a graceful termination frame or structured error.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit aa4b0d4. Configure here.


Summary (W2/9 — ship-now)
Gateway hardening from the closed SP-5 PR #77, re-baselined onto current main.
New:
/healthz+/readyz(AIN-232)/healthz— liveness, in-process only, no I/O. Platform HEALTHCHECK target./readyz— aggregate readiness: process_drain gate · DB · audit chain · killswitch snapshot. Returns 503 on any probe fail so the LB stops routing.New: cost kill-switch (AIN-234)
dispatch_with_brain. Before the brain runs.AINFERA_SPEND_KILLSWITCH_USD(default $50, configurable, hot-reload).503 cost_killswitch_engagedwithspent/threshold/window_startin body.Disc #12
This is observability + a guard, never a routing-engine change:
Tests
test_health_probes.py(4): liveness, readyz ok, readyz db-fail → 503, drain-gate flip.test_cost_killswitch.py(20): env reads, spend rollup, snapshot composition, guard raise/no-raise, edge cases, disabled posture.test_openapi_contract.py(4):/healthz+/readyzdocumented as non-v1 platform endpoints.Founder action (post-merge)
Set in Railway env on
ainfera-api:Refs
AIN-232 · AIN-234 · supersedes closed PR #77 · Disc #12 preserved
🤖 Generated with Claude Code
Note
Medium Risk
Changes operational behavior (LB routing via /readyz, possible 503 on all routed inference when killswitch trips) and spend aggregation across all inferences for the day; routing engine itself is unchanged.
Overview
Adds Kubernetes-style gateway probes and a global spend circuit breaker on routed inference, without changing routing logic.
Health (
/healthz,/readyz): A newhealthrouter exposes/healthz(liveness, no I/O) and/readyz(readiness: drain gate, DBSELECT 1, audit table read, plus killswitch snapshot in the body). Failed readiness probes return 503 with a per-probe breakdown. Legacy/healthstays ashealth_legacyfor existing monitors. A process-local readiness gate can flip to draining (intended for SIGTERM) so load balancers stop new traffic while work drains.Cost kill-switch: New
cost_killswitchservice sums dailyinferences.cost_usdagainstAINFERA_SPEND_KILLSWITCH_USD(default $50, hot-reloaded) andAINFERA_SPEND_KILLSWITCH_ENABLED.guard_or_raiseruns at the start ofdispatch_with_brain(before the brain). When engaged, routedPOST /v1/inferencereturns 503 withcost_killswitch_engagedand spend/threshold/window metadata; pinned vendor passthroughs are unchanged.Tests: Unit tests for env parsing, engagement math, liveness/legacy health, and drain gate; OpenAPI contract notes probes are platform-level, not part of the v1 surface.
Reviewed by Cursor Bugbot for commit aa4b0d4. Bugbot is set up for automated code reviews on this repo. Configure here.