feat(api): AIN-154 Phase A · router hardening tables (policy + health + breakers) by hizrianraz · Pull Request #27 · ainfera-ai/api

hizrianraz · 2026-05-18T13:32:21Z

Summary

Lands the three persistent stores for L2 router production hardening per AIN-154 Phase 0 prep
All Phase 0 founder-recommended decisions (D-154-1 weights / D-154-2 veto / D-154-3 penalty) encoded as DEFAULT + CHECK constraints at the DB layer

Tables

Table	Purpose	PK
`tenant_routing_policy`	Per-tenant routing weight + fallback config (5 preset policies CHECK-locked)	tenant_id
`provider_health_checks`	Synthetic probe results, rolling	UUID surrogate
`circuit_breakers`	Per-(provider, model) state machine (CLOSED / OPEN / HALF_OPEN)	composite (provider_slug, model_slug)

Phase 0 decisions all locked in DB

Decision	DB enforcement
D-154-1 weights (Q 0.40 / L 0.30 / C 0.30)	DEFAULT + sum=1.00 CHECK
D-154-2 veto threshold 0.50	DEFAULT + bounds [0, 1] CHECK
D-154-3 fallback penalty 5%	DEFAULT + bounds [0, 100] CHECK
5 preset policy names	CHECK locked vocabulary
breaker state {CLOSED, OPEN, HALF_OPEN}	CHECK locked vocabulary

Stack note

Base: feat/ain-153-phase-a-workflows-tasks-schema (PR api#26)
Migration 0013 chains off 0012 — when api#26 merges, this auto-rebases to main

Pre-commit hooks

ruff + ruff format + mypy --strict + pytest -x — all green.

Test plan

Migration follows existing alembic pattern
Pre-commit hooks pass
CI green
Deploy: alembic upgrade head after merge (no backfill required)
Phase B follow-up: circuit_breaker state machine + fallback chain + ATS scoring (per ticket Phase A through M)

Refs

AIN-154 (parent — Sprint v1.8 router hardening)
AIN-154 Phase 0 prep comment (2026-05-18 PM)
AIN-153 (PR api#26 — base branch)

🤖 Generated with Claude Code

Note

Medium Risk
Adds new Postgres tables and constraints used by routing decisions; while isolated from existing data, schema/constraint mistakes could block writes or degrade query performance once the router starts using them.

Overview
Adds Alembic migration 20260518_0013_router_hardening_tables.py to introduce three new persistent stores for router hardening: tenant_routing_policy, provider_health_checks, and circuit_breakers.

The migration encodes Phase 0 routing defaults and guardrails at the DB layer via server defaults + CHECK constraints (policy name vocabulary, weights sum/bounds, fallback penalty/veto bounds), plus adds indexes for common health-check recency queries and breaker state lookup; downgrade drops indexes/tables in reverse order.

^{Reviewed by Cursor Bugbot for commit 825a66e. Bugbot is set up for automated code reviews on this repo. Configure here.}

… + breakers) Lands the three persistent stores for L2 router production hardening per AIN-154 Phase 0 prep: ## Tables ### tenant_routing_policy - One row per tenant (PK = tenant_id, CASCADE on tenant delete) - 5 preset `policy_name` values locked via CHECK - Default weights Q 0.40 / L 0.30 / C 0.30 sum=1.00 per D-154-1 - weights_sum_lock CHECK enforces invariant at DB layer - compliance_veto_threshold default 0.50 per D-154-2 - fallback_cost_penalty_pct default 5% per D-154-3 - Bounds CHECKs on all numeric fields ### provider_health_checks - Synthetic probe results, one row per probe - (provider_slug, model_slug) NOT FK — supports soft-deleted catalog rows per AIN-141 archival semantics - outcome CHECK locked to 5 values matching InvocationResult.status from AIN-154 architecture - Composite DESC index on (provider, model, probed_at) serves both dashboard latency rollup + routing decision read path ### circuit_breakers - One row per (provider, model) — composite PK, no surrogate - 3-state machine (CLOSED / OPEN / HALF_OPEN) CHECK-locked - Tracks opened_at + half_open_at + closed_at timestamps for the 60s cool-off window logic - consecutive_failures + trip_count + last_failure_at + last_success_at for the 5-fail-in-60s trip rule quick-path - State index for "show me all open breakers" dashboard query ## Phase 0 decisions all locked in DB D-154-1, D-154-2, D-154-3 encoded as DEFAULT + CHECK so the DB enforces the policy invariants — app layer can override per tenant but defaults match the founder-recommended values from Phase 0 prep. ## What this migration does NOT include (Phase B+ scope) - Per-provider rate-limit tracking — lives in Redis (ephemeral, TTL) not Postgres; durable trace surfaces via audit chain - ATS scores table — separate sprint deliverable (Phase E of AIN-154) - routing.decided audit event integration — separate PR; will reference workflow_id from AIN-153 migration 0012 ## Stack note Migration 0013 chains off 0012 (AIN-153 Phase A). This PR's base is the AIN-153 PR branch so the migration chain stays linear. When AIN-153 merges to main, AIN-154 PR auto-rebases. ## Cross-refs - AIN-154 (parent epic — Sprint v1.8 router hardening) - AIN-154 Phase 0 prep comment (2026-05-18 PM) - AIN-153 Phase A (PR api#26 — chained parent migration) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

linear-code · 2026-05-18T13:32:25Z

AIN-154 [L2] Auto-route hardening: circuit breakers, fallback chains, ATS-weighted routing, observability

Parent epic for Sprint v1.8 Launch Prerequisites. Founder locked 2026-05-18 PM.

Scope

Today's L2 Routing does provider-neutral dispatch across 10 providers + AAMC voter selection. It works in happy path but lacks production-grade hardening: no circuit breakers, no automatic fallback chains, no ATS-weighted routing, no per-provider rate-limit tracking, no observability of routing decisions.

Founder ask: harden auto-route to production-grade before launch.

Hardening dimensions

1. Resilience (circuit breakers + fallback chains)

Per-provider circuit breaker: 5 consecutive failures within 60s → OPEN (1-minute cool-off) → HALF-OPEN (1 probe) → CLOSED on success
Failure types: HTTP 5xx, timeout (>30s), connection refused, malformed response
Fallback chain per model class:
- claude-opus-4-7 → fall back to claude-sonnet-4-6 → fall back to gpt-5-5
- gpt-5-5 → fall back to claude-opus-4-7 → fall back to gemini-3-1-pro
- gemini-3-1-pro → fall back to gpt-5-5 → fall back to claude-opus-4-7
- grok-4 → fall back to gpt-5-5 → fall back to claude-opus-4-7
- mistral-large-3 → fall back to mistral-medium-3 → fall back to gpt-5-5
Fallback adds 5% cost penalty to discourage habitual fallback
Per-tenant policy: opt-out of fallback (strict-routing tenants pay 402 instead of falling back)

2. Latency-aware routing

Track p50/p95/p99 latency per provider × model over rolling 1-hour window
For latency-sensitive requests (header X-Ainfera-Priority: latency), prefer lower-latency provider within same model class
Surface latency stats at GET /v1/router/health (admin endpoint)

3. Cost-aware routing

For cost-sensitive requests (header X-Ainfera-Priority: cost), pick cheapest provider within quality threshold (defined by ATS score floor)
Default: balance latency + cost via weighted score score = 0.4*quality + 0.3*latency + 0.3*cost
Configurable per-tenant via tenant_routing_policy table

4. Quality-aware routing (ATS-weighted)

Each (provider, model) has an ATS score (Agent Trust Score for inference quality)
ATS dimensions for routing: Reliability (30%) · Quality (25%) · Cost-efficiency (20%) · Latency (15%) · Compliance (10%)
Compliance veto: if ATS shows zero on Compliance (e.g., provider violates Annex IV reporting), provider is excluded regardless of other scores
ATS scores updated nightly from rolling 24-hour data
Reasoning models get higher quality weight; chat models get higher cost weight

5. Rate-limit awareness

Track per-provider rate limit usage (tokens/min, requests/min) via response headers
When approaching 80% of rate limit window, prefer alternate provider
When at 100%, queue request for next window (max 30s wait) OR fall back
Surface rate-limit status at GET /v1/router/health

6. Reasoning-token floor enforcement (already locked)

Hard rule: max_tokens >= 80 for gpt-5-5, gemini-3-1-pro, claude-opus-4-7
Below floor → 400 Bad Request with helpful error message
Auto-bump option (opt-in via header X-Ainfera-Auto-Bump: true) to silently raise to 80

7. Provider health monitor (background task)

Continuous synthetic probe every 60s per (provider, model) using sacrificial test key
Stores results in provider_health_checks table
Feeds into circuit breaker + latency stats + cost-efficiency calculations
Surfaces in /v1/router/health for transparency

8. Observability — routing decision in audit chain

Every inference call appends a new audit event: routing.decided with:
- Requested model
- Selected provider + model (may differ if fallback)
- Decision rationale (e.g., primary_available, fallback_circuit_open, rate_limit_exceeded)
- Score breakdown (quality, latency, cost, compliance)
- Latency p99 used for decision
- Tenant routing policy applied
Audit-verifiable: customers can prove which provider answered + why

9. Per-tenant routing policy

CREATE TABLE tenant_routing_policy (
  tenant_id uuid PRIMARY KEY REFERENCES tenants(id),
  policy_name text NOT NULL DEFAULT 'balanced',  -- balanced / cost_first / quality_first / latency_first / strict_no_fallback
  cost_weight numeric(3,2) NOT NULL DEFAULT 0.30,
  quality_weight numeric(3,2) NOT NULL DEFAULT 0.40,
  latency_weight numeric(3,2) NOT NULL DEFAULT 0.30,
  fallback_enabled boolean NOT NULL DEFAULT true,
  fallback_cost_penalty_pct numeric(4,2) NOT NULL DEFAULT 5.00,
  compliance_veto_threshold numeric(3,2) NOT NULL DEFAULT 0.50,
  CONSTRAINT weights_sum CHECK (cost_weight + quality_weight + latency_weight = 1.00)
);

10. Graceful degradation

When all preferred providers fail: emit routing.degraded audit event + fall through to backup tier (e.g., Together + Groq) with explicit cost penalty
If all tiers fail: HTTP 503 with detailed reason in body (helps customer's agent retry intelligently)

Architecture

New code paths in api/ainfera_api/services/router.py:

router/
├── core.py              # main route() entry point
├── circuit_breaker.py   # per-provider breaker state machine
├── fallback_chain.py    # model-class → fallback chain mapping
├── scoring.py           # ATS-weighted score computation
├── health_monitor.py    # background synthetic probes
├── rate_limit.py        # per-provider rate-limit tracking
├── policy.py            # tenant_routing_policy lookup + application
└── audit.py             # emit routing.decided events

Provider adapters get standardized failure interface:

class ProviderAdapter:
    async def invoke(self, ...) -> InvocationResult:
        # InvocationResult has:
        #   - status: ok | retriable_error | terminal_error | rate_limited
        #   - latency_ms
        #   - rate_limit_headers (if exposed)
        #   - error_class (for circuit breaker decision)

Phases (sub-tickets after design lock)

Acceptance criteria

Out of scope (this epic)

Provider onboarding tooling (manual Charter v3 update + adapter implementation)
Customer self-service routing policy editor in dashboard (admin API only for v1)
Geographic routing (US-only providers vs EU vs APAC) — post-launch
Provider failover with state migration (e.g., resume streaming after provider death) — Phase 2
Multi-region active-active router — Series A architecture

Sprint targeting

Sprint v1.8 — Launch Window (after Sprint v1.7 closes ~June 30, ships D30–D45 ~mid-July).

ROUTER HARDENING is the production-stability prerequisite for launch. Without it, the first burst of preview-user inferences would expose unhardened paths.

Cross-references

🛣️ L2 Routing — current state
📊 ATS v1.0 — scoring dimensions
📜 AAMC v1.0 — Council voter discipline (no OpenRouter)
🛡️ Tulkas — chaos testing role
📜 L4 Audit — routing.decided event addition

Ontology vocabulary additions (need locking)

RoutingDecision — new audit event sub-type
CircuitBreaker — internal state, not customer-exposed
FallbackChain — model-class to fallback-sequence mapping
TenantRoutingPolicy — new entity
ProviderHealthCheck — new entity

Review in Linear

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.}

^{Reviewed by Cursor Bugbot for commit 825a66e. Configure here.}

cursor · 2026-05-18T13:35:38Z

+        ),
+        sa.Column(
+            "fallback_cost_penalty_pct",
+            sa.Numeric(4, 2),


Numeric(4,2) column cannot store upper CHECK bound 100

Medium Severity

fallback_cost_penalty_pct is typed Numeric(4, 2), which in PostgreSQL supports values up to 99.99 (4 total digits, 2 after decimal → 2 before decimal). The CHECK constraint claims <= 100 is valid, and the PR description specifies bounds [0, 100], but inserting 100 would cause a "numeric field overflow" error before the CHECK is even evaluated. The column type needs to be Numeric(5, 2) to accommodate the documented upper bound.

Additional Locations (1)

alembic/versions/20260518_0013_router_hardening_tables.py#L131-L132

^{Reviewed by Cursor Bugbot for commit 825a66e. Configure here.}

cursor Bot reviewed May 18, 2026

View reviewed changes

hizrianraz deleted the branch feat/ain-153-phase-a-workflows-tasks-schema May 18, 2026 14:53

hizrianraz closed this May 18, 2026

hizrianraz mentioned this pull request May 18, 2026

feat(api): AIN-154 Phase A + AIN-152 Phase A · router hardening + waitlist (re-PR after stacked base collapse) #37

Merged

This was referenced May 23, 2026

chore(api): AIN-243 · purge sweep · retire AAMC vocab from code surface #69

Merged

[test] AIN-243 W6: grep-gate for retired ATS/AAMC/TrustScore terms #92

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): AIN-154 Phase A · router hardening tables (policy + health + breakers)#27

feat(api): AIN-154 Phase A · router hardening tables (policy + health + breakers)#27
hizrianraz wants to merge 1 commit into
feat/ain-153-phase-a-workflows-tasks-schemafrom
feat/ain-154-phase-a-router-hardening-schema

hizrianraz commented May 18, 2026 •

edited by cursor Bot

Loading

Uh oh!

linear-code Bot commented May 18, 2026 •

edited

Loading

Scope

Hardening dimensions

1. Resilience (circuit breakers + fallback chains)

2. Latency-aware routing

3. Cost-aware routing

4. Quality-aware routing (ATS-weighted)

5. Rate-limit awareness

6. Reasoning-token floor enforcement (already locked)

7. Provider health monitor (background task)

8. Observability — routing decision in audit chain

9. Per-tenant routing policy

10. Graceful degradation

Architecture

Phases (sub-tickets after design lock)

Acceptance criteria

Out of scope (this epic)

Sprint targeting

Cross-references

Ontology vocabulary additions (need locking)

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hizrianraz commented May 18, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tables

Phase 0 decisions all locked in DB

Stack note

Pre-commit hooks

Test plan

Refs

Uh oh!

linear-code Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scope

Hardening dimensions

1. Resilience (circuit breakers + fallback chains)

2. Latency-aware routing

3. Cost-aware routing

4. Quality-aware routing (ATS-weighted)

5. Rate-limit awareness

6. Reasoning-token floor enforcement (already locked)

7. Provider health monitor (background task)

8. Observability — routing decision in audit chain

9. Per-tenant routing policy

10. Graceful degradation

Architecture

Phases (sub-tickets after design lock)

Acceptance criteria

Out of scope (this epic)

Sprint targeting

Cross-references

Ontology vocabulary additions (need locking)

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 18, 2026

Choose a reason for hiding this comment

Numeric(4,2) column cannot store upper CHECK bound 100

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hizrianraz commented May 18, 2026 •

edited by cursor Bot

Loading

linear-code Bot commented May 18, 2026 •

edited

Loading