Skip to content

feat(api): AIN-154 Phase A · router hardening tables (policy + health + breakers)#27

Closed
hizrianraz wants to merge 1 commit into
feat/ain-153-phase-a-workflows-tasks-schemafrom
feat/ain-154-phase-a-router-hardening-schema
Closed

feat(api): AIN-154 Phase A · router hardening tables (policy + health + breakers)#27
hizrianraz wants to merge 1 commit into
feat/ain-153-phase-a-workflows-tasks-schemafrom
feat/ain-154-phase-a-router-hardening-schema

Conversation

@hizrianraz
Copy link
Copy Markdown
Contributor

@hizrianraz hizrianraz commented May 18, 2026

Summary

  • Lands the three persistent stores for L2 router production hardening per AIN-154 Phase 0 prep
  • All Phase 0 founder-recommended decisions (D-154-1 weights / D-154-2 veto / D-154-3 penalty) encoded as DEFAULT + CHECK constraints at the DB layer

Tables

Table Purpose PK
tenant_routing_policy Per-tenant routing weight + fallback config (5 preset policies CHECK-locked) tenant_id
provider_health_checks Synthetic probe results, rolling UUID surrogate
circuit_breakers Per-(provider, model) state machine (CLOSED / OPEN / HALF_OPEN) composite (provider_slug, model_slug)

Phase 0 decisions all locked in DB

Decision DB enforcement
D-154-1 weights (Q 0.40 / L 0.30 / C 0.30) DEFAULT + sum=1.00 CHECK
D-154-2 veto threshold 0.50 DEFAULT + bounds [0, 1] CHECK
D-154-3 fallback penalty 5% DEFAULT + bounds [0, 100] CHECK
5 preset policy names CHECK locked vocabulary
breaker state {CLOSED, OPEN, HALF_OPEN} CHECK locked vocabulary

Stack note

  • Base: feat/ain-153-phase-a-workflows-tasks-schema (PR api#26)
  • Migration 0013 chains off 0012 — when api#26 merges, this auto-rebases to main

Pre-commit hooks

ruff + ruff format + mypy --strict + pytest -x — all green.

Test plan

  • Migration follows existing alembic pattern
  • Pre-commit hooks pass
  • CI green
  • Deploy: alembic upgrade head after merge (no backfill required)
  • Phase B follow-up: circuit_breaker state machine + fallback chain + ATS scoring (per ticket Phase A through M)

Refs

  • AIN-154 (parent — Sprint v1.8 router hardening)
  • AIN-154 Phase 0 prep comment (2026-05-18 PM)
  • AIN-153 (PR api#26 — base branch)

🤖 Generated with Claude Code


Note

Medium Risk
Adds new Postgres tables and constraints used by routing decisions; while isolated from existing data, schema/constraint mistakes could block writes or degrade query performance once the router starts using them.

Overview
Adds Alembic migration 20260518_0013_router_hardening_tables.py to introduce three new persistent stores for router hardening: tenant_routing_policy, provider_health_checks, and circuit_breakers.

The migration encodes Phase 0 routing defaults and guardrails at the DB layer via server defaults + CHECK constraints (policy name vocabulary, weights sum/bounds, fallback penalty/veto bounds), plus adds indexes for common health-check recency queries and breaker state lookup; downgrade drops indexes/tables in reverse order.

Reviewed by Cursor Bugbot for commit 825a66e. Bugbot is set up for automated code reviews on this repo. Configure here.

… + breakers)

Lands the three persistent stores for L2 router production hardening
per AIN-154 Phase 0 prep:

## Tables

### tenant_routing_policy
- One row per tenant (PK = tenant_id, CASCADE on tenant delete)
- 5 preset `policy_name` values locked via CHECK
- Default weights Q 0.40 / L 0.30 / C 0.30 sum=1.00 per D-154-1
- weights_sum_lock CHECK enforces invariant at DB layer
- compliance_veto_threshold default 0.50 per D-154-2
- fallback_cost_penalty_pct default 5% per D-154-3
- Bounds CHECKs on all numeric fields

### provider_health_checks
- Synthetic probe results, one row per probe
- (provider_slug, model_slug) NOT FK — supports soft-deleted catalog
  rows per AIN-141 archival semantics
- outcome CHECK locked to 5 values matching InvocationResult.status
  from AIN-154 architecture
- Composite DESC index on (provider, model, probed_at) serves both
  dashboard latency rollup + routing decision read path

### circuit_breakers
- One row per (provider, model) — composite PK, no surrogate
- 3-state machine (CLOSED / OPEN / HALF_OPEN) CHECK-locked
- Tracks opened_at + half_open_at + closed_at timestamps for the
  60s cool-off window logic
- consecutive_failures + trip_count + last_failure_at + last_success_at
  for the 5-fail-in-60s trip rule quick-path
- State index for "show me all open breakers" dashboard query

## Phase 0 decisions all locked in DB

D-154-1, D-154-2, D-154-3 encoded as DEFAULT + CHECK so the DB enforces
the policy invariants — app layer can override per tenant but defaults
match the founder-recommended values from Phase 0 prep.

## What this migration does NOT include (Phase B+ scope)

- Per-provider rate-limit tracking — lives in Redis (ephemeral, TTL)
  not Postgres; durable trace surfaces via audit chain
- ATS scores table — separate sprint deliverable (Phase E of AIN-154)
- routing.decided audit event integration — separate PR; will reference
  workflow_id from AIN-153 migration 0012

## Stack note

Migration 0013 chains off 0012 (AIN-153 Phase A). This PR's base is
the AIN-153 PR branch so the migration chain stays linear. When AIN-153
merges to main, AIN-154 PR auto-rebases.

## Cross-refs

- AIN-154 (parent epic — Sprint v1.8 router hardening)
- AIN-154 Phase 0 prep comment (2026-05-18 PM)
- AIN-153 Phase A (PR api#26 — chained parent migration)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 18, 2026

AIN-154 [L2] Auto-route hardening: circuit breakers, fallback chains, ATS-weighted routing, observability

Parent epic for Sprint v1.8 Launch Prerequisites. Founder locked 2026-05-18 PM.

Scope

Today's L2 Routing does provider-neutral dispatch across 10 providers + AAMC voter selection. It works in happy path but lacks production-grade hardening: no circuit breakers, no automatic fallback chains, no ATS-weighted routing, no per-provider rate-limit tracking, no observability of routing decisions.

Founder ask: harden auto-route to production-grade before launch.

Hardening dimensions

1. Resilience (circuit breakers + fallback chains)

  • Per-provider circuit breaker: 5 consecutive failures within 60s → OPEN (1-minute cool-off) → HALF-OPEN (1 probe) → CLOSED on success
  • Failure types: HTTP 5xx, timeout (>30s), connection refused, malformed response
  • Fallback chain per model class:
    • claude-opus-4-7 → fall back to claude-sonnet-4-6 → fall back to gpt-5-5
    • gpt-5-5 → fall back to claude-opus-4-7 → fall back to gemini-3-1-pro
    • gemini-3-1-pro → fall back to gpt-5-5 → fall back to claude-opus-4-7
    • grok-4 → fall back to gpt-5-5 → fall back to claude-opus-4-7
    • mistral-large-3 → fall back to mistral-medium-3 → fall back to gpt-5-5
  • Fallback adds 5% cost penalty to discourage habitual fallback
  • Per-tenant policy: opt-out of fallback (strict-routing tenants pay 402 instead of falling back)

2. Latency-aware routing

  • Track p50/p95/p99 latency per provider × model over rolling 1-hour window
  • For latency-sensitive requests (header X-Ainfera-Priority: latency), prefer lower-latency provider within same model class
  • Surface latency stats at GET /v1/router/health (admin endpoint)

3. Cost-aware routing

  • For cost-sensitive requests (header X-Ainfera-Priority: cost), pick cheapest provider within quality threshold (defined by ATS score floor)
  • Default: balance latency + cost via weighted score score = 0.4*quality + 0.3*latency + 0.3*cost
  • Configurable per-tenant via tenant_routing_policy table

4. Quality-aware routing (ATS-weighted)

  • Each (provider, model) has an ATS score (Agent Trust Score for inference quality)
  • ATS dimensions for routing: Reliability (30%) · Quality (25%) · Cost-efficiency (20%) · Latency (15%) · Compliance (10%)
  • Compliance veto: if ATS shows zero on Compliance (e.g., provider violates Annex IV reporting), provider is excluded regardless of other scores
  • ATS scores updated nightly from rolling 24-hour data
  • Reasoning models get higher quality weight; chat models get higher cost weight

5. Rate-limit awareness

  • Track per-provider rate limit usage (tokens/min, requests/min) via response headers
  • When approaching 80% of rate limit window, prefer alternate provider
  • When at 100%, queue request for next window (max 30s wait) OR fall back
  • Surface rate-limit status at GET /v1/router/health

6. Reasoning-token floor enforcement (already locked)

  • Hard rule: max_tokens >= 80 for gpt-5-5, gemini-3-1-pro, claude-opus-4-7
  • Below floor → 400 Bad Request with helpful error message
  • Auto-bump option (opt-in via header X-Ainfera-Auto-Bump: true) to silently raise to 80

7. Provider health monitor (background task)

  • Continuous synthetic probe every 60s per (provider, model) using sacrificial test key
  • Stores results in provider_health_checks table
  • Feeds into circuit breaker + latency stats + cost-efficiency calculations
  • Surfaces in /v1/router/health for transparency

8. Observability — routing decision in audit chain

  • Every inference call appends a new audit event: routing.decided with:
    • Requested model
    • Selected provider + model (may differ if fallback)
    • Decision rationale (e.g., primary_available, fallback_circuit_open, rate_limit_exceeded)
    • Score breakdown (quality, latency, cost, compliance)
    • Latency p99 used for decision
    • Tenant routing policy applied
  • Audit-verifiable: customers can prove which provider answered + why

9. Per-tenant routing policy

CREATE TABLE tenant_routing_policy (
  tenant_id uuid PRIMARY KEY REFERENCES tenants(id),
  policy_name text NOT NULL DEFAULT 'balanced',  -- balanced / cost_first / quality_first / latency_first / strict_no_fallback
  cost_weight numeric(3,2) NOT NULL DEFAULT 0.30,
  quality_weight numeric(3,2) NOT NULL DEFAULT 0.40,
  latency_weight numeric(3,2) NOT NULL DEFAULT 0.30,
  fallback_enabled boolean NOT NULL DEFAULT true,
  fallback_cost_penalty_pct numeric(4,2) NOT NULL DEFAULT 5.00,
  compliance_veto_threshold numeric(3,2) NOT NULL DEFAULT 0.50,
  CONSTRAINT weights_sum CHECK (cost_weight + quality_weight + latency_weight = 1.00)
);

10. Graceful degradation

  • When all preferred providers fail: emit routing.degraded audit event + fall through to backup tier (e.g., Together + Groq) with explicit cost penalty
  • If all tiers fail: HTTP 503 with detailed reason in body (helps customer's agent retry intelligently)

Architecture

New code paths in api/ainfera_api/services/router.py:

router/
├── core.py              # main route() entry point
├── circuit_breaker.py   # per-provider breaker state machine
├── fallback_chain.py    # model-class → fallback chain mapping
├── scoring.py           # ATS-weighted score computation
├── health_monitor.py    # background synthetic probes
├── rate_limit.py        # per-provider rate-limit tracking
├── policy.py            # tenant_routing_policy lookup + application
└── audit.py             # emit routing.decided events

Provider adapters get standardized failure interface:

class ProviderAdapter:
    async def invoke(self, ...) -> InvocationResult:
        # InvocationResult has:
        #   - status: ok | retriable_error | terminal_error | rate_limited
        #   - latency_ms
        #   - rate_limit_headers (if exposed)
        #   - error_class (for circuit breaker decision)

Phases (sub-tickets after design lock)

  • Phase A — Circuit breaker per provider + state persistence (Redis) + 5-fail-in-60s trip rule
  • Phase B — Fallback chain per model class + 5% cost penalty + per-tenant opt-out
  • Phase C — Latency tracking (rolling 1hr p50/p95/p99) + latency-aware selection
  • Phase D — Cost-aware selection (within quality threshold)
  • Phase E — ATS-weighted scoring + nightly ATS update job
  • Phase F — Rate-limit tracking (per provider × model) + queue/fallback decision logic
  • Phase G — Reasoning-floor enforcement (hard error vs auto-bump opt-in)
  • Phase H — Provider health monitor (60s synthetic probes + provider_health_checks table)
  • Phase I — Audit event routing.decided with full decision rationale
  • Phase J — tenant_routing_policy table + lookup + 5 preset policies
  • Phase K — Graceful degradation: backup tier + routing.degraded event + 503 with reason
  • Phase L — Admin endpoints: GET /v1/router/health, GET /v1/router/policies
  • Phase M — Tulkas-driven chaos test: kill each provider in turn, verify fallback chain
  • Phase N — Documentation: customer-facing /docs/router explaining policies + how to set them

Acceptance criteria

  • Provider returning 5xx 5 times in 60s → circuit OPEN, audit event emitted, fallback engaged on next request
  • Circuit HALF-OPEN after 60s, single probe → CLOSED on success or back to OPEN on fail
  • Customer can set X-Ainfera-Priority: cost and verify cheaper provider selected in audit log
  • Customer can verify in audit log: requested model + actual provider + score breakdown + rationale
  • Rate-limit-aware: synthetic 100-call burst within rate window → some requests routed to alternate provider
  • Tulkas chaos test: kill anthropic + openai simultaneously, claude-opus-4-7 request → falls back to gemini-3-1-pro → 200 + cost penalty applied
  • GET /v1/router/health returns current circuit states + latency stats + rate-limit utilization + last 100 routing decisions (sanitized)
  • Reasoning floor: max_tokens: 20 for claude-opus-4-7 → 400 with helpful error; with X-Ainfera-Auto-Bump: true → silent bump to 80 + audit event
  • All routing decisions auditable: customer can prove every inference's routing rationale
  • AAMC voters: NEVER use OpenRouter, NEVER fall back outside Council set (binding lock — Tulkas verifies)

Out of scope (this epic)

  • Provider onboarding tooling (manual Charter v3 update + adapter implementation)
  • Customer self-service routing policy editor in dashboard (admin API only for v1)
  • Geographic routing (US-only providers vs EU vs APAC) — post-launch
  • Provider failover with state migration (e.g., resume streaming after provider death) — Phase 2
  • Multi-region active-active router — Series A architecture

Sprint targeting

Sprint v1.8 — Launch Window (after Sprint v1.7 closes ~June 30, ships D30–D45 ~mid-July).

ROUTER HARDENING is the production-stability prerequisite for launch. Without it, the first burst of preview-user inferences would expose unhardened paths.

Cross-references

Ontology vocabulary additions (need locking)

  • RoutingDecision — new audit event sub-type
  • CircuitBreaker — internal state, not customer-exposed
  • FallbackChain — model-class to fallback-sequence mapping
  • TenantRoutingPolicy — new entity
  • ProviderHealthCheck — new entity

Review in Linear

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.

Reviewed by Cursor Bugbot for commit 825a66e. Configure here.

),
sa.Column(
"fallback_cost_penalty_pct",
sa.Numeric(4, 2),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numeric(4,2) column cannot store upper CHECK bound 100

Medium Severity

fallback_cost_penalty_pct is typed Numeric(4, 2), which in PostgreSQL supports values up to 99.99 (4 total digits, 2 after decimal → 2 before decimal). The CHECK constraint claims <= 100 is valid, and the PR description specifies bounds [0, 100], but inserting 100 would cause a "numeric field overflow" error before the CHECK is even evaluated. The column type needs to be Numeric(5, 2) to accommodate the documented upper bound.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 825a66e. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant