feat(api): SP-4 PR-A · forward capture-coverage guard (AIN-244 instrumentation) by hizrianraz · Pull Request #73 · ainfera-ai/api

hizrianraz · 2026-05-23T17:14:46Z

Summary

Adds the durable forward-coverage guarantee for §16 capture. Every routed dispatch — canonical `ainfera-inference` OR any of the 3 SP-1 aliases — writes exactly one `routing_outcomes` row regardless of outcome (success / reject / fallback / fail). Pinned passthroughs write zero AND carry a `router: "direct"` audit marker.

Stacks on SP-2 api#72. Base is `feat/ain271-streaming-tooluse` so this PR can verify the stream-close capture path SP-2 added.

Moat-sensitive scope · pure observability

ZERO change to routing decisions, scores, weights, thresholds, ordering, M_allowed, q_prior, q_empirical, ruleset_hash. `git diff feat/ain271-streaming-tooluse..HEAD -- ainfera_api/services/routing*.py ainfera_api/routing/` returns empty.
`routing_outcomes` schema unchanged; no migration.
`routing/ainfera_routing/decide.py` untouched.

§0/P2 denominator finding

Live read against Supabase `dftfpwzqxoebwzepygzl`: 778 historical inferences / 5 outcome rows / 0 historical routed_request. All 778 were pinned passthroughs (vendor slugs). The 773-row "gap" is honest fleet posture (AULE_PLANNER / YAVANNA_X_MODEL opt-outs), not a capture failure. No backfill is owed (§D3). PR-A's value is the forward guarantee.

What's new

`services/capture_invariant.py` — `route_outcome_kind`, `assert_capture_invariant`, `find_passthrough_audit_event`, `DispatchCaptureCounter` (headline signal `dispatch_without_capture_total` stays 0 in green builds).
`tests/unit/test_capture_invariant.py` — 9 pure tests locking classifier + counter semantics.
`tests/integration/test_capture_coverage.py` — parametrized sweep across canonical + 3 aliases × {success, reject-floor} + 2 vendor slugs (passthrough). Coverage-sweep test asserts `dispatch_without_capture_total == 0`.

Pre-commit

ruff + ruff-format + mypy --strict + pytest unit+smoke = 523 green.

Test plan

Branch DB: `RUN_INTEGRATION=1 pytest tests/integration/test_capture_coverage.py -v` → every exit path's capture status confirmed
Post-deploy: a fresh non-stream and a fresh stream routed call against prod each add a row to `routing_outcomes` (newest, `outcome_status='succeeded'`)
A pinned-vendor call against prod adds 0 rows and a `router: "direct"` audit event

🤖 Generated with Claude Code

Note

Low Risk
Low risk: adds a new invariant-checking helper module and new unit/integration tests without changing routing/dispatch behavior or database schema; main impact is additional integration test coverage that could fail if capture/audit semantics drift.

Overview
Adds a new services/capture_invariant.py guard module to classify requests as routed vs passthrough, assert expected routing_outcomes row counts, and track a process-local regression counter (dispatch_without_capture_total).

Introduces unit tests for the classifier/counter and an integration sweep that exercises routed success + routed reject-floor + vendor-slug passthroughs, asserting routed calls always write exactly one routing_outcomes row while passthroughs write zero and include a router: "direct" audit marker.

^{Reviewed by Cursor Bugbot for commit e6ee280. Bugbot is set up for automated code reviews on this repo. Configure here.}

linear-code · 2026-05-23T17:14:49Z

AIN-244 [Routing] Engine build — v0 → v1 → v2 (methodology v1.2 → flagship)

Turn the ainfera-ai/routing repo from a demo shell into the real routing brain. Phase 2 of the build order (methodology → flagship engine → platform bugs → platform → OS+agents).

Starting point: AIN-188 shipped the scaffold (README, templates/, schema/routing-policy.schema.json, LICENSE) + the E2E demo + policy editor (PR #49). The engine — q_prior, constrained objective, §16 capture, replay — is NOT in the repo yet. This epic builds it.

Methodology: Routing v1.2. v0 Build Spec (Notion): https://www.notion.so/368b49507d6c8111bb49edec97f72841

🔒 FLAGSHIP NAME — LOCKED 2026-05-23 (Discipline #12, founder-authorized)

Model string (API contract): ainfera-mithril — replaces ainfera-auto. Keep ainfera-auto as a silent alias → ainfera-mithril during a grace period (NOT a hard-cut; nothing breaks mid-flight). Hyphenated + brand-namespaced, consistent with catalog convention (gpt-5-5, claude-opus-4-7).
Product / brand name (human): "Mithril" — replaces "Ainfera Routing" on web/marketing. Pattern = GPT (brand) / gpt-5.5 (string).
Prime-brokerage posture (OpenRouter differentiator): Ainfera researches 24/7 for the optimum intelligence-cost route; an agent points at ainfera-mithril by default and trusts the researched decision — vendor passthrough is the explicit opt-out (inverse of OpenRouter, where the user picks the model and passthrough is default). The routing decision IS the product.
Future tiers reserved under the Mithril brand. Single flagship now.
Applies across: AIN-226 (gateway serves the string), AIN-245 (v0 router), AIN-243 (web/marketing brand migration "Ainfera Routing"→"Mithril"), SDK + quickstart + /routing page.

Children

v0 (AIN-245, ✅ Done) — static router: q_prior seed (5 frontier models) + M_allowed veto → quality-floor-then-min-cost + §16 write-only capture + exact-match cache + provider fallback + drain-proof budgets + deterministic replay. No learning.
v1 (AIN-246) — LinUCB online learning; q_empirical goes live and overrides q_prior as outcome data accrues (seeded by the internal fleet per AIN-210).
v2 (AIN-247) — cross-attention contextual feature model; A/B vs. v1 baseline.

Done (epic)

v0 shipped to ainfera-ai/routing with curl-200 + deterministic replay + full §16 capture (✅ AIN-245); v1/v2 scoped and gated behind v0 outcome volume. Flagship name ainfera-mithril live on the gateway (AIN-226) + migrated across SDK/web.

Review in Linear

cursor · 2026-05-23T17:20:56Z

+    expected_outcome_rows: int
+    actual_outcome_rows: int
+    note: str = ""
+


Frozen dataclass Exception produces empty error messages

Medium Severity

CaptureInvariantViolationError is @dataclass(frozen=True) extending Exception. The dataclass-generated __init__ never calls Exception.__init__(), so self.args remains an empty tuple. This means str(exception) returns '', making the detailed note field (which contains debugging instructions) completely invisible in tracebacks, log messages, and pytest failure output. When this exception is eventually raised, developers will see an empty message after the class name instead of the carefully written diagnostic text.

^{Reviewed by Cursor Bugbot for commit 107bd5b. Configure here.}

…atches Adds the durable forward-coverage guarantee for §16 capture: every routed dispatch (canonical `ainfera-inference` OR any of the 3 SP-1 aliases) writes exactly one `routing_outcomes` row, regardless of outcome (success / reject / fallback / fail). Pinned passthroughs (vendor slugs) write zero AND carry a `router: "direct"` audit marker. Stacks on SP-2 PR-A (`feat/ain271-streaming-tooluse`, api#72) — that PR's stream-close capture path is the last exit covered by this guard. ## Moat-sensitive scope (read this first) This PR is **pure observability**. Per the SP-4 §1 guardrails: - ZERO change to routing decisions, scores, weights, thresholds, candidate ordering, `M_allowed`, `q_prior`, `q_empirical`, ruleset_hash. The diff against `services/routing_brain.py` and `services/routing.py` is **empty**. Verifiable: `git diff feat/ain271-streaming-tooluse..HEAD -- ainfera_api/services/routing*.py` shows no hunks. - `routing_outcomes` schema is unchanged. No new columns, no migration. The row is written by the existing `insert_decision()` / `complete_decision()` calls in `dispatch_with_brain` (§0/P3 walk-through confirmed every exit path already writes the row). - `routing/ainfera_routing/decide.py` is untouched. ## What's new 1. `ainfera_api/services/capture_invariant.py`: - `route_outcome_kind(model_slug) -> "routed" | "passthrough"` — pure classifier keyed off the SP-1 alias resolver's `ROUTING_TARGETS`, so any string added to the resolver becomes "routed" without a second edit. - `assert_capture_invariant(db, inference_id, kind)` — read-only post-condition check the test sweep runs after every probe. Raises `CaptureInvariantViolationError` with diagnostic context when a routed call returns without a row or a passthrough produces one unexpectedly. - `find_passthrough_audit_event()` — helper for the test sweep to assert the `router: "direct"` marker is present. - `DispatchCaptureCounter.dispatch_without_capture_total` — the headline regression signal. Stays 0 in green builds; production scrape (future Prometheus surface) alerts on any non-zero. 2. `tests/unit/test_capture_invariant.py` — 9 pure tests locking the classifier (canonical + 3 aliases → routed; vendor slugs + typos → passthrough) + the counter semantics (routed-miss bumps the regression signal; passthrough-captured-unexpectedly bumps the contamination signal; reset zeros everything). 3. `tests/integration/test_capture_coverage.py` — parametrized sweep that drives a routed-success call for EACH of the 4 routing targets, a reject-floor routed call, and passthrough calls against two vendor slugs (anthropic native + openai). After each, asserts: - routed success → exactly 1 routing_outcomes row, `outcome_status='succeeded'` - reject path → 1 row, `outcome_status='rejected_floor'`, `inference_id IS NULL` (the only branch where it's NULL by design — see RoutingOutcomeORM docstring) - passthrough → 0 rows AND `router: "direct"` in the audit chain (distinguishes a properly-bypassed passthrough from a routed call that silently lost its row) Plus a coverage-sweep test that asserts `DispatchCaptureCounter.dispatch_without_capture_total == 0` at the end of a mixed dispatch sequence. ## §0/P2 denominator finding (documented for the audit chain) Live read against Supabase `dftfpwzqxoebwzepygzl`: - 778 historical inferences / 5 routing_outcomes rows - 0 historical `request_payload.model` was a routing string (ainfera-inference / ainfera-mithril / ainfera-auto / ainfera/auto) - ALL 778 were pinned passthroughs — vendor slugs (claude-opus-4-7 x220, gpt-5-5 x189, claude-haiku-4-5 x105, ...) - The 3 succeeded outcome rows are integration-test side effects **The 773-row "gap" is honest fleet posture, not a capture failure.** The fleet's been on pinned passthroughs (AULE_PLANNER / YAVANNA_X_MODEL opt-outs). No backfill is owed (§D3). PR-A's value is the forward GUARANTEE: every NEW routed call going forward writes exactly one row. ## Pre-commit ruff + ruff-format + mypy --strict + pytest tests/unit + tests/smoke all green (523 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cal, quality/cost gated (AIN-244 dial) Exposes `routing_preference: "quality" | "balanced" | "cost"` in the routing_hint body as sugar over the existing caps. **`balanced` is byte-identical to today's behavior** (the dial is a no-op when balanced is selected — proved by the parametrized regression lock in the test file). **`quality` / `cost` are accepted on the wire but INERT** until the env gate `AINFERA_ROUTING_PREFERENCE_LIVE=1` is set (founder Disc#12 authorization of the lever values). Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent of SP-4 PR-A (#73 capture-coverage). ## Moat-sensitive scope · Disc#12 boundary This PR is Disc#12-adjacent — the dial CAN change routing decisions once the env gate is on. To stay safe: - The default (gate OFF) means `quality`/`cost` resolve to today's policy IDENTICALLY to `balanced`. SP-4 ships with the gate OFF. - Explicit caller `min_quality` always wins. The dial only nudges the default-derived floor — a quality-conscious caller never has their floor silently lowered by a `cost` preference. - Safety clamps: dial output is bounded by [good=0.50, frontier=0.85] so neither lever can exclude every voter or admit a sub-floor model. - Pure-function `_apply_preference()` is deterministic — same input → same output, testable without the brain. ## Proposed mapping (Aulë's conservative starting point — founder authorizes) `balanced` — no-op. Resolves exactly as today. `quality` — bump default min_quality by +0.10 (default 0.50 → 0.60), clamped to the `frontier` tier (0.85). Caller's explicit `min_quality` wins if higher. `cost` — drop default min_quality by -0.10, clamped to the `good` tier (0.50). Caller's explicit `min_quality` wins if higher. Both bumps are conservative: ≤0.10 delta, with hard safety clamps. No weighted-λ, no score surgery, no candidate-ordering changes. The dial moves the FLOOR; the engine still picks cheapest-clearing-floor. The founder reviews + authorizes the exact lever values in this PR. Once signed off, `railway env set AINFERA_ROUTING_PREFERENCE_LIVE=1` on the api service flips the gate ON. Until then, only `balanced` ships live behavior. ## What's new - `services/routing_brain.py`: - `VALID_PREFERENCES` frozenset + `DEFAULT_PREFERENCE = "balanced"`. - `_apply_preference(base_min_q, preference) -> Decimal` — pure function honoring the gate-off semantic. - `_routing_preference_live()` — env-var read at call time so ops can flip the gate without restart. - `_PREFERENCE_FLOOR_DELTA` + safety clamps `_SAFETY_MIN_QUALITY` + `_SAFETY_MAX_QUALITY` (= good / frontier tier numerics). - `resolve_policy()` reads `routing_preference` from the hint and applies the dial ONLY when the caller did NOT pass an explicit `min_quality` — preserves caller-intent-wins semantics. - `models/inference.py`: `InferenceRequest.routing_hint` description documents the new key (so it surfaces in openapi.json). - `tests/unit/test_routing_preference_dial.py`: - 8-case parametrized **byte-identical regression lock** for `balanced` — the moat invariant. Any divergence fails the build. - Dial-inert-when-gate-off coverage × all 3 preferences. - Dial-active mapping × bumps + clamps + explicit-caller-wins. - Unknown / typo preference values fall through to `balanced`. - 23 tests; all pure (no DB). ## Pre-commit ruff + ruff-format + mypy --strict + pytest unit+smoke = 528 green. ## Out of scope (per SP-4 §1) - methodology v1.3 changes - weights / λ-blending - online learning (AIN-246 — Backlog/deferred) - `M_allowed` / `q_prior` / `q_empirical` semantics - engine code in `routing/ainfera_routing/decide.py` — untouched ## Public copy (founder/Varda) Drafted README/STRATEGY paragraph for the routing repo describing the dial — see `docs/routing-preference.md` in the next PR after founder sign-off on the mapping values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er-gated) Adds three pieces of scaffolding for the dark-host activation pass. **Activates nothing. Zero schema change. Zero catalog change.** Per the SP-4 §1 moat guardrails, this PR ships ONLY founder-gated artifacts. Stacks on SP-2 api#72 (\`feat/ain271-streaming-tooluse\`); independent of PR-A (#73) and PR-B (#74). ## What's new ### 1. \`scripts/dark_host_smoke.py\` — adapter smoke harness A CLI that exercises the existing ProviderAdapter against a (provider, upstream_model, base_url) target and prints a JSON latency/cost/shape report. Two consecutive \`.chat()\` calls give a coarse cold-vs-warm variance read. - Reads keys from env (Doppler-injectable) — never argv. - Covers the 5 open-weight venues (Groq, DeepInfra, Together, Fireworks, Novita) + Anthropic for parity check. - Returns JSON-serializable error dicts on every failure mode (no bare exceptions to stderr) so the founder can pipe the output straight into the activation runbook as evidence. - **Aulë does NOT run this** — the harness needs live provider credits (~\$45 total: DeepInfra \$15 + Together \$15 + Fireworks \$10 + Groq \$0 + Novita \$5) + Doppler keys. Founder runs it after topping up. ### 2. \`docs/dark-host-activation-runbook.md\` — the 4-phase tap The exact, ordered steps to light one (logical-model, venue) row: Phase 1 — smoke (founder, no DB): run the harness per venue, save the JSON reports for §16 audit. Phase 2 — Model x Host ontology decision (Disc#12): see proposal below; founder picks Path A / B / C. Phase 3 — activation migration TEMPLATE (not yet a real alembic file — lives as a snippet in the doc to keep the \`alembic/versions/\` directory clean until authorized). Parametrized on slug, upstream_model, costs, q_prior, brand. Phase 4 — verify (post-deploy): catalog row active, brain enrols it, audit chain intact. Rollback = \`alembic downgrade -1\`. The runbook is explicit that activation is **founder-gated** on three signals: credits + Doppler keys + ontology authorization. ### 3. \`docs/dark-host-ontology-proposal.md\` — Disc#12 schema decision Lays out 3 schema paths for representing the same logical model on multiple hosts (verified live: 0 cross-host slugs today; the schema is operationally one-model-one-host): Path A — flat \`models\` table, venue-suffixed slugs (\`llama-3.3-70b-groq\`). Lightest migration; zero engine change. Path B — \`model_hosts\` M:N junction. Cleanest semantics; biggest migration; touches \`routing_outcomes\` (§16 schema — violates SP-4 §1 immutability unless additive). Path C — Path A + nullable \`models.logical_slug\` for cross-venue aggregates. Aulë's recommendation: **Path A** for the SP-4 activation pass. Migrate to Path B in a follow-up sprint when the multi-host catalog density justifies the §16-additive migration. Four Disc#12 questions for the founder are listed at the bottom of the proposal. Activation runbook stays parked until they're answered. ## §0/P5 finding (documented for the audit chain) Live read against Supabase \`dftfpwzqxoebwzepygzl\`: - 47 inactive models distributed across 10 providers (novita 9 + deepinfra 6 + together 6 + gemini 5 + groq 5 + openai 4 + anthropic 3 + fireworks 3 + mistral 3 + xai 3). - **0 model slugs appear across multiple providers** — confirms one-model-one-host today. The Model x Host ontology change IS a real schema migration; PR-C ships ONLY the proposal doc. ## Pre-commit ruff + ruff-format + mypy --strict + pytest unit+smoke = 505 green. Zero new tests in this PR — the smoke harness is exercised against live providers (founder-run); the runbook + ontology are docs. ## Out of scope (per SP-4 §1 moat guardrails) - \`routing_outcomes\` schema — immutable, untouched. - The routing engine in \`routing/ainfera_routing/decide.py\` — untouched. - \`models\` schema — untouched. - Catalog activation — no model becomes \`active=true\` from this PR. - Online learning (AIN-246) — Backlog/deferred. - M_allowed / q_prior / q_empirical semantics. ## Founder action to unblock 1. \$45 credits across the 5 open-weight venues (DeepInfra \$15 + Together \$15 + Fireworks \$10 + Groq \$0 + Novita \$5). 2. Doppler keys mirroring those into the api Doppler env. 3. Disc#12 authorization of the Model x Host ontology path (the 4 questions at the bottom of the proposal doc). Once all three are in place, run the smoke harness per venue, then materialize the activation migration template into an actual alembic file and apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.}

^{Reviewed by Cursor Bugbot for commit 3237f7a. Configure here.}

cursor · 2026-05-23T23:08:51Z

+                "of services/routing_brain.dispatch_with_brain for a "
+                "missing complete_decision()."
+            ),
+        )


Docstring promises reject-path handling that isn't implemented

Low Severity

assert_capture_invariant documents that for routed inferences it handles the reject-path case where inference_id is NULL on the routing_outcomes row (by falling back to an audit-event join). However, the implementation only queries WHERE inference_id == inference_id, which will never match a NULL row in SQL. The documented "OR" branch is never executed — it unconditionally raises CaptureInvariantViolationError for any reject-path inference. A future caller trusting the docstring would receive false-positive violations for every reject case.

^{Reviewed by Cursor Bugbot for commit 3237f7a. Configure here.}

…raw string) Same class as the dashboard.py:127 fix landed in #71. The capture-invariant service + integration test compared `AuditEventORM.event_type == "inference_routed"` (underscored Python name), but the actual DB enum value is `inference.routed` (dotted) per migration 20260514_0001. Postgres rejected the literal with: invalid input value for enum audit_event_type: "inference_routed" Fix: pass `AuditEventType.inference_routed` (the enum *member*) instead of the raw string — SQLAlchemy's `values_callable` resolves it to the correct DB value (`inference.routed`). Docstring updated to spell the dotted form for any future reader. Unblocks the SP-4 PR-A integration tests: test_capture_coverage.py::test_passthrough_writes_zero_outcome_rows_and_router_direct_audit No engine touch, no routing_outcomes touch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cal, quality/cost gated (AIN-244 dial) (#74) Exposes `routing_preference: "quality" | "balanced" | "cost"` in the routing_hint body as sugar over the existing caps. **`balanced` is byte-identical to today's behavior** (the dial is a no-op when balanced is selected — proved by the parametrized regression lock in the test file). **`quality` / `cost` are accepted on the wire but INERT** until the env gate `AINFERA_ROUTING_PREFERENCE_LIVE=1` is set (founder Disc#12 authorization of the lever values). Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent of SP-4 PR-A (#73 capture-coverage). ## Moat-sensitive scope · Disc#12 boundary This PR is Disc#12-adjacent — the dial CAN change routing decisions once the env gate is on. To stay safe: - The default (gate OFF) means `quality`/`cost` resolve to today's policy IDENTICALLY to `balanced`. SP-4 ships with the gate OFF. - Explicit caller `min_quality` always wins. The dial only nudges the default-derived floor — a quality-conscious caller never has their floor silently lowered by a `cost` preference. - Safety clamps: dial output is bounded by [good=0.50, frontier=0.85] so neither lever can exclude every voter or admit a sub-floor model. - Pure-function `_apply_preference()` is deterministic — same input → same output, testable without the brain. ## Proposed mapping (Aulë's conservative starting point — founder authorizes) `balanced` — no-op. Resolves exactly as today. `quality` — bump default min_quality by +0.10 (default 0.50 → 0.60), clamped to the `frontier` tier (0.85). Caller's explicit `min_quality` wins if higher. `cost` — drop default min_quality by -0.10, clamped to the `good` tier (0.50). Caller's explicit `min_quality` wins if higher. Both bumps are conservative: ≤0.10 delta, with hard safety clamps. No weighted-λ, no score surgery, no candidate-ordering changes. The dial moves the FLOOR; the engine still picks cheapest-clearing-floor. The founder reviews + authorizes the exact lever values in this PR. Once signed off, `railway env set AINFERA_ROUTING_PREFERENCE_LIVE=1` on the api service flips the gate ON. Until then, only `balanced` ships live behavior. ## What's new - `services/routing_brain.py`: - `VALID_PREFERENCES` frozenset + `DEFAULT_PREFERENCE = "balanced"`. - `_apply_preference(base_min_q, preference) -> Decimal` — pure function honoring the gate-off semantic. - `_routing_preference_live()` — env-var read at call time so ops can flip the gate without restart. - `_PREFERENCE_FLOOR_DELTA` + safety clamps `_SAFETY_MIN_QUALITY` + `_SAFETY_MAX_QUALITY` (= good / frontier tier numerics). - `resolve_policy()` reads `routing_preference` from the hint and applies the dial ONLY when the caller did NOT pass an explicit `min_quality` — preserves caller-intent-wins semantics. - `models/inference.py`: `InferenceRequest.routing_hint` description documents the new key (so it surfaces in openapi.json). - `tests/unit/test_routing_preference_dial.py`: - 8-case parametrized **byte-identical regression lock** for `balanced` — the moat invariant. Any divergence fails the build. - Dial-inert-when-gate-off coverage × all 3 preferences. - Dial-active mapping × bumps + clamps + explicit-caller-wins. - Unknown / typo preference values fall through to `balanced`. - 23 tests; all pure (no DB). ## Pre-commit ruff + ruff-format + mypy --strict + pytest unit+smoke = 528 green. ## Out of scope (per SP-4 §1) - methodology v1.3 changes - weights / λ-blending - online learning (AIN-246 — Backlog/deferred) - `M_allowed` / `q_prior` / `q_empirical` semantics - engine code in `routing/ainfera_routing/decide.py` — untouched ## Public copy (founder/Varda) Drafted README/STRATEGY paragraph for the routing repo describing the dial — see `docs/routing-preference.md` in the next PR after founder sign-off on the mapping values. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mentation) (#73) * feat(api): SP-4 PR-A · forward capture-coverage guard for routed dispatches Adds the durable forward-coverage guarantee for §16 capture: every routed dispatch (canonical `ainfera-inference` OR any of the 3 SP-1 aliases) writes exactly one `routing_outcomes` row, regardless of outcome (success / reject / fallback / fail). Pinned passthroughs (vendor slugs) write zero AND carry a `router: "direct"` audit marker. Stacks on SP-2 PR-A (`feat/ain271-streaming-tooluse`, api#72) — that PR's stream-close capture path is the last exit covered by this guard. ## Moat-sensitive scope (read this first) This PR is **pure observability**. Per the SP-4 §1 guardrails: - ZERO change to routing decisions, scores, weights, thresholds, candidate ordering, `M_allowed`, `q_prior`, `q_empirical`, ruleset_hash. The diff against `services/routing_brain.py` and `services/routing.py` is **empty**. Verifiable: `git diff feat/ain271-streaming-tooluse..HEAD -- ainfera_api/services/routing*.py` shows no hunks. - `routing_outcomes` schema is unchanged. No new columns, no migration. The row is written by the existing `insert_decision()` / `complete_decision()` calls in `dispatch_with_brain` (§0/P3 walk-through confirmed every exit path already writes the row). - `routing/ainfera_routing/decide.py` is untouched. ## What's new 1. `ainfera_api/services/capture_invariant.py`: - `route_outcome_kind(model_slug) -> "routed" | "passthrough"` — pure classifier keyed off the SP-1 alias resolver's `ROUTING_TARGETS`, so any string added to the resolver becomes "routed" without a second edit. - `assert_capture_invariant(db, inference_id, kind)` — read-only post-condition check the test sweep runs after every probe. Raises `CaptureInvariantViolationError` with diagnostic context when a routed call returns without a row or a passthrough produces one unexpectedly. - `find_passthrough_audit_event()` — helper for the test sweep to assert the `router: "direct"` marker is present. - `DispatchCaptureCounter.dispatch_without_capture_total` — the headline regression signal. Stays 0 in green builds; production scrape (future Prometheus surface) alerts on any non-zero. 2. `tests/unit/test_capture_invariant.py` — 9 pure tests locking the classifier (canonical + 3 aliases → routed; vendor slugs + typos → passthrough) + the counter semantics (routed-miss bumps the regression signal; passthrough-captured-unexpectedly bumps the contamination signal; reset zeros everything). 3. `tests/integration/test_capture_coverage.py` — parametrized sweep that drives a routed-success call for EACH of the 4 routing targets, a reject-floor routed call, and passthrough calls against two vendor slugs (anthropic native + openai). After each, asserts: - routed success → exactly 1 routing_outcomes row, `outcome_status='succeeded'` - reject path → 1 row, `outcome_status='rejected_floor'`, `inference_id IS NULL` (the only branch where it's NULL by design — see RoutingOutcomeORM docstring) - passthrough → 0 rows AND `router: "direct"` in the audit chain (distinguishes a properly-bypassed passthrough from a routed call that silently lost its row) Plus a coverage-sweep test that asserts `DispatchCaptureCounter.dispatch_without_capture_total == 0` at the end of a mixed dispatch sequence. ## §0/P2 denominator finding (documented for the audit chain) Live read against Supabase `dftfpwzqxoebwzepygzl`: - 778 historical inferences / 5 routing_outcomes rows - 0 historical `request_payload.model` was a routing string (ainfera-inference / ainfera-mithril / ainfera-auto / ainfera/auto) - ALL 778 were pinned passthroughs — vendor slugs (claude-opus-4-7 x220, gpt-5-5 x189, claude-haiku-4-5 x105, ...) - The 3 succeeded outcome rows are integration-test side effects **The 773-row "gap" is honest fleet posture, not a capture failure.** The fleet's been on pinned passthroughs (AULE_PLANNER / YAVANNA_X_MODEL opt-outs). No backfill is owed (§D3). PR-A's value is the forward GUARANTEE: every NEW routed call going forward writes exactly one row. ## Pre-commit ruff + ruff-format + mypy --strict + pytest tests/unit + tests/smoke all green (523 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api): SP-CLOSE · capture-invariant uses AuditEventType enum (not raw string) Same class as the dashboard.py:127 fix landed in #71. The capture-invariant service + integration test compared `AuditEventORM.event_type == "inference_routed"` (underscored Python name), but the actual DB enum value is `inference.routed` (dotted) per migration 20260514_0001. Postgres rejected the literal with: invalid input value for enum audit_event_type: "inference_routed" Fix: pass `AuditEventType.inference_routed` (the enum *member*) instead of the raw string — SQLAlchemy's `values_callable` resolves it to the correct DB value (`inference.routed`). Docstring updated to spell the dotted form for any future reader. Unblocks the SP-4 PR-A integration tests: test_capture_coverage.py::test_passthrough_writes_zero_outcome_rows_and_router_direct_audit No engine touch, no routing_outcomes touch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cal, quality/cost gated (AIN-244 dial) (#74) Exposes `routing_preference: "quality" | "balanced" | "cost"` in the routing_hint body as sugar over the existing caps. **`balanced` is byte-identical to today's behavior** (the dial is a no-op when balanced is selected — proved by the parametrized regression lock in the test file). **`quality` / `cost` are accepted on the wire but INERT** until the env gate `AINFERA_ROUTING_PREFERENCE_LIVE=1` is set (founder Disc#12 authorization of the lever values). Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent of SP-4 PR-A (#73 capture-coverage). ## Moat-sensitive scope · Disc#12 boundary This PR is Disc#12-adjacent — the dial CAN change routing decisions once the env gate is on. To stay safe: - The default (gate OFF) means `quality`/`cost` resolve to today's policy IDENTICALLY to `balanced`. SP-4 ships with the gate OFF. - Explicit caller `min_quality` always wins. The dial only nudges the default-derived floor — a quality-conscious caller never has their floor silently lowered by a `cost` preference. - Safety clamps: dial output is bounded by [good=0.50, frontier=0.85] so neither lever can exclude every voter or admit a sub-floor model. - Pure-function `_apply_preference()` is deterministic — same input → same output, testable without the brain. ## Proposed mapping (Aulë's conservative starting point — founder authorizes) `balanced` — no-op. Resolves exactly as today. `quality` — bump default min_quality by +0.10 (default 0.50 → 0.60), clamped to the `frontier` tier (0.85). Caller's explicit `min_quality` wins if higher. `cost` — drop default min_quality by -0.10, clamped to the `good` tier (0.50). Caller's explicit `min_quality` wins if higher. Both bumps are conservative: ≤0.10 delta, with hard safety clamps. No weighted-λ, no score surgery, no candidate-ordering changes. The dial moves the FLOOR; the engine still picks cheapest-clearing-floor. The founder reviews + authorizes the exact lever values in this PR. Once signed off, `railway env set AINFERA_ROUTING_PREFERENCE_LIVE=1` on the api service flips the gate ON. Until then, only `balanced` ships live behavior. ## What's new - `services/routing_brain.py`: - `VALID_PREFERENCES` frozenset + `DEFAULT_PREFERENCE = "balanced"`. - `_apply_preference(base_min_q, preference) -> Decimal` — pure function honoring the gate-off semantic. - `_routing_preference_live()` — env-var read at call time so ops can flip the gate without restart. - `_PREFERENCE_FLOOR_DELTA` + safety clamps `_SAFETY_MIN_QUALITY` + `_SAFETY_MAX_QUALITY` (= good / frontier tier numerics). - `resolve_policy()` reads `routing_preference` from the hint and applies the dial ONLY when the caller did NOT pass an explicit `min_quality` — preserves caller-intent-wins semantics. - `models/inference.py`: `InferenceRequest.routing_hint` description documents the new key (so it surfaces in openapi.json). - `tests/unit/test_routing_preference_dial.py`: - 8-case parametrized **byte-identical regression lock** for `balanced` — the moat invariant. Any divergence fails the build. - Dial-inert-when-gate-off coverage × all 3 preferences. - Dial-active mapping × bumps + clamps + explicit-caller-wins. - Unknown / typo preference values fall through to `balanced`. - 23 tests; all pure (no DB). ## Pre-commit ruff + ruff-format + mypy --strict + pytest unit+smoke = 528 green. ## Out of scope (per SP-4 §1) - methodology v1.3 changes - weights / λ-blending - online learning (AIN-246 — Backlog/deferred) - `M_allowed` / `q_prior` / `q_empirical` semantics - engine code in `routing/ainfera_routing/decide.py` — untouched ## Public copy (founder/Varda) Drafted README/STRATEGY paragraph for the routing repo describing the dial — see `docs/routing-preference.md` in the next PR after founder sign-off on the mapping values. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#80) * feat(api): SP-2 PR-A · AIN-271 streaming + tool-use lift on /v1/messages Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now honors `stream:true` (200 + text/event-stream with ordered Anthropic SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one `routing_outcomes` row plus the matching audit events plus the ledger debit. Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER that PR. ## Adapter contract lift - `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults None — back-compat preserved across all 5 adapters). - New `ProviderAdapter.stream_chat()` async generator yields normalized `StreamEvent`s. Default impl wraps `chat()` into one content_delta + one message_delta so adapters that don't yet override honor the contract surface. - New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`, `tool_use_delta`, `message_delta`. - New `ToolsNotSupportedError` — adapters that don't yet wire tool calling raise this at the adapter boundary; the handler maps it to a 422 with backend slug + remediation. - `AdapterResponse.content_blocks` added so tool_use round-trips through the non-streaming path too. ## Per-adapter native streaming - AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages` with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass through natively. - OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real native SSE against `/v1/chat/completions` with `stream:true` + `stream_options.include_usage`; translates `delta.tool_calls[]` → normalized tool_use events. - OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises ToolsNotSupportedError → 422 with backend slug. - GeminiAdapter / MistralAdapter: signature extended; inherit OpenAICompatAdapter native streaming. ## Streaming dispatch + /v1/messages - `services/streaming.py` runs the dispatcher to completion (full §16 capture + ledger + audit), then synthesizes Anthropic SSE frames from the resulting DispatchResult. v0 posture: `wrapped` (TTFT = full inference time); response header `x-ainfera-stream-mode` reports the mode so SDK clients can observe it. Adapter-level native streaming primitives in this same PR are ready for the follow-up that refactors `dispatch_inference` to consume them end-to-end (flipping the header to `native`). - `routers/anthropic_compat.py`: - Drops 501-on-stream → returns StreamingResponse with text/event-stream content-type. - Drops blanket 422-on-tools → tools pass through. Legacy code `tool_calling_not_supported_on_shim` retired; backends without tools surface `tools_not_supported_by_backend` with hint. - `MessagesResponse.content[]` polymorphic (text OR tool_use); SDK sees one shape across stream + non-stream. - Alias resolver honored on streamed calls (`_log_alias_hit` fires for the three SP-1 legacy strings). - Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`) set on streaming responses identical to non-streaming. ## Tests - tests/unit/test_streaming_wire_format.py — 6 pure tests against default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason mapping + `supports_native_streaming()` flag. - tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with SP-2 coverage: · stream:true → 200 + text/event-stream + ordered Anthropic frames · streaming writes §16 row on close · streaming honors silent-alias resolver (parametrized × 3) · non-empty tools passes through Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke all green (505 unit+smoke tests). ## SP-2 v0 honesty caveat Contract surface (200 text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity) is real and verified. TTFT is NOT sub-1s in v0 because the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives are in place; the follow-up refactors dispatch_inference to consume them end-to-end. `x-ainfera-stream-mode: wrapped` today → `native` after the follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): SP-4 PR-A · forward capture-coverage guard (AIN-244 instrumentation) (#73) * feat(api): SP-4 PR-A · forward capture-coverage guard for routed dispatches Adds the durable forward-coverage guarantee for §16 capture: every routed dispatch (canonical `ainfera-inference` OR any of the 3 SP-1 aliases) writes exactly one `routing_outcomes` row, regardless of outcome (success / reject / fallback / fail). Pinned passthroughs (vendor slugs) write zero AND carry a `router: "direct"` audit marker. Stacks on SP-2 PR-A (`feat/ain271-streaming-tooluse`, api#72) — that PR's stream-close capture path is the last exit covered by this guard. ## Moat-sensitive scope (read this first) This PR is **pure observability**. Per the SP-4 §1 guardrails: - ZERO change to routing decisions, scores, weights, thresholds, candidate ordering, `M_allowed`, `q_prior`, `q_empirical`, ruleset_hash. The diff against `services/routing_brain.py` and `services/routing.py` is **empty**. Verifiable: `git diff feat/ain271-streaming-tooluse..HEAD -- ainfera_api/services/routing*.py` shows no hunks. - `routing_outcomes` schema is unchanged. No new columns, no migration. The row is written by the existing `insert_decision()` / `complete_decision()` calls in `dispatch_with_brain` (§0/P3 walk-through confirmed every exit path already writes the row). - `routing/ainfera_routing/decide.py` is untouched. ## What's new 1. `ainfera_api/services/capture_invariant.py`: - `route_outcome_kind(model_slug) -> "routed" | "passthrough"` — pure classifier keyed off the SP-1 alias resolver's `ROUTING_TARGETS`, so any string added to the resolver becomes "routed" without a second edit. - `assert_capture_invariant(db, inference_id, kind)` — read-only post-condition check the test sweep runs after every probe. Raises `CaptureInvariantViolationError` with diagnostic context when a routed call returns without a row or a passthrough produces one unexpectedly. - `find_passthrough_audit_event()` — helper for the test sweep to assert the `router: "direct"` marker is present. - `DispatchCaptureCounter.dispatch_without_capture_total` — the headline regression signal. Stays 0 in green builds; production scrape (future Prometheus surface) alerts on any non-zero. 2. `tests/unit/test_capture_invariant.py` — 9 pure tests locking the classifier (canonical + 3 aliases → routed; vendor slugs + typos → passthrough) + the counter semantics (routed-miss bumps the regression signal; passthrough-captured-unexpectedly bumps the contamination signal; reset zeros everything). 3. `tests/integration/test_capture_coverage.py` — parametrized sweep that drives a routed-success call for EACH of the 4 routing targets, a reject-floor routed call, and passthrough calls against two vendor slugs (anthropic native + openai). After each, asserts: - routed success → exactly 1 routing_outcomes row, `outcome_status='succeeded'` - reject path → 1 row, `outcome_status='rejected_floor'`, `inference_id IS NULL` (the only branch where it's NULL by design — see RoutingOutcomeORM docstring) - passthrough → 0 rows AND `router: "direct"` in the audit chain (distinguishes a properly-bypassed passthrough from a routed call that silently lost its row) Plus a coverage-sweep test that asserts `DispatchCaptureCounter.dispatch_without_capture_total == 0` at the end of a mixed dispatch sequence. ## §0/P2 denominator finding (documented for the audit chain) Live read against Supabase `dftfpwzqxoebwzepygzl`: - 778 historical inferences / 5 routing_outcomes rows - 0 historical `request_payload.model` was a routing string (ainfera-inference / ainfera-mithril / ainfera-auto / ainfera/auto) - ALL 778 were pinned passthroughs — vendor slugs (claude-opus-4-7 x220, gpt-5-5 x189, claude-haiku-4-5 x105, ...) - The 3 succeeded outcome rows are integration-test side effects **The 773-row "gap" is honest fleet posture, not a capture failure.** The fleet's been on pinned passthroughs (AULE_PLANNER / YAVANNA_X_MODEL opt-outs). No backfill is owed (§D3). PR-A's value is the forward GUARANTEE: every NEW routed call going forward writes exactly one row. ## Pre-commit ruff + ruff-format + mypy --strict + pytest tests/unit + tests/smoke all green (523 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api): SP-CLOSE · capture-invariant uses AuditEventType enum (not raw string) Same class as the dashboard.py:127 fix landed in #71. The capture-invariant service + integration test compared `AuditEventORM.event_type == "inference_routed"` (underscored Python name), but the actual DB enum value is `inference.routed` (dotted) per migration 20260514_0001. Postgres rejected the literal with: invalid input value for enum audit_event_type: "inference_routed" Fix: pass `AuditEventType.inference_routed` (the enum *member*) instead of the raw string — SQLAlchemy's `values_callable` resolves it to the correct DB value (`inference.routed`). Docstring updated to spell the dotted form for any future reader. Unblocks the SP-4 PR-A integration tests: test_capture_coverage.py::test_passthrough_writes_zero_outcome_rows_and_router_direct_audit No engine touch, no routing_outcomes touch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): SP-4 PR-B · routing_preference dial — balanced byte-identical, quality/cost gated (AIN-244 dial) (#74) Exposes `routing_preference: "quality" | "balanced" | "cost"` in the routing_hint body as sugar over the existing caps. **`balanced` is byte-identical to today's behavior** (the dial is a no-op when balanced is selected — proved by the parametrized regression lock in the test file). **`quality` / `cost` are accepted on the wire but INERT** until the env gate `AINFERA_ROUTING_PREFERENCE_LIVE=1` is set (founder Disc#12 authorization of the lever values). Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent of SP-4 PR-A (#73 capture-coverage). ## Moat-sensitive scope · Disc#12 boundary This PR is Disc#12-adjacent — the dial CAN change routing decisions once the env gate is on. To stay safe: - The default (gate OFF) means `quality`/`cost` resolve to today's policy IDENTICALLY to `balanced`. SP-4 ships with the gate OFF. - Explicit caller `min_quality` always wins. The dial only nudges the default-derived floor — a quality-conscious caller never has their floor silently lowered by a `cost` preference. - Safety clamps: dial output is bounded by [good=0.50, frontier=0.85] so neither lever can exclude every voter or admit a sub-floor model. - Pure-function `_apply_preference()` is deterministic — same input → same output, testable without the brain. ## Proposed mapping (Aulë's conservative starting point — founder authorizes) `balanced` — no-op. Resolves exactly as today. `quality` — bump default min_quality by +0.10 (default 0.50 → 0.60), clamped to the `frontier` tier (0.85). Caller's explicit `min_quality` wins if higher. `cost` — drop default min_quality by -0.10, clamped to the `good` tier (0.50). Caller's explicit `min_quality` wins if higher. Both bumps are conservative: ≤0.10 delta, with hard safety clamps. No weighted-λ, no score surgery, no candidate-ordering changes. The dial moves the FLOOR; the engine still picks cheapest-clearing-floor. The founder reviews + authorizes the exact lever values in this PR. Once signed off, `railway env set AINFERA_ROUTING_PREFERENCE_LIVE=1` on the api service flips the gate ON. Until then, only `balanced` ships live behavior. ## What's new - `services/routing_brain.py`: - `VALID_PREFERENCES` frozenset + `DEFAULT_PREFERENCE = "balanced"`. - `_apply_preference(base_min_q, preference) -> Decimal` — pure function honoring the gate-off semantic. - `_routing_preference_live()` — env-var read at call time so ops can flip the gate without restart. - `_PREFERENCE_FLOOR_DELTA` + safety clamps `_SAFETY_MIN_QUALITY` + `_SAFETY_MAX_QUALITY` (= good / frontier tier numerics). - `resolve_policy()` reads `routing_preference` from the hint and applies the dial ONLY when the caller did NOT pass an explicit `min_quality` — preserves caller-intent-wins semantics. - `models/inference.py`: `InferenceRequest.routing_hint` description documents the new key (so it surfaces in openapi.json). - `tests/unit/test_routing_preference_dial.py`: - 8-case parametrized **byte-identical regression lock** for `balanced` — the moat invariant. Any divergence fails the build. - Dial-inert-when-gate-off coverage × all 3 preferences. - Dial-active mapping × bumps + clamps + explicit-caller-wins. - Unknown / typo preference values fall through to `balanced`. - 23 tests; all pure (no DB). ## Pre-commit ruff + ruff-format + mypy --strict + pytest unit+smoke = 528 green. ## Out of scope (per SP-4 §1) - methodology v1.3 changes - weights / λ-blending - online learning (AIN-246 — Backlog/deferred) - `M_allowed` / `q_prior` / `q_empirical` semantics - engine code in `routing/ainfera_routing/decide.py` — untouched ## Public copy (founder/Varda) Drafted README/STRATEGY paragraph for the routing repo describing the dial — see `docs/routing-preference.md` in the next PR after founder sign-off on the mapping values. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er-gated) Adds three pieces of scaffolding for the dark-host activation pass. **Activates nothing. Zero schema change. Zero catalog change.** Per the SP-4 §1 moat guardrails, this PR ships ONLY founder-gated artifacts. Stacks on SP-2 api#72 (\`feat/ain271-streaming-tooluse\`); independent of PR-A (#73) and PR-B (#74). ## What's new ### 1. \`scripts/dark_host_smoke.py\` — adapter smoke harness A CLI that exercises the existing ProviderAdapter against a (provider, upstream_model, base_url) target and prints a JSON latency/cost/shape report. Two consecutive \`.chat()\` calls give a coarse cold-vs-warm variance read. - Reads keys from env (Doppler-injectable) — never argv. - Covers the 5 open-weight venues (Groq, DeepInfra, Together, Fireworks, Novita) + Anthropic for parity check. - Returns JSON-serializable error dicts on every failure mode (no bare exceptions to stderr) so the founder can pipe the output straight into the activation runbook as evidence. - **Aulë does NOT run this** — the harness needs live provider credits (~\$45 total: DeepInfra \$15 + Together \$15 + Fireworks \$10 + Groq \$0 + Novita \$5) + Doppler keys. Founder runs it after topping up. ### 2. \`docs/dark-host-activation-runbook.md\` — the 4-phase tap The exact, ordered steps to light one (logical-model, venue) row: Phase 1 — smoke (founder, no DB): run the harness per venue, save the JSON reports for §16 audit. Phase 2 — Model x Host ontology decision (Disc#12): see proposal below; founder picks Path A / B / C. Phase 3 — activation migration TEMPLATE (not yet a real alembic file — lives as a snippet in the doc to keep the \`alembic/versions/\` directory clean until authorized). Parametrized on slug, upstream_model, costs, q_prior, brand. Phase 4 — verify (post-deploy): catalog row active, brain enrols it, audit chain intact. Rollback = \`alembic downgrade -1\`. The runbook is explicit that activation is **founder-gated** on three signals: credits + Doppler keys + ontology authorization. ### 3. \`docs/dark-host-ontology-proposal.md\` — Disc#12 schema decision Lays out 3 schema paths for representing the same logical model on multiple hosts (verified live: 0 cross-host slugs today; the schema is operationally one-model-one-host): Path A — flat \`models\` table, venue-suffixed slugs (\`llama-3.3-70b-groq\`). Lightest migration; zero engine change. Path B — \`model_hosts\` M:N junction. Cleanest semantics; biggest migration; touches \`routing_outcomes\` (§16 schema — violates SP-4 §1 immutability unless additive). Path C — Path A + nullable \`models.logical_slug\` for cross-venue aggregates. Aulë's recommendation: **Path A** for the SP-4 activation pass. Migrate to Path B in a follow-up sprint when the multi-host catalog density justifies the §16-additive migration. Four Disc#12 questions for the founder are listed at the bottom of the proposal. Activation runbook stays parked until they're answered. ## §0/P5 finding (documented for the audit chain) Live read against Supabase \`dftfpwzqxoebwzepygzl\`: - 47 inactive models distributed across 10 providers (novita 9 + deepinfra 6 + together 6 + gemini 5 + groq 5 + openai 4 + anthropic 3 + fireworks 3 + mistral 3 + xai 3). - **0 model slugs appear across multiple providers** — confirms one-model-one-host today. The Model x Host ontology change IS a real schema migration; PR-C ships ONLY the proposal doc. ## Pre-commit ruff + ruff-format + mypy --strict + pytest unit+smoke = 505 green. Zero new tests in this PR — the smoke harness is exercised against live providers (founder-run); the runbook + ontology are docs. ## Out of scope (per SP-4 §1 moat guardrails) - \`routing_outcomes\` schema — immutable, untouched. - The routing engine in \`routing/ainfera_routing/decide.py\` — untouched. - \`models\` schema — untouched. - Catalog activation — no model becomes \`active=true\` from this PR. - Online learning (AIN-246) — Backlog/deferred. - M_allowed / q_prior / q_empirical semantics. ## Founder action to unblock 1. \$45 credits across the 5 open-weight venues (DeepInfra \$15 + Together \$15 + Fireworks \$10 + Groq \$0 + Novita \$5). 2. Doppler keys mirroring those into the api Doppler env. 3. Disc#12 authorization of the Model x Host ontology path (the 4 questions at the bottom of the proposal doc). Once all three are in place, run the smoke harness per venue, then materialize the activation migration template into an actual alembic file and apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hizrianraz mentioned this pull request May 23, 2026

feat(api): SP-4 PR-B · routing_preference dial — balanced byte-identical, quality/cost gated #74

Merged

4 tasks

cursor Bot reviewed May 23, 2026

View reviewed changes

hizrianraz mentioned this pull request May 23, 2026

chore(api): SP-4 PR-C · dark-host activation scaffold (AIN-248, founder-gated) #75

Closed

5 tasks

hizrianraz force-pushed the feat/ain271-streaming-tooluse branch from 5a57625 to 7281e42 Compare May 23, 2026 23:00

hizrianraz force-pushed the feat/capture-coverage branch from 107bd5b to 3237f7a Compare May 23, 2026 23:00

cursor Bot reviewed May 23, 2026

View reviewed changes

hizrianraz merged commit dfa144e into feat/ain271-streaming-tooluse May 24, 2026
4 checks passed

hizrianraz mentioned this pull request May 24, 2026

fix(api): SP-Ω recovery · land #72+#73+#74 (stack-merge orphan rescue) #80

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): SP-4 PR-A · forward capture-coverage guard (AIN-244 instrumentation)#73

feat(api): SP-4 PR-A · forward capture-coverage guard (AIN-244 instrumentation)#73
hizrianraz merged 2 commits into
feat/ain271-streaming-toolusefrom
feat/capture-coverage

hizrianraz commented May 23, 2026 •

edited by cursor Bot

Loading

Uh oh!

linear-code Bot commented May 23, 2026 •

edited

Loading

🔒 FLAGSHIP NAME — LOCKED 2026-05-23 (Discipline #12, founder-authorized)

Children

Done (epic)

Uh oh!

cursor Bot May 23, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hizrianraz commented May 23, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Moat-sensitive scope · pure observability

§0/P2 denominator finding

What's new

Pre-commit

Test plan

Uh oh!

linear-code Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 FLAGSHIP NAME — LOCKED 2026-05-23 (Discipline #12, founder-authorized)

Children

Done (epic)

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Frozen dataclass Exception produces empty error messages

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Docstring promises reject-path handling that isn't implemented

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hizrianraz commented May 23, 2026 •

edited by cursor Bot

Loading

linear-code Bot commented May 23, 2026 •

edited

Loading