chore(api): SP-1 · ainfera-inference flagship rename + aa_index_source migration (AIN-271b)#70
Merged
Merged
Conversation
…e migration (AIN-271b)
Inverts the AIN-244 routing-target lock per the 2026-05-23 founder
decision (Disc#12): `ainfera-inference` becomes the canonical wire
string; `ainfera-mithril`, `ainfera-auto`, and `ainfera/auto` are
demoted to silent aliases resolved at the router boundary.
Changes:
- routers/inference.py: INFERENCE_MODEL canonical; ROUTING_ALIASES
frozenset covers all 3 legacy strings; _log_alias_hit fires for each.
Back-compat module constants MITHRIL_MODEL / AUTO_MODEL now alias the
canonical so legacy imports keep working.
- routers/agent_surfaces.py: agent-card.json + llms.txt rewritten with
Ainfera Inference framing; zero dead strings on agent-discovery
surfaces.
- routers/anthropic_compat.py: docstring reframed; 501-on-stream /
422-on-tools surfaces preserved pending the streaming/tool-use lift
(separate follow-up).
- models/inference.py: InferenceRequest field descriptions (which feed
openapi.json) lead with ainfera-inference; aliases not mentioned.
- services/routing_brain.py: §16 audit "router" payload reports
canonical "ainfera-inference" regardless of alias requested.
- routing/{__init__,auto}.py: docstrings reframed.
- inference_gateway.md (renamed from MITHRIL_GATEWAY.md): contract doc
swept clean of product/wire dead strings.
Tests:
- tests/unit/test_inference_alias.py (new; supersedes deleted
test_mithril_alias.py): canonical + 3-alias parametrized coverage.
- tests/unit/test_agent_surfaces.py: asserts ainfera-inference is the
default_model + dead-string regression lock on both /.well-known/
agent-card.json and /llms.txt.
- tests/integration/test_anthropic_compat.py: happy paths use canonical
string; silent-alias test parametrized over all 3 aliases.
- tests/integration/test_routing_v0.py: canonical happy path.
- tests/integration/test_routing_backends_invariants.py: post-migration
invariant — 0 rows with aa_index_source ILIKE '%aamc%'.
Migration:
- 0027_rename_aa_index_source_aamc_to_routing_backend.py — row-rewrite
of the 5 anchor models from 'aamc_v1_lock' to 'routing_backend_v1_lock'.
Branch-verify only via this commit; prod-apply on project
dftfpwzqxoebwzepygzl is in the founder action block.
Linear gate: AIN-271 (P1-WS2 prod deploy of /v1/messages streaming +
tool-use) — this commit lands the rename half. Streaming + tool-use
land in a follow-up because the ProviderAdapter interface does not yet
carry tools/stream signatures across the 5 adapters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 23, 2026
…rename
Swap the legacy literal 'aamc_v1_lock' → 'routing_backend_v1_lock' in
scripts/seed_dev.py (5 anchor rows + idempotency-comment update).
The SP-1 rename migration 20260523_0027 row-rewrites
`aa_index_source = 'aamc_v1_lock' → 'routing_backend_v1_lock'`. On a
clean CI database the migrations run BEFORE seeding, so the rename
fires on an empty table; the seed script then inserts the 5 §C
anchors directly with the new literal.
Fix (a) over fix (b) per founder's two-guard authorization: re-running
an already-applied migration after seed is structurally awkward and
violates Alembic's once-per-revision contract. The rename migration
remains independently asserted by
`test_zero_rows_carry_legacy_aamc_source_tag` (integration).
Grep probe confirmed the literal is NOT shared with another test-path
expectation — only test_t9_catalog_migration.py:142 references it, and
that unit test reads the static catalog-migration tuple (frozen
historical data), not live DB state, so it's unaffected.
Unblocks:
tests/integration/test_routing_backends_invariants.py
::test_canonical_5_voters_use_v1_lock_source
::test_zero_rows_carry_legacy_aamc_source_tag
Fixture/packaging only. No engine touch, no routing_outcomes touch, no
methodology change. Disc#12 unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz
added a commit
that referenced
this pull request
May 23, 2026
Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now honors `stream:true` (200 + text/event-stream with ordered Anthropic SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one `routing_outcomes` row plus the matching audit events plus the ledger debit. Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER that PR. ## Adapter contract lift - `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults None — back-compat preserved across all 5 adapters). - New `ProviderAdapter.stream_chat()` async generator yields normalized `StreamEvent`s. Default impl wraps `chat()` into one content_delta + one message_delta so adapters that don't yet override honor the contract surface. - New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`, `tool_use_delta`, `message_delta`. - New `ToolsNotSupportedError` — adapters that don't yet wire tool calling raise this at the adapter boundary; the handler maps it to a 422 with backend slug + remediation. - `AdapterResponse.content_blocks` added so tool_use round-trips through the non-streaming path too. ## Per-adapter native streaming - AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages` with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass through natively. - OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real native SSE against `/v1/chat/completions` with `stream:true` + `stream_options.include_usage`; translates `delta.tool_calls[]` → normalized tool_use events. - OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises ToolsNotSupportedError → 422 with backend slug. - GeminiAdapter / MistralAdapter: signature extended; inherit OpenAICompatAdapter native streaming. ## Streaming dispatch + /v1/messages - `services/streaming.py` runs the dispatcher to completion (full §16 capture + ledger + audit), then synthesizes Anthropic SSE frames from the resulting DispatchResult. v0 posture: `wrapped` (TTFT = full inference time); response header `x-ainfera-stream-mode` reports the mode so SDK clients can observe it. Adapter-level native streaming primitives in this same PR are ready for the follow-up that refactors `dispatch_inference` to consume them end-to-end (flipping the header to `native`). - `routers/anthropic_compat.py`: - Drops 501-on-stream → returns StreamingResponse with text/event-stream content-type. - Drops blanket 422-on-tools → tools pass through. Legacy code `tool_calling_not_supported_on_shim` retired; backends without tools surface `tools_not_supported_by_backend` with hint. - `MessagesResponse.content[]` polymorphic (text OR tool_use); SDK sees one shape across stream + non-stream. - Alias resolver honored on streamed calls (`_log_alias_hit` fires for the three SP-1 legacy strings). - Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`) set on streaming responses identical to non-streaming. ## Tests - tests/unit/test_streaming_wire_format.py — 6 pure tests against default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason mapping + `supports_native_streaming()` flag. - tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with SP-2 coverage: · stream:true → 200 + text/event-stream + ordered Anthropic frames · streaming writes §16 row on close · streaming honors silent-alias resolver (parametrized × 3) · non-empty tools passes through Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke all green (505 unit+smoke tests). ## SP-2 v0 honesty caveat Contract surface (200 text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity) is real and verified. TTFT is NOT sub-1s in v0 because the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives are in place; the follow-up refactors dispatch_inference to consume them end-to-end. `x-ainfera-stream-mode: wrapped` today → `native` after the follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz
added a commit
that referenced
this pull request
May 23, 2026
…urface (AIN-263/264/265/266)
Lands the four read-only endpoints the `/dashboard` Glance page reads.
Until this router landed, all three GETs 404'd and the front door
crashed (digest 3541235483). The candidates field on the existing
`/v1/inferences/{id}/decision` is additive — the byte-reproducible
`candidates[]` shape is unchanged; `dashboard_candidates[]` is the new
collapsed shape SP-3 renders.
Endpoints (all tenant-scoped, read-only, §D3 honest-empty 200s):
- GET /v1/usage/daily?days=30 (AIN-263)
Returns `{ days:[{date, calls, cost_usd, by_status:{ok,fallback,error}}],
totals:{calls, cost_usd} }`. Empty tenant → `days:[]` zeros 200.
`by_status.fallback` is v0 honest 0 — the audit-chain signal exists
but per-day dedup is non-trivial; surfaced from a denormalized brain
column in a follow-up.
- GET /v1/caps/rollup (AIN-264)
`{ agents, policies_set, budget:{set,used_usd}, latency_p50_ms,
breaches:{quality,reliability} }`. `budget.set` sums
`spend_policy.daily_cap_usd` across the tenant's agents.
`latency_p50_ms` from `routing_outcomes.observed_latency_ms` (24h);
null when no §16 rows in window.
- GET /v1/agents/{id}/metrics (AIN-265)
24h window per-agent. 404s cross-tenant (same masking as
/v1/inferences/{id}/decision).
- /v1/inferences/{id}/decision (AIN-266)
Adds `dashboard_candidates: list[DashboardCandidate]` parallel to
`candidates[]`. Each row carries `chosen: bool` + `excluded: str |
null` so SP-3 renders "4 candidates, 1 excluded" without re-deriving.
The shape collapses three brain signals (`rejection_reason`,
`eligible`, `cleared_floor`) into one `excluded` string; explicit
reason takes precedence over `ineligible`/`below_floor` fallbacks.
Tests:
- tests/unit/test_dashboard_candidates.py — 8 pure tests against
`candidate_dashboard_summary()` (chosen marking, excluded
precedence, alt-key compat, etc.).
- tests/integration/test_dashboard.py — honest-empty 200s × 3
endpoints; cross-tenant 404 on /metrics; usage scoped to tenant
(A's call doesn't appear in B's daily); caps reflect real spend +
breach counts; decision endpoint surfaces dashboard_candidates;
no-mutation invariant on routing_outcomes (count before == after a
3-endpoint sweep).
- tests/smoke/test_openapi_contract.py · EXPECTED_OPERATIONS extended
with the 3 new GETs (the contract snapshot is the public-surface
source of truth).
Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER
that PR.
Pre-commit: ruff + ruff-format + mypy --strict + pytest tests/unit
+ tests/smoke all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz
added a commit
that referenced
this pull request
May 24, 2026
…ges (#72) Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now honors `stream:true` (200 + text/event-stream with ordered Anthropic SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one `routing_outcomes` row plus the matching audit events plus the ledger debit. Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER that PR. ## Adapter contract lift - `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults None — back-compat preserved across all 5 adapters). - New `ProviderAdapter.stream_chat()` async generator yields normalized `StreamEvent`s. Default impl wraps `chat()` into one content_delta + one message_delta so adapters that don't yet override honor the contract surface. - New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`, `tool_use_delta`, `message_delta`. - New `ToolsNotSupportedError` — adapters that don't yet wire tool calling raise this at the adapter boundary; the handler maps it to a 422 with backend slug + remediation. - `AdapterResponse.content_blocks` added so tool_use round-trips through the non-streaming path too. ## Per-adapter native streaming - AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages` with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass through natively. - OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real native SSE against `/v1/chat/completions` with `stream:true` + `stream_options.include_usage`; translates `delta.tool_calls[]` → normalized tool_use events. - OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises ToolsNotSupportedError → 422 with backend slug. - GeminiAdapter / MistralAdapter: signature extended; inherit OpenAICompatAdapter native streaming. ## Streaming dispatch + /v1/messages - `services/streaming.py` runs the dispatcher to completion (full §16 capture + ledger + audit), then synthesizes Anthropic SSE frames from the resulting DispatchResult. v0 posture: `wrapped` (TTFT = full inference time); response header `x-ainfera-stream-mode` reports the mode so SDK clients can observe it. Adapter-level native streaming primitives in this same PR are ready for the follow-up that refactors `dispatch_inference` to consume them end-to-end (flipping the header to `native`). - `routers/anthropic_compat.py`: - Drops 501-on-stream → returns StreamingResponse with text/event-stream content-type. - Drops blanket 422-on-tools → tools pass through. Legacy code `tool_calling_not_supported_on_shim` retired; backends without tools surface `tools_not_supported_by_backend` with hint. - `MessagesResponse.content[]` polymorphic (text OR tool_use); SDK sees one shape across stream + non-stream. - Alias resolver honored on streamed calls (`_log_alias_hit` fires for the three SP-1 legacy strings). - Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`) set on streaming responses identical to non-streaming. ## Tests - tests/unit/test_streaming_wire_format.py — 6 pure tests against default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason mapping + `supports_native_streaming()` flag. - tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with SP-2 coverage: · stream:true → 200 + text/event-stream + ordered Anthropic frames · streaming writes §16 row on close · streaming honors silent-alias resolver (parametrized × 3) · non-empty tools passes through Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke all green (505 unit+smoke tests). ## SP-2 v0 honesty caveat Contract surface (200 text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity) is real and verified. TTFT is NOT sub-1s in v0 because the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives are in place; the follow-up refactors dispatch_inference to consume them end-to-end. `x-ainfera-stream-mode: wrapped` today → `native` after the follow-up. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz
added a commit
that referenced
this pull request
May 24, 2026
Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now honors `stream:true` (200 + text/event-stream with ordered Anthropic SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one `routing_outcomes` row plus the matching audit events plus the ledger debit. Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER that PR. ## Adapter contract lift - `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults None — back-compat preserved across all 5 adapters). - New `ProviderAdapter.stream_chat()` async generator yields normalized `StreamEvent`s. Default impl wraps `chat()` into one content_delta + one message_delta so adapters that don't yet override honor the contract surface. - New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`, `tool_use_delta`, `message_delta`. - New `ToolsNotSupportedError` — adapters that don't yet wire tool calling raise this at the adapter boundary; the handler maps it to a 422 with backend slug + remediation. - `AdapterResponse.content_blocks` added so tool_use round-trips through the non-streaming path too. ## Per-adapter native streaming - AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages` with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass through natively. - OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real native SSE against `/v1/chat/completions` with `stream:true` + `stream_options.include_usage`; translates `delta.tool_calls[]` → normalized tool_use events. - OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises ToolsNotSupportedError → 422 with backend slug. - GeminiAdapter / MistralAdapter: signature extended; inherit OpenAICompatAdapter native streaming. ## Streaming dispatch + /v1/messages - `services/streaming.py` runs the dispatcher to completion (full §16 capture + ledger + audit), then synthesizes Anthropic SSE frames from the resulting DispatchResult. v0 posture: `wrapped` (TTFT = full inference time); response header `x-ainfera-stream-mode` reports the mode so SDK clients can observe it. Adapter-level native streaming primitives in this same PR are ready for the follow-up that refactors `dispatch_inference` to consume them end-to-end (flipping the header to `native`). - `routers/anthropic_compat.py`: - Drops 501-on-stream → returns StreamingResponse with text/event-stream content-type. - Drops blanket 422-on-tools → tools pass through. Legacy code `tool_calling_not_supported_on_shim` retired; backends without tools surface `tools_not_supported_by_backend` with hint. - `MessagesResponse.content[]` polymorphic (text OR tool_use); SDK sees one shape across stream + non-stream. - Alias resolver honored on streamed calls (`_log_alias_hit` fires for the three SP-1 legacy strings). - Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`) set on streaming responses identical to non-streaming. ## Tests - tests/unit/test_streaming_wire_format.py — 6 pure tests against default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason mapping + `supports_native_streaming()` flag. - tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with SP-2 coverage: · stream:true → 200 + text/event-stream + ordered Anthropic frames · streaming writes §16 row on close · streaming honors silent-alias resolver (parametrized × 3) · non-empty tools passes through Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke all green (505 unit+smoke tests). ## SP-2 v0 honesty caveat Contract surface (200 text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity) is real and verified. TTFT is NOT sub-1s in v0 because the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives are in place; the follow-up refactors dispatch_inference to consume them end-to-end. `x-ainfera-stream-mode: wrapped` today → `native` after the follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz
added a commit
that referenced
this pull request
May 24, 2026
#80) * feat(api): SP-2 PR-A · AIN-271 streaming + tool-use lift on /v1/messages Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now honors `stream:true` (200 + text/event-stream with ordered Anthropic SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one `routing_outcomes` row plus the matching audit events plus the ledger debit. Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER that PR. ## Adapter contract lift - `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults None — back-compat preserved across all 5 adapters). - New `ProviderAdapter.stream_chat()` async generator yields normalized `StreamEvent`s. Default impl wraps `chat()` into one content_delta + one message_delta so adapters that don't yet override honor the contract surface. - New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`, `tool_use_delta`, `message_delta`. - New `ToolsNotSupportedError` — adapters that don't yet wire tool calling raise this at the adapter boundary; the handler maps it to a 422 with backend slug + remediation. - `AdapterResponse.content_blocks` added so tool_use round-trips through the non-streaming path too. ## Per-adapter native streaming - AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages` with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass through natively. - OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real native SSE against `/v1/chat/completions` with `stream:true` + `stream_options.include_usage`; translates `delta.tool_calls[]` → normalized tool_use events. - OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises ToolsNotSupportedError → 422 with backend slug. - GeminiAdapter / MistralAdapter: signature extended; inherit OpenAICompatAdapter native streaming. ## Streaming dispatch + /v1/messages - `services/streaming.py` runs the dispatcher to completion (full §16 capture + ledger + audit), then synthesizes Anthropic SSE frames from the resulting DispatchResult. v0 posture: `wrapped` (TTFT = full inference time); response header `x-ainfera-stream-mode` reports the mode so SDK clients can observe it. Adapter-level native streaming primitives in this same PR are ready for the follow-up that refactors `dispatch_inference` to consume them end-to-end (flipping the header to `native`). - `routers/anthropic_compat.py`: - Drops 501-on-stream → returns StreamingResponse with text/event-stream content-type. - Drops blanket 422-on-tools → tools pass through. Legacy code `tool_calling_not_supported_on_shim` retired; backends without tools surface `tools_not_supported_by_backend` with hint. - `MessagesResponse.content[]` polymorphic (text OR tool_use); SDK sees one shape across stream + non-stream. - Alias resolver honored on streamed calls (`_log_alias_hit` fires for the three SP-1 legacy strings). - Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`) set on streaming responses identical to non-streaming. ## Tests - tests/unit/test_streaming_wire_format.py — 6 pure tests against default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason mapping + `supports_native_streaming()` flag. - tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with SP-2 coverage: · stream:true → 200 + text/event-stream + ordered Anthropic frames · streaming writes §16 row on close · streaming honors silent-alias resolver (parametrized × 3) · non-empty tools passes through Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke all green (505 unit+smoke tests). ## SP-2 v0 honesty caveat Contract surface (200 text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity) is real and verified. TTFT is NOT sub-1s in v0 because the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives are in place; the follow-up refactors dispatch_inference to consume them end-to-end. `x-ainfera-stream-mode: wrapped` today → `native` after the follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): SP-4 PR-A · forward capture-coverage guard (AIN-244 instrumentation) (#73) * feat(api): SP-4 PR-A · forward capture-coverage guard for routed dispatches Adds the durable forward-coverage guarantee for §16 capture: every routed dispatch (canonical `ainfera-inference` OR any of the 3 SP-1 aliases) writes exactly one `routing_outcomes` row, regardless of outcome (success / reject / fallback / fail). Pinned passthroughs (vendor slugs) write zero AND carry a `router: "direct"` audit marker. Stacks on SP-2 PR-A (`feat/ain271-streaming-tooluse`, api#72) — that PR's stream-close capture path is the last exit covered by this guard. ## Moat-sensitive scope (read this first) This PR is **pure observability**. Per the SP-4 §1 guardrails: - ZERO change to routing decisions, scores, weights, thresholds, candidate ordering, `M_allowed`, `q_prior`, `q_empirical`, ruleset_hash. The diff against `services/routing_brain.py` and `services/routing.py` is **empty**. Verifiable: `git diff feat/ain271-streaming-tooluse..HEAD -- ainfera_api/services/routing*.py` shows no hunks. - `routing_outcomes` schema is unchanged. No new columns, no migration. The row is written by the existing `insert_decision()` / `complete_decision()` calls in `dispatch_with_brain` (§0/P3 walk-through confirmed every exit path already writes the row). - `routing/ainfera_routing/decide.py` is untouched. ## What's new 1. `ainfera_api/services/capture_invariant.py`: - `route_outcome_kind(model_slug) -> "routed" | "passthrough"` — pure classifier keyed off the SP-1 alias resolver's `ROUTING_TARGETS`, so any string added to the resolver becomes "routed" without a second edit. - `assert_capture_invariant(db, inference_id, kind)` — read-only post-condition check the test sweep runs after every probe. Raises `CaptureInvariantViolationError` with diagnostic context when a routed call returns without a row or a passthrough produces one unexpectedly. - `find_passthrough_audit_event()` — helper for the test sweep to assert the `router: "direct"` marker is present. - `DispatchCaptureCounter.dispatch_without_capture_total` — the headline regression signal. Stays 0 in green builds; production scrape (future Prometheus surface) alerts on any non-zero. 2. `tests/unit/test_capture_invariant.py` — 9 pure tests locking the classifier (canonical + 3 aliases → routed; vendor slugs + typos → passthrough) + the counter semantics (routed-miss bumps the regression signal; passthrough-captured-unexpectedly bumps the contamination signal; reset zeros everything). 3. `tests/integration/test_capture_coverage.py` — parametrized sweep that drives a routed-success call for EACH of the 4 routing targets, a reject-floor routed call, and passthrough calls against two vendor slugs (anthropic native + openai). After each, asserts: - routed success → exactly 1 routing_outcomes row, `outcome_status='succeeded'` - reject path → 1 row, `outcome_status='rejected_floor'`, `inference_id IS NULL` (the only branch where it's NULL by design — see RoutingOutcomeORM docstring) - passthrough → 0 rows AND `router: "direct"` in the audit chain (distinguishes a properly-bypassed passthrough from a routed call that silently lost its row) Plus a coverage-sweep test that asserts `DispatchCaptureCounter.dispatch_without_capture_total == 0` at the end of a mixed dispatch sequence. ## §0/P2 denominator finding (documented for the audit chain) Live read against Supabase `dftfpwzqxoebwzepygzl`: - 778 historical inferences / 5 routing_outcomes rows - 0 historical `request_payload.model` was a routing string (ainfera-inference / ainfera-mithril / ainfera-auto / ainfera/auto) - ALL 778 were pinned passthroughs — vendor slugs (claude-opus-4-7 x220, gpt-5-5 x189, claude-haiku-4-5 x105, ...) - The 3 succeeded outcome rows are integration-test side effects **The 773-row "gap" is honest fleet posture, not a capture failure.** The fleet's been on pinned passthroughs (AULE_PLANNER / YAVANNA_X_MODEL opt-outs). No backfill is owed (§D3). PR-A's value is the forward GUARANTEE: every NEW routed call going forward writes exactly one row. ## Pre-commit ruff + ruff-format + mypy --strict + pytest tests/unit + tests/smoke all green (523 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api): SP-CLOSE · capture-invariant uses AuditEventType enum (not raw string) Same class as the dashboard.py:127 fix landed in #71. The capture-invariant service + integration test compared `AuditEventORM.event_type == "inference_routed"` (underscored Python name), but the actual DB enum value is `inference.routed` (dotted) per migration 20260514_0001. Postgres rejected the literal with: invalid input value for enum audit_event_type: "inference_routed" Fix: pass `AuditEventType.inference_routed` (the enum *member*) instead of the raw string — SQLAlchemy's `values_callable` resolves it to the correct DB value (`inference.routed`). Docstring updated to spell the dotted form for any future reader. Unblocks the SP-4 PR-A integration tests: test_capture_coverage.py::test_passthrough_writes_zero_outcome_rows_and_router_direct_audit No engine touch, no routing_outcomes touch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): SP-4 PR-B · routing_preference dial — balanced byte-identical, quality/cost gated (AIN-244 dial) (#74) Exposes `routing_preference: "quality" | "balanced" | "cost"` in the routing_hint body as sugar over the existing caps. **`balanced` is byte-identical to today's behavior** (the dial is a no-op when balanced is selected — proved by the parametrized regression lock in the test file). **`quality` / `cost` are accepted on the wire but INERT** until the env gate `AINFERA_ROUTING_PREFERENCE_LIVE=1` is set (founder Disc#12 authorization of the lever values). Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent of SP-4 PR-A (#73 capture-coverage). ## Moat-sensitive scope · Disc#12 boundary This PR is Disc#12-adjacent — the dial CAN change routing decisions once the env gate is on. To stay safe: - The default (gate OFF) means `quality`/`cost` resolve to today's policy IDENTICALLY to `balanced`. SP-4 ships with the gate OFF. - Explicit caller `min_quality` always wins. The dial only nudges the default-derived floor — a quality-conscious caller never has their floor silently lowered by a `cost` preference. - Safety clamps: dial output is bounded by [good=0.50, frontier=0.85] so neither lever can exclude every voter or admit a sub-floor model. - Pure-function `_apply_preference()` is deterministic — same input → same output, testable without the brain. ## Proposed mapping (Aulë's conservative starting point — founder authorizes) `balanced` — no-op. Resolves exactly as today. `quality` — bump default min_quality by +0.10 (default 0.50 → 0.60), clamped to the `frontier` tier (0.85). Caller's explicit `min_quality` wins if higher. `cost` — drop default min_quality by -0.10, clamped to the `good` tier (0.50). Caller's explicit `min_quality` wins if higher. Both bumps are conservative: ≤0.10 delta, with hard safety clamps. No weighted-λ, no score surgery, no candidate-ordering changes. The dial moves the FLOOR; the engine still picks cheapest-clearing-floor. The founder reviews + authorizes the exact lever values in this PR. Once signed off, `railway env set AINFERA_ROUTING_PREFERENCE_LIVE=1` on the api service flips the gate ON. Until then, only `balanced` ships live behavior. ## What's new - `services/routing_brain.py`: - `VALID_PREFERENCES` frozenset + `DEFAULT_PREFERENCE = "balanced"`. - `_apply_preference(base_min_q, preference) -> Decimal` — pure function honoring the gate-off semantic. - `_routing_preference_live()` — env-var read at call time so ops can flip the gate without restart. - `_PREFERENCE_FLOOR_DELTA` + safety clamps `_SAFETY_MIN_QUALITY` + `_SAFETY_MAX_QUALITY` (= good / frontier tier numerics). - `resolve_policy()` reads `routing_preference` from the hint and applies the dial ONLY when the caller did NOT pass an explicit `min_quality` — preserves caller-intent-wins semantics. - `models/inference.py`: `InferenceRequest.routing_hint` description documents the new key (so it surfaces in openapi.json). - `tests/unit/test_routing_preference_dial.py`: - 8-case parametrized **byte-identical regression lock** for `balanced` — the moat invariant. Any divergence fails the build. - Dial-inert-when-gate-off coverage × all 3 preferences. - Dial-active mapping × bumps + clamps + explicit-caller-wins. - Unknown / typo preference values fall through to `balanced`. - 23 tests; all pure (no DB). ## Pre-commit ruff + ruff-format + mypy --strict + pytest unit+smoke = 528 green. ## Out of scope (per SP-4 §1) - methodology v1.3 changes - weights / λ-blending - online learning (AIN-246 — Backlog/deferred) - `M_allowed` / `q_prior` / `q_empirical` semantics - engine code in `routing/ainfera_routing/decide.py` — untouched ## Public copy (founder/Varda) Drafted README/STRATEGY paragraph for the routing repo describing the dial — see `docs/routing-preference.md` in the next PR after founder sign-off on the mapping values. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz
added a commit
that referenced
this pull request
May 24, 2026
…urface (AIN-263/264/265/266)
Lands the four read-only endpoints the `/dashboard` Glance page reads.
Until this router landed, all three GETs 404'd and the front door
crashed (digest 3541235483). The candidates field on the existing
`/v1/inferences/{id}/decision` is additive — the byte-reproducible
`candidates[]` shape is unchanged; `dashboard_candidates[]` is the new
collapsed shape SP-3 renders.
Endpoints (all tenant-scoped, read-only, §D3 honest-empty 200s):
- GET /v1/usage/daily?days=30 (AIN-263)
Returns `{ days:[{date, calls, cost_usd, by_status:{ok,fallback,error}}],
totals:{calls, cost_usd} }`. Empty tenant → `days:[]` zeros 200.
`by_status.fallback` is v0 honest 0 — the audit-chain signal exists
but per-day dedup is non-trivial; surfaced from a denormalized brain
column in a follow-up.
- GET /v1/caps/rollup (AIN-264)
`{ agents, policies_set, budget:{set,used_usd}, latency_p50_ms,
breaches:{quality,reliability} }`. `budget.set` sums
`spend_policy.daily_cap_usd` across the tenant's agents.
`latency_p50_ms` from `routing_outcomes.observed_latency_ms` (24h);
null when no §16 rows in window.
- GET /v1/agents/{id}/metrics (AIN-265)
24h window per-agent. 404s cross-tenant (same masking as
/v1/inferences/{id}/decision).
- /v1/inferences/{id}/decision (AIN-266)
Adds `dashboard_candidates: list[DashboardCandidate]` parallel to
`candidates[]`. Each row carries `chosen: bool` + `excluded: str |
null` so SP-3 renders "4 candidates, 1 excluded" without re-deriving.
The shape collapses three brain signals (`rejection_reason`,
`eligible`, `cleared_floor`) into one `excluded` string; explicit
reason takes precedence over `ineligible`/`below_floor` fallbacks.
Tests:
- tests/unit/test_dashboard_candidates.py — 8 pure tests against
`candidate_dashboard_summary()` (chosen marking, excluded
precedence, alt-key compat, etc.).
- tests/integration/test_dashboard.py — honest-empty 200s × 3
endpoints; cross-tenant 404 on /metrics; usage scoped to tenant
(A's call doesn't appear in B's daily); caps reflect real spend +
breach counts; decision endpoint surfaces dashboard_candidates;
no-mutation invariant on routing_outcomes (count before == after a
3-endpoint sweep).
- tests/smoke/test_openapi_contract.py · EXPECTED_OPERATIONS extended
with the 3 new GETs (the contract snapshot is the public-surface
source of truth).
Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER
that PR.
Pre-commit: ruff + ruff-format + mypy --strict + pytest tests/unit
+ tests/smoke all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ainfera-inferencebecomes the canonical wire string;ainfera-mithril,ainfera-auto,ainfera/autodemoted to silent router-boundary aliases.api/(openapi via Pydantic Field descriptions, agent-card.json, llms.txt, decision-receipt router summary, auditrouterpayload field).aa_index_source = 'aamc_v1_lock'→'routing_backend_v1_lock'. Branch-verified only; prod-apply ondftfpwzqxoebwzepygzlis in the FOUNDER ACTION block./v1/messagesstay 501/422 in this PR — the lift requiresstream+toolssignatures across the ProviderAdapter interface + each of the 5 adapter implementations. Tracked under AIN-271 follow-up.Test plan
ruff check + format,mypy --strict,pytest tests/unit -xall green (pre-commit confirms)alembic upgrade head+select count(*) from models where aa_index_source ilike '%aamc%'→ 0curl -X POST $URL/v1/messages -d '{"model":"ainfera-inference",...}'→ 200 + Anthropic-shape body + audit headers"model":"ainfera-mithril"→ 200 (alias path) +router_alias_hit legacy=ainfera-mithril canonical=ainfera-inferencein the structured logcurl /.well-known/agent-card.json+/llms.txtshow 0 dead stringsselectreturns 0;select count(*) where aa_index_source='routing_backend_v1_lock'returns 5🤖 Generated with Claude Code
Note
Medium Risk
Medium risk because it changes the canonical
modelrouting string and the audit/receiptrouterfield (client-visible contract) while adding multiple silent aliases; it also includes a production data rewrite ofmodels.aa_index_sourcevalues.Overview
Renames the flagship routing target from
ainfera-mithriltoainfera-inferenceacross API contract surfaces (Pydantic/OpenAPI docs, agent discovery surfaces, decision-receipt copy) and updates the router resolver soainfera-mithril,ainfera-auto, andainfera/autocontinue to work as silent aliases with structured alias-hit logging.Normalizes audit/telemetry to the new canonical name by reporting
router="ainfera-inference"in routed-path audit events and related receipts/log lines regardless of which alias was sent.Adds a row-rewrite migration + seed/test updates to rename
models.aa_index_sourcefromaamc_v1_locktorouting_backend_v1_lock, with new/updated unit+integration tests enforcing the alias behavior and a zero-dead-string invariant on agent-facing surfaces.Reviewed by Cursor Bugbot for commit 96cccb2. Bugbot is set up for automated code reviews on this repo. Configure here.