feat(api): SP-2 PR-A · AIN-271 streaming + tool-use lift on /v1/messages by hizrianraz · Pull Request #72 · ainfera-ai/api

hizrianraz · 2026-05-23T16:32:33Z

Summary

Completes the half of AIN-271 that SP-1 deferred. /v1/messages now honors stream:true (200 + text/event-stream with ordered Anthropic SSE frames) and tools[] (pass-through to backends, tool_use blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one routing_outcomes row plus matching audit events plus ledger debit.

Stacks on SP-1 (#70). Base is chore/sp1-inference-rename; merges AFTER that PR.

What lands

ProviderAdapter.chat() gains tools + tool_choice (defaults None — back-compat). New stream_chat() async generator + StreamEvent dataclass + ToolsNotSupportedError. AdapterResponse.content_blocks for tool round-trip.
AnthropicAdapter + OpenAICompatAdapter ship native SSE stream_chat overrides (sub-1s TTFT at the adapter layer). Mistral/Together/Gemini inherit native streaming via OpenAICompatAdapter.
services/streaming.py synthesizes Anthropic-shape SSE frames from a full DispatchResult, preserving §16 + ledger + audit invariants on the streaming path.
routers/anthropic_compat.py drops 501-on-stream + blanket 422-on-tools. Returns StreamingResponse(text/event-stream); alias resolver honored on streamed calls; polymorphic content[] (text + tool_use).
Audit headers (x-ainfera-agent-id, x-ainfera-audit-url) set on streamed responses identical to non-streaming.

SP-2 v0 honesty caveat

Contract surface is real (200, text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity, all 3 SP-1 aliases). TTFT is NOT sub-1s in v0 — the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives shipped above are ready for the follow-up that refactors dispatch_inference to consume them end-to-end. x-ainfera-stream-mode: wrapped today; flips to native after the follow-up so SDK probes can observe TTFT.

Tests

tests/unit/test_streaming_wire_format.py — 6 pure tests against default stream_chat() wrapper + AIN-176→Anthropic finish_reason mapping + supports_native_streaming().
tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with:
- stream:true → 200 + text/event-stream + ordered Anthropic frames
- streaming writes §16 row on close
- streaming honors silent-alias resolver (parametrized × 3)
- non-empty tools passes through (no blanket 422)

Pre-commit ran ruff + ruff-format + mypy --strict + pytest (unit + smoke) — all green (505 tests).

Test plan

CI green
Branch preview: curl -N -X POST $URL/v1/messages -d '{"model":"ainfera-inference","stream":true,...}' → 200 + content-type: text/event-stream + sequence of event: message_start → event: content_block_* → event: message_delta → event: message_stop
Same with model:"ainfera-mithril" → 200 + Railway log router_alias_hit legacy=ainfera-mithril canonical=ainfera-inference
Anthropic SDK with ANTHROPIC_BASE_URL=https://api.ainfera.ai streams text successfully
tools:[{...}] request → 200 with content[] containing a tool_use block when the model invoked the tool; stop_reason: "tool_use"
Read-only select on routing_outcomes for the streamed call's agent_id returns exactly 1 new row

🤖 Generated with Claude Code

Note

Medium Risk
Introduces new streaming and tool-calling surfaces across adapters and the /v1/messages router, which affects core inference I/O shapes and error handling. While largely additive, changes touch routing results and provider adapters and could impact compatibility if SSE framing or tool translation is incorrect.

Overview
Enables /v1/messages to accept stream=true and return 200 text/event-stream with Anthropic-style SSE frames, implemented via a new services/streaming.py wrapper that replays a completed routed inference as message_start/content_block_*/message_delta/message_stop while preserving §16/audit/ledger invariants.

Adds tool-calling plumb-through: ProviderAdapter.chat() gains tools/tool_choice, responses can carry structured content_blocks, and OpenAI-compat providers translate tool_calls into Anthropic-shaped tool_use blocks; unsupported backends raise ToolsNotSupportedError which /v1/messages surfaces as a structured 422.

Adds native provider-level SSE parsing for Anthropic and OpenAI-compat adapters (stream_chat() + normalized StreamEvent), plus tests updating the Anthropic-compat integration suite for the new streaming behavior and adding unit coverage for stream fallback and stop-reason mapping.

^{Reviewed by Cursor Bugbot for commit 7281e42. Bugbot is set up for automated code reviews on this repo. Configure here.}

linear-code · 2026-05-23T16:32:36Z

AIN-271 [Phase 5 · §0 · AIN-309 (planning)] Gate: P1-WS2 prod deploy of /v1/messages streaming + tool-use

Hard gate for Phase 5. WS1+ cannot start until this is CERT GREEN.

⚠️ NAMING MANDATE SUPERSEDES (LOCKED 2026-05-23 PM, Disc#12)

Founder locked ainfera-inference as the flagship wire string and HARD-DELETE of ainfera-mithril + ainfera-auto + all Tolkien PRODUCT names (Mithril/Valinor/Galvorn). Products are now DESCRIPTIVE: Ainfera Inference (flagship), Ainfera OS, Ainfera Robotics. Agent BEINGS keep Tolkien (Varda etc).
Therefore this deploy MUST ship ainfera-inference as canonical — NOT ainfera-mithril. All 4 CC sessions (2026-05-23 AM) stamped ainfera-mithril; that is now the dead name. The deploy + the rename-recode sweep are coupled: do not deploy the dead string to prod.

Reconciled state 2026-05-23 PM (post 4-session consolidation)

Probe	Result
`POST /v1/messages` base (non-stream)	LIVE — 401 unauth (was 404). Shim deployed via api#68.
`POST /v1/messages` stream=true	501/404 — streaming NOT implemented (the remaining keystone)
`POST /v1/messages` tools=[...]	422/404 — tool-use NOT implemented

Base shim is deployed. Streaming + tool-use is the highest-value remaining build — it also closes the WS5 perceived-latency issue (33s is 98.5% model inference, NOT a bug; streaming gives <1s first-token). Unblocks Aulë SDK + the Phase-5 MAF target.

What needs to happen

Implement streaming + tool-use on /v1/messages (AIN-174 Phase B) — SSE piped not buffered; tool round-trip on both dialects.
Ship with canonical string ainfera-inference (mithril/auto accepted as inbound aliases ONLY during grace, then hard-removed — founder wants hard-delete, so confirm grace length).
Founder rules "router" wire-format (hard-cut to ainfera-inference recommended given hard-delete stance).
Founder pushes + Railway deploy + re-cert.

Owner

Founder-only: branch + commit + deploy. Aulë drafts PR + cert checklist + the streaming/tool-use implementation on request.

Linked

Repo-root MASTER_LOG.md WS0 · ainfera-os/MASTER_LOG_P2.md §0 · the descriptive-rename sweep (needs its own ticket once Linear 250-cap freed).

Review in Linear

cursor · 2026-05-23T16:34:48Z

        finish_reason=response.finish_reason,
        receipt_id=receipt.id,
        provider=provider_slug,  # AIN-126
+        content_blocks=list(response.content_blocks),


Tools omitted from inference dispatch

High Severity

The /v1/messages endpoint accepts tools and tool_choice but doesn't forward them to the underlying inference handler. This occurs in both non-streaming and streaming paths, causing tool definitions to be ignored. Consequently, tool_use functionality and ToolsNotSupportedError handling don't work as intended, making tool-enabled calls behave as text-only.

Additional Locations (1)

ainfera_api/routers/anthropic_compat.py#L309-L318

^{Reviewed by Cursor Bugbot for commit 5a57625. Configure here.}

cursor · 2026-05-23T16:34:48Z

+    # SDK clients read `content[]` and dispatch on `type` per block.
+    blocks: list[dict[str, Any]] = list(inf_resp.content_blocks or [])
+    if not blocks:
+        blocks = [{"type": "text", "text": inf_resp.content}]


Non-stream drops adapter content blocks

High Severity

The non-streaming /v1/messages endpoint discards tool_use blocks. post_inference doesn't correctly transfer structured tool_use content from DispatchResult.content_blocks to InferenceResponse.content_blocks, causing responses to fall back to a single text block.

^{Reviewed by Cursor Bugbot for commit 5a57625. Configure here.}

cursor · 2026-05-23T16:34:48Z

+            idempotency_key=idempotency_key,
+            caller_task_type=task_type,
+            request_id=request_id,
+        )


Streaming skips vendor model passthrough

High Severity

When stream:true is used with /v1/messages, requests are always routed through the brain, even when a specific vendor model is provided. This diverges from non-streamed requests, which honor vendor pins directly. As a result, streamed calls can lead to unexpected provider selection, billing, and audit behavior.

Additional Locations (1)

ainfera_api/routers/anthropic_compat.py#L300-L308

^{Reviewed by Cursor Bugbot for commit 5a57625. Configure here.}

cursor · 2026-05-23T16:34:48Z

+            },
+        )
+        yield _sse(_EVENT_MESSAGE_STOP, {"type": "message_stop"})
+        return


Stream errors after HTTP 200

Medium Severity

stream_messages only catches NoCandidateError and AllCandidatesFailedError after StreamingResponse already returned 200. CapViolationError, InsufficientFundsError, AgentNotActiveError, and ProviderError from dispatch_with_brain propagate out of the generator, unlike non-stream /v1/messages which maps them to 402, 409, 502, or 422 JSON errors.

^{Reviewed by Cursor Bugbot for commit 5a57625. Configure here.}

cursor · 2026-05-23T16:34:48Z

+            # inference then replay). Native end-to-end is the planned
+            # follow-up; the adapter-level primitives shipped here.
+            "x-ainfera-stream-mode": "wrapped",
+        },


Stream missing audit headers

Medium Severity

Streamed /v1/messages responses set x-ainfera-agent-id, x-ainfera-audit-url, and x-ainfera-stream-mode before the body runs, but omit x-ainfera-inference-id and x-ainfera-receipt-id that non-stream post_messages sets after dispatch completes.

^{Reviewed by Cursor Bugbot for commit 5a57625. Configure here.}

Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now honors `stream:true` (200 + text/event-stream with ordered Anthropic SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one `routing_outcomes` row plus the matching audit events plus the ledger debit. Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER that PR. ## Adapter contract lift - `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults None — back-compat preserved across all 5 adapters). - New `ProviderAdapter.stream_chat()` async generator yields normalized `StreamEvent`s. Default impl wraps `chat()` into one content_delta + one message_delta so adapters that don't yet override honor the contract surface. - New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`, `tool_use_delta`, `message_delta`. - New `ToolsNotSupportedError` — adapters that don't yet wire tool calling raise this at the adapter boundary; the handler maps it to a 422 with backend slug + remediation. - `AdapterResponse.content_blocks` added so tool_use round-trips through the non-streaming path too. ## Per-adapter native streaming - AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages` with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass through natively. - OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real native SSE against `/v1/chat/completions` with `stream:true` + `stream_options.include_usage`; translates `delta.tool_calls[]` → normalized tool_use events. - OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises ToolsNotSupportedError → 422 with backend slug. - GeminiAdapter / MistralAdapter: signature extended; inherit OpenAICompatAdapter native streaming. ## Streaming dispatch + /v1/messages - `services/streaming.py` runs the dispatcher to completion (full §16 capture + ledger + audit), then synthesizes Anthropic SSE frames from the resulting DispatchResult. v0 posture: `wrapped` (TTFT = full inference time); response header `x-ainfera-stream-mode` reports the mode so SDK clients can observe it. Adapter-level native streaming primitives in this same PR are ready for the follow-up that refactors `dispatch_inference` to consume them end-to-end (flipping the header to `native`). - `routers/anthropic_compat.py`: - Drops 501-on-stream → returns StreamingResponse with text/event-stream content-type. - Drops blanket 422-on-tools → tools pass through. Legacy code `tool_calling_not_supported_on_shim` retired; backends without tools surface `tools_not_supported_by_backend` with hint. - `MessagesResponse.content[]` polymorphic (text OR tool_use); SDK sees one shape across stream + non-stream. - Alias resolver honored on streamed calls (`_log_alias_hit` fires for the three SP-1 legacy strings). - Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`) set on streaming responses identical to non-streaming. ## Tests - tests/unit/test_streaming_wire_format.py — 6 pure tests against default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason mapping + `supports_native_streaming()` flag. - tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with SP-2 coverage: · stream:true → 200 + text/event-stream + ordered Anthropic frames · streaming writes §16 row on close · streaming honors silent-alias resolver (parametrized × 3) · non-empty tools passes through Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke all green (505 unit+smoke tests). ## SP-2 v0 honesty caveat Contract surface (200 text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity) is real and verified. TTFT is NOT sub-1s in v0 because the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives are in place; the follow-up refactors dispatch_inference to consume them end-to-end. `x-ainfera-stream-mode: wrapped` today → `native` after the follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

There are 8 total unresolved issues (including 5 from previous reviews).

^{Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.}

^{Reviewed by Cursor Bugbot for commit 7281e42. Configure here.}

cursor · 2026-05-23T23:02:59Z

+        blocks = [{"type": "text", "text": inf_resp.content}]
    return MessagesResponse(
        id=f"msg_{uuid4().hex[:24]}",
-        content=[_TextBlock(text=inf_resp.content)],


Tools never reach dispatch

High Severity

The Anthropic /v1/messages endpoint accepts tools and tool_choice, but these aren't fully forwarded to the inference dispatch logic. For non-streaming requests, they're omitted from the InferenceRequest. For streaming, stream_messages receives them but doesn't pass them to dispatch_with_brain. This prevents tool definitions from reaching backend providers.

^{Reviewed by Cursor Bugbot for commit 7281e42. Configure here.}

cursor · 2026-05-23T23:02:59Z

+            tenant_id=tenant.id,
+            flattened_msgs=flattened_msgs,
+            idempotency_key=idempotency_key,
+        )


Stream ignores vendor model

Medium Severity

With stream:true, every request goes through _serve_messages_stream and dispatch_with_brain, and body.model is never passed into dispatch. Non-stream calls use post_inference, which routes vendor-pinned models via direct dispatch_inference. Pinned models with streaming are treated as brain-routed ainfera-inference, breaking vendor passthrough parity.

Additional Locations (1)

ainfera_api/routers/anthropic_compat.py#L439-L455

^{Reviewed by Cursor Bugbot for commit 7281e42. Configure here.}

cursor · 2026-05-23T23:02:59Z

+            # OpenAI Chat Completions tools shape — surface that clearly
+            # to the dispatcher rather than silently dropping tools.
+            if tools:
+                raise ToolsNotSupportedError(adapter_slug=f"{self.slug}/responses")


Unsupported tools mark failure

Medium Severity

When tools are eventually passed to dispatch, ToolsNotSupportedError from the OpenAI responses path is handled like an unknown exception in dispatch_inference, triggering _finalize_failure before post_messages can map it to 422 tools_not_supported_by_backend, leaving a failed inference and refund alongside the client-facing 422.

^{Reviewed by Cursor Bugbot for commit 7281e42. Configure here.}

#80) * feat(api): SP-2 PR-A · AIN-271 streaming + tool-use lift on /v1/messages Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now honors `stream:true` (200 + text/event-stream with ordered Anthropic SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one `routing_outcomes` row plus the matching audit events plus the ledger debit. Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER that PR. ## Adapter contract lift - `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults None — back-compat preserved across all 5 adapters). - New `ProviderAdapter.stream_chat()` async generator yields normalized `StreamEvent`s. Default impl wraps `chat()` into one content_delta + one message_delta so adapters that don't yet override honor the contract surface. - New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`, `tool_use_delta`, `message_delta`. - New `ToolsNotSupportedError` — adapters that don't yet wire tool calling raise this at the adapter boundary; the handler maps it to a 422 with backend slug + remediation. - `AdapterResponse.content_blocks` added so tool_use round-trips through the non-streaming path too. ## Per-adapter native streaming - AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages` with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass through natively. - OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real native SSE against `/v1/chat/completions` with `stream:true` + `stream_options.include_usage`; translates `delta.tool_calls[]` → normalized tool_use events. - OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises ToolsNotSupportedError → 422 with backend slug. - GeminiAdapter / MistralAdapter: signature extended; inherit OpenAICompatAdapter native streaming. ## Streaming dispatch + /v1/messages - `services/streaming.py` runs the dispatcher to completion (full §16 capture + ledger + audit), then synthesizes Anthropic SSE frames from the resulting DispatchResult. v0 posture: `wrapped` (TTFT = full inference time); response header `x-ainfera-stream-mode` reports the mode so SDK clients can observe it. Adapter-level native streaming primitives in this same PR are ready for the follow-up that refactors `dispatch_inference` to consume them end-to-end (flipping the header to `native`). - `routers/anthropic_compat.py`: - Drops 501-on-stream → returns StreamingResponse with text/event-stream content-type. - Drops blanket 422-on-tools → tools pass through. Legacy code `tool_calling_not_supported_on_shim` retired; backends without tools surface `tools_not_supported_by_backend` with hint. - `MessagesResponse.content[]` polymorphic (text OR tool_use); SDK sees one shape across stream + non-stream. - Alias resolver honored on streamed calls (`_log_alias_hit` fires for the three SP-1 legacy strings). - Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`) set on streaming responses identical to non-streaming. ## Tests - tests/unit/test_streaming_wire_format.py — 6 pure tests against default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason mapping + `supports_native_streaming()` flag. - tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with SP-2 coverage: · stream:true → 200 + text/event-stream + ordered Anthropic frames · streaming writes §16 row on close · streaming honors silent-alias resolver (parametrized × 3) · non-empty tools passes through Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke all green (505 unit+smoke tests). ## SP-2 v0 honesty caveat Contract surface (200 text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity) is real and verified. TTFT is NOT sub-1s in v0 because the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives are in place; the follow-up refactors dispatch_inference to consume them end-to-end. `x-ainfera-stream-mode: wrapped` today → `native` after the follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): SP-4 PR-A · forward capture-coverage guard (AIN-244 instrumentation) (#73) * feat(api): SP-4 PR-A · forward capture-coverage guard for routed dispatches Adds the durable forward-coverage guarantee for §16 capture: every routed dispatch (canonical `ainfera-inference` OR any of the 3 SP-1 aliases) writes exactly one `routing_outcomes` row, regardless of outcome (success / reject / fallback / fail). Pinned passthroughs (vendor slugs) write zero AND carry a `router: "direct"` audit marker. Stacks on SP-2 PR-A (`feat/ain271-streaming-tooluse`, api#72) — that PR's stream-close capture path is the last exit covered by this guard. ## Moat-sensitive scope (read this first) This PR is **pure observability**. Per the SP-4 §1 guardrails: - ZERO change to routing decisions, scores, weights, thresholds, candidate ordering, `M_allowed`, `q_prior`, `q_empirical`, ruleset_hash. The diff against `services/routing_brain.py` and `services/routing.py` is **empty**. Verifiable: `git diff feat/ain271-streaming-tooluse..HEAD -- ainfera_api/services/routing*.py` shows no hunks. - `routing_outcomes` schema is unchanged. No new columns, no migration. The row is written by the existing `insert_decision()` / `complete_decision()` calls in `dispatch_with_brain` (§0/P3 walk-through confirmed every exit path already writes the row). - `routing/ainfera_routing/decide.py` is untouched. ## What's new 1. `ainfera_api/services/capture_invariant.py`: - `route_outcome_kind(model_slug) -> "routed" | "passthrough"` — pure classifier keyed off the SP-1 alias resolver's `ROUTING_TARGETS`, so any string added to the resolver becomes "routed" without a second edit. - `assert_capture_invariant(db, inference_id, kind)` — read-only post-condition check the test sweep runs after every probe. Raises `CaptureInvariantViolationError` with diagnostic context when a routed call returns without a row or a passthrough produces one unexpectedly. - `find_passthrough_audit_event()` — helper for the test sweep to assert the `router: "direct"` marker is present. - `DispatchCaptureCounter.dispatch_without_capture_total` — the headline regression signal. Stays 0 in green builds; production scrape (future Prometheus surface) alerts on any non-zero. 2. `tests/unit/test_capture_invariant.py` — 9 pure tests locking the classifier (canonical + 3 aliases → routed; vendor slugs + typos → passthrough) + the counter semantics (routed-miss bumps the regression signal; passthrough-captured-unexpectedly bumps the contamination signal; reset zeros everything). 3. `tests/integration/test_capture_coverage.py` — parametrized sweep that drives a routed-success call for EACH of the 4 routing targets, a reject-floor routed call, and passthrough calls against two vendor slugs (anthropic native + openai). After each, asserts: - routed success → exactly 1 routing_outcomes row, `outcome_status='succeeded'` - reject path → 1 row, `outcome_status='rejected_floor'`, `inference_id IS NULL` (the only branch where it's NULL by design — see RoutingOutcomeORM docstring) - passthrough → 0 rows AND `router: "direct"` in the audit chain (distinguishes a properly-bypassed passthrough from a routed call that silently lost its row) Plus a coverage-sweep test that asserts `DispatchCaptureCounter.dispatch_without_capture_total == 0` at the end of a mixed dispatch sequence. ## §0/P2 denominator finding (documented for the audit chain) Live read against Supabase `dftfpwzqxoebwzepygzl`: - 778 historical inferences / 5 routing_outcomes rows - 0 historical `request_payload.model` was a routing string (ainfera-inference / ainfera-mithril / ainfera-auto / ainfera/auto) - ALL 778 were pinned passthroughs — vendor slugs (claude-opus-4-7 x220, gpt-5-5 x189, claude-haiku-4-5 x105, ...) - The 3 succeeded outcome rows are integration-test side effects **The 773-row "gap" is honest fleet posture, not a capture failure.** The fleet's been on pinned passthroughs (AULE_PLANNER / YAVANNA_X_MODEL opt-outs). No backfill is owed (§D3). PR-A's value is the forward GUARANTEE: every NEW routed call going forward writes exactly one row. ## Pre-commit ruff + ruff-format + mypy --strict + pytest tests/unit + tests/smoke all green (523 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api): SP-CLOSE · capture-invariant uses AuditEventType enum (not raw string) Same class as the dashboard.py:127 fix landed in #71. The capture-invariant service + integration test compared `AuditEventORM.event_type == "inference_routed"` (underscored Python name), but the actual DB enum value is `inference.routed` (dotted) per migration 20260514_0001. Postgres rejected the literal with: invalid input value for enum audit_event_type: "inference_routed" Fix: pass `AuditEventType.inference_routed` (the enum *member*) instead of the raw string — SQLAlchemy's `values_callable` resolves it to the correct DB value (`inference.routed`). Docstring updated to spell the dotted form for any future reader. Unblocks the SP-4 PR-A integration tests: test_capture_coverage.py::test_passthrough_writes_zero_outcome_rows_and_router_direct_audit No engine touch, no routing_outcomes touch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): SP-4 PR-B · routing_preference dial — balanced byte-identical, quality/cost gated (AIN-244 dial) (#74) Exposes `routing_preference: "quality" | "balanced" | "cost"` in the routing_hint body as sugar over the existing caps. **`balanced` is byte-identical to today's behavior** (the dial is a no-op when balanced is selected — proved by the parametrized regression lock in the test file). **`quality` / `cost` are accepted on the wire but INERT** until the env gate `AINFERA_ROUTING_PREFERENCE_LIVE=1` is set (founder Disc#12 authorization of the lever values). Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent of SP-4 PR-A (#73 capture-coverage). ## Moat-sensitive scope · Disc#12 boundary This PR is Disc#12-adjacent — the dial CAN change routing decisions once the env gate is on. To stay safe: - The default (gate OFF) means `quality`/`cost` resolve to today's policy IDENTICALLY to `balanced`. SP-4 ships with the gate OFF. - Explicit caller `min_quality` always wins. The dial only nudges the default-derived floor — a quality-conscious caller never has their floor silently lowered by a `cost` preference. - Safety clamps: dial output is bounded by [good=0.50, frontier=0.85] so neither lever can exclude every voter or admit a sub-floor model. - Pure-function `_apply_preference()` is deterministic — same input → same output, testable without the brain. ## Proposed mapping (Aulë's conservative starting point — founder authorizes) `balanced` — no-op. Resolves exactly as today. `quality` — bump default min_quality by +0.10 (default 0.50 → 0.60), clamped to the `frontier` tier (0.85). Caller's explicit `min_quality` wins if higher. `cost` — drop default min_quality by -0.10, clamped to the `good` tier (0.50). Caller's explicit `min_quality` wins if higher. Both bumps are conservative: ≤0.10 delta, with hard safety clamps. No weighted-λ, no score surgery, no candidate-ordering changes. The dial moves the FLOOR; the engine still picks cheapest-clearing-floor. The founder reviews + authorizes the exact lever values in this PR. Once signed off, `railway env set AINFERA_ROUTING_PREFERENCE_LIVE=1` on the api service flips the gate ON. Until then, only `balanced` ships live behavior. ## What's new - `services/routing_brain.py`: - `VALID_PREFERENCES` frozenset + `DEFAULT_PREFERENCE = "balanced"`. - `_apply_preference(base_min_q, preference) -> Decimal` — pure function honoring the gate-off semantic. - `_routing_preference_live()` — env-var read at call time so ops can flip the gate without restart. - `_PREFERENCE_FLOOR_DELTA` + safety clamps `_SAFETY_MIN_QUALITY` + `_SAFETY_MAX_QUALITY` (= good / frontier tier numerics). - `resolve_policy()` reads `routing_preference` from the hint and applies the dial ONLY when the caller did NOT pass an explicit `min_quality` — preserves caller-intent-wins semantics. - `models/inference.py`: `InferenceRequest.routing_hint` description documents the new key (so it surfaces in openapi.json). - `tests/unit/test_routing_preference_dial.py`: - 8-case parametrized **byte-identical regression lock** for `balanced` — the moat invariant. Any divergence fails the build. - Dial-inert-when-gate-off coverage × all 3 preferences. - Dial-active mapping × bumps + clamps + explicit-caller-wins. - Unknown / typo preference values fall through to `balanced`. - 23 tests; all pure (no DB). ## Pre-commit ruff + ruff-format + mypy --strict + pytest unit+smoke = 528 green. ## Out of scope (per SP-4 §1) - methodology v1.3 changes - weights / λ-blending - online learning (AIN-246 — Backlog/deferred) - `M_allowed` / `q_prior` / `q_empirical` semantics - engine code in `routing/ainfera_routing/decide.py` — untouched ## Public copy (founder/Varda) Drafted README/STRATEGY paragraph for the routing repo describing the dial — see `docs/routing-preference.md` in the next PR after founder sign-off on the mapping values. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor Bot reviewed May 23, 2026

View reviewed changes

hizrianraz force-pushed the feat/ain271-streaming-tooluse branch from 5a57625 to 7281e42 Compare May 23, 2026 23:00

cursor Bot reviewed May 23, 2026

View reviewed changes

hizrianraz merged commit 10e1cd8 into chore/sp1-inference-rename May 24, 2026
4 checks passed

hizrianraz mentioned this pull request May 24, 2026

fix(api): SP-Ω recovery · land #72+#73+#74 (stack-merge orphan rescue) #80

Merged

Conversation

hizrianraz commented May 23, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What lands

SP-2 v0 honesty caveat

Tests

Test plan

Uh oh!

linear-code Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ NAMING MANDATE SUPERSEDES (LOCKED 2026-05-23 PM, Disc#12)

Reconciled state 2026-05-23 PM (post 4-session consolidation)

What needs to happen

Owner

Linked

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Tools omitted from inference dispatch

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Non-stream drops adapter content blocks

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Streaming skips vendor model passthrough

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Stream errors after HTTP 200

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Stream missing audit headers

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Tools never reach dispatch

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Stream ignores vendor model

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Unsupported tools mark failure

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hizrianraz commented May 23, 2026 •

edited by cursor Bot

Loading

linear-code Bot commented May 23, 2026 •

edited

Loading