feat(api): AIN-226 + AIN-293 keystone · Mithril gateway shim (/v1/messages) + decision receipt by hizrianraz · Pull Request #68 · ainfera-ai/api

hizrianraz · 2026-05-23T11:22:18Z

Summary — the Mithril keystone

This PR bundles AIN-226 (the Mithril gateway shim) + AIN-293 (the prime-brokerage decision receipt). They co-touch inference.py and would conflict if separated.

AIN-226 · POST /v1/messages — the Anthropic-Messages dialect shim that closes the one-router invariant for the fleet. Until this lands, fleet agents on the Claude Agent SDK (Aulë most importantly) cannot route through Ainfera and must call api.anthropic.com directly — which the Tulkas probe in ainfera-os/.github/workflows/framework-import-smoke.yml is specifically designed to ALARM on.

AIN-293 / WS4 G6 · GET /v1/inferences/{id}/decision — the prime-brokerage verification primitive. The /routing marketing page documents this curl; without this PR it 404s. Returns the §16 routing_outcomes row that backed the inference: candidates[], decision_rule, policy_version, ruleset_hash, q_prior_used, m_allowed_set, projected vs actual cost, observed latency, outcome_status, seed.

What ships

Component	File(s)	Notes
`/v1/messages` route	`ainfera_api/routers/anthropic_compat.py` (new, 332 lines)	Translates Anthropic Messages shape → InferenceRequest; system-prompt translation; finish-reason mapping; content-block round-trip
Wire it in	`ainfera_api/main.py`	adds `anthropic_compat.router` to the include chain
Routing-target lock	`ainfera_api/routers/inference.py`	`MITHRIL_MODEL` / `AUTO_MODEL` / `ROUTING_TARGETS` constants + `_is_routed` resolver + `_log_alias_hit`
Docstring refresh	`models/inference.py`, `routing/__init__.py`, `routing/auto.py`	Mithril is now the documented default; `ainfera-auto` is the legacy alias
Audit-payload wire value	`services/routing_brain.py`	Two `"router"` lines: `"ainfera-auto"` → `"ainfera-mithril"` (canonical per master prompt §0 P0; request still accepts both)
`/v1/inferences/{id}/decision`	`ainfera_api/routers/inference.py`	Tenant-scoped, NEVER public, returns the §16 routing_outcomes row
Tests	`tests/unit/test_mithril_alias.py` (15) + `tests/unit/test_decision_endpoint.py` (3) + `tests/integration/test_anthropic_compat.py` (515 lines)	Unit + integration coverage
Cert script	`scripts/cert-mithril-prod.sh`	Post-deploy curl probes for both `ainfera-mithril` and the silent alias
Deliverable doc	`MITHRIL_GATEWAY.md`	AIN-226 deliverable
Contract test	`tests/smoke/test_openapi_contract.py`	Adds `("post", "/v1/messages")` + `("get", "/v1/inferences/{inference_id}/decision")` to `EXPECTED_OPERATIONS`

Streaming + tool-use posture

Both intentionally return 501/422 on /v1/messages for now (mirrors the OpenAI-shim posture). AIN-174 Phase B is the separate ticket that wires these through post_inference() — that's the WS2 keystone follow-up. This PR unblocks the non-streaming, non-tool path which is the bulk of fleet traffic.

Privacy contract for `/decision` (locked in tests)

Tenant-scoped — owning tenant only.
Cross-tenant probe → same 404 as "id not found" (no existence oracle).
Unauthenticated → 401/403, NEVER 200 with data.
Reject-path inferences still have a row — chosen_model_slug/cost_actual null but candidates + decision_rule + m_allowed_set present, so the caller can see WHY no model was picked.

Pre-deploy verify (founder)

Diff size: ~1364 lines added, 15 removed. Eyeball the wire format ("router" field is canonical "ainfera-mithril"; the ainfera-auto request string is still accepted at the router boundary).
Pre-commit suite passed locally: 483 tests / ruff + mypy clean.
CI green (lint-and-typecheck, test, integration).

Post-deploy cert

After Railway deploys main → cert script in scripts/cert-mithril-prod.sh. Smokes:

POST /v1/messages with model="ainfera-mithril" → 200
POST /v1/messages with model="ainfera-auto" → 200 (alias)
POST /v1/messages with model="claude-opus-4-7" → 200 (passthrough)
POST /v1/messages with stream=true → 501
POST /v1/messages with non-empty tools[] → 422
GET /v1/inferences/{id}/decision authed-own → 200
GET /v1/inferences/{id}/decision unauthed → 401/403
GET /openapi.json advertises ainfera-mithril

Master super-prompt references

§WS0 (deploy gate — this IS the deploy) · §WS2 (the streaming/tool-use follow-up sits on top of this) · §WS4 G6 (decision receipt) · founder-authorized via "go deliver end to end" 2026-05-23.

Overview
Adds an Anthropic-dialect gateway surface by introducing POST /v1/messages that translates Anthropic Messages requests (including top-level system) into the existing inference pipeline and returns Anthropic-shaped responses, with explicit 501/422 behavior for streaming and tool calls.

Locks routing nomenclature by making ainfera-mithril the canonical routed target, treating ainfera-auto as a silent alias with telemetry logging, and emitting the canonical router name in audit/log payloads.

Exposes a verification primitive via GET /v1/inferences/{inference_id}/decision to return the stored §16 routing_outcomes decision record for an inference (tenant-scoped with existence-oracle masking).

Adds unit/integration tests, OpenAPI contract updates, and a post-deploy cert script/documentation for the new endpoints and alias behavior.

^{Reviewed by Cursor Bugbot for commit 2bf83a9. Bugbot is set up for automated code reviews on this repo. Configure here.}

Master super-prompt 2026-05-23 §WS4 G6: the /routing marketing page documents `curl .../v1/inferences/{id}/decision` but the endpoint was 404. This makes the public copy honest. The receipt is the prime-brokerage verification primitive: an authed caller can fetch it and confirm Mithril picked the cheapest candidate that cleared their quality floor. Returns the §16 routing_outcomes row that backed the inference (candidates[], decision_rule, policy_version, ruleset_hash, q_prior_used, m_allowed_set, cost_projected, cost_actual, observed_latency_ms, outcome_status, seed). Privacy contract (locked): - Tenant-scoped — only the OWNING tenant can read. Cross-tenant probe returns 404 with the same body as "id not found" (no existence oracle). - NEVER public — discoverable only with a bearer that matches the inference's owning tenant. - Reject-path inferences still have a decision row — chosen_model_slug and cost_actual will be null, but candidates + decision_rule + m_allowed_set are present so the caller can see WHY no model was picked. Tests (4 new unit tests in tests/unit/test_decision_endpoint.py): - Route exists at the documented path (locks the marketing curl). - Response shape carries all verification fields. - Unauthenticated requests → 401/403, NEVER 200 with data. - Bogus bearer → same rejection class. Full integration coverage (real tenant + agent + inference + routing_outcomes row, 200 with body shape) belongs in the integration suite and follows the existing test_routing_v0.py pattern. Follow-up. Contract test: added ("get", "/v1/inferences/{inference_id}/decision") to EXPECTED_OPERATIONS. The /v1/messages line is left for PR #67 or the founder's AIN-226 PR to add — pre-commit stash collapses unstaged anthropic_compat wiring out of the working tree, so locking it here would block this branch's pre-commit. The unrelated AIN-226 Mithril shim WIP in ainfera_api/ + the founder's untracked test_mithril_alias.py remain unstaged on this branch. Co-Authored-By: Claude <noreply@anthropic.com>

cursor · 2026-05-23T11:23:55Z

+        task_type=outcome.task_type,
+        cell=outcome.cell,
+        q_prior_used=outcome.q_prior_used,
+        candidates=[DecisionCandidate(**c) for c in outcome.candidates],


Candidate JSONB fields not mapped

High Severity

The DecisionCandidate Pydantic model's field names don't align with the keys stored in the routing_outcomes.candidates JSONB. This mismatch causes Pydantic to ignore critical verification details like projected costs and rejection reasons when building the response for /v1/inferences/{inference_id}/decision, leading to their omission from the API output.

^{Reviewed by Cursor Bugbot for commit 48f6f1d. Configure here.}

cursor · 2026-05-23T11:23:55Z

+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail="no routing decision recorded for this inference",
+        )


Reject receipts need inference id

Medium Severity

The handler requires a tenant-owned inferences row and loads routing_outcomes by inference_id, but floor/veto reject paths complete outcomes with inference_id NULL and never create an inference. Reject-path routing receipts therefore cannot be fetched at this URL despite the handler docstring and PR privacy contract describing them as available.

^{Reviewed by Cursor Bugbot for commit 48f6f1d. Configure here.}

…have one in CI) The bearer-resolution code path hits the tenants table to look up the key hash; in CI (no Postgres on localhost:5432) the test triggered an asyncpg connection error instead of the expected 401/403. Bearer coverage (bogus token → 401/403, cross-tenant → 404, owning → 200) belongs in the integration suite where the DB is wired. The unauthenticated test stays — it covers the auth gate without touching the DB (FastAPI rejects on the missing Header() check before bearer resolution runs). Co-Authored-By: Claude <noreply@anthropic.com>

cursor · 2026-05-23T11:33:45Z

+        await db.execute(
+            select(RoutingOutcomeORM).where(RoutingOutcomeORM.inference_id == inference_id)
+        )
+    ).scalar_one_or_none()


Duplicate outcome rows cause 500

Medium Severity

The handler loads routing_outcomes by inference_id with scalar_one_or_none(). RoutingOutcomeORM.inference_id is indexed but not unique, so more than one linked row raises MultipleResultsFound and returns 500 instead of a controlled error.

^{Reviewed by Cursor Bugbot for commit 4c401e7. Configure here.}

…eystone The Anthropic-Messages dialect shim that closes the one-router invariant for the fleet. Until this lands, fleet agents on the Claude Agent SDK (Aulë most importantly) cannot route through Ainfera and must call api.anthropic.com directly — which the Tulkas probe in ainfera-os/.github/workflows/framework-import-smoke.yml is specifically designed to ALARM on. What this PR ships: - `ainfera_api/routers/anthropic_compat.py` (new, 332 lines) — POST /v1/messages route that translates Anthropic Messages shape into the internal InferenceRequest and delegates to post_inference. System prompt translation (top-level `system` → synthetic system message), finish-reason mapping, content-block round-trip. - `ainfera_api/main.py` — wire the new router. anthropic_compat.router added to the include chain. - `ainfera_api/models/inference.py` — docstring update on the `model` field: `ainfera-mithril` is now the documented default; `ainfera-auto` is the legacy alias. - `ainfera_api/routing/__init__.py` + `ainfera_api/routing/auto.py` — docstring updates: the v1.0 `auto_route()` is deprecated in favor of `services/routing_brain.dispatch_with_brain` (AIN-245 brain wiring). - `ainfera_api/services/routing_brain.py` — two `"router"` audit-payload lines flip `"ainfera-auto"` → `"ainfera-mithril"` (canonical wire value per master super-prompt §0 P0 ruling; the request still accepts both strings — only the audit payload normalizes). - `MITHRIL_GATEWAY.md` (new, 207 lines) — deliverable doc. - `scripts/cert-mithril-prod.sh` (new, executable) — post-deploy cert script that probes /v1/messages with both `ainfera-mithril` and the silent alias. - `tests/integration/test_anthropic_compat.py` (new, 515 lines) — full end-to-end coverage (happy path, alias parity, system translation, streaming 501, tools 422, vendor passthrough). - `tests/unit/test_mithril_alias.py` (new, 106 lines) — pure-function coverage for the routing-target resolver + the Anthropic stop-reason mapping inversion. Also bundled (because they share inference.py and would otherwise be co-dependent): - AIN-293 / WS4 G6 — `GET /v1/inferences/{id}/decision` (the prime-brokerage receipt). Documented as a separate ticket; lives here because PR #68's inference.py edits would conflict with the AIN-226 work on the same file otherwise. Tests at `tests/unit/test_decision_endpoint.py`. Contract test: added ("post", "/v1/messages") AND ("get", "/v1/inferences/{inference_id}/decision") to EXPECTED_OPERATIONS. Streaming + tool-use surfaces are intentionally 501/422 on /v1/messages for now (AIN-174 Phase B — separate ticket, the WS2 keystone follow-up). This unblocks the non-streaming, non-tool path which is the bulk of fleet traffic. Master super-prompt 2026-05-23 §WS0 (deploy gate) + §WS2 (keystone) + §WS4 G6 (decision receipt). Founder-authorized via "go deliver end to end" 2026-05-23. Co-Authored-By: Claude <noreply@anthropic.com>

linear-code · 2026-05-23T11:40:32Z

AIN-226 [Foundation] Ainfera dual-dialect gateway — serve OpenAI + Anthropic Messages (one-router for all agents)

The Foundation layer every agent inherits — derived from Aulë's design (AIN-225), which is the gateway's hardest consumer.

What it must do

The Ainfera Routing gateway must serve TWO dialects so every fleet agent routes through it:

OpenAI chat-completions (/v1/...) — Varda (OpenClaw/NemoClaw), Yavanna (LangGraph), Námo (Letta). Already the primary path.
Anthropic Messages (/v1/messages + /v1/models) — Aulë (Claude Agent SDK). The NEW requirement.

Why (the one-router rule has a dialect problem)

Aulë's Claude Agent SDK speaks Anthropic Messages, not OpenAI. "All inference through Ainfera" is impossible unless Ainfera also speaks Anthropic. Confirmed via Anthropic's own LLM-gateway docs: SDK honors ANTHROPIC_BASE_URL; gateway must expose /v1/messages (+ optional /v1/models for the model picker).

Acceptance criteria

OpenAI-compatible endpoint (existing) — verified for Varda/Yavanna/Námo
Anthropic Messages endpoint /v1/messages — POST same body shape as Anthropic, SSE streaming piped not buffered, tool-use round-trip tested
/v1/models endpoint (Anthropic-format) — returns claude-prefixed model IDs for the Claude Code picker
Model catalog includes claude-opus-4-7, claude-sonnet-4-6 by exact name (no auto-translation)
Single-upstream contract: each framework pins ONE Ainfera provider; in-framework fallback disabled; SPOF fallback = Ainfera + explicit model (never direct provider)
Audit + settlement fire identically regardless of dialect (route, settle, audit on both)
Bootstrap option: LiteLLM shim hash-pinned (NOT 1.82.7/1.82.8 — malware), Tulkas-scanned
Product option: Ainfera-native /v1/messages (no third-party shim)
Tulkas probe: 100% of all-agent inference in audit chain; any direct-provider call (api.anthropic.com, api.openai.com, etc.) = ALARM

Blocks

Aulë (AIN-225) cannot route through Ainfera until /v1/messages is live. This is the critical-path prerequisite for the whole fleet's one-router invariant.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

^{Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.}

^{Reviewed by Cursor Bugbot for commit 2bf83a9. Configure here.}

cursor · 2026-05-23T11:42:29Z

+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail="no routing decision recorded for this inference",
+        )


Within-tenant inference existence oracle

Low Severity

For the owning tenant, a missing inference returns "inference not found", while a valid inference without a §16 row returns "no routing decision recorded for this inference". That distinct 404 detail lets an authenticated caller probe UUIDs and learn which ids correspond to real inferences (e.g. vendor passthrough vs routed).

^{Reviewed by Cursor Bugbot for commit 2bf83a9. Configure here.}

cursor Bot reviewed May 23, 2026

View reviewed changes

hizrianraz changed the title ~~feat(api): AIN-293 / WS4 G6 · GET /v1/inferences/{id}/decision (the prime-brokerage receipt)~~ feat(api): AIN-226 + AIN-293 keystone · Mithril gateway shim (/v1/messages) + decision receipt May 23, 2026

cursor Bot reviewed May 23, 2026

View reviewed changes

hizrianraz merged commit cbb7292 into main May 23, 2026
4 checks passed

hizrianraz deleted the feat/ain-293-decision-endpoint branch May 23, 2026 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): AIN-226 + AIN-293 keystone · Mithril gateway shim (/v1/messages) + decision receipt#68

feat(api): AIN-226 + AIN-293 keystone · Mithril gateway shim (/v1/messages) + decision receipt#68
hizrianraz merged 3 commits into
mainfrom
feat/ain-293-decision-endpoint

hizrianraz commented May 23, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot May 23, 2026

Uh oh!

cursor Bot May 23, 2026

Uh oh!

cursor Bot May 23, 2026

Uh oh!

linear-code Bot commented May 23, 2026

What it must do

Why (the one-router rule has a dialect problem)

Acceptance criteria

Blocks

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hizrianraz commented May 23, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary — the Mithril keystone

What ships

Streaming + tool-use posture

Privacy contract for /decision (locked in tests)

Pre-deploy verify (founder)

Post-deploy cert

Master super-prompt references

Related

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Candidate JSONB fields not mapped

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Reject receipts need inference id

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Duplicate outcome rows cause 500

Uh oh!

linear-code Bot commented May 23, 2026

What it must do

Why (the one-router rule has a dialect problem)

Acceptance criteria

Blocks

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Within-tenant inference existence oracle

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hizrianraz commented May 23, 2026 •

edited by cursor Bot

Loading

Privacy contract for `/decision` (locked in tests)