feat(api): AIN-226 + AIN-293 keystone · Mithril gateway shim (/v1/messages) + decision receipt#68
Conversation
Master super-prompt 2026-05-23 §WS4 G6: the /routing marketing page
documents `curl .../v1/inferences/{id}/decision` but the endpoint was
404. This makes the public copy honest.
The receipt is the prime-brokerage verification primitive: an authed
caller can fetch it and confirm Mithril picked the cheapest candidate
that cleared their quality floor. Returns the §16 routing_outcomes row
that backed the inference (candidates[], decision_rule, policy_version,
ruleset_hash, q_prior_used, m_allowed_set, cost_projected, cost_actual,
observed_latency_ms, outcome_status, seed).
Privacy contract (locked):
- Tenant-scoped — only the OWNING tenant can read. Cross-tenant probe
returns 404 with the same body as "id not found" (no existence
oracle).
- NEVER public — discoverable only with a bearer that matches the
inference's owning tenant.
- Reject-path inferences still have a decision row — chosen_model_slug
and cost_actual will be null, but candidates + decision_rule +
m_allowed_set are present so the caller can see WHY no model was
picked.
Tests (4 new unit tests in tests/unit/test_decision_endpoint.py):
- Route exists at the documented path (locks the marketing curl).
- Response shape carries all verification fields.
- Unauthenticated requests → 401/403, NEVER 200 with data.
- Bogus bearer → same rejection class.
Full integration coverage (real tenant + agent + inference +
routing_outcomes row, 200 with body shape) belongs in the integration
suite and follows the existing test_routing_v0.py pattern. Follow-up.
Contract test: added ("get", "/v1/inferences/{inference_id}/decision")
to EXPECTED_OPERATIONS. The /v1/messages line is left for PR #67 or
the founder's AIN-226 PR to add — pre-commit stash collapses unstaged
anthropic_compat wiring out of the working tree, so locking it here
would block this branch's pre-commit.
The unrelated AIN-226 Mithril shim WIP in ainfera_api/ + the founder's
untracked test_mithril_alias.py remain unstaged on this branch.
Co-Authored-By: Claude <noreply@anthropic.com>
| task_type=outcome.task_type, | ||
| cell=outcome.cell, | ||
| q_prior_used=outcome.q_prior_used, | ||
| candidates=[DecisionCandidate(**c) for c in outcome.candidates], |
There was a problem hiding this comment.
Candidate JSONB fields not mapped
High Severity
The DecisionCandidate Pydantic model's field names don't align with the keys stored in the routing_outcomes.candidates JSONB. This mismatch causes Pydantic to ignore critical verification details like projected costs and rejection reasons when building the response for /v1/inferences/{inference_id}/decision, leading to their omission from the API output.
Reviewed by Cursor Bugbot for commit 48f6f1d. Configure here.
| raise HTTPException( | ||
| status_code=status.HTTP_404_NOT_FOUND, | ||
| detail="no routing decision recorded for this inference", | ||
| ) |
There was a problem hiding this comment.
Reject receipts need inference id
Medium Severity
The handler requires a tenant-owned inferences row and loads routing_outcomes by inference_id, but floor/veto reject paths complete outcomes with inference_id NULL and never create an inference. Reject-path routing receipts therefore cannot be fetched at this URL despite the handler docstring and PR privacy contract describing them as available.
Reviewed by Cursor Bugbot for commit 48f6f1d. Configure here.
…have one in CI) The bearer-resolution code path hits the tenants table to look up the key hash; in CI (no Postgres on localhost:5432) the test triggered an asyncpg connection error instead of the expected 401/403. Bearer coverage (bogus token → 401/403, cross-tenant → 404, owning → 200) belongs in the integration suite where the DB is wired. The unauthenticated test stays — it covers the auth gate without touching the DB (FastAPI rejects on the missing Header() check before bearer resolution runs). Co-Authored-By: Claude <noreply@anthropic.com>
| await db.execute( | ||
| select(RoutingOutcomeORM).where(RoutingOutcomeORM.inference_id == inference_id) | ||
| ) | ||
| ).scalar_one_or_none() |
There was a problem hiding this comment.
Duplicate outcome rows cause 500
Medium Severity
The handler loads routing_outcomes by inference_id with scalar_one_or_none(). RoutingOutcomeORM.inference_id is indexed but not unique, so more than one linked row raises MultipleResultsFound and returns 500 instead of a controlled error.
Reviewed by Cursor Bugbot for commit 4c401e7. Configure here.
…eystone
The Anthropic-Messages dialect shim that closes the one-router invariant
for the fleet. Until this lands, fleet agents on the Claude Agent SDK
(Aulë most importantly) cannot route through Ainfera and must call
api.anthropic.com directly — which the Tulkas probe in
ainfera-os/.github/workflows/framework-import-smoke.yml is specifically
designed to ALARM on.
What this PR ships:
- `ainfera_api/routers/anthropic_compat.py` (new, 332 lines) —
POST /v1/messages route that translates Anthropic Messages shape into
the internal InferenceRequest and delegates to post_inference. System
prompt translation (top-level `system` → synthetic system message),
finish-reason mapping, content-block round-trip.
- `ainfera_api/main.py` — wire the new router. anthropic_compat.router
added to the include chain.
- `ainfera_api/models/inference.py` — docstring update on the `model`
field: `ainfera-mithril` is now the documented default; `ainfera-auto`
is the legacy alias.
- `ainfera_api/routing/__init__.py` + `ainfera_api/routing/auto.py` —
docstring updates: the v1.0 `auto_route()` is deprecated in favor of
`services/routing_brain.dispatch_with_brain` (AIN-245 brain wiring).
- `ainfera_api/services/routing_brain.py` — two `"router"` audit-payload
lines flip `"ainfera-auto"` → `"ainfera-mithril"` (canonical wire
value per master super-prompt §0 P0 ruling; the request still accepts
both strings — only the audit payload normalizes).
- `MITHRIL_GATEWAY.md` (new, 207 lines) — deliverable doc.
- `scripts/cert-mithril-prod.sh` (new, executable) — post-deploy cert
script that probes /v1/messages with both `ainfera-mithril` and the
silent alias.
- `tests/integration/test_anthropic_compat.py` (new, 515 lines) — full
end-to-end coverage (happy path, alias parity, system translation,
streaming 501, tools 422, vendor passthrough).
- `tests/unit/test_mithril_alias.py` (new, 106 lines) — pure-function
coverage for the routing-target resolver + the Anthropic stop-reason
mapping inversion.
Also bundled (because they share inference.py and would otherwise be
co-dependent):
- AIN-293 / WS4 G6 — `GET /v1/inferences/{id}/decision` (the
prime-brokerage receipt). Documented as a separate ticket; lives
here because PR #68's inference.py edits would conflict with the
AIN-226 work on the same file otherwise. Tests at
`tests/unit/test_decision_endpoint.py`.
Contract test: added ("post", "/v1/messages") AND
("get", "/v1/inferences/{inference_id}/decision") to EXPECTED_OPERATIONS.
Streaming + tool-use surfaces are intentionally 501/422 on /v1/messages
for now (AIN-174 Phase B — separate ticket, the WS2 keystone follow-up).
This unblocks the non-streaming, non-tool path which is the bulk of
fleet traffic.
Master super-prompt 2026-05-23 §WS0 (deploy gate) + §WS2 (keystone) +
§WS4 G6 (decision receipt). Founder-authorized via "go deliver end to
end" 2026-05-23.
Co-Authored-By: Claude <noreply@anthropic.com>
AIN-226 [Foundation] Ainfera dual-dialect gateway — serve OpenAI + Anthropic Messages (one-router for all agents)
The Foundation layer every agent inherits — derived from Aulë's design (AIN-225), which is the gateway's hardest consumer. What it must doThe Ainfera Routing gateway must serve TWO dialects so every fleet agent routes through it:
Why (the one-router rule has a dialect problem)Aulë's Claude Agent SDK speaks Anthropic Messages, not OpenAI. "All inference through Ainfera" is impossible unless Ainfera also speaks Anthropic. Confirmed via Anthropic's own LLM-gateway docs: SDK honors ANTHROPIC_BASE_URL; gateway must expose /v1/messages (+ optional /v1/models for the model picker). Acceptance criteria
BlocksAulë (AIN-225) cannot route through Ainfera until /v1/messages is live. This is the critical-path prerequisite for the whole fleet's one-router invariant. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.
Reviewed by Cursor Bugbot for commit 2bf83a9. Configure here.
| raise HTTPException( | ||
| status_code=status.HTTP_404_NOT_FOUND, | ||
| detail="no routing decision recorded for this inference", | ||
| ) |
There was a problem hiding this comment.
Within-tenant inference existence oracle
Low Severity
For the owning tenant, a missing inference returns "inference not found", while a valid inference without a §16 row returns "no routing decision recorded for this inference". That distinct 404 detail lets an authenticated caller probe UUIDs and learn which ids correspond to real inferences (e.g. vendor passthrough vs routed).
Reviewed by Cursor Bugbot for commit 2bf83a9. Configure here.


Summary — the Mithril keystone
This PR bundles AIN-226 (the Mithril gateway shim) + AIN-293 (the prime-brokerage decision receipt). They co-touch
inference.pyand would conflict if separated.AIN-226 · POST /v1/messages — the Anthropic-Messages dialect shim that closes the one-router invariant for the fleet. Until this lands, fleet agents on the Claude Agent SDK (Aulë most importantly) cannot route through Ainfera and must call
api.anthropic.comdirectly — which the Tulkas probe inainfera-os/.github/workflows/framework-import-smoke.ymlis specifically designed to ALARM on.AIN-293 / WS4 G6 · GET /v1/inferences/{id}/decision — the prime-brokerage verification primitive. The
/routingmarketing page documents this curl; without this PR it 404s. Returns the §16routing_outcomesrow that backed the inference:candidates[],decision_rule,policy_version,ruleset_hash,q_prior_used,m_allowed_set, projected vs actual cost, observed latency, outcome_status, seed.What ships
/v1/messagesrouteainfera_api/routers/anthropic_compat.py(new, 332 lines)ainfera_api/main.pyanthropic_compat.routerto the include chainainfera_api/routers/inference.pyMITHRIL_MODEL/AUTO_MODEL/ROUTING_TARGETSconstants +_is_routedresolver +_log_alias_hitmodels/inference.py,routing/__init__.py,routing/auto.pyainfera-autois the legacy aliasservices/routing_brain.py"router"lines:"ainfera-auto"→"ainfera-mithril"(canonical per master prompt §0 P0; request still accepts both)/v1/inferences/{id}/decisionainfera_api/routers/inference.pytests/unit/test_mithril_alias.py(15) +tests/unit/test_decision_endpoint.py(3) +tests/integration/test_anthropic_compat.py(515 lines)scripts/cert-mithril-prod.shainfera-mithriland the silent aliasMITHRIL_GATEWAY.mdtests/smoke/test_openapi_contract.py("post", "/v1/messages")+("get", "/v1/inferences/{inference_id}/decision")toEXPECTED_OPERATIONSStreaming + tool-use posture
Both intentionally return 501/422 on
/v1/messagesfor now (mirrors the OpenAI-shim posture). AIN-174 Phase B is the separate ticket that wires these throughpost_inference()— that's the WS2 keystone follow-up. This PR unblocks the non-streaming, non-tool path which is the bulk of fleet traffic.Privacy contract for
/decision(locked in tests)chosen_model_slug/cost_actualnull butcandidates+decision_rule+m_allowed_setpresent, so the caller can see WHY no model was picked.Pre-deploy verify (founder)
"router"field is canonical "ainfera-mithril"; theainfera-autorequest string is still accepted at the router boundary).Post-deploy cert
After Railway deploys main → cert script in
scripts/cert-mithril-prod.sh. Smokes:POST /v1/messageswithmodel="ainfera-mithril"→ 200POST /v1/messageswithmodel="ainfera-auto"→ 200 (alias)POST /v1/messageswithmodel="claude-opus-4-7"→ 200 (passthrough)POST /v1/messageswithstream=true→ 501POST /v1/messageswith non-emptytools[]→ 422GET /v1/inferences/{id}/decisionauthed-own → 200GET /v1/inferences/{id}/decisionunauthed → 401/403GET /openapi.jsonadvertisesainfera-mithrilMaster super-prompt references
§WS0 (deploy gate — this IS the deploy) · §WS2 (the streaming/tool-use follow-up sits on top of this) · §WS4 G6 (decision receipt) · founder-authorized via "go deliver end to end" 2026-05-23.
Related
MASTER_LOG.mdRun-1 — the trail of why this is the keystone.368b49507d6c814bb5fbe5b5f641eabf) — the Product entity that this PR materializes on the wire.🤖 Generated with Claude Code
Note
Medium Risk
Medium risk because it introduces new public API surfaces and changes the customer-visible
routervalue in audit/response payloads fromainfera-autoto canonicalainfera-mithril, which can break integrations parsing that field.Overview
Adds an Anthropic-dialect gateway surface by introducing
POST /v1/messagesthat translates Anthropic Messages requests (including top-levelsystem) into the existing inference pipeline and returns Anthropic-shaped responses, with explicit 501/422 behavior for streaming and tool calls.Locks routing nomenclature by making
ainfera-mithrilthe canonical routed target, treatingainfera-autoas a silent alias with telemetry logging, and emitting the canonical router name in audit/log payloads.Exposes a verification primitive via
GET /v1/inferences/{inference_id}/decisionto return the stored §16routing_outcomesdecision record for an inference (tenant-scoped with existence-oracle masking).Adds unit/integration tests, OpenAPI contract updates, and a post-deploy cert script/documentation for the new endpoints and alias behavior.
Reviewed by Cursor Bugbot for commit 2bf83a9. Bugbot is set up for automated code reviews on this repo. Configure here.