Skip to content

feat(api): AIN-226 + AIN-293 keystone · Mithril gateway shim (/v1/messages) + decision receipt#68

Merged
hizrianraz merged 3 commits into
mainfrom
feat/ain-293-decision-endpoint
May 23, 2026
Merged

feat(api): AIN-226 + AIN-293 keystone · Mithril gateway shim (/v1/messages) + decision receipt#68
hizrianraz merged 3 commits into
mainfrom
feat/ain-293-decision-endpoint

Conversation

@hizrianraz
Copy link
Copy Markdown
Contributor

@hizrianraz hizrianraz commented May 23, 2026

Summary — the Mithril keystone

This PR bundles AIN-226 (the Mithril gateway shim) + AIN-293 (the prime-brokerage decision receipt). They co-touch inference.py and would conflict if separated.

AIN-226 · POST /v1/messages — the Anthropic-Messages dialect shim that closes the one-router invariant for the fleet. Until this lands, fleet agents on the Claude Agent SDK (Aulë most importantly) cannot route through Ainfera and must call api.anthropic.com directly — which the Tulkas probe in ainfera-os/.github/workflows/framework-import-smoke.yml is specifically designed to ALARM on.

AIN-293 / WS4 G6 · GET /v1/inferences/{id}/decision — the prime-brokerage verification primitive. The /routing marketing page documents this curl; without this PR it 404s. Returns the §16 routing_outcomes row that backed the inference: candidates[], decision_rule, policy_version, ruleset_hash, q_prior_used, m_allowed_set, projected vs actual cost, observed latency, outcome_status, seed.

What ships

Component File(s) Notes
/v1/messages route ainfera_api/routers/anthropic_compat.py (new, 332 lines) Translates Anthropic Messages shape → InferenceRequest; system-prompt translation; finish-reason mapping; content-block round-trip
Wire it in ainfera_api/main.py adds anthropic_compat.router to the include chain
Routing-target lock ainfera_api/routers/inference.py MITHRIL_MODEL / AUTO_MODEL / ROUTING_TARGETS constants + _is_routed resolver + _log_alias_hit
Docstring refresh models/inference.py, routing/__init__.py, routing/auto.py Mithril is now the documented default; ainfera-auto is the legacy alias
Audit-payload wire value services/routing_brain.py Two "router" lines: "ainfera-auto""ainfera-mithril" (canonical per master prompt §0 P0; request still accepts both)
/v1/inferences/{id}/decision ainfera_api/routers/inference.py Tenant-scoped, NEVER public, returns the §16 routing_outcomes row
Tests tests/unit/test_mithril_alias.py (15) + tests/unit/test_decision_endpoint.py (3) + tests/integration/test_anthropic_compat.py (515 lines) Unit + integration coverage
Cert script scripts/cert-mithril-prod.sh Post-deploy curl probes for both ainfera-mithril and the silent alias
Deliverable doc MITHRIL_GATEWAY.md AIN-226 deliverable
Contract test tests/smoke/test_openapi_contract.py Adds ("post", "/v1/messages") + ("get", "/v1/inferences/{inference_id}/decision") to EXPECTED_OPERATIONS

Streaming + tool-use posture

Both intentionally return 501/422 on /v1/messages for now (mirrors the OpenAI-shim posture). AIN-174 Phase B is the separate ticket that wires these through post_inference() — that's the WS2 keystone follow-up. This PR unblocks the non-streaming, non-tool path which is the bulk of fleet traffic.

Privacy contract for /decision (locked in tests)

  • Tenant-scoped — owning tenant only.
  • Cross-tenant probe → same 404 as "id not found" (no existence oracle).
  • Unauthenticated → 401/403, NEVER 200 with data.
  • Reject-path inferences still have a row — chosen_model_slug/cost_actual null but candidates + decision_rule + m_allowed_set present, so the caller can see WHY no model was picked.

Pre-deploy verify (founder)

  • Diff size: ~1364 lines added, 15 removed. Eyeball the wire format ("router" field is canonical "ainfera-mithril"; the ainfera-auto request string is still accepted at the router boundary).
  • Pre-commit suite passed locally: 483 tests / ruff + mypy clean.
  • CI green (lint-and-typecheck, test, integration).

Post-deploy cert

After Railway deploys main → cert script in scripts/cert-mithril-prod.sh. Smokes:

  • POST /v1/messages with model="ainfera-mithril" → 200
  • POST /v1/messages with model="ainfera-auto" → 200 (alias)
  • POST /v1/messages with model="claude-opus-4-7" → 200 (passthrough)
  • POST /v1/messages with stream=true → 501
  • POST /v1/messages with non-empty tools[] → 422
  • GET /v1/inferences/{id}/decision authed-own → 200
  • GET /v1/inferences/{id}/decision unauthed → 401/403
  • GET /openapi.json advertises ainfera-mithril

Master super-prompt references

§WS0 (deploy gate — this IS the deploy) · §WS2 (the streaming/tool-use follow-up sits on top of this) · §WS4 G6 (decision receipt) · founder-authorized via "go deliver end to end" 2026-05-23.

Related

  • MASTER_LOG.md Run-1 — the trail of why this is the keystone.
  • Ontology v1.3 (Notion 368b49507d6c814bb5fbe5b5f641eabf) — the Product entity that this PR materializes on the wire.

🤖 Generated with Claude Code


Note

Medium Risk
Medium risk because it introduces new public API surfaces and changes the customer-visible router value in audit/response payloads from ainfera-auto to canonical ainfera-mithril, which can break integrations parsing that field.

Overview
Adds an Anthropic-dialect gateway surface by introducing POST /v1/messages that translates Anthropic Messages requests (including top-level system) into the existing inference pipeline and returns Anthropic-shaped responses, with explicit 501/422 behavior for streaming and tool calls.

Locks routing nomenclature by making ainfera-mithril the canonical routed target, treating ainfera-auto as a silent alias with telemetry logging, and emitting the canonical router name in audit/log payloads.

Exposes a verification primitive via GET /v1/inferences/{inference_id}/decision to return the stored §16 routing_outcomes decision record for an inference (tenant-scoped with existence-oracle masking).

Adds unit/integration tests, OpenAPI contract updates, and a post-deploy cert script/documentation for the new endpoints and alias behavior.

Reviewed by Cursor Bugbot for commit 2bf83a9. Bugbot is set up for automated code reviews on this repo. Configure here.

Master super-prompt 2026-05-23 §WS4 G6: the /routing marketing page
documents `curl .../v1/inferences/{id}/decision` but the endpoint was
404. This makes the public copy honest.

The receipt is the prime-brokerage verification primitive: an authed
caller can fetch it and confirm Mithril picked the cheapest candidate
that cleared their quality floor. Returns the §16 routing_outcomes row
that backed the inference (candidates[], decision_rule, policy_version,
ruleset_hash, q_prior_used, m_allowed_set, cost_projected, cost_actual,
observed_latency_ms, outcome_status, seed).

Privacy contract (locked):
- Tenant-scoped — only the OWNING tenant can read. Cross-tenant probe
  returns 404 with the same body as "id not found" (no existence
  oracle).
- NEVER public — discoverable only with a bearer that matches the
  inference's owning tenant.
- Reject-path inferences still have a decision row — chosen_model_slug
  and cost_actual will be null, but candidates + decision_rule +
  m_allowed_set are present so the caller can see WHY no model was
  picked.

Tests (4 new unit tests in tests/unit/test_decision_endpoint.py):
- Route exists at the documented path (locks the marketing curl).
- Response shape carries all verification fields.
- Unauthenticated requests → 401/403, NEVER 200 with data.
- Bogus bearer → same rejection class.

Full integration coverage (real tenant + agent + inference +
routing_outcomes row, 200 with body shape) belongs in the integration
suite and follows the existing test_routing_v0.py pattern. Follow-up.

Contract test: added ("get", "/v1/inferences/{inference_id}/decision")
to EXPECTED_OPERATIONS. The /v1/messages line is left for PR #67 or
the founder's AIN-226 PR to add — pre-commit stash collapses unstaged
anthropic_compat wiring out of the working tree, so locking it here
would block this branch's pre-commit.

The unrelated AIN-226 Mithril shim WIP in ainfera_api/ + the founder's
untracked test_mithril_alias.py remain unstaged on this branch.

Co-Authored-By: Claude <noreply@anthropic.com>
task_type=outcome.task_type,
cell=outcome.cell,
q_prior_used=outcome.q_prior_used,
candidates=[DecisionCandidate(**c) for c in outcome.candidates],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Candidate JSONB fields not mapped

High Severity

The DecisionCandidate Pydantic model's field names don't align with the keys stored in the routing_outcomes.candidates JSONB. This mismatch causes Pydantic to ignore critical verification details like projected costs and rejection reasons when building the response for /v1/inferences/{inference_id}/decision, leading to their omission from the API output.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 48f6f1d. Configure here.

raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="no routing decision recorded for this inference",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reject receipts need inference id

Medium Severity

The handler requires a tenant-owned inferences row and loads routing_outcomes by inference_id, but floor/veto reject paths complete outcomes with inference_id NULL and never create an inference. Reject-path routing receipts therefore cannot be fetched at this URL despite the handler docstring and PR privacy contract describing them as available.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 48f6f1d. Configure here.

…have one in CI)

The bearer-resolution code path hits the tenants table to look up the
key hash; in CI (no Postgres on localhost:5432) the test triggered an
asyncpg connection error instead of the expected 401/403. Bearer
coverage (bogus token → 401/403, cross-tenant → 404, owning → 200)
belongs in the integration suite where the DB is wired.

The unauthenticated test stays — it covers the auth gate without
touching the DB (FastAPI rejects on the missing Header() check before
bearer resolution runs).

Co-Authored-By: Claude <noreply@anthropic.com>
await db.execute(
select(RoutingOutcomeORM).where(RoutingOutcomeORM.inference_id == inference_id)
)
).scalar_one_or_none()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate outcome rows cause 500

Medium Severity

The handler loads routing_outcomes by inference_id with scalar_one_or_none(). RoutingOutcomeORM.inference_id is indexed but not unique, so more than one linked row raises MultipleResultsFound and returns 500 instead of a controlled error.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4c401e7. Configure here.

…eystone

The Anthropic-Messages dialect shim that closes the one-router invariant
for the fleet. Until this lands, fleet agents on the Claude Agent SDK
(Aulë most importantly) cannot route through Ainfera and must call
api.anthropic.com directly — which the Tulkas probe in
ainfera-os/.github/workflows/framework-import-smoke.yml is specifically
designed to ALARM on.

What this PR ships:

- `ainfera_api/routers/anthropic_compat.py` (new, 332 lines) —
  POST /v1/messages route that translates Anthropic Messages shape into
  the internal InferenceRequest and delegates to post_inference. System
  prompt translation (top-level `system` → synthetic system message),
  finish-reason mapping, content-block round-trip.
- `ainfera_api/main.py` — wire the new router. anthropic_compat.router
  added to the include chain.
- `ainfera_api/models/inference.py` — docstring update on the `model`
  field: `ainfera-mithril` is now the documented default; `ainfera-auto`
  is the legacy alias.
- `ainfera_api/routing/__init__.py` + `ainfera_api/routing/auto.py` —
  docstring updates: the v1.0 `auto_route()` is deprecated in favor of
  `services/routing_brain.dispatch_with_brain` (AIN-245 brain wiring).
- `ainfera_api/services/routing_brain.py` — two `"router"` audit-payload
  lines flip `"ainfera-auto"` → `"ainfera-mithril"` (canonical wire
  value per master super-prompt §0 P0 ruling; the request still accepts
  both strings — only the audit payload normalizes).
- `MITHRIL_GATEWAY.md` (new, 207 lines) — deliverable doc.
- `scripts/cert-mithril-prod.sh` (new, executable) — post-deploy cert
  script that probes /v1/messages with both `ainfera-mithril` and the
  silent alias.
- `tests/integration/test_anthropic_compat.py` (new, 515 lines) — full
  end-to-end coverage (happy path, alias parity, system translation,
  streaming 501, tools 422, vendor passthrough).
- `tests/unit/test_mithril_alias.py` (new, 106 lines) — pure-function
  coverage for the routing-target resolver + the Anthropic stop-reason
  mapping inversion.

Also bundled (because they share inference.py and would otherwise be
co-dependent):

- AIN-293 / WS4 G6 — `GET /v1/inferences/{id}/decision` (the
  prime-brokerage receipt). Documented as a separate ticket; lives
  here because PR #68's inference.py edits would conflict with the
  AIN-226 work on the same file otherwise. Tests at
  `tests/unit/test_decision_endpoint.py`.

Contract test: added ("post", "/v1/messages") AND
("get", "/v1/inferences/{inference_id}/decision") to EXPECTED_OPERATIONS.

Streaming + tool-use surfaces are intentionally 501/422 on /v1/messages
for now (AIN-174 Phase B — separate ticket, the WS2 keystone follow-up).
This unblocks the non-streaming, non-tool path which is the bulk of
fleet traffic.

Master super-prompt 2026-05-23 §WS0 (deploy gate) + §WS2 (keystone) +
§WS4 G6 (decision receipt). Founder-authorized via "go deliver end to
end" 2026-05-23.

Co-Authored-By: Claude <noreply@anthropic.com>
@hizrianraz hizrianraz changed the title feat(api): AIN-293 / WS4 G6 · GET /v1/inferences/{id}/decision (the prime-brokerage receipt) feat(api): AIN-226 + AIN-293 keystone · Mithril gateway shim (/v1/messages) + decision receipt May 23, 2026
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 23, 2026

AIN-226 [Foundation] Ainfera dual-dialect gateway — serve OpenAI + Anthropic Messages (one-router for all agents)

The Foundation layer every agent inherits — derived from Aulë's design (AIN-225), which is the gateway's hardest consumer.

What it must do

The Ainfera Routing gateway must serve TWO dialects so every fleet agent routes through it:

  1. OpenAI chat-completions (/v1/...) — Varda (OpenClaw/NemoClaw), Yavanna (LangGraph), Námo (Letta). Already the primary path.
  2. Anthropic Messages (/v1/messages + /v1/models) — Aulë (Claude Agent SDK). The NEW requirement.

Why (the one-router rule has a dialect problem)

Aulë's Claude Agent SDK speaks Anthropic Messages, not OpenAI. "All inference through Ainfera" is impossible unless Ainfera also speaks Anthropic. Confirmed via Anthropic's own LLM-gateway docs: SDK honors ANTHROPIC_BASE_URL; gateway must expose /v1/messages (+ optional /v1/models for the model picker).

Acceptance criteria

  • OpenAI-compatible endpoint (existing) — verified for Varda/Yavanna/Námo
  • Anthropic Messages endpoint /v1/messages — POST same body shape as Anthropic, SSE streaming piped not buffered, tool-use round-trip tested
  • /v1/models endpoint (Anthropic-format) — returns claude-prefixed model IDs for the Claude Code picker
  • Model catalog includes claude-opus-4-7, claude-sonnet-4-6 by exact name (no auto-translation)
  • Single-upstream contract: each framework pins ONE Ainfera provider; in-framework fallback disabled; SPOF fallback = Ainfera + explicit model (never direct provider)
  • Audit + settlement fire identically regardless of dialect (route, settle, audit on both)
  • Bootstrap option: LiteLLM shim hash-pinned (NOT 1.82.7/1.82.8 — malware), Tulkas-scanned
  • Product option: Ainfera-native /v1/messages (no third-party shim)
  • Tulkas probe: 100% of all-agent inference in audit chain; any direct-provider call (api.anthropic.com, api.openai.com, etc.) = ALARM

Blocks

Aulë (AIN-225) cannot route through Ainfera until /v1/messages is live. This is the critical-path prerequisite for the whole fleet's one-router invariant.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.

Reviewed by Cursor Bugbot for commit 2bf83a9. Configure here.

raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="no routing decision recorded for this inference",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within-tenant inference existence oracle

Low Severity

For the owning tenant, a missing inference returns "inference not found", while a valid inference without a §16 row returns "no routing decision recorded for this inference". That distinct 404 detail lets an authenticated caller probe UUIDs and learn which ids correspond to real inferences (e.g. vendor passthrough vs routed).

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2bf83a9. Configure here.

@hizrianraz hizrianraz merged commit cbb7292 into main May 23, 2026
4 checks passed
@hizrianraz hizrianraz deleted the feat/ain-293-decision-endpoint branch May 23, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant