feat(api): SP-5 PR-C · /metrics + structured JSON logging with secret scrubbing (AIN-238 + AIN-249)#78
Conversation
…e migration (AIN-271b)
Inverts the AIN-244 routing-target lock per the 2026-05-23 founder
decision (Disc#12): `ainfera-inference` becomes the canonical wire
string; `ainfera-mithril`, `ainfera-auto`, and `ainfera/auto` are
demoted to silent aliases resolved at the router boundary.
Changes:
- routers/inference.py: INFERENCE_MODEL canonical; ROUTING_ALIASES
frozenset covers all 3 legacy strings; _log_alias_hit fires for each.
Back-compat module constants MITHRIL_MODEL / AUTO_MODEL now alias the
canonical so legacy imports keep working.
- routers/agent_surfaces.py: agent-card.json + llms.txt rewritten with
Ainfera Inference framing; zero dead strings on agent-discovery
surfaces.
- routers/anthropic_compat.py: docstring reframed; 501-on-stream /
422-on-tools surfaces preserved pending the streaming/tool-use lift
(separate follow-up).
- models/inference.py: InferenceRequest field descriptions (which feed
openapi.json) lead with ainfera-inference; aliases not mentioned.
- services/routing_brain.py: §16 audit "router" payload reports
canonical "ainfera-inference" regardless of alias requested.
- routing/{__init__,auto}.py: docstrings reframed.
- inference_gateway.md (renamed from MITHRIL_GATEWAY.md): contract doc
swept clean of product/wire dead strings.
Tests:
- tests/unit/test_inference_alias.py (new; supersedes deleted
test_mithril_alias.py): canonical + 3-alias parametrized coverage.
- tests/unit/test_agent_surfaces.py: asserts ainfera-inference is the
default_model + dead-string regression lock on both /.well-known/
agent-card.json and /llms.txt.
- tests/integration/test_anthropic_compat.py: happy paths use canonical
string; silent-alias test parametrized over all 3 aliases.
- tests/integration/test_routing_v0.py: canonical happy path.
- tests/integration/test_routing_backends_invariants.py: post-migration
invariant — 0 rows with aa_index_source ILIKE '%aamc%'.
Migration:
- 0027_rename_aa_index_source_aamc_to_routing_backend.py — row-rewrite
of the 5 anchor models from 'aamc_v1_lock' to 'routing_backend_v1_lock'.
Branch-verify only via this commit; prod-apply on project
dftfpwzqxoebwzepygzl is in the founder action block.
Linear gate: AIN-271 (P1-WS2 prod deploy of /v1/messages streaming +
tool-use) — this commit lands the rename half. Streaming + tool-use
land in a follow-up because the ProviderAdapter interface does not yet
carry tools/stream signatures across the 5 adapters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AIN-238 [Foundation] Cross-framework observability — one trace ID across all hops (atop L4 audit)
Pressure-test finding #12 + Ainfera OS design principle #6 (LOCKED 2026-05-22). Seven frameworks + the gateway + Letta + Modal + Spark-local models means a single request can cross 4-5 heterogeneous runtimes (e.g. Yavanna→Aulë→gateway→Modal→audit). When something breaks, tracing it across framework boundaries is an "observability black box" — the dominant 2026 multi-agent production failure. The L4 audit chain captures inference calls; it does NOT give end-to-end cross-agent traces. What it must do
Acceptance criteria
Why M1Without cross-framework tracing, debugging a 7-framework fleet on Spark is guesswork. Pairs with AIN-234 (metering rides the same context) + AIN-233 (trace propagates via handoff contracts) + AIN-226 (gateway is the injection point). AIN-249 [Routing] CONDITIONAL M_allowed wiring — brand_verdicts schema + RoutingRequest.context + explicit PASS rows
Code + schema follow-on to AIN-248. Wires the engine to consume CONDITIONAL ⛔ Item #1 — SCHEMA LOCK (Discipline #12, founder ruling required before any code)Same shape of one-shot immutable decision as F4. Recommended: separate Scope (≈150 LOC once schema lock taken)
NOT in scope
Done bar
Blocked by: AIN-248 (needs first verdicts to test against). Related: AIN-245 (engine), AIN-237 (D7). |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.
Reviewed by Cursor Bugbot for commit 284d6eb. Configure here.
| raise HTTPException( | ||
| status_code=status.HTTP_401_UNAUTHORIZED, | ||
| detail="invalid X-Ainfera-Internal-Key", | ||
| ) |
There was a problem hiding this comment.
Metrics opens DB before auth
Medium Severity
GET /metrics declares DBSession as a handler dependency while the internal-key check runs only inside the function body. FastAPI resolves dependencies before the handler runs, so missing or invalid X-Ainfera-Internal-Key requests still acquire a database session even though no query should run.
Reviewed by Cursor Bugbot for commit 284d6eb. Configure here.
…rename
Swap the legacy literal 'aamc_v1_lock' → 'routing_backend_v1_lock' in
scripts/seed_dev.py (5 anchor rows + idempotency-comment update).
The SP-1 rename migration 20260523_0027 row-rewrites
`aa_index_source = 'aamc_v1_lock' → 'routing_backend_v1_lock'`. On a
clean CI database the migrations run BEFORE seeding, so the rename
fires on an empty table; the seed script then inserts the 5 §C
anchors directly with the new literal.
Fix (a) over fix (b) per founder's two-guard authorization: re-running
an already-applied migration after seed is structurally awkward and
violates Alembic's once-per-revision contract. The rename migration
remains independently asserted by
`test_zero_rows_carry_legacy_aamc_source_tag` (integration).
Grep probe confirmed the literal is NOT shared with another test-path
expectation — only test_t9_catalog_migration.py:142 references it, and
that unit test reads the static catalog-migration tuple (frozen
historical data), not live DB state, so it's unaffected.
Unblocks:
tests/integration/test_routing_backends_invariants.py
::test_canonical_5_voters_use_v1_lock_source
::test_zero_rows_carry_legacy_aamc_source_tag
Fixture/packaging only. No engine touch, no routing_outcomes touch, no
methodology change. Disc#12 unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now honors `stream:true` (200 + text/event-stream with ordered Anthropic SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one `routing_outcomes` row plus the matching audit events plus the ledger debit. Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER that PR. ## Adapter contract lift - `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults None — back-compat preserved across all 5 adapters). - New `ProviderAdapter.stream_chat()` async generator yields normalized `StreamEvent`s. Default impl wraps `chat()` into one content_delta + one message_delta so adapters that don't yet override honor the contract surface. - New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`, `tool_use_delta`, `message_delta`. - New `ToolsNotSupportedError` — adapters that don't yet wire tool calling raise this at the adapter boundary; the handler maps it to a 422 with backend slug + remediation. - `AdapterResponse.content_blocks` added so tool_use round-trips through the non-streaming path too. ## Per-adapter native streaming - AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages` with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass through natively. - OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real native SSE against `/v1/chat/completions` with `stream:true` + `stream_options.include_usage`; translates `delta.tool_calls[]` → normalized tool_use events. - OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises ToolsNotSupportedError → 422 with backend slug. - GeminiAdapter / MistralAdapter: signature extended; inherit OpenAICompatAdapter native streaming. ## Streaming dispatch + /v1/messages - `services/streaming.py` runs the dispatcher to completion (full §16 capture + ledger + audit), then synthesizes Anthropic SSE frames from the resulting DispatchResult. v0 posture: `wrapped` (TTFT = full inference time); response header `x-ainfera-stream-mode` reports the mode so SDK clients can observe it. Adapter-level native streaming primitives in this same PR are ready for the follow-up that refactors `dispatch_inference` to consume them end-to-end (flipping the header to `native`). - `routers/anthropic_compat.py`: - Drops 501-on-stream → returns StreamingResponse with text/event-stream content-type. - Drops blanket 422-on-tools → tools pass through. Legacy code `tool_calling_not_supported_on_shim` retired; backends without tools surface `tools_not_supported_by_backend` with hint. - `MessagesResponse.content[]` polymorphic (text OR tool_use); SDK sees one shape across stream + non-stream. - Alias resolver honored on streamed calls (`_log_alias_hit` fires for the three SP-1 legacy strings). - Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`) set on streaming responses identical to non-streaming. ## Tests - tests/unit/test_streaming_wire_format.py — 6 pure tests against default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason mapping + `supports_native_streaming()` flag. - tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with SP-2 coverage: · stream:true → 200 + text/event-stream + ordered Anthropic frames · streaming writes §16 row on close · streaming honors silent-alias resolver (parametrized × 3) · non-empty tools passes through Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke all green (505 unit+smoke tests). ## SP-2 v0 honesty caveat Contract surface (200 text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity) is real and verified. TTFT is NOT sub-1s in v0 because the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives are in place; the follow-up refactors dispatch_inference to consume them end-to-end. `x-ainfera-stream-mode: wrapped` today → `native` after the follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h secret scrubbing (AIN-238 + AIN-249) Adds the internal-scoped observability surface + a structured JSON log formatter that scrubs secrets before bytes leave the process. Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent of SP-5 PR-A (#76 supply-chain) and SP-5 PR-B (#77 resilience). ## AIN-238 — Prometheus /metrics surface Dependency-free registry in `services/metrics.py`. Named series (process-global; NO tenant_id / agent_id / owner_handle): - `ainfera_http_requests_total{method,path,status}` — counter - `ainfera_http_request_duration_seconds{method,path}` — histogram - `ainfera_provider_calls_total{provider,outcome}` — counter - `ainfera_router_alias_hit_total{alias}` — counter - `ainfera_audit_chain_height` + `_freshness_seconds` — gauges - `ainfera_dispatch_without_capture_total` — bridge for SP-4 PR-A - `ainfera_cost_killswitch_{engaged,spent_usd,threshold_usd}` — bridge for SP-5 PR-B - `ainfera_app_info{version}` — constant info gauge `middleware/request_metrics.py` — ASGI middleware that times every request and uses the FastAPI route TEMPLATE for the path label so agent_id etc. never leak. Defensive label-cardinality cap (200 unique paths) blocks probe-spam from blowing up the histogram set. `routers/metrics.py` — `GET /metrics` gated by `X-Ainfera-Internal-Key` (same key the signup proxy uses). Cold-path enrichment reads `max(seq)` + `max(created_at)` from audit_events (read-only — never mutates the immutable chain). Hidden from openapi so it's not advertised to public clients. ## AIN-249 carry-forward — SP-4 PR-A guard scrape series `ainfera_dispatch_without_capture_total` is registered here; SP-4 PR-A's `DispatchCaptureCounter` plugs in via a single `.inc()` call once both PRs merge. ## AIN-238 — structured JSON logging with secret scrubbing `services/structured_log.py` — `StructuredJSONFormatter` emits one JSON object per record + scrubs secrets in two layers: 1. Per-KEY scrubbing for structured `extra` fields (`api_key`, `password`, `secret`, `token`, `authorization`, `cookie`, `prompt`, `messages`, `content`). 2. Regex pass for known secret SHAPES in freeform message text (`ai_infera_*`, `sk-*`, `Bearer *`, JWT `eyJ*.*.*`). Tracebacks also flow through the scrubber. Wired in `main.py` via `logging.basicConfig(handlers=[...], force=True)` BEFORE the routers import so startup log lines are also scrubbed. ## Tests - `tests/unit/test_structured_log.py` — 10 cases (each secret format + structured extra + nested dicts + innocent passthrough + tracebacks). - `tests/unit/test_metrics_registry.py` — 13 cases (primitives, label escaping, cumulative buckets, sorted render, named-series wrappers). Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke = 529 green. ## Privacy guardrails (SP-5 §1) - NO tenant_id, agent_id, owner_handle, or any PII appears as a metrics label. - `/metrics` is internal-key gated; tenant cardinality (if ever needed) lands on a stricter-auth endpoint. - Log lines are scrubbed by both KEY and SHAPE. The `test_extra_field_with_prompt_label_redacted` test locks "prompt content is PII; never log it" into CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5a57625 to
7281e42
Compare
284d6eb to
c153ecc
Compare
1fd4e23 to
0d33b78
Compare


Summary
Internal-scoped Prometheus `/metrics` + a structured JSON log formatter that scrubs secrets before bytes leave the process.
Stacks on SP-2 api#72. Independent of SP-5 PR-A (#76) and SP-5 PR-B (#77).
AIN-238 — /metrics
AIN-238 — structured JSON logging
AIN-249 — SP-4 PR-A guard scrape series
`ainfera_dispatch_without_capture_total` registered; SP-4 PR-A's `DispatchCaptureCounter` plugs in via a 1-line `.inc()` once both PRs merge.
Privacy guardrails (SP-5 §1)
Tests
Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke = 529 green.
Test plan
🤖 Generated with Claude Code
Note
Medium Risk
Touches global logging configuration (
logging.basicConfig(..., force=True)) and adds always-on request instrumentation, which can affect log consumers and add minor runtime overhead;/metricsalso introduces a new internal-authenticated endpoint and DB reads on scrape.Overview
Adds an internal-key–gated
GET /metricsendpoint that exposes a small dependency-free Prometheus registry, including per-request counters/latency histograms and a cold-path refresh of audit-chain gauges viamax(seq)/max(created_at)DB reads.Installs a global
StructuredJSONFormatterinmain.py(forced root handler) to emit JSON logs and scrub secrets/PII from both structuredextrafields and freeform message/traceback text, and addsRequestMetricsMiddlewareto record request counts/durations using route templates with a path-cardinality cap.Includes new unit tests covering the metrics primitives/rendering and the log-scrubbing contract.
Reviewed by Cursor Bugbot for commit c153ecc. Bugbot is set up for automated code reviews on this repo. Configure here.