feat(api): SP-5 PR-C · /metrics + structured JSON logging with secret scrubbing (AIN-238 + AIN-249) by hizrianraz · Pull Request #78 · ainfera-ai/api

hizrianraz · 2026-05-23T17:55:28Z

Summary

Internal-scoped Prometheus `/metrics` + a structured JSON log formatter that scrubs secrets before bytes leave the process.

Stacks on SP-2 api#72. Independent of SP-5 PR-A (#76) and SP-5 PR-B (#77).

AIN-238 — /metrics

Dependency-free registry (`services/metrics.py`); we don't pull `prometheus_client` because every runtime dep is an AIN-237 attack surface
Named series: `ainfera_http_requests_total`, `ainfera_http_request_duration_seconds`, `ainfera_provider_calls_total`, `ainfera_router_alias_hit_total`, `ainfera_audit_chain_height` + `freshness_seconds`, `ainfera_dispatch_without_capture_total` (SP-4 PR-A bridge), `ainfera_cost_killswitch*` (SP-5 PR-B bridge), `ainfera_app_info{version}`
Middleware times every request; uses FastAPI route template for the path label so agent_id never leaks
`GET /metrics` gated by `X-Ainfera-Internal-Key`; hidden from openapi
Cold-path reads `max(seq)` + `max(created_at)` from audit_events — read-only, never mutates the chain

AIN-238 — structured JSON logging

`StructuredJSONFormatter` — one JSON object per record + 2-layer secret scrubbing (per-key for extra fields; regex for freeform text shapes `ai_infera_` / `sk-` / `Bearer` / JWT)
Tracebacks also scrubbed
Wired via `logging.basicConfig(force=True)` BEFORE routers import so startup lines flow through

AIN-249 — SP-4 PR-A guard scrape series

`ainfera_dispatch_without_capture_total` registered; SP-4 PR-A's `DispatchCaptureCounter` plugs in via a 1-line `.inc()` once both PRs merge.

Privacy guardrails (SP-5 §1)

NO tenant_id / agent_id / owner_handle on any metrics label
/metrics is internal-key gated, not public
Log lines scrubbed by KEY (api_key/password/prompt/messages/content/etc.) AND SHAPE (regex)

Tests

`test_structured_log.py` — 10 cases (Ainfera key, OpenAI/Anthropic, Bearer, JWT, structured extra, nested dicts, innocent passthrough, exception tracebacks)
`test_metrics_registry.py` — 13 cases (primitives, label escaping, cumulative buckets, sorted render, named-series wrappers)

Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke = 529 green.

Test plan

CI green
Branch / preview: `curl -H "X-Ainfera-Internal-Key: $KEY" /metrics` → 200 with the named series
Without the header: `curl /metrics` → 401
A few real requests then re-scrape — `ainfera_http_requests_total` increments for the seen routes; histogram populates bucket counts
Log line in Railway shows JSON shape `{ts,level,logger,msg,...}` and no secrets appear

🤖 Generated with Claude Code

Note

Medium Risk
Touches global logging configuration (logging.basicConfig(..., force=True)) and adds always-on request instrumentation, which can affect log consumers and add minor runtime overhead; /metrics also introduces a new internal-authenticated endpoint and DB reads on scrape.

Overview
Adds an internal-key–gated GET /metrics endpoint that exposes a small dependency-free Prometheus registry, including per-request counters/latency histograms and a cold-path refresh of audit-chain gauges via max(seq)/max(created_at) DB reads.

Installs a global StructuredJSONFormatter in main.py (forced root handler) to emit JSON logs and scrub secrets/PII from both structured extra fields and freeform message/traceback text, and adds RequestMetricsMiddleware to record request counts/durations using route templates with a path-cardinality cap.

Includes new unit tests covering the metrics primitives/rendering and the log-scrubbing contract.

^{Reviewed by Cursor Bugbot for commit c153ecc. Bugbot is set up for automated code reviews on this repo. Configure here.}

…e migration (AIN-271b) Inverts the AIN-244 routing-target lock per the 2026-05-23 founder decision (Disc#12): `ainfera-inference` becomes the canonical wire string; `ainfera-mithril`, `ainfera-auto`, and `ainfera/auto` are demoted to silent aliases resolved at the router boundary. Changes: - routers/inference.py: INFERENCE_MODEL canonical; ROUTING_ALIASES frozenset covers all 3 legacy strings; _log_alias_hit fires for each. Back-compat module constants MITHRIL_MODEL / AUTO_MODEL now alias the canonical so legacy imports keep working. - routers/agent_surfaces.py: agent-card.json + llms.txt rewritten with Ainfera Inference framing; zero dead strings on agent-discovery surfaces. - routers/anthropic_compat.py: docstring reframed; 501-on-stream / 422-on-tools surfaces preserved pending the streaming/tool-use lift (separate follow-up). - models/inference.py: InferenceRequest field descriptions (which feed openapi.json) lead with ainfera-inference; aliases not mentioned. - services/routing_brain.py: §16 audit "router" payload reports canonical "ainfera-inference" regardless of alias requested. - routing/{__init__,auto}.py: docstrings reframed. - inference_gateway.md (renamed from MITHRIL_GATEWAY.md): contract doc swept clean of product/wire dead strings. Tests: - tests/unit/test_inference_alias.py (new; supersedes deleted test_mithril_alias.py): canonical + 3-alias parametrized coverage. - tests/unit/test_agent_surfaces.py: asserts ainfera-inference is the default_model + dead-string regression lock on both /.well-known/ agent-card.json and /llms.txt. - tests/integration/test_anthropic_compat.py: happy paths use canonical string; silent-alias test parametrized over all 3 aliases. - tests/integration/test_routing_v0.py: canonical happy path. - tests/integration/test_routing_backends_invariants.py: post-migration invariant — 0 rows with aa_index_source ILIKE '%aamc%'. Migration: - 0027_rename_aa_index_source_aamc_to_routing_backend.py — row-rewrite of the 5 anchor models from 'aamc_v1_lock' to 'routing_backend_v1_lock'. Branch-verify only via this commit; prod-apply on project dftfpwzqxoebwzepygzl is in the founder action block. Linear gate: AIN-271 (P1-WS2 prod deploy of /v1/messages streaming + tool-use) — this commit lands the rename half. Streaming + tool-use land in a follow-up because the ProviderAdapter interface does not yet carry tools/stream signatures across the 5 adapters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

linear-code · 2026-05-23T17:55:31Z

AIN-238 [Foundation] Cross-framework observability — one trace ID across all hops (atop L4 audit)

Pressure-test finding #12 + Ainfera OS design principle #6 (LOCKED 2026-05-22). Seven frameworks + the gateway + Letta + Modal + Spark-local models means a single request can cross 4-5 heterogeneous runtimes (e.g. Yavanna→Aulë→gateway→Modal→audit). When something breaks, tracing it across framework boundaries is an "observability black box" — the dominant 2026 multi-agent production failure. The L4 audit chain captures inference calls; it does NOT give end-to-end cross-agent traces.

What it must do

One trace ID per logical request, propagated across every hop — agent → gateway → backend → audit — regardless of framework. OpenTelemetry-style context propagation, injected at the gateway and carried through handoffs (ties to AIN-233 handoff contracts).
Per-hop spans — each agent, each framework call, each backend dispatch emits a span with timing + status, parented to the trace.
Correlate with L4 audit — the audit chain's inference records link to the trace ID so a financial/audit event maps to the full request path.
Failure localization — when a request fails or stalls, the trace shows exactly which hop (which agent, which framework, which backend) — no more guessing across runtimes.
Feeds AIN-234 — per-agent cost/latency metering rides the same trace context.

Acceptance criteria

Trace ID generated at gateway ingress, propagated through all agent + framework + backend hops
Each hop emits a parented span (timing + status); visible in one trace view
Trace ID correlated into L4 audit chain records
Induced failure at any hop is localizable from the trace alone
Works across all 7 frameworks (heterogeneous runtimes) + Modal + Spark-local

Why M1

Without cross-framework tracing, debugging a 7-framework fleet on Spark is guesswork. Pairs with AIN-234 (metering rides the same context) + AIN-233 (trace propagates via handoff contracts) + AIN-226 (gateway is the injection point).

AIN-249 [Routing] CONDITIONAL M_allowed wiring — brand_verdicts schema + RoutingRequest.context + explicit PASS rows

Code + schema follow-on to AIN-248. Wires the engine to consume CONDITIONAL M_allowed verdicts end-to-end. PASS/VETO/None already work today (Candidate.m_allowed 3-state); this ticket adds the context-aware path.

⛔ Item #1 — SCHEMA LOCK (Discipline #12, founder ruling required before any code)

Same shape of one-shot immutable decision as F4. Recommended: separate brand_verdicts table (not brands.verdict cols).
Rationale: (1) a verdict is an audit record (reviewed_by/approved_by/approved_at), not a brand attribute; (2) CONDITIONAL is context-dependent → a brand may need multiple/superseded verdict rows (per-jurisdiction, over time) — columns can't hold history cleanly; (3) keeps catalog clean, matches append-don't-mutate ethos. Brain reads a current-verdict view; table keeps the trail.
→ Do not start items 1–3 until founder rules table-vs-columns.

Scope (≈150 LOC once schema lock taken)

Verdict schema (Alembic). brand_verdicts: {brand_id, verdict: pass|conditional|veto, condition: jsonb, hosting_endpoint: text, dimensions: jsonb, counsel_required: bool, reviewed_by, approved_by, approved_at}. RLS posture matching the catalog family. Engine reads only resolved verdict + condition; dimensions/counsel_required/reviewed_by/approved_by are audit metadata the brain does not read.
Request-context model. RoutingRequest (ainfera_routing.types) gains context: dict (e.g. tenant_jurisdiction, hosted_region). Enrolment gate evaluates the brand's condition against context → resolves m_allowed. Additive — callers that pass no context default to the strictest interpretation (CONDITIONAL without satisfying context → deny).
Explicit PASS rows for the 5 active brands. Replace today's implicit "pass via brand.active=true" (api/ainfera_api/services/routing_brain.py:174) with first-class verdict rows so the audit trail shows why allowed, not just that it is. approved_by: founder from launch.

NOT in scope

Ulmo intake + founder-signed verdicts themselves = AIN-248 (human/counsel-facing, zero code).
D7 SBOM gate (AIN-237), EU Annex IV bundle = operational gates before a verdict lands, not engine inputs.

Done bar

Schema lock ruled + migration applied (Alembic, dev-branch tested)
context field live; CONDITIONAL resolves correctly for a test brand (satisfied context → candidate; unsatisfied → M_ALLOWED_VETO)
5 active brands have explicit PASS verdict rows; routing unchanged for them (regression-clean vs current)
routing_outcomes.m_allowed_set captures the resolved boolean + condition outcome
No-context request still routes (strictest-default proven)

Blocked by: AIN-248 (needs first verdicts to test against). Related: AIN-245 (engine), AIN-237 (D7).

Review in Linear

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.}

^{Reviewed by Cursor Bugbot for commit 284d6eb. Configure here.}

cursor · 2026-05-23T17:56:33Z

+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="invalid X-Ainfera-Internal-Key",
+        )


Metrics opens DB before auth

Medium Severity

GET /metrics declares DBSession as a handler dependency while the internal-key check runs only inside the function body. FastAPI resolves dependencies before the handler runs, so missing or invalid X-Ainfera-Internal-Key requests still acquire a database session even though no query should run.

^{Reviewed by Cursor Bugbot for commit 284d6eb. Configure here.}

…rename Swap the legacy literal 'aamc_v1_lock' → 'routing_backend_v1_lock' in scripts/seed_dev.py (5 anchor rows + idempotency-comment update). The SP-1 rename migration 20260523_0027 row-rewrites `aa_index_source = 'aamc_v1_lock' → 'routing_backend_v1_lock'`. On a clean CI database the migrations run BEFORE seeding, so the rename fires on an empty table; the seed script then inserts the 5 §C anchors directly with the new literal. Fix (a) over fix (b) per founder's two-guard authorization: re-running an already-applied migration after seed is structurally awkward and violates Alembic's once-per-revision contract. The rename migration remains independently asserted by `test_zero_rows_carry_legacy_aamc_source_tag` (integration). Grep probe confirmed the literal is NOT shared with another test-path expectation — only test_t9_catalog_migration.py:142 references it, and that unit test reads the static catalog-migration tuple (frozen historical data), not live DB state, so it's unaffected. Unblocks: tests/integration/test_routing_backends_invariants.py ::test_canonical_5_voters_use_v1_lock_source ::test_zero_rows_carry_legacy_aamc_source_tag Fixture/packaging only. No engine touch, no routing_outcomes touch, no methodology change. Disc#12 unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now honors `stream:true` (200 + text/event-stream with ordered Anthropic SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one `routing_outcomes` row plus the matching audit events plus the ledger debit. Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER that PR. ## Adapter contract lift - `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults None — back-compat preserved across all 5 adapters). - New `ProviderAdapter.stream_chat()` async generator yields normalized `StreamEvent`s. Default impl wraps `chat()` into one content_delta + one message_delta so adapters that don't yet override honor the contract surface. - New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`, `tool_use_delta`, `message_delta`. - New `ToolsNotSupportedError` — adapters that don't yet wire tool calling raise this at the adapter boundary; the handler maps it to a 422 with backend slug + remediation. - `AdapterResponse.content_blocks` added so tool_use round-trips through the non-streaming path too. ## Per-adapter native streaming - AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages` with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass through natively. - OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real native SSE against `/v1/chat/completions` with `stream:true` + `stream_options.include_usage`; translates `delta.tool_calls[]` → normalized tool_use events. - OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises ToolsNotSupportedError → 422 with backend slug. - GeminiAdapter / MistralAdapter: signature extended; inherit OpenAICompatAdapter native streaming. ## Streaming dispatch + /v1/messages - `services/streaming.py` runs the dispatcher to completion (full §16 capture + ledger + audit), then synthesizes Anthropic SSE frames from the resulting DispatchResult. v0 posture: `wrapped` (TTFT = full inference time); response header `x-ainfera-stream-mode` reports the mode so SDK clients can observe it. Adapter-level native streaming primitives in this same PR are ready for the follow-up that refactors `dispatch_inference` to consume them end-to-end (flipping the header to `native`). - `routers/anthropic_compat.py`: - Drops 501-on-stream → returns StreamingResponse with text/event-stream content-type. - Drops blanket 422-on-tools → tools pass through. Legacy code `tool_calling_not_supported_on_shim` retired; backends without tools surface `tools_not_supported_by_backend` with hint. - `MessagesResponse.content[]` polymorphic (text OR tool_use); SDK sees one shape across stream + non-stream. - Alias resolver honored on streamed calls (`_log_alias_hit` fires for the three SP-1 legacy strings). - Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`) set on streaming responses identical to non-streaming. ## Tests - tests/unit/test_streaming_wire_format.py — 6 pure tests against default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason mapping + `supports_native_streaming()` flag. - tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with SP-2 coverage: · stream:true → 200 + text/event-stream + ordered Anthropic frames · streaming writes §16 row on close · streaming honors silent-alias resolver (parametrized × 3) · non-empty tools passes through Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke all green (505 unit+smoke tests). ## SP-2 v0 honesty caveat Contract surface (200 text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity) is real and verified. TTFT is NOT sub-1s in v0 because the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives are in place; the follow-up refactors dispatch_inference to consume them end-to-end. `x-ainfera-stream-mode: wrapped` today → `native` after the follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…h secret scrubbing (AIN-238 + AIN-249) Adds the internal-scoped observability surface + a structured JSON log formatter that scrubs secrets before bytes leave the process. Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent of SP-5 PR-A (#76 supply-chain) and SP-5 PR-B (#77 resilience). ## AIN-238 — Prometheus /metrics surface Dependency-free registry in `services/metrics.py`. Named series (process-global; NO tenant_id / agent_id / owner_handle): - `ainfera_http_requests_total{method,path,status}` — counter - `ainfera_http_request_duration_seconds{method,path}` — histogram - `ainfera_provider_calls_total{provider,outcome}` — counter - `ainfera_router_alias_hit_total{alias}` — counter - `ainfera_audit_chain_height` + `_freshness_seconds` — gauges - `ainfera_dispatch_without_capture_total` — bridge for SP-4 PR-A - `ainfera_cost_killswitch_{engaged,spent_usd,threshold_usd}` — bridge for SP-5 PR-B - `ainfera_app_info{version}` — constant info gauge `middleware/request_metrics.py` — ASGI middleware that times every request and uses the FastAPI route TEMPLATE for the path label so agent_id etc. never leak. Defensive label-cardinality cap (200 unique paths) blocks probe-spam from blowing up the histogram set. `routers/metrics.py` — `GET /metrics` gated by `X-Ainfera-Internal-Key` (same key the signup proxy uses). Cold-path enrichment reads `max(seq)` + `max(created_at)` from audit_events (read-only — never mutates the immutable chain). Hidden from openapi so it's not advertised to public clients. ## AIN-249 carry-forward — SP-4 PR-A guard scrape series `ainfera_dispatch_without_capture_total` is registered here; SP-4 PR-A's `DispatchCaptureCounter` plugs in via a single `.inc()` call once both PRs merge. ## AIN-238 — structured JSON logging with secret scrubbing `services/structured_log.py` — `StructuredJSONFormatter` emits one JSON object per record + scrubs secrets in two layers: 1. Per-KEY scrubbing for structured `extra` fields (`api_key`, `password`, `secret`, `token`, `authorization`, `cookie`, `prompt`, `messages`, `content`). 2. Regex pass for known secret SHAPES in freeform message text (`ai_infera_*`, `sk-*`, `Bearer *`, JWT `eyJ*.*.*`). Tracebacks also flow through the scrubber. Wired in `main.py` via `logging.basicConfig(handlers=[...], force=True)` BEFORE the routers import so startup log lines are also scrubbed. ## Tests - `tests/unit/test_structured_log.py` — 10 cases (each secret format + structured extra + nested dicts + innocent passthrough + tracebacks). - `tests/unit/test_metrics_registry.py` — 13 cases (primitives, label escaping, cumulative buckets, sorted render, named-series wrappers). Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke = 529 green. ## Privacy guardrails (SP-5 §1) - NO tenant_id, agent_id, owner_handle, or any PII appears as a metrics label. - `/metrics` is internal-key gated; tenant cardinality (if ever needed) lands on a stricter-auth endpoint. - Log lines are scrubbed by both KEY and SHAPE. The `test_extra_field_with_prompt_label_redacted` test locks "prompt content is PII; never log it" into CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor Bot reviewed May 23, 2026

View reviewed changes

hizrianraz and others added 3 commits May 24, 2026 05:59

hizrianraz force-pushed the feat/ain271-streaming-tooluse branch from 5a57625 to 7281e42 Compare May 23, 2026 23:00

hizrianraz force-pushed the feat/observability branch from 284d6eb to c153ecc Compare May 23, 2026 23:00

hizrianraz mentioned this pull request May 24, 2026

fix(api): SP-Ω recovery · land #72+#73+#74 (stack-merge orphan rescue) #80

Merged

hizrianraz force-pushed the feat/ain271-streaming-tooluse branch from 1fd4e23 to 0d33b78 Compare May 24, 2026 05:12

hizrianraz deleted the branch feat/ain271-streaming-tooluse May 24, 2026 05:14

hizrianraz closed this May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): SP-5 PR-C · /metrics + structured JSON logging with secret scrubbing (AIN-238 + AIN-249)#78

feat(api): SP-5 PR-C · /metrics + structured JSON logging with secret scrubbing (AIN-238 + AIN-249)#78
hizrianraz wants to merge 4 commits into
feat/ain271-streaming-toolusefrom
feat/observability

hizrianraz commented May 23, 2026 •

edited by cursor Bot

Loading

Uh oh!

linear-code Bot commented May 23, 2026 •

edited

Loading

What it must do

Acceptance criteria

Why M1

⛔ Item #1 — SCHEMA LOCK (Discipline #12, founder ruling required before any code)

Scope (≈150 LOC once schema lock taken)

NOT in scope

Done bar

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hizrianraz commented May 23, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

AIN-238 — /metrics

AIN-238 — structured JSON logging

AIN-249 — SP-4 PR-A guard scrape series

Privacy guardrails (SP-5 §1)

Tests

Test plan

Uh oh!

linear-code Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What it must do

Acceptance criteria

Why M1

⛔ Item #1 — SCHEMA LOCK (Discipline #12, founder ruling required before any code)

Scope (≈150 LOC once schema lock taken)

NOT in scope

Done bar

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 23, 2026

Choose a reason for hiding this comment

Metrics opens DB before auth

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hizrianraz commented May 23, 2026 •

edited by cursor Bot

Loading

linear-code Bot commented May 23, 2026 •

edited

Loading