Skip to content

feat(api): SP-5 PR-C · /metrics + structured JSON logging with secret scrubbing (AIN-238 + AIN-249)#78

Closed
hizrianraz wants to merge 4 commits into
feat/ain271-streaming-toolusefrom
feat/observability
Closed

feat(api): SP-5 PR-C · /metrics + structured JSON logging with secret scrubbing (AIN-238 + AIN-249)#78
hizrianraz wants to merge 4 commits into
feat/ain271-streaming-toolusefrom
feat/observability

Conversation

@hizrianraz
Copy link
Copy Markdown
Contributor

@hizrianraz hizrianraz commented May 23, 2026

Summary

Internal-scoped Prometheus `/metrics` + a structured JSON log formatter that scrubs secrets before bytes leave the process.

Stacks on SP-2 api#72. Independent of SP-5 PR-A (#76) and SP-5 PR-B (#77).

AIN-238 — /metrics

  • Dependency-free registry (`services/metrics.py`); we don't pull `prometheus_client` because every runtime dep is an AIN-237 attack surface
  • Named series: `ainfera_http_requests_total`, `ainfera_http_request_duration_seconds`, `ainfera_provider_calls_total`, `ainfera_router_alias_hit_total`, `ainfera_audit_chain_height` + `freshness_seconds`, `ainfera_dispatch_without_capture_total` (SP-4 PR-A bridge), `ainfera_cost_killswitch*` (SP-5 PR-B bridge), `ainfera_app_info{version}`
  • Middleware times every request; uses FastAPI route template for the path label so agent_id never leaks
  • `GET /metrics` gated by `X-Ainfera-Internal-Key`; hidden from openapi
  • Cold-path reads `max(seq)` + `max(created_at)` from audit_events — read-only, never mutates the chain

AIN-238 — structured JSON logging

  • `StructuredJSONFormatter` — one JSON object per record + 2-layer secret scrubbing (per-key for extra fields; regex for freeform text shapes `ai_infera_` / `sk-` / `Bearer` / JWT)
  • Tracebacks also scrubbed
  • Wired via `logging.basicConfig(force=True)` BEFORE routers import so startup lines flow through

AIN-249 — SP-4 PR-A guard scrape series

`ainfera_dispatch_without_capture_total` registered; SP-4 PR-A's `DispatchCaptureCounter` plugs in via a 1-line `.inc()` once both PRs merge.

Privacy guardrails (SP-5 §1)

  • NO tenant_id / agent_id / owner_handle on any metrics label
  • /metrics is internal-key gated, not public
  • Log lines scrubbed by KEY (api_key/password/prompt/messages/content/etc.) AND SHAPE (regex)

Tests

  • `test_structured_log.py` — 10 cases (Ainfera key, OpenAI/Anthropic, Bearer, JWT, structured extra, nested dicts, innocent passthrough, exception tracebacks)
  • `test_metrics_registry.py` — 13 cases (primitives, label escaping, cumulative buckets, sorted render, named-series wrappers)

Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke = 529 green.

Test plan

  • CI green
  • Branch / preview: `curl -H "X-Ainfera-Internal-Key: $KEY" /metrics` → 200 with the named series
  • Without the header: `curl /metrics` → 401
  • A few real requests then re-scrape — `ainfera_http_requests_total` increments for the seen routes; histogram populates bucket counts
  • Log line in Railway shows JSON shape `{ts,level,logger,msg,...}` and no secrets appear

🤖 Generated with Claude Code


Note

Medium Risk
Touches global logging configuration (logging.basicConfig(..., force=True)) and adds always-on request instrumentation, which can affect log consumers and add minor runtime overhead; /metrics also introduces a new internal-authenticated endpoint and DB reads on scrape.

Overview
Adds an internal-key–gated GET /metrics endpoint that exposes a small dependency-free Prometheus registry, including per-request counters/latency histograms and a cold-path refresh of audit-chain gauges via max(seq)/max(created_at) DB reads.

Installs a global StructuredJSONFormatter in main.py (forced root handler) to emit JSON logs and scrub secrets/PII from both structured extra fields and freeform message/traceback text, and adds RequestMetricsMiddleware to record request counts/durations using route templates with a path-cardinality cap.

Includes new unit tests covering the metrics primitives/rendering and the log-scrubbing contract.

Reviewed by Cursor Bugbot for commit c153ecc. Bugbot is set up for automated code reviews on this repo. Configure here.

…e migration (AIN-271b)

Inverts the AIN-244 routing-target lock per the 2026-05-23 founder
decision (Disc#12): `ainfera-inference` becomes the canonical wire
string; `ainfera-mithril`, `ainfera-auto`, and `ainfera/auto` are
demoted to silent aliases resolved at the router boundary.

Changes:
- routers/inference.py: INFERENCE_MODEL canonical; ROUTING_ALIASES
  frozenset covers all 3 legacy strings; _log_alias_hit fires for each.
  Back-compat module constants MITHRIL_MODEL / AUTO_MODEL now alias the
  canonical so legacy imports keep working.
- routers/agent_surfaces.py: agent-card.json + llms.txt rewritten with
  Ainfera Inference framing; zero dead strings on agent-discovery
  surfaces.
- routers/anthropic_compat.py: docstring reframed; 501-on-stream /
  422-on-tools surfaces preserved pending the streaming/tool-use lift
  (separate follow-up).
- models/inference.py: InferenceRequest field descriptions (which feed
  openapi.json) lead with ainfera-inference; aliases not mentioned.
- services/routing_brain.py: §16 audit "router" payload reports
  canonical "ainfera-inference" regardless of alias requested.
- routing/{__init__,auto}.py: docstrings reframed.
- inference_gateway.md (renamed from MITHRIL_GATEWAY.md): contract doc
  swept clean of product/wire dead strings.

Tests:
- tests/unit/test_inference_alias.py (new; supersedes deleted
  test_mithril_alias.py): canonical + 3-alias parametrized coverage.
- tests/unit/test_agent_surfaces.py: asserts ainfera-inference is the
  default_model + dead-string regression lock on both /.well-known/
  agent-card.json and /llms.txt.
- tests/integration/test_anthropic_compat.py: happy paths use canonical
  string; silent-alias test parametrized over all 3 aliases.
- tests/integration/test_routing_v0.py: canonical happy path.
- tests/integration/test_routing_backends_invariants.py: post-migration
  invariant — 0 rows with aa_index_source ILIKE '%aamc%'.

Migration:
- 0027_rename_aa_index_source_aamc_to_routing_backend.py — row-rewrite
  of the 5 anchor models from 'aamc_v1_lock' to 'routing_backend_v1_lock'.
  Branch-verify only via this commit; prod-apply on project
  dftfpwzqxoebwzepygzl is in the founder action block.

Linear gate: AIN-271 (P1-WS2 prod deploy of /v1/messages streaming +
tool-use) — this commit lands the rename half. Streaming + tool-use
land in a follow-up because the ProviderAdapter interface does not yet
carry tools/stream signatures across the 5 adapters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 23, 2026

AIN-238 [Foundation] Cross-framework observability — one trace ID across all hops (atop L4 audit)

Pressure-test finding #12 + Ainfera OS design principle #6 (LOCKED 2026-05-22). Seven frameworks + the gateway + Letta + Modal + Spark-local models means a single request can cross 4-5 heterogeneous runtimes (e.g. Yavanna→Aulë→gateway→Modal→audit). When something breaks, tracing it across framework boundaries is an "observability black box" — the dominant 2026 multi-agent production failure. The L4 audit chain captures inference calls; it does NOT give end-to-end cross-agent traces.

What it must do

  • One trace ID per logical request, propagated across every hop — agent → gateway → backend → audit — regardless of framework. OpenTelemetry-style context propagation, injected at the gateway and carried through handoffs (ties to AIN-233 handoff contracts).
  • Per-hop spans — each agent, each framework call, each backend dispatch emits a span with timing + status, parented to the trace.
  • Correlate with L4 audit — the audit chain's inference records link to the trace ID so a financial/audit event maps to the full request path.
  • Failure localization — when a request fails or stalls, the trace shows exactly which hop (which agent, which framework, which backend) — no more guessing across runtimes.
  • Feeds AIN-234 — per-agent cost/latency metering rides the same trace context.

Acceptance criteria

  • Trace ID generated at gateway ingress, propagated through all agent + framework + backend hops
  • Each hop emits a parented span (timing + status); visible in one trace view
  • Trace ID correlated into L4 audit chain records
  • Induced failure at any hop is localizable from the trace alone
  • Works across all 7 frameworks (heterogeneous runtimes) + Modal + Spark-local

Why M1

Without cross-framework tracing, debugging a 7-framework fleet on Spark is guesswork. Pairs with AIN-234 (metering rides the same context) + AIN-233 (trace propagates via handoff contracts) + AIN-226 (gateway is the injection point).

AIN-249 [Routing] CONDITIONAL M_allowed wiring — brand_verdicts schema + RoutingRequest.context + explicit PASS rows

Code + schema follow-on to AIN-248. Wires the engine to consume CONDITIONAL M_allowed verdicts end-to-end. PASS/VETO/None already work today (Candidate.m_allowed 3-state); this ticket adds the context-aware path.

⛔ Item #1 — SCHEMA LOCK (Discipline #12, founder ruling required before any code)

Same shape of one-shot immutable decision as F4. Recommended: separate brand_verdicts table (not brands.verdict cols).
Rationale: (1) a verdict is an audit record (reviewed_by/approved_by/approved_at), not a brand attribute; (2) CONDITIONAL is context-dependent → a brand may need multiple/superseded verdict rows (per-jurisdiction, over time) — columns can't hold history cleanly; (3) keeps catalog clean, matches append-don't-mutate ethos. Brain reads a current-verdict view; table keeps the trail.
Do not start items 1–3 until founder rules table-vs-columns.

Scope (≈150 LOC once schema lock taken)

  1. Verdict schema (Alembic). brand_verdicts: {brand_id, verdict: pass|conditional|veto, condition: jsonb, hosting_endpoint: text, dimensions: jsonb, counsel_required: bool, reviewed_by, approved_by, approved_at}. RLS posture matching the catalog family. Engine reads only resolved verdict + condition; dimensions/counsel_required/reviewed_by/approved_by are audit metadata the brain does not read.
  2. Request-context model. RoutingRequest (ainfera_routing.types) gains context: dict (e.g. tenant_jurisdiction, hosted_region). Enrolment gate evaluates the brand's condition against context → resolves m_allowed. Additive — callers that pass no context default to the strictest interpretation (CONDITIONAL without satisfying context → deny).
  3. Explicit PASS rows for the 5 active brands. Replace today's implicit "pass via brand.active=true" (api/ainfera_api/services/routing_brain.py:174) with first-class verdict rows so the audit trail shows why allowed, not just that it is. approved_by: founder from launch.

NOT in scope

  • Ulmo intake + founder-signed verdicts themselves = AIN-248 (human/counsel-facing, zero code).
  • D7 SBOM gate (AIN-237), EU Annex IV bundle = operational gates before a verdict lands, not engine inputs.

Done bar

  • Schema lock ruled + migration applied (Alembic, dev-branch tested)
  • context field live; CONDITIONAL resolves correctly for a test brand (satisfied context → candidate; unsatisfied → M_ALLOWED_VETO)
  • 5 active brands have explicit PASS verdict rows; routing unchanged for them (regression-clean vs current)
  • routing_outcomes.m_allowed_set captures the resolved boolean + condition outcome
  • No-context request still routes (strictest-default proven)

Blocked by: AIN-248 (needs first verdicts to test against). Related: AIN-245 (engine), AIN-237 (D7).

Review in Linear

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.

Reviewed by Cursor Bugbot for commit 284d6eb. Configure here.

raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="invalid X-Ainfera-Internal-Key",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics opens DB before auth

Medium Severity

GET /metrics declares DBSession as a handler dependency while the internal-key check runs only inside the function body. FastAPI resolves dependencies before the handler runs, so missing or invalid X-Ainfera-Internal-Key requests still acquire a database session even though no query should run.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 284d6eb. Configure here.

hizrianraz and others added 3 commits May 24, 2026 05:59
…rename

Swap the legacy literal 'aamc_v1_lock' → 'routing_backend_v1_lock' in
scripts/seed_dev.py (5 anchor rows + idempotency-comment update).

The SP-1 rename migration 20260523_0027 row-rewrites
`aa_index_source = 'aamc_v1_lock' → 'routing_backend_v1_lock'`. On a
clean CI database the migrations run BEFORE seeding, so the rename
fires on an empty table; the seed script then inserts the 5 §C
anchors directly with the new literal.

Fix (a) over fix (b) per founder's two-guard authorization: re-running
an already-applied migration after seed is structurally awkward and
violates Alembic's once-per-revision contract. The rename migration
remains independently asserted by
`test_zero_rows_carry_legacy_aamc_source_tag` (integration).

Grep probe confirmed the literal is NOT shared with another test-path
expectation — only test_t9_catalog_migration.py:142 references it, and
that unit test reads the static catalog-migration tuple (frozen
historical data), not live DB state, so it's unaffected.

Unblocks:
  tests/integration/test_routing_backends_invariants.py
    ::test_canonical_5_voters_use_v1_lock_source
    ::test_zero_rows_carry_legacy_aamc_source_tag

Fixture/packaging only. No engine touch, no routing_outcomes touch, no
methodology change. Disc#12 unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now
honors `stream:true` (200 + text/event-stream with ordered Anthropic
SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks
in the response). The §16 capture invariant holds: every routed call —
streamed or not — writes exactly one `routing_outcomes` row plus the
matching audit events plus the ledger debit.

Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER
that PR.

## Adapter contract lift

- `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults
  None — back-compat preserved across all 5 adapters).
- New `ProviderAdapter.stream_chat()` async generator yields normalized
  `StreamEvent`s. Default impl wraps `chat()` into one content_delta +
  one message_delta so adapters that don't yet override honor the
  contract surface.
- New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`,
  `tool_use_delta`, `message_delta`.
- New `ToolsNotSupportedError` — adapters that don't yet wire tool
  calling raise this at the adapter boundary; the handler maps it to
  a 422 with backend slug + remediation.
- `AdapterResponse.content_blocks` added so tool_use round-trips
  through the non-streaming path too.

## Per-adapter native streaming

- AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages`
  with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass
  through natively.
- OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real
  native SSE against `/v1/chat/completions` with `stream:true` +
  `stream_options.include_usage`; translates `delta.tool_calls[]` →
  normalized tool_use events.
- OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises
  ToolsNotSupportedError → 422 with backend slug.
- GeminiAdapter / MistralAdapter: signature extended; inherit
  OpenAICompatAdapter native streaming.

## Streaming dispatch + /v1/messages

- `services/streaming.py` runs the dispatcher to completion (full §16
  capture + ledger + audit), then synthesizes Anthropic SSE frames
  from the resulting DispatchResult. v0 posture: `wrapped` (TTFT =
  full inference time); response header `x-ainfera-stream-mode`
  reports the mode so SDK clients can observe it. Adapter-level
  native streaming primitives in this same PR are ready for the
  follow-up that refactors `dispatch_inference` to consume them
  end-to-end (flipping the header to `native`).
- `routers/anthropic_compat.py`:
  - Drops 501-on-stream → returns StreamingResponse with
    text/event-stream content-type.
  - Drops blanket 422-on-tools → tools pass through. Legacy code
    `tool_calling_not_supported_on_shim` retired; backends without
    tools surface `tools_not_supported_by_backend` with hint.
  - `MessagesResponse.content[]` polymorphic (text OR tool_use);
    SDK sees one shape across stream + non-stream.
  - Alias resolver honored on streamed calls (`_log_alias_hit` fires
    for the three SP-1 legacy strings).
- Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`)
  set on streaming responses identical to non-streaming.

## Tests

- tests/unit/test_streaming_wire_format.py — 6 pure tests against
  default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason
  mapping + `supports_native_streaming()` flag.
- tests/integration/test_anthropic_compat.py — replaces SP-1 501/422
  assertions with SP-2 coverage:
    · stream:true → 200 + text/event-stream + ordered Anthropic frames
    · streaming writes §16 row on close
    · streaming honors silent-alias resolver (parametrized × 3)
    · non-empty tools passes through

Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke
all green (505 unit+smoke tests).

## SP-2 v0 honesty caveat

Contract surface (200 text/event-stream, ordered Anthropic frames,
§16 capture, tool_use round-trip, alias parity) is real and verified.
TTFT is NOT sub-1s in v0 because the streaming wrapper runs
non-streaming dispatch first and replays its full response as SSE.
The adapter-level native streaming primitives are in place; the
follow-up refactors dispatch_inference to consume them end-to-end.
`x-ainfera-stream-mode: wrapped` today → `native` after the follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h secret scrubbing (AIN-238 + AIN-249)

Adds the internal-scoped observability surface + a structured JSON
log formatter that scrubs secrets before bytes leave the process.

Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent
of SP-5 PR-A (#76 supply-chain) and SP-5 PR-B (#77 resilience).

## AIN-238 — Prometheus /metrics surface

Dependency-free registry in `services/metrics.py`. Named series
(process-global; NO tenant_id / agent_id / owner_handle):
  - `ainfera_http_requests_total{method,path,status}` — counter
  - `ainfera_http_request_duration_seconds{method,path}` — histogram
  - `ainfera_provider_calls_total{provider,outcome}` — counter
  - `ainfera_router_alias_hit_total{alias}` — counter
  - `ainfera_audit_chain_height` + `_freshness_seconds` — gauges
  - `ainfera_dispatch_without_capture_total` — bridge for SP-4 PR-A
  - `ainfera_cost_killswitch_{engaged,spent_usd,threshold_usd}` —
    bridge for SP-5 PR-B
  - `ainfera_app_info{version}` — constant info gauge

`middleware/request_metrics.py` — ASGI middleware that times every
request and uses the FastAPI route TEMPLATE for the path label so
agent_id etc. never leak. Defensive label-cardinality cap (200
unique paths) blocks probe-spam from blowing up the histogram set.

`routers/metrics.py` — `GET /metrics` gated by
`X-Ainfera-Internal-Key` (same key the signup proxy uses). Cold-path
enrichment reads `max(seq)` + `max(created_at)` from audit_events
(read-only — never mutates the immutable chain). Hidden from openapi
so it's not advertised to public clients.

## AIN-249 carry-forward — SP-4 PR-A guard scrape series

`ainfera_dispatch_without_capture_total` is registered here; SP-4
PR-A's `DispatchCaptureCounter` plugs in via a single `.inc()` call
once both PRs merge.

## AIN-238 — structured JSON logging with secret scrubbing

`services/structured_log.py` — `StructuredJSONFormatter` emits one
JSON object per record + scrubs secrets in two layers:
  1. Per-KEY scrubbing for structured `extra` fields (`api_key`,
     `password`, `secret`, `token`, `authorization`, `cookie`,
     `prompt`, `messages`, `content`).
  2. Regex pass for known secret SHAPES in freeform message text
     (`ai_infera_*`, `sk-*`, `Bearer *`, JWT `eyJ*.*.*`).

Tracebacks also flow through the scrubber. Wired in `main.py` via
`logging.basicConfig(handlers=[...], force=True)` BEFORE the routers
import so startup log lines are also scrubbed.

## Tests
- `tests/unit/test_structured_log.py` — 10 cases (each secret format
  + structured extra + nested dicts + innocent passthrough + tracebacks).
- `tests/unit/test_metrics_registry.py` — 13 cases (primitives, label
  escaping, cumulative buckets, sorted render, named-series wrappers).

Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke
= 529 green.

## Privacy guardrails (SP-5 §1)

- NO tenant_id, agent_id, owner_handle, or any PII appears as a
  metrics label.
- `/metrics` is internal-key gated; tenant cardinality (if ever
  needed) lands on a stricter-auth endpoint.
- Log lines are scrubbed by both KEY and SHAPE. The
  `test_extra_field_with_prompt_label_redacted` test locks "prompt
  content is PII; never log it" into CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hizrianraz hizrianraz force-pushed the feat/ain271-streaming-tooluse branch from 5a57625 to 7281e42 Compare May 23, 2026 23:00
@hizrianraz hizrianraz force-pushed the feat/observability branch from 284d6eb to c153ecc Compare May 23, 2026 23:00
@hizrianraz hizrianraz force-pushed the feat/ain271-streaming-tooluse branch from 1fd4e23 to 0d33b78 Compare May 24, 2026 05:12
@hizrianraz hizrianraz deleted the branch feat/ain271-streaming-tooluse May 24, 2026 05:14
@hizrianraz hizrianraz closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant