Phase 1 wrap-up. Adds the observability surface called for in the master plan.
Metrics (module-scope, default registry, auto-scraped by the worker's combined
metrics server):
- `replay_vision_observations_total{status, scanner_type}` — terminal-state counter
- `replay_vision_failure_kinds_total{kind, scanner_type}` — failures broken down by FailureKind
- `replay_vision_ineligible_kinds_total{kind}` — ineligibles broken down by IneligibleSessionKind
- `replay_vision_activity_duration_seconds{activity, status}` — wall time per activity, via @track_activity
- `replay_vision_provider_call_seconds{provider, model, scanner_type, outcome}` — Gemini call latency
Structured success logs emitted alongside the counters:
- `replay_vision.observation.succeeded`
- `replay_vision.observation.failed` (with kind)
- `replay_vision.observation.ineligible` (with kind)
Plumbing: `CreateObservationOutput` and the three `MarkObservation*Inputs`
gain a `scanner_type` field so the terminal-state activities can label
their metrics without re-querying. The `@track_activity` decorator captures
the function name at decoration time so direct test invocations work without
an activity context.
Problem
Phase 1 of Replay Vision is functionally complete (model + workflow + manual
/observe/triggering) but has no telemetry surface beyondlogger.exceptioncalls on the failure paths. Operators can't answer "how many observations are running per scanner type", "what's the failure rate", "what's p95 latency on the Gemini call" — none of it exists in Prometheus or as structured success logs in Loki. The master plan flags this as a Phase 1 wrap-up requirement.Changes
New metrics module (
temporal/metrics.py) — module-scopeCounter/Histogramfrom the default registry, auto-scraped by the worker's combined metrics server. Cardinality budget worst-case ~150 series.replay_vision_observations_totalstatus, scanner_typereplay_vision_failure_kinds_totalkind, scanner_typemark_observation_failed_activityreplay_vision_ineligible_kinds_totalkindmark_observation_ineligible_activityreplay_vision_activity_duration_secondsactivity, status@track_activitydecorator on every activityreplay_vision_provider_call_secondsprovider, model, scanner_type, outcome_call_with_retryStructured success logs at the same hook points so Loki can answer the same questions without Prom —
replay_vision.observation.{succeeded,failed,ineligible}withobservation_id,scanner_type, andkind(where applicable).Plumbing:
CreateObservationOutputnow carriesscanner_typeso the threeMarkObservation*Inputscan take it as a labeled field — terminal activities label metrics without re-querying.How did you test this code?
Agent-authored.
hogli test products/replay_vision/backend/tests/test_temporal.py— 66 passed (61 existing + 5 new). NewTestObservationStateMetricsAndLogsclass asserts both the Prometheus increment and the structured log payload for each terminal-state transition, plus the histogram count for@track_activity.Publish to changelog?
no
🤖 Agent context
Tool: Claude Code. Considered emitting metrics directly from workflow code (rejected — workflows must be deterministic, Prom increments are global side effects that would double-fire on replay). Considered an activity interceptor (rejected — heavier than a decorator and harder to scope to one product). Considered
activity.info().activity_typefor the metric label (rejected — breaks direct test invocations outside an activity context; usingfn.__name__at decoration time gives the same value with no runtime context dependency).