feat(replay-vision): prometheus metrics + structured success logs by TueHaulund · Pull Request #60196 · PostHog/posthog

TueHaulund · 2026-05-27T08:10:20Z

Problem

Phase 1 of Replay Vision is functionally complete (model + workflow + manual /observe/ triggering) but has no telemetry surface beyond logger.exception calls on the failure paths. Operators can't answer "how many observations are running per scanner type", "what's the failure rate", "what's p95 latency on the Gemini call" — none of it exists in Prometheus or as structured success logs in Loki. The master plan flags this as a Phase 1 wrap-up requirement.

Changes

New metrics module (temporal/metrics.py) — module-scope Counter/Histogram from the default registry, auto-scraped by the worker's combined metrics server. Cardinality budget worst-case ~150 series.

Metric	Labels	Where emitted
`replay_vision_observations_total`	`status, scanner_type`	terminal-state activities
`replay_vision_failure_kinds_total`	`kind, scanner_type`	`mark_observation_failed_activity`
`replay_vision_ineligible_kinds_total`	`kind`	`mark_observation_ineligible_activity`
`replay_vision_activity_duration_seconds`	`activity, status`	`@track_activity` decorator on every activity
`replay_vision_provider_call_seconds`	`provider, model, scanner_type, outcome`	inside `_call_with_retry`

Structured success logs at the same hook points so Loki can answer the same questions without Prom — replay_vision.observation.{succeeded,failed,ineligible} with observation_id, scanner_type, and kind (where applicable).

Plumbing: CreateObservationOutput now carries scanner_type so the three MarkObservation*Inputs can take it as a labeled field — terminal activities label metrics without re-querying.

How did you test this code?

Agent-authored. hogli test products/replay_vision/backend/tests/test_temporal.py — 66 passed (61 existing + 5 new). New TestObservationStateMetricsAndLogs class asserts both the Prometheus increment and the structured log payload for each terminal-state transition, plus the histogram count for @track_activity.

Publish to changelog?

no

🤖 Agent context

Tool: Claude Code. Considered emitting metrics directly from workflow code (rejected — workflows must be deterministic, Prom increments are global side effects that would double-fire on replay). Considered an activity interceptor (rejected — heavier than a decorator and harder to scope to one product). Considered activity.info().activity_type for the metric label (rejected — breaks direct test invocations outside an activity context; using fn.__name__ at decoration time gives the same value with no runtime context dependency).

Phase 1 wrap-up. Adds the observability surface called for in the master plan. Metrics (module-scope, default registry, auto-scraped by the worker's combined metrics server): - `replay_vision_observations_total{status, scanner_type}` — terminal-state counter - `replay_vision_failure_kinds_total{kind, scanner_type}` — failures broken down by FailureKind - `replay_vision_ineligible_kinds_total{kind}` — ineligibles broken down by IneligibleSessionKind - `replay_vision_activity_duration_seconds{activity, status}` — wall time per activity, via @track_activity - `replay_vision_provider_call_seconds{provider, model, scanner_type, outcome}` — Gemini call latency Structured success logs emitted alongside the counters: - `replay_vision.observation.succeeded` - `replay_vision.observation.failed` (with kind) - `replay_vision.observation.ineligible` (with kind) Plumbing: `CreateObservationOutput` and the three `MarkObservation*Inputs` gain a `scanner_type` field so the terminal-state activities can label their metrics without re-querying. The `@track_activity` decorator captures the function name at decoration time so direct test invocations work without an activity context.

greptile-apps · 2026-05-27T09:09:45Z

Comments Outside Diff (1)

products/replay_vision/backend/tests/test_temporal.py, line 857-880 (link)

Idempotency only tested for mark_observation_failed_activity

test_no_counter_or_log_when_update_affects_zero_rows is parameterised on all three terminal statuses but calls only mark_observation_failed_activity in every case. The idempotency guard (if not updated: return) in mark_observation_ineligible_activity and mark_observation_succeeded_activity is never exercised by this test. Given the stated preference for parameterised tests, extending this case to also drive the matching activity for each status (e.g. parameterising on (terminal_status, activity, inputs) tuples) would close that gap.

Prompt To Fix With AI

This is a comment left during a code review.
Path: products/replay_vision/backend/tests/test_temporal.py
Line: 857-880

Comment:
**Idempotency only tested for `mark_observation_failed_activity`**

`test_no_counter_or_log_when_update_affects_zero_rows` is parameterised on all three terminal statuses but calls only `mark_observation_failed_activity` in every case. The idempotency guard (`if not updated: return`) in `mark_observation_ineligible_activity` and `mark_observation_succeeded_activity` is never exercised by this test. Given the stated preference for parameterised tests, extending this case to also drive the matching activity for each status (e.g. parameterising on `(terminal_status, activity, inputs)` tuples) would close that gap.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
products/replay_vision/backend/tests/test_temporal.py:857-880
**Idempotency only tested for `mark_observation_failed_activity`**

`test_no_counter_or_log_when_update_affects_zero_rows` is parameterised on all three terminal statuses but calls only `mark_observation_failed_activity` in every case. The idempotency guard (`if not updated: return`) in `mark_observation_ineligible_activity` and `mark_observation_succeeded_activity` is never exercised by this test. Given the stated preference for parameterised tests, extending this case to also drive the matching activity for each status (e.g. parameterising on `(terminal_status, activity, inputs)` tuples) would close that gap.

_{Reviews (1): Last reviewed commit: "feat(replay-vision): prometheus metrics ..." | Re-trigger Greptile}

fasyy612

Sweeet 👌

deployment-status-posthog · 2026-05-28T10:57:10Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-05-28 10:57 UTC	Run
prod-us	✅ Deployed	2026-05-28 11:21 UTC	Run
prod-eu	✅ Deployed	2026-05-28 11:24 UTC	Run

TueHaulund force-pushed the tue/replay-vision-metrics branch from 2c50720 to e6b36bb Compare May 27, 2026 08:25

TueHaulund force-pushed the tue/replay-vision-metrics branch from e6b36bb to 4fc08bb Compare May 27, 2026 08:41

TueHaulund requested review from a team, arnohillen, fasyy612, ksvat and nicowaltz and removed request for a team May 27, 2026 09:05

TueHaulund marked this pull request as ready for review May 27, 2026 09:05

assign-reviewers-posthog Bot requested a review from a team May 27, 2026 09:05

Merge branch 'master' into tue/replay-vision-metrics

c5ed1c6

TueHaulund force-pushed the tue/replay-vision-metrics branch from 5b5016c to c5ed1c6 Compare May 27, 2026 09:27

TueHaulund mentioned this pull request May 27, 2026

feat(replay-vision): daily cleanup-sweep workflow #60215

Open

fasyy612 approved these changes May 28, 2026

View reviewed changes

TueHaulund merged commit ec2a7f5 into master May 28, 2026
219 checks passed

TueHaulund deleted the tue/replay-vision-metrics branch May 28, 2026 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(replay-vision): prometheus metrics + structured success logs#60196

feat(replay-vision): prometheus metrics + structured success logs#60196
TueHaulund merged 2 commits into
masterfrom
tue/replay-vision-metrics

TueHaulund commented May 27, 2026

Uh oh!

greptile-apps Bot commented May 27, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

fasyy612 left a comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TueHaulund commented May 27, 2026

Problem

Changes

How did you test this code?

Publish to changelog?

🤖 Agent context

Uh oh!

greptile-apps Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments Outside Diff (1)

Uh oh!

fasyy612 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 27, 2026 •

edited

Loading

deployment-status-posthog Bot commented May 28, 2026 •

edited

Loading