chore(ci): don't gate master Storybook on observe-purpose VR result by aspicer · Pull Request #59120 · PostHog/posthog

aspicer · 2026-05-19T22:46:01Z

Problem

The Storybook workflow on master has been failing about half of all completed
runs, blocking the required Visual regression tests pass check on the default
branch.

In a sample of the most recent 50 master-push runs, the dominant failure pattern
(roughly 55% of all failures) is the Complete Visual Review run job exiting 1
with Visual changes detected — review at: ..., even though the master Visual
Review run is created with --purpose observe (tracking-only, non-gating).

Example failed runs that hit this pattern:

In each, every shard of the visual-regression matrix passes; the only failing
step is Complete Visual Review run, e.g.:

[run:...] Done: 3125 snapshots — 3119 unchanged, 6 changed, 0 new, 0 removed
[run:...] Visual changes detected — review at: ...
##[error]Process completed with exit code 1.

Changes

The vr-setup step already documents the intent: master runs use
--purpose observe so they don't block. But the vr run complete CLI still
exits 1 whenever changes are detected, regardless of the run's purpose, and that
propagates to the vr-complete job result and onward to the required check.

Marking the Complete Visual Review run step as continue-on-error on
push events keeps the master tracker recording diffs and finalizing the run
without flipping the workflow red. PR runs are unchanged — they still gate as
before.

A backend-side fix (teaching the CLI / API to return 0 for observe runs) would
be cleaner, but requires plumbing purpose through the Run dataclass and
serializer. This workflow-only change is the smallest fix that restores master
green and leaves the better fix for a follow-up.

Concerns:

WebKit shard timeouts are a second, unrelated flake source. Around a third
of the remaining master failures are page.goto: Timeout 30000ms exceeded
inside WebKit shards (different stories each time — Onboarding,
ErrorDisplay, etc.). This PR does not touch those — they need a separate
investigation (likely test-runner navigation timeout or retry behavior). After
this PR, the residual master failure rate should drop to roughly the WebKit
flake floor.

How did you test this code?

I'm an agent. I did not run the workflow locally. Verification was static:

Read all 50 most recent master completed Storybook runs and classified job
failures via gh run view ... --json jobs and gh run view --job ... --log-failed.
Confirmed the failing step is Complete Visual Review run in 16/29 recent
master failures, all with Visual changes detected — review at: ... followed
by exit code 1 while every shard reported success.
Confirmed the intent comment in vr-setup (# … master runs are tracking-only ("observe") since we don't want master runs to block).
continue-on-error: true on a step makes the job conclusion success
even when the step fails; downstream visual_regression_tests reads
needs.vr-complete.result so the required check will pass.

Publish to changelog?

no

🤖 Agent context

Tool: Claude Code agent, CI optimization push focused on Storybook master
health.
Investigation: sampled the last 50 completed master runs, grouped failures by
failing job, and read logs for representative examples in both buckets.
Considered fixes:
1. Teach vr run complete to return 0 on observe runs (rejected for this PR
  — needs purpose exposed on the RunApi schema and a CLI follow-up; the
  correct long-term fix, but out of scope for an unblocker).
2. Skip vr-complete entirely on push (rejected — the backend still needs the
  completion call to transition runs out of PENDING, healing baselines,
  finalizing snapshots).
3. continue-on-error gated on github.event_name == 'push' (chosen — smallest
  diff, preserves PR gating, keeps the backend transition).
Not touched in this PR: WebKit page.goto timeouts in the shard matrix; the
backend exposing purpose on the run serializer.
Used --no-verify? No — pre-commit hook ran successfully after refreshing the
venv (uv sync --active).

The vr-setup step creates master Visual Review runs with --purpose observe (tracking-only, non-gating). However `vr run complete` still exits 1 when any visual changes are detected, which propagates through `vr-complete`'s job result and trips the required `Visual regression tests pass` check. This makes master Storybook red whenever a UI change lands — defeating the "observe" semantics already documented in the vr-setup comment. Mark the Complete Visual Review run step as continue-on-error on push events so the master tracker keeps recording diffs without gating. PRs still gate.

greptile-apps · 2026-05-19T22:47:14Z

_{Reviews (1): Last reviewed commit: "chore(ci): don't gate master Storybook o..." | Re-trigger Greptile}

deployment-status-posthog · 2026-05-20T18:12:42Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-05-20 18:12 UTC	Run
prod-us	✅ Deployed	2026-05-20 18:30 UTC	Run
prod-eu	✅ Deployed	2026-05-20 18:32 UTC	Run

assign-reviewers-posthog Bot requested a review from a team May 19, 2026 22:46

gantoine approved these changes May 20, 2026

View reviewed changes

aspicer merged commit 2ba3b3a into master May 20, 2026
159 checks passed

aspicer deleted the aspicer/ciopt-storybook-master-fix branch May 20, 2026 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(ci): don't gate master Storybook on observe-purpose VR result#59120

chore(ci): don't gate master Storybook on observe-purpose VR result#59120
aspicer merged 1 commit into
masterfrom
aspicer/ciopt-storybook-master-fix

aspicer commented May 19, 2026

Uh oh!

greptile-apps Bot commented May 19, 2026

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aspicer commented May 19, 2026

Problem

Changes

How did you test this code?

Publish to changelog?

🤖 Agent context

Uh oh!

greptile-apps Bot commented May 19, 2026

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

deployment-status-posthog Bot commented May 20, 2026 •

edited

Loading