Skip to content

Visual regression: scale from 5-screen pilot to every *_page.dart #547

@TaprootFreak

Description

@TaprootFreak

Context

PR #541 (feat/visual-regression-pilot, head 60f0e87, base develop, open and ready-for-review) lands a visual-regression pilot using alchemist 0.14.0 + Open Sans baseline font on the dfx01 self-hosted Mac Studio runner. Pilot covers 5 pages with 8 baseline PNGs. The original golden-bootstrap.yaml workflow was removed in commit 60f0e87 ("revert drift probe + remove bootstrap workflow") because the initial baselines were committed and the workflow's purpose was fulfilled.

PR #552 (feat/visual-regression-scale, stacked on #541) already ships the full 57-page scale-out with 59 baselines committed. All CI checks green. This issue tracks acceptance + merge of #552, not greenfield work.

Why the scale-out is the right move (verified against actual data)

A prior version of this issue recommended cutting to ~25 hot-path pages over concerns about maintenance tax and flake risk. That recommendation was overruled by the actual pilot data:

  • Pilot→full scale wall-clock: 57 minutes from first pilot commit (2de7eb1) to 57-page complete (ed19559). Including bootstrap setup, deliberate drift probe + revert (b27916860f0e87, framework's catch-mechanism verified end-to-end), scale to 57 pages, fix 8 deterministic test wiring issues, generate 51 more baselines on dfx01.
  • Marginal cost per page: ~5 minutes once the framework is hardened. Cutting to 25 leaves 32 pages ungated; coverage holes are worse than maintenance cost when the maintenance is small.
  • Flake reality: zero flake reports across pilot + scale-out CI runs. The 8 fix-once test failures were MissingPluginException for no_screenshot (channel stub, permanent fix) and pumpAndSettle hangs on CircularProgressIndicator pages (switched to pumpOnce, permanent fix). Neither is a flake.
  • Re-bake cadence in practice: based on UI-touching commits in lib/screens/ since 2026-04, ~5-8 PRs/year would need baseline updates. Each re-bake on dfx01 takes 3-5 minutes.
  • Visual-class bug history: only 2 commits in repo history (#103 ellipsis-on-overflow, #59 scroll-for-small-screens). But golden tests also implicitly catch render-crashes (page must build to produce a PNG) — the pilot fix commit 9236617 itself caught real wiring bugs in kyc-loading and settings_edit_loading.

Scope

PR #552 already contains the full set. This issue tracks:

Acceptance criteria

Open decisions

  1. Dark-mode baselines? App doesn't ship a dark theme today. Defer.
  2. iPad / Android tablet variants? Pilot is iPhone-only (390×844); recommend keeping that.
  3. Bootstrap-update mechanism after Golden-update bootstrap mechanism after golden-bootstrap.yaml removal #555 — separate issue.

Estimated effort

Sub-task Days
Review + merge #541 and #552 0.5
Tighten alchemist pin 0.1
Bottom-sheet coverage check + allow-list entries if needed 0.25
docs/visual-regression-tests.md re-bake workflow section (after #555) 0.25
Total ~1 engineer-day of finishing work; ~10 days already invested in #552

The original 10-12 engineer-day estimate stands for the work done in #552. This issue's remaining work is the wrap-up.

ROI reassessment

Original V1 estimate Verified
Maintenance tax very high low (~5-8 re-bakes/year, 3-5 min each on dfx01)
Flake rate high (pixel drift even on dfx01) zero observed in pilot + scale-out
Bug-catching low (visual-only) medium-to-high (also catches render-crashes implicitly)
Recommendation scale to 25 hot-path pages stay at full 57

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions