You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #541 (feat/visual-regression-pilot, head 60f0e87, base develop, open and ready-for-review) lands a visual-regression pilot using alchemist 0.14.0 + Open Sans baseline font on the dfx01 self-hosted Mac Studio runner. Pilot covers 5 pages with 8 baseline PNGs. The original golden-bootstrap.yaml workflow was removed in commit 60f0e87 ("revert drift probe + remove bootstrap workflow") because the initial baselines were committed and the workflow's purpose was fulfilled.
PR #552 (feat/visual-regression-scale, stacked on #541) already ships the full 57-page scale-out with 59 baselines committed. All CI checks green. This issue tracks acceptance + merge of #552, not greenfield work.
Why the scale-out is the right move (verified against actual data)
A prior version of this issue recommended cutting to ~25 hot-path pages over concerns about maintenance tax and flake risk. That recommendation was overruled by the actual pilot data:
Pilot→full scale wall-clock: 57 minutes from first pilot commit (2de7eb1) to 57-page complete (ed19559). Including bootstrap setup, deliberate drift probe + revert (b279168 → 60f0e87, framework's catch-mechanism verified end-to-end), scale to 57 pages, fix 8 deterministic test wiring issues, generate 51 more baselines on dfx01.
Marginal cost per page: ~5 minutes once the framework is hardened. Cutting to 25 leaves 32 pages ungated; coverage holes are worse than maintenance cost when the maintenance is small.
Flake reality: zero flake reports across pilot + scale-out CI runs. The 8 fix-once test failures were MissingPluginException for no_screenshot (channel stub, permanent fix) and pumpAndSettle hangs on CircularProgressIndicator pages (switched to pumpOnce, permanent fix). Neither is a flake.
Re-bake cadence in practice: based on UI-touching commits in lib/screens/ since 2026-04, ~5-8 PRs/year would need baseline updates. Each re-bake on dfx01 takes 3-5 minutes.
Visual-class bug history: only 2 commits in repo history (#103 ellipsis-on-overflow, #59 scroll-for-small-screens). But golden tests also implicitly catch render-crashes (page must build to produce a PNG) — the pilot fix commit 9236617 itself caught real wiring bugs in kyc-loading and settings_edit_loading.
Scope
PR #552 already contains the full set. This issue tracks:
Tighten alchemist pin: ^0.14.0 (caret) → 0.14.0 (exact) in pubspec.yaml. Pixel determinism is the goal; even though only one 0.14.x release exists today, a future 0.14.1 would silently re-bake all baselines on the next pub upgrade.
Context
PR #541 (
feat/visual-regression-pilot, head60f0e87, basedevelop, open and ready-for-review) lands a visual-regression pilot usingalchemist0.14.0 + Open Sans baseline font on thedfx01self-hosted Mac Studio runner. Pilot covers 5 pages with 8 baseline PNGs. The originalgolden-bootstrap.yamlworkflow was removed in commit60f0e87("revert drift probe + remove bootstrap workflow") because the initial baselines were committed and the workflow's purpose was fulfilled.PR #552 (
feat/visual-regression-scale, stacked on #541) already ships the full 57-page scale-out with 59 baselines committed. All CI checks green. This issue tracks acceptance + merge of #552, not greenfield work.Why the scale-out is the right move (verified against actual data)
A prior version of this issue recommended cutting to ~25 hot-path pages over concerns about maintenance tax and flake risk. That recommendation was overruled by the actual pilot data:
2de7eb1) to 57-page complete (ed19559). Including bootstrap setup, deliberate drift probe + revert (b279168→60f0e87, framework's catch-mechanism verified end-to-end), scale to 57 pages, fix 8 deterministic test wiring issues, generate 51 more baselines on dfx01.MissingPluginExceptionforno_screenshot(channel stub, permanent fix) andpumpAndSettlehangs onCircularProgressIndicatorpages (switched topumpOnce, permanent fix). Neither is a flake.lib/screens/since 2026-04, ~5-8 PRs/year would need baseline updates. Each re-bake on dfx01 takes 3-5 minutes.#103ellipsis-on-overflow,#59scroll-for-small-screens). But golden tests also implicitly catch render-crashes (page must build to produce a PNG) — the pilot fix commit9236617itself caught real wiring bugs inkyc-loadingandsettings_edit_loading.Scope
PR #552 already contains the full set. This issue tracks:
alchemistpin:^0.14.0(caret) →0.14.0(exact) inpubspec.yaml. Pixel determinism is the goal; even though only one 0.14.x release exists today, a future 0.14.1 would silently re-bake all baselines on the nextpub upgrade.docs/visual-regression-tests.md(depends on Golden-update bootstrap mechanism aftergolden-bootstrap.yamlremoval #555 — golden-update bootstrap mechanism aftergolden-bootstrap.yamlremoval)lib/screens/pin/widgets/enable_biometric_bottom_sheet.dart,lib/screens/pin/widgets/forgot_pin_bottom_sheet.dart,lib/screens/hardware_connect_bitbox/show_bitbox_reconnect_sheet.dart— verify these are either in feat(tests): scale visual regression to all 57 screens (stacked on #541) #552's scope or explicitly listed in.test-coverage-allowlist(CI: file-pair existence guard (every page/widget needs a test or an allow-list entry) #551 guard)Acceptance criteria
*_page.dartfiles have at least one matching*_golden_test.dartafter feat(tests): scale visual regression to all 57 screens (stacked on #541) #552 merges[self-hosted, macOS, ARM64, m3-ultra, realunit-app](verified configured)alchemist: 0.14.0exact pin inpubspec.yamldocs/visual-regression-tests.mddocuments the re-bake workflow once Golden-update bootstrap mechanism aftergolden-bootstrap.yamlremoval #555 landsOpen decisions
golden-bootstrap.yamlremoval #555 — separate issue.Estimated effort
docs/visual-regression-tests.mdre-bake workflow section (after #555)The original 10-12 engineer-day estimate stands for the work done in #552. This issue's remaining work is the wrap-up.
ROI reassessment
Related
docs/visual-regression-tests.md— to be added by PR feat(tests): visual-regression coverage for all 57 screens on dfx01 runner #541lib/screens/**/*_page.dart— source of truth for the page list*_page.dart(12 untested + 5 special-handling) #544 — sister issue fortestWidgetspage testsgolden-bootstrap.yamlremoval #555 — golden-update bootstrap mechanism (blocking thedocs/workflow section)