Skip to content

feat(tests): scale visual regression to all 57 screens (stacked on #541)#552

Merged
TaprootFreak merged 3 commits into
feat/visual-regression-pilotfrom
feat/visual-regression-scale
May 23, 2026
Merged

feat(tests): scale visual regression to all 57 screens (stacked on #541)#552
TaprootFreak merged 3 commits into
feat/visual-regression-pilotfrom
feat/visual-regression-scale

Conversation

@TaprootFreak
Copy link
Copy Markdown
Contributor

Summary

Bucket breakdown

  • Onboarding + wallet (11): create, restore, verify_seed, home, onboarding_completed, pin x2, hw bitbox, legal x2, debug_auth
  • Settings subpages (17): languages, currencies, network, seed, tax_report, contact, wallet_address, legal_docs x3, user_data x7
  • KYC (15): 2fa, email x2, financial_data x4, ident, nationality, registration, subpages x5
  • Trading + support + misc (9): receive, tx_history, sell_bitbox, sell_bank_account_selection, support x4, web_view (skipped)

Test plan

  • Bootstrap workflow runs on push, generates 56 baseline PNGs (57 minus web_view)
  • Baselines reviewed visually, committed to test/goldens/screens/**/goldens/macos/
  • golden-tests CI job grün on the same commit
  • Bootstrap workflow removed before ready-for-review

State-variant goldens (loaded/error/loading per screen) are out of scope here — they follow as separate per-feature PRs.

Adds 52 new golden tests on top of the pilot 5, covering every
Page-File under lib/screens/**:

- Bucket 1: onboarding + wallet lifecycle (11 tests: create, restore,
  verify_seed, home, onboarding_completed, pin x2, hw bitbox, legal x2,
  debug_auth)
- Bucket 2: settings subpages (17 tests: languages, currencies, network,
  seed, tax_report, contact, wallet_address, legal_docs x3, user_data x7)
- Bucket 3: KYC (15 tests: 2fa, email x2, financial_data x4, ident,
  nationality, registration, subpages x5)
- Bucket 4: trading + support + misc (9 tests: receive, tx_history,
  sell_bitbox, sell_bank_account_selection, support x4, web_view skipped)

One Page → one golden in default/initial state. State-variant goldens
follow as a separate PR per feature when the team finds them necessary.

web_view_page.dart is the only skipped target — InAppWebView is a
platform-view that has no headless rendering in flutter_test. The
test file is committed with skip: true so it activates the moment a
stub is introduced.

Bootstrap workflow is restored temporarily so the dfx01 runner can
regenerate the baseline set on push; removed in a follow-up commit
once the artifact is committed.
Two failure classes turned up on the first dfx01 bootstrap run:

1. MissingPluginException for no_screenshot's
   com.flutterplaza.no_screenshot_methods channel — create_wallet,
   settings_seed. Stubbed via TestDefaultBinaryMessengerBinding in the
   per-file setUpAll so the call returns true and the test continues.

2. pumpAndSettle hangs on CircularProgressIndicator/CupertinoActivityIndicator
   in loading-state pages — kyc_loading, kyc_financial_data,
   kyc_financial_data_loading, settings_edit_loading, support_chat,
   sell_bitbox. Switched pumpBeforeTest to alchemist's pumpOnce so the
   first frame is captured instead of waiting for animation completion.
Generated by golden-bootstrap run 26341918780 on dfx01. Total
authoritative baselines now sits at 59 PNGs covering 56 of 57 page
files (web_view stays skipped).

Bootstrap workflow removed — the standard golden-tests CI job is the
permanent validation entry point from here.
@TaprootFreak TaprootFreak marked this pull request as ready for review May 23, 2026 19:52
@TaprootFreak TaprootFreak merged commit 5613899 into feat/visual-regression-pilot May 23, 2026
4 checks passed
@TaprootFreak TaprootFreak deleted the feat/visual-regression-scale branch May 23, 2026 20:56
TaprootFreak added a commit that referenced this pull request May 23, 2026
…) (#552)

## Summary

- Stacked on top of #541 (visual-regression pilot). **Do not merge until
#541 is merged**, then this PR rebases against develop.
- Adds 52 new golden tests covering every \`lib/screens/**/*_page.dart\`
(1 per page, default/initial state) on top of the pilot 5 — 57 tests
total
- Bootstrap workflow is reintroduced temporarily to regenerate all
baselines on dfx01; removed in a follow-up commit before
ready-for-review
- 1 page (\`web_view_page.dart\`) is intentionally \`skip: true\` —
InAppWebView is a platform-view that has no headless render

## Bucket breakdown
- **Onboarding + wallet** (11): create, restore, verify_seed, home,
onboarding_completed, pin x2, hw bitbox, legal x2, debug_auth
- **Settings subpages** (17): languages, currencies, network, seed,
tax_report, contact, wallet_address, legal_docs x3, user_data x7
- **KYC** (15): 2fa, email x2, financial_data x4, ident, nationality,
registration, subpages x5
- **Trading + support + misc** (9): receive, tx_history, sell_bitbox,
sell_bank_account_selection, support x4, web_view (skipped)

## Test plan
- [ ] Bootstrap workflow runs on push, generates 56 baseline PNGs (57
minus web_view)
- [ ] Baselines reviewed visually, committed to
test/goldens/screens/**/goldens/macos/
- [ ] golden-tests CI job grün on the same commit
- [ ] Bootstrap workflow removed before ready-for-review

State-variant goldens (loaded/error/loading per screen) are out of scope
here — they follow as separate per-feature PRs.
TaprootFreak added a commit that referenced this pull request May 25, 2026
…tacked on #541) (#562)

## Summary

Stacked on #541 (which is itself the merge bus for #552). Two changes
that together unblock unifying the Maestro handbook screenshots with the
Golden baselines (see plan at
`~/Documents/Claude/realunit-handbook-unification-plan.md`):

1. **Locale switch en → de**: `wrapForGolden` defaulted to
`Locale('en')`, so all 59 current Goldens render in English. The Maestro
handbook pins the simulator to `de_CH` and captures German UI — the two
pipelines cannot share images while they speak different languages.
2. **Handbook gap coverage**: three new Goldens for handbook pages that
had no Golden equivalent:
- `create_wallet_page_revealed` — handbook 05-seed-revealed (state
variant of `state.hideSeed=false`)
- `settings_seed_page_revealed` — handbook 19-settings-seed-revealed
(`showSeed=true`)
- `settings_confirm_logout_wallet_sheet_default` — handbook
24-settings-delete-wallet (modal in initial unchecked state)

## Mapping audit (Phase 0)

Verified against `.maestro/handbook/*.yaml`:

| Handbook page | Golden | Status |
|---|---|---|
| 01–09, 11–16, 18, 20–23, 25 | existing | ✅ |
| 05-seed-revealed | new | ✅ this PR |
| 17-settings-backup-pin | — | ⚠️ deferred (state variant of
`verify_pin_page`, needs context-aware test setup) |
| 19-settings-seed-revealed | new | ✅ this PR |
| 24-settings-delete-wallet | new | ✅ this PR |
| 26-terms | `legal_document_page_default` | ⚠️ to verify visually
whether the bound content matches |
| **10-biometric-prompt** | — | ❌ **out of scope**: iOS system bottom
sheet from `LocalAuthentication`, not rendered by Flutter — Skia cannot
reproduce it. Will be discussed before Phase 1 (Dockerfile.handbook
switch). |

## BackdropFilter validation

The existing `settings_seed_page_default` Golden already proves that
Flutter's headless Skia renders `BackdropFilter` correctly (the blur is
visible, not the historic XCUITest-black-PNG issue). Same applies to the
new revealed/hidden state variants and the `create_wallet_view`'s
`SeedBlurCard`.

## Bootstrap workflow

`.github/workflows/golden-bootstrap.yaml` is re-introduced temporarily,
triggered by push to this branch. It runs `flutter test test/goldens
--update-goldens` on the `realunit-app` self-hosted dfx01 runner and
uploads the regenerated PNGs as `golden-baselines`. I download the
artifact, commit the baselines into
`test/goldens/screens/**/goldens/macos/`, then delete the bootstrap
workflow file in a follow-up commit — same pattern as the pilot PR.

## Test plan

- [ ] `golden-bootstrap` workflow run completes green on dfx01
- [ ] Baselines downloaded + committed
- [ ] `golden-bootstrap.yaml` removed
- [ ] `Visual Regression` job in pull-request.yaml green on final commit
- [ ] Spot-check sample DE Goldens visually match the handbook
screenshots
- [ ] Decide on `10-biometric-prompt` and `17-settings-backup-pin`
before promoting to ready-for-review

## Out of scope

- `Dockerfile.handbook` switch from `docs/handbook/screenshots/` to
`test/goldens/` (Phase 1 of the unification plan)
- Maestro pipeline retirement / nightly-only mode (Phase 2)
TaprootFreak added a commit that referenced this pull request May 25, 2026
…unner (#541)

## Summary

- Introduces visual-regression goldens for **every
\`lib/screens/**/*_page.dart\`** in the repo (56 of 57 rendered, 1
explicit \`skip: true\`)
- Render host is the dfx01 self-hosted runner (Mac Studio M3 Ultra,
labels \`self-hosted, macOS, ARM64, m3-ultra, realunit-app\`) —
Hardware-pinning so Skia/CoreText state is identical between baseline
generation and validation
- Stack: [alchemist](https://pub.dev/packages/alchemist) 0.14.0, Open
Sans (SIL OFL 1.1) committed as an asset (the previous system-font
fallback wasn't deterministic across hosts)
- New CI job \`golden-tests\` in \`.github/workflows/pull-request.yaml\`
runs in parallel to \`build\`; \`build\` passes \`--exclude-tags
golden\` so the visual-regression tests stay confined to the dfx01
runner

## Coverage

- **Onboarding + wallet lifecycle (11):** create_wallet, restore_wallet,
verify_seed, home, onboarding_completed, setup_pin, verify_pin, hw
bitbox, legal x2, debug_auth
- **Settings + subpages (17):** languages, currencies, network, seed,
tax_report, contact, wallet_address, legal_docs x3, user_data x7
- **KYC (15):** 2fa, email x2, financial_data x4, ident, nationality,
registration, subpages x5 (account_merge, completed, failure, loading,
pending)
- **Trading + support + misc (9):** receive, transaction_history,
sell_bitbox, sell_bank_account_selection, support x4, web_view (skipped
— InAppWebView is a platform-view with no headless render)
- **Pilot 5:** welcome (iOS + Android theme variants), dashboard,
settings, buy (initial + payment-info-loaded), sell (no-account +
with-balance)

**Total: 57 test files, 59 baseline PNGs.**

## Verified

- Baselines generated on dfx01 via the (now removed)
\`golden-bootstrap.yaml\` workflow, downloaded and committed
- \`golden-tests\` CI ran green on the stacked PR #552 ([run
26342855405](https://github.com/DFXswiss/realunit-app/actions/runs/26342855405))
— proving the committed baselines match a fresh render on dfx01
- Drift detection verified during the pilot phase: a probe pixel change
in \`realUnitBlue\` flipped CI red and uploaded
master/test/maskedDiff/isolatedDiff PNGs as a \`golden-diffs\` artifact
- Local \`flutter analyze\` clean, \`flutter test --exclude-tags
golden\` passes (2148/2148)

## Documentation

-
[\`docs/visual-regression-tests.md\`](https://github.com/DFXswiss/realunit-app/blob/feat/visual-regression-pilot/docs/visual-regression-tests.md)
— bootstrap pattern, drift workflow, Flutter-bump regeneration,
dfx01-outage fallback
- Runner setup + tooling docs in
[DFXServer/server@develop](https://github.com/DFXServer/server/blob/develop/infrastructure/dfx01/actions-runners/realunit-app-tooling.md)

State-variant goldens (loaded/error/loading per screen) are out of scope
here — they follow as separate per-feature PRs.

## Follow-ups after merge

- Set \`golden-tests\` as a required status check on develop branch
protection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant