feat(tests): scale visual regression to all 57 screens (stacked on #541) by TaprootFreak · Pull Request #552 · RealUnitCH/app

TaprootFreak · 2026-05-23T19:40:09Z

Summary

Stacked on top of feat(tests): visual-regression coverage for all 57 screens on dfx01 runner #541 (visual-regression pilot). Do not merge until feat(tests): visual-regression coverage for all 57 screens on dfx01 runner #541 is merged, then this PR rebases against develop.
Adds 52 new golden tests covering every `lib/screens/**/*_page.dart` (1 per page, default/initial state) on top of the pilot 5 — 57 tests total
Bootstrap workflow is reintroduced temporarily to regenerate all baselines on dfx01; removed in a follow-up commit before ready-for-review
1 page (`web_view_page.dart`) is intentionally `skip: true` — InAppWebView is a platform-view that has no headless render

Bucket breakdown

Onboarding + wallet (11): create, restore, verify_seed, home, onboarding_completed, pin x2, hw bitbox, legal x2, debug_auth
Settings subpages (17): languages, currencies, network, seed, tax_report, contact, wallet_address, legal_docs x3, user_data x7
KYC (15): 2fa, email x2, financial_data x4, ident, nationality, registration, subpages x5
Trading + support + misc (9): receive, tx_history, sell_bitbox, sell_bank_account_selection, support x4, web_view (skipped)

Test plan

Bootstrap workflow runs on push, generates 56 baseline PNGs (57 minus web_view)
Baselines reviewed visually, committed to test/goldens/screens/**/goldens/macos/
golden-tests CI job grün on the same commit
Bootstrap workflow removed before ready-for-review

State-variant goldens (loaded/error/loading per screen) are out of scope here — they follow as separate per-feature PRs.

Adds 52 new golden tests on top of the pilot 5, covering every Page-File under lib/screens/**: - Bucket 1: onboarding + wallet lifecycle (11 tests: create, restore, verify_seed, home, onboarding_completed, pin x2, hw bitbox, legal x2, debug_auth) - Bucket 2: settings subpages (17 tests: languages, currencies, network, seed, tax_report, contact, wallet_address, legal_docs x3, user_data x7) - Bucket 3: KYC (15 tests: 2fa, email x2, financial_data x4, ident, nationality, registration, subpages x5) - Bucket 4: trading + support + misc (9 tests: receive, tx_history, sell_bitbox, sell_bank_account_selection, support x4, web_view skipped) One Page → one golden in default/initial state. State-variant goldens follow as a separate PR per feature when the team finds them necessary. web_view_page.dart is the only skipped target — InAppWebView is a platform-view that has no headless rendering in flutter_test. The test file is committed with skip: true so it activates the moment a stub is introduced. Bootstrap workflow is restored temporarily so the dfx01 runner can regenerate the baseline set on push; removed in a follow-up commit once the artifact is committed.

Two failure classes turned up on the first dfx01 bootstrap run: 1. MissingPluginException for no_screenshot's com.flutterplaza.no_screenshot_methods channel — create_wallet, settings_seed. Stubbed via TestDefaultBinaryMessengerBinding in the per-file setUpAll so the call returns true and the test continues. 2. pumpAndSettle hangs on CircularProgressIndicator/CupertinoActivityIndicator in loading-state pages — kyc_loading, kyc_financial_data, kyc_financial_data_loading, settings_edit_loading, support_chat, sell_bitbox. Switched pumpBeforeTest to alchemist's pumpOnce so the first frame is captured instead of waiting for animation completion.

Generated by golden-bootstrap run 26341918780 on dfx01. Total authoritative baselines now sits at 59 PNGs covering 56 of 57 page files (web_view stays skipped). Bootstrap workflow removed — the standard golden-tests CI job is the permanent validation entry point from here.

…) (#552) ## Summary - Stacked on top of #541 (visual-regression pilot). **Do not merge until #541 is merged**, then this PR rebases against develop. - Adds 52 new golden tests covering every \`lib/screens/**/*_page.dart\` (1 per page, default/initial state) on top of the pilot 5 — 57 tests total - Bootstrap workflow is reintroduced temporarily to regenerate all baselines on dfx01; removed in a follow-up commit before ready-for-review - 1 page (\`web_view_page.dart\`) is intentionally \`skip: true\` — InAppWebView is a platform-view that has no headless render ## Bucket breakdown - **Onboarding + wallet** (11): create, restore, verify_seed, home, onboarding_completed, pin x2, hw bitbox, legal x2, debug_auth - **Settings subpages** (17): languages, currencies, network, seed, tax_report, contact, wallet_address, legal_docs x3, user_data x7 - **KYC** (15): 2fa, email x2, financial_data x4, ident, nationality, registration, subpages x5 - **Trading + support + misc** (9): receive, tx_history, sell_bitbox, sell_bank_account_selection, support x4, web_view (skipped) ## Test plan - [ ] Bootstrap workflow runs on push, generates 56 baseline PNGs (57 minus web_view) - [ ] Baselines reviewed visually, committed to test/goldens/screens/**/goldens/macos/ - [ ] golden-tests CI job grün on the same commit - [ ] Bootstrap workflow removed before ready-for-review State-variant goldens (loaded/error/loading per screen) are out of scope here — they follow as separate per-feature PRs.

…tacked on #541) (#562) ## Summary Stacked on #541 (which is itself the merge bus for #552). Two changes that together unblock unifying the Maestro handbook screenshots with the Golden baselines (see plan at `~/Documents/Claude/realunit-handbook-unification-plan.md`): 1. **Locale switch en → de**: `wrapForGolden` defaulted to `Locale('en')`, so all 59 current Goldens render in English. The Maestro handbook pins the simulator to `de_CH` and captures German UI — the two pipelines cannot share images while they speak different languages. 2. **Handbook gap coverage**: three new Goldens for handbook pages that had no Golden equivalent: - `create_wallet_page_revealed` — handbook 05-seed-revealed (state variant of `state.hideSeed=false`) - `settings_seed_page_revealed` — handbook 19-settings-seed-revealed (`showSeed=true`) - `settings_confirm_logout_wallet_sheet_default` — handbook 24-settings-delete-wallet (modal in initial unchecked state) ## Mapping audit (Phase 0) Verified against `.maestro/handbook/*.yaml`: | Handbook page | Golden | Status | |---|---|---| | 01–09, 11–16, 18, 20–23, 25 | existing | ✅ | | 05-seed-revealed | new | ✅ this PR | | 17-settings-backup-pin | — | ⚠️ deferred (state variant of `verify_pin_page`, needs context-aware test setup) | | 19-settings-seed-revealed | new | ✅ this PR | | 24-settings-delete-wallet | new | ✅ this PR | | 26-terms | `legal_document_page_default` | ⚠️ to verify visually whether the bound content matches | | **10-biometric-prompt** | — | ❌ **out of scope**: iOS system bottom sheet from `LocalAuthentication`, not rendered by Flutter — Skia cannot reproduce it. Will be discussed before Phase 1 (Dockerfile.handbook switch). | ## BackdropFilter validation The existing `settings_seed_page_default` Golden already proves that Flutter's headless Skia renders `BackdropFilter` correctly (the blur is visible, not the historic XCUITest-black-PNG issue). Same applies to the new revealed/hidden state variants and the `create_wallet_view`'s `SeedBlurCard`. ## Bootstrap workflow `.github/workflows/golden-bootstrap.yaml` is re-introduced temporarily, triggered by push to this branch. It runs `flutter test test/goldens --update-goldens` on the `realunit-app` self-hosted dfx01 runner and uploads the regenerated PNGs as `golden-baselines`. I download the artifact, commit the baselines into `test/goldens/screens/**/goldens/macos/`, then delete the bootstrap workflow file in a follow-up commit — same pattern as the pilot PR. ## Test plan - [ ] `golden-bootstrap` workflow run completes green on dfx01 - [ ] Baselines downloaded + committed - [ ] `golden-bootstrap.yaml` removed - [ ] `Visual Regression` job in pull-request.yaml green on final commit - [ ] Spot-check sample DE Goldens visually match the handbook screenshots - [ ] Decide on `10-biometric-prompt` and `17-settings-backup-pin` before promoting to ready-for-review ## Out of scope - `Dockerfile.handbook` switch from `docs/handbook/screenshots/` to `test/goldens/` (Phase 1 of the unification plan) - Maestro pipeline retirement / nightly-only mode (Phase 2)

…unner (#541) ## Summary - Introduces visual-regression goldens for **every \`lib/screens/**/*_page.dart\`** in the repo (56 of 57 rendered, 1 explicit \`skip: true\`) - Render host is the dfx01 self-hosted runner (Mac Studio M3 Ultra, labels \`self-hosted, macOS, ARM64, m3-ultra, realunit-app\`) — Hardware-pinning so Skia/CoreText state is identical between baseline generation and validation - Stack: [alchemist](https://pub.dev/packages/alchemist) 0.14.0, Open Sans (SIL OFL 1.1) committed as an asset (the previous system-font fallback wasn't deterministic across hosts) - New CI job \`golden-tests\` in \`.github/workflows/pull-request.yaml\` runs in parallel to \`build\`; \`build\` passes \`--exclude-tags golden\` so the visual-regression tests stay confined to the dfx01 runner ## Coverage - **Onboarding + wallet lifecycle (11):** create_wallet, restore_wallet, verify_seed, home, onboarding_completed, setup_pin, verify_pin, hw bitbox, legal x2, debug_auth - **Settings + subpages (17):** languages, currencies, network, seed, tax_report, contact, wallet_address, legal_docs x3, user_data x7 - **KYC (15):** 2fa, email x2, financial_data x4, ident, nationality, registration, subpages x5 (account_merge, completed, failure, loading, pending) - **Trading + support + misc (9):** receive, transaction_history, sell_bitbox, sell_bank_account_selection, support x4, web_view (skipped — InAppWebView is a platform-view with no headless render) - **Pilot 5:** welcome (iOS + Android theme variants), dashboard, settings, buy (initial + payment-info-loaded), sell (no-account + with-balance) **Total: 57 test files, 59 baseline PNGs.** ## Verified - Baselines generated on dfx01 via the (now removed) \`golden-bootstrap.yaml\` workflow, downloaded and committed - \`golden-tests\` CI ran green on the stacked PR #552 ([run 26342855405](https://github.com/DFXswiss/realunit-app/actions/runs/26342855405)) — proving the committed baselines match a fresh render on dfx01 - Drift detection verified during the pilot phase: a probe pixel change in \`realUnitBlue\` flipped CI red and uploaded master/test/maskedDiff/isolatedDiff PNGs as a \`golden-diffs\` artifact - Local \`flutter analyze\` clean, \`flutter test --exclude-tags golden\` passes (2148/2148) ## Documentation - [\`docs/visual-regression-tests.md\`](https://github.com/DFXswiss/realunit-app/blob/feat/visual-regression-pilot/docs/visual-regression-tests.md) — bootstrap pattern, drift workflow, Flutter-bump regeneration, dfx01-outage fallback - Runner setup + tooling docs in [DFXServer/server@develop](https://github.com/DFXServer/server/blob/develop/infrastructure/dfx01/actions-runners/realunit-app-tooling.md) State-variant goldens (loaded/error/loading per screen) are out of scope here — they follow as separate per-feature PRs. ## Follow-ups after merge - Set \`golden-tests\` as a required status check on develop branch protection

TaprootFreak added 3 commits May 23, 2026 21:39

TaprootFreak marked this pull request as ready for review May 23, 2026 19:52

This was referenced May 23, 2026

Visual regression: scale from 5-screen pilot to every *_page.dart #547

Open

Meta: complete test coverage across every tier and surface #550

Open

TaprootFreak merged commit 5613899 into feat/visual-regression-pilot May 23, 2026
4 checks passed

TaprootFreak deleted the feat/visual-regression-scale branch May 23, 2026 20:56

This was referenced May 23, 2026

feat(tests): visual-regression coverage for all 57 screens on dfx01 runner #541

Merged

test(goldens): switch default locale to de + handbook gap coverage (stacked on #541) #562

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tests): scale visual regression to all 57 screens (stacked on #541)#552

feat(tests): scale visual regression to all 57 screens (stacked on #541)#552
TaprootFreak merged 3 commits into
feat/visual-regression-pilotfrom
feat/visual-regression-scale

TaprootFreak commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TaprootFreak commented May 23, 2026

Summary

Bucket breakdown

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant