feat(tests): visual-regression coverage for all 57 screens on dfx01 runner by TaprootFreak · Pull Request #541 · RealUnitCH/app

TaprootFreak · 2026-05-23T18:56:16Z

Summary

Introduces visual-regression goldens for every `lib/screens//*_page.dart`** in the repo (56 of 57 rendered, 1 explicit `skip: true`)
Render host is the dfx01 self-hosted runner (Mac Studio M3 Ultra, labels `self-hosted, macOS, ARM64, m3-ultra, realunit-app`) — Hardware-pinning so Skia/CoreText state is identical between baseline generation and validation
Stack: alchemist 0.14.0, Open Sans (SIL OFL 1.1) committed as an asset (the previous system-font fallback wasn't deterministic across hosts)
New CI job `golden-tests` in `.github/workflows/pull-request.yaml` runs in parallel to `build`; `build` passes `--exclude-tags golden` so the visual-regression tests stay confined to the dfx01 runner

Coverage

Onboarding + wallet lifecycle (11): create_wallet, restore_wallet, verify_seed, home, onboarding_completed, setup_pin, verify_pin, hw bitbox, legal x2, debug_auth
Settings + subpages (17): languages, currencies, network, seed, tax_report, contact, wallet_address, legal_docs x3, user_data x7
KYC (15): 2fa, email x2, financial_data x4, ident, nationality, registration, subpages x5 (account_merge, completed, failure, loading, pending)
Trading + support + misc (9): receive, transaction_history, sell_bitbox, sell_bank_account_selection, support x4, web_view (skipped — InAppWebView is a platform-view with no headless render)
Pilot 5: welcome (iOS + Android theme variants), dashboard, settings, buy (initial + payment-info-loaded), sell (no-account + with-balance)

Total: 57 test files, 59 baseline PNGs.

Verified

Baselines generated on dfx01 via the (now removed) `golden-bootstrap.yaml` workflow, downloaded and committed
`golden-tests` CI ran green on the stacked PR feat(tests): scale visual regression to all 57 screens (stacked on #541) #552 (run 26342855405) — proving the committed baselines match a fresh render on dfx01
Drift detection verified during the pilot phase: a probe pixel change in `realUnitBlue` flipped CI red and uploaded master/test/maskedDiff/isolatedDiff PNGs as a `golden-diffs` artifact
Local `flutter analyze` clean, `flutter test --exclude-tags golden` passes (2148/2148)

Documentation

`docs/visual-regression-tests.md` — bootstrap pattern, drift workflow, Flutter-bump regeneration, dfx01-outage fallback
Runner setup + tooling docs in DFXServer/server@develop

State-variant goldens (loaded/error/loading per screen) are out of scope here — they follow as separate per-feature PRs.

Follow-ups after merge

Set `golden-tests` as a required status check on develop branch protection

…) (#552) ## Summary - Stacked on top of #541 (visual-regression pilot). **Do not merge until #541 is merged**, then this PR rebases against develop. - Adds 52 new golden tests covering every \`lib/screens/**/*_page.dart\` (1 per page, default/initial state) on top of the pilot 5 — 57 tests total - Bootstrap workflow is reintroduced temporarily to regenerate all baselines on dfx01; removed in a follow-up commit before ready-for-review - 1 page (\`web_view_page.dart\`) is intentionally \`skip: true\` — InAppWebView is a platform-view that has no headless render ## Bucket breakdown - **Onboarding + wallet** (11): create, restore, verify_seed, home, onboarding_completed, pin x2, hw bitbox, legal x2, debug_auth - **Settings subpages** (17): languages, currencies, network, seed, tax_report, contact, wallet_address, legal_docs x3, user_data x7 - **KYC** (15): 2fa, email x2, financial_data x4, ident, nationality, registration, subpages x5 - **Trading + support + misc** (9): receive, tx_history, sell_bitbox, sell_bank_account_selection, support x4, web_view (skipped) ## Test plan - [ ] Bootstrap workflow runs on push, generates 56 baseline PNGs (57 minus web_view) - [ ] Baselines reviewed visually, committed to test/goldens/screens/**/goldens/macos/ - [ ] golden-tests CI job grün on the same commit - [ ] Bootstrap workflow removed before ready-for-review State-variant goldens (loaded/error/loading per screen) are out of scope here — they follow as separate per-feature PRs.

Introduces pixel-exact baseline tests for Welcome, Dashboard, Settings, Buy, and Sell — 8 baselines total — rendered on the dfx01 self-hosted runner (Mac Studio M3 Ultra). Stack: - alchemist 0.14.0 (Betterment) as the goldens framework - Open Sans (SIL OFL 1.1) committed as an asset; previously fell back to the system font, which broke determinism across hosts - New CI job 'golden-tests' parallel to 'build', runs on the realunit-app self-hosted runner so Skia/CoreText state is identical between baseline generation and validation - 'build' job now passes --exclude-tags golden so goldens stay confined to the dedicated runner and don't false-fail on macos-latest Baselines are bootstrapped in this PR via the temporary golden-bootstrap.yaml workflow (removed in the same PR before ready-for-review). Day-to-day workflow, drift handling, Flutter-bump process, and dfx01-outage fallback are documented in docs/visual-regression-tests.md. Scaling to all 57 page-files is the explicit follow-up after this pilot validates the pipeline end-to-end.

… can self-bootstrap

Generated by the golden-bootstrap workflow run 26340878734. These 8 PNGs are the authoritative pixel baselines for the pilot — every PR's golden-tests job validates against them.

…gressions

- Revert lib/styles/colors.dart drift commit (25→50 R-channel) that was used to verify CI catches pixel regressions. Drift caught by the golden-tests job as expected, diff artifact uploaded with master/test/ maskedDiff/isolatedDiff PNGs. - Remove .github/workflows/golden-bootstrap.yaml — initial baselines have been committed (run 26340878734), the bootstrap workflow's purpose is fulfilled. From here, baseline regeneration is documented in docs/visual-regression-tests.md "Adding a new golden test" / "Reacting to a CI drift" sections.

…) (#552) ## Summary - Stacked on top of #541 (visual-regression pilot). **Do not merge until #541 is merged**, then this PR rebases against develop. - Adds 52 new golden tests covering every \`lib/screens/**/*_page.dart\` (1 per page, default/initial state) on top of the pilot 5 — 57 tests total - Bootstrap workflow is reintroduced temporarily to regenerate all baselines on dfx01; removed in a follow-up commit before ready-for-review - 1 page (\`web_view_page.dart\`) is intentionally \`skip: true\` — InAppWebView is a platform-view that has no headless render ## Bucket breakdown - **Onboarding + wallet** (11): create, restore, verify_seed, home, onboarding_completed, pin x2, hw bitbox, legal x2, debug_auth - **Settings subpages** (17): languages, currencies, network, seed, tax_report, contact, wallet_address, legal_docs x3, user_data x7 - **KYC** (15): 2fa, email x2, financial_data x4, ident, nationality, registration, subpages x5 - **Trading + support + misc** (9): receive, tx_history, sell_bitbox, sell_bank_account_selection, support x4, web_view (skipped) ## Test plan - [ ] Bootstrap workflow runs on push, generates 56 baseline PNGs (57 minus web_view) - [ ] Baselines reviewed visually, committed to test/goldens/screens/**/goldens/macos/ - [ ] golden-tests CI job grün on the same commit - [ ] Bootstrap workflow removed before ready-for-review State-variant goldens (loaded/error/loading per screen) are out of scope here — they follow as separate per-feature PRs.

Stacked on top of #541. Pure refactor — no functional change, no baseline touched. ## What changed Five duplicated patterns across 35+ golden test files moved behind 3 new helper files re-exported from \`test/helper/helper.dart\`: | Helper | Symbol(s) | Was inline in | |---|---|---| | \`golden_constants.dart\` | \`phoneConstraints\` | 33 files | | \`golden_mocks.dart\` | \`MockHomeBloc\` | 5 | | | \`MockSettingsBloc\` | 7 | | | \`MockAppStore\` | 5 | | | \`MockDfxCountryService\` | 4 | | | \`MockSoftwareWallet\` | 3 | | \`golden_plugin_stubs.dart\` | \`stubNoScreenshotChannel()\` | 2 | Plus 32 unused \`package:flutter/material.dart\` imports removed (collateral of the phoneConstraints extraction). **Net diff:** 44 insertions / 182 deletions across 44 files. Every new golden test from here saves ~10-15 lines of boilerplate. ## Verified - \`flutter analyze\` clean - \`flutter test --exclude-tags golden\`: 2148/2148 green - \`golden-tests\` CI will run on this branch once #541 lands on develop (stacked PR limitation — \`pull_request.yaml\` triggers only on PRs targeting develop) ## After merge follow-up After #541 lands on develop, this PR's base auto-rebases to develop and the \`golden-tests\` job validates that the refactor didn't break the committed baselines on dfx01.

…k stub Investigation outcome after attempting to activate web_view_golden_test: flutter_inappwebview asserts InAppWebViewPlatform.instance is set on the very first build, and the abstract interface defines five platform-bound factory methods (controller, widget, cookie manager, etc.). Stubbing the method channel alone is not enough — a full InAppWebViewPlatform mock subclass tree is required, and even then the webview body renders as a blank rectangle (the native content can't exist headless). For a one-page edge case the cost doesn't justify it. Keeping the skip:true with a documented exit criterion is the honest call. Also a tiny consistency fix in the same test file: use the shared phoneConstraints const from test/helper/ instead of an inline literal. The helper refactor missed this file because it was skip:true and the subagent skipped it.

## Summary All three primary CI workflows (`pull-request.yaml`, `tier3-handbook.yaml`, `bitbox-simulator.yml`) were filtered with `pull_request.branches: [develop]`. That meant stacked PRs (e.g. `feature → integration → develop`) never triggered CI on the lower stack levels — every regression was only caught once the stack collapsed to a develop PR. Concretely visible on PR #561 (`ci/coverage-floor-100` → `feat/visual-regression-pilot`): zero check-runs. This PR drops the `branches:` filter on all three so every PR fires CI regardless of target branch. The actual cost controls (draft-gate, `tier3:full` label, `paths:` filter on bitbox-simulator) are unchanged. ## What changes - **`pull-request.yaml`** — `branches: [develop]` removed from `pull_request:`. Block comment rewritten to explain why the branch filter is gone and what the remaining gates are (draft-gate, concurrency). `push: develop` kept as the authoritative post-merge run. - **`tier3-handbook.yaml`** — `branches: [develop]` removed from `pull_request:`. Block comment updated: "label gate is the cost control, not the branch filter". `push: develop` kept. - **`bitbox-simulator.yml`** — `branches: [develop]` removed from `pull_request:`. Comment rewritten: "`paths:` filter is the real cost control". No `push:` trigger here, unchanged. ## What stays the same - Draft PRs still skip every job via `if: github.event.pull_request.draft == false` guards. - Tier 3 still opt-in via `tier3:full` label. - BitBox simulator still path-gated to hardware_wallet / wallet / bitbox files. - `push: develop` post-merge verification on the two workflows that had it. - Concurrency groups unchanged — stacked-PR runs land in different groups by PR number, so they don't cancel each other. ## Test plan - [x] `python3 -c "import yaml; …"` — all three YAML files parse - [ ] CI green on this PR itself (which now means it triggers on `pull_request` against develop AS WELL as on any future stacked PR) - [ ] After merge: PR #561 (currently stacked on `feat/visual-regression-pilot`) should trigger CI on the next push or synchronize event

## Summary After PR #539 closed wave 1-3 and brought scoped line coverage to **100.0 %** (4 751 / 4 751), the 3 pp safety buffer below the measured value no longer matches the "100 % rule" stated in `README.md`. Pin the floor at **100** so any regression — including a single accidentally-uncovered line — fails CI immediately. When a future change genuinely needs to drop below 100 % (e.g. a Flutter SDK update that re-counts a defensive branch), use the `coverage:lower-floor` PR label so the regression is visible in the PR list rather than silently smuggled in. Functions floor stays at 50 (placeholder, `flutter test --coverage` still emits no FN records). ## Stacked on `feat/visual-regression-pilot` so it lands together with the visual-regression scale-out work. When that branch merges, this PR retargets to `develop` automatically (or `gh pr edit 560 --base develop`). ## Test plan - [ ] `Coverage Floor Gate` passes at 100.0 % == 100 floor - [ ] No other CI changes needed

…able (#565) ## Summary `docs/testing.md:280` claimed `FakeBitboxBehavior.disconnect` throws `SigningCancelledException`. The fake actually throws `BitboxNotConnectedException` at `test/helper/fake_bitbox_credentials.dart:89`. Both are distinct exception classes serving distinct intents: | Class | Defined at | Used for | |---|---|---| | `SigningCancelledException` | `lib/packages/wallet/exceptions/signing_cancelled_exception.dart` | User-cancel mid-sign | | `BitboxNotConnectedException` | `lib/packages/service/dfx/exceptions/bitbox_exception.dart` | Disconnect / not-paired | The cancel-flow code example immediately below the table (lines 284-295) is correct and stays unchanged — it tests the `cancel` behavior, which legitimately surfaces as `SigningCancelledException` through `Eip712Signer.signRegistration` when the fake returns `'0x'`. ## Discovered during Deep audit of issue #542 (Tier 1 integration tests) — verifying the cited Tier-1 doc references against the actual fake implementation. ## Test plan - [x] `grep -n SigningCancelledException docs/testing.md` and `BitboxNotConnectedException` — confirmed only the one drifted line touched - [x] `dart analyze` — markdown change, no Dart impact - [ ] CI green

Stacked on top of #541. README-only — no code, no baseline touched. ## What changed The repo's Feature Matrix (\`README.md\` Sections "Onboarding & authentication" / "Wallet actions" / "DFX backend integration" / "Settings" / "Support") didn't reflect the 57 golden tests added in #541. This PR closes that gap. - **Legend:** added \`golden\` as a test type with a pointer to \`docs/visual-regression-tests.md\` - **Every existing page row** gets its golden test path appended to the existing \`widget\`/\`cubit\`/\`unit\` entries - **Two new rows** for pages that exist in the app but weren't in the previous matrix: Settings root, Support root - **Five previously-empty rows** flipped from \`—\` to \`golden (...)\`: Sell to BitBox, Currencies/Languages/Network, Tax report, Contact, KYC AccountMergeRequested - **Triage gaps:** new "Visual regression (goldens)" bullet documenting the gate, the \`web_view\` \`skip: true\` exception, and the cross-link to \`docs/visual-regression-tests.md\` - **CI/CD table:** \`pull-request.yaml\` row now mentions \`--exclude-tags golden\` on the existing test step plus the parallel \`Visual Regression\` job on the dfx01 runner ## Verified - \`flutter analyze\` clean - Diff: 38 insertions / 35 deletions across one file (README.md) ## After merge #541 stays as before; this PR can land before or after #541. If it lands first, GitHub auto-rebases #541 against the updated develop on next push.

Yesterday's bootstrap baked DateTime.now() into the rendered date fields, then today's CI run rendered with today's date and diffed. Fix that without skipping the tests: - App code: use clock.now() from package:clock (already a runtime dep, pattern is already established in pin_auth_cubit, dfx_fiat_service, dfx_language_service). Touches: lib/screens/settings_tax_report/settings_tax_report_page.dart lib/screens/settings_tax_report/cubit/settings_tax_report_cubit.dart lib/screens/transaction_history/transaction_history_page.dart lib/screens/transaction_history/cubits/filter/transaction_history_filter_cubit.dart lib/screens/transaction_history/cubits/filter/transaction_history_filter_state.dart - transaction_history_page._todaysDate flips from `static final` to `static DateTime get` so cached lazy-init can't survive into a fixed-clock test zone. - Tests: wrap the two affected goldenTest calls in withClock(Clock.fixed(DateTime.utc(2026, 5, 23)), () { ... }). Verified locally: --update-goldens green, plain re-run also green, pixel-deterministic across runs. - Bootstrap workflow reintroduced temporarily; will be removed once the regenerated PNGs are committed.

…oses #556 follow-up) (#567) ## Summary Drops the `Build-time feature-flag mechanism (analogous to EXPO_PUBLIC_ENABLE_* in dfx-wallet)` row from the Coverage infrastructure roadmap. Issue #556 closed this as not applicable after an audit — keeping the unchecked row would advertise a roadmap item we've decided not to build. ## Why Three independent reasons (full writeup in [#556 close comment](#556 (comment))): 1. **Coverage-gating already solved** — `lcov --remove` + inline `// coverage:ignore-*` + Port-pattern refactors. The floor ratcheted from 51 → 94 (#538 / #539) without any env flag. 2. **The dfx-wallet precedent is itself an engineering tool**, not product/region/beta. Dart's `String.fromEnvironment` has no tree-shake equivalent. 3. **No realunit-app surface benefits** — no beta/experimental/staging screen exists, `/debugAuth` is already `kDebugMode`-gated, all 12 `defer` features intentionally ship. ## Re-add condition If a future ship-but-hide feature appears (regulator-gated jurisdiction, TestFlight-only experiment, compile-time endpoint switch), re-add this row and open a fresh implementation issue. ## Test plan - [x] Visual diff against the surrounding bullets (one line removed, no formatting change) - [ ] CI green

…ove bootstrap Bootstrap run 26344939432 produced byte-identical PNGs for 57 of 59 goldens and new versions for the two that previously baked DateTime.now() (settings_tax_report, transaction_history). The latter now render with Clock.fixed(DateTime.utc(2026, 5, 23)) and are pixel-stable across runs.

…#569) ## Summary Before this PR: `test/integration/` had exactly **1 file** (`kyc_sign_flow_test.dart`, 4 tests). Tier 1 — cubit + service + signer cross-layer driven by `FakeBitboxCredentials` / `SimulatedBitboxPlatform` — was the documented thin spot of the test pyramid. This PR triples the breadth: **6 new specs, 23 new tests**, 100 % green, no production-code touched. ## What's in this PR Three review-gated bundles, all merged into `test/tier1-expansion`: ### Bundle A — Sell + EIP-7702 - `sell_bitbox_flow_test.dart` (5 tests): real `SellBitboxCubit` + real `RealUnitSellPaymentInfoService` + `MockClient` HTTP boundary + `FakeBitboxCredentials`. Pins happy / cancel / disconnect / malformed-sig / **deposit-retry** (SellBitboxDepositRetry state with signedSwap + signedDeposit preserved across post-swap broadcast failure). - `eip7702_delegation_bitbox_test.dart` (4 tests): `Eip712Signer.signDelegation` through `FakeBitboxCredentials`. Happy + cancel + disconnect + **chainId-wiring** across mainnet / sepolia / arbitrum. ### Bundle B — Connect + Wallet-Creation - `connect_bitbox_flow_test.dart` (4 tests): real `ConnectBitboxCubit` + real `BitboxService` + real `BitboxManager` through `installSimulatedBitboxPlatform()` at the `BitboxUsbPlatform.instance` seam. Happy / pair-rejected / observer-disconnect / **re-pair-after-disconnect** (P461 #1 contract — `init()` re-attaches the SAME credentials instance, so `isConnected` flips `true → false → true` over a single reference). - `wallet_creation_bitbox_test.dart` (3 tests): real `WalletService.createBitboxWallet` → real `WalletRepository` → Drift in-memory → real `BitboxService` + simulator. Happy (schema + `verifyNever(getOrCreateMnemonicKey)`) / hardware-failure (rollback) / **persistence-round-trip** (fresh `WalletService` on the same DB+Prefs cold-loads without touching the AES-key store). ### Bundle C — Auth + Reconnect - `dfx_auth_sign_ceremony_bitbox_test.dart` (4 tests): real `DFXAuthService` + real `SessionCache` through the BitBox account boundary. Happy (cold-cache full ceremony) / cancel (no POST, `lockCurrentWallet` fires) / **timeout via fake_async** (3-min `_signMessageTimeout` cap) / 403 country-blocked propagation. - `bitbox_reconnect_recovery_test.dart` (3 tests): real `BitboxService` observer + `SimulatedBitboxPlatform` + real `SellBitboxCubit.retryAfterConnection`. Disconnect-mid-sign-retry / observer-device-loss (credentials slot preserved) / reattach-after-init (P461 surface). ## Coverage impact Tier 1 tests are **cross-layer behaviour pins**, not line-coverage drivers — every file the new tests touch is already at 100 % via the wave 1-3 sweep. The value is catching regressions that pass per-layer unit tests but break the seam between them (e.g. cubit catches the wrong exception type, sign-ceremony skips lock on cancel, re-pair returns a fresh credentials instance breaking the P461 contract). ## Test plan - [x] `flutter analyze` — no issues - [x] `flutter test test/integration/` — **27 tests green** (4 existing + 23 new) - [ ] CI green (3 jobs in `pull-request.yaml`)

…ed on #568) (#570) ## Summary Stacked on #568. Phase 1 of the Maestro → Goldens handbook unification plan: the 26 PNGs at `handbook.realunit.app/screenshots/` now come from the same Golden baselines the visual-regression CI verifies on every PR. ## Why - **Single source of truth**: one pixel-checked baseline per handbook page. A UI regression that breaks a Golden also breaks the handbook image before either ships. - **Determinism**: dfx01's headless Skia/Open Sans rendering is byte-stable across CI runs. Maestro's iOS-simulator screenshots drifted on Apple Silicon + iOS 26 driver hangs (mobile-dev-inc/Maestro#3137). - **Cycle time**: the handbook image rebuilds when Goldens change, in seconds — no 30-minute Maestro suite to refresh a page. ## What changed - **`scripts/assemble-handbook-screenshots.sh`** — explicit `handbook-name → golden-path` mapping for all 26 pages (see PR #568 for the audit that built this table). Copies and renames Goldens into a flat output directory the Dockerfile then consumes. Fails loudly if a Golden goes missing. - **`Dockerfile.handbook`** — multi-stage build. Stage 1 (alpine + bash) runs the assembly script. Stage 2 (nginx) copies the assembled screenshots over the legacy `docs/handbook/screenshots/` tree, preserving output paths so the handbook HTML `<img>` links work unchanged. - **`.github/workflows/handbook-build-check.yaml`** — new PR-only validation. Runs the assembly script independently (cheap check before docker spins up), builds the image, starts the container, hits `/healthz`, verifies the auth gate returns 401 on `/de/`, and probes `/screenshots/{01,11,26}.png` to prove the assembled files land on disk. Does **not** push to Docker Hub and does **not** deploy. The develop-push `handbook-deploy.yaml` retains sole ownership of DEV → PRD rollout. ## Out of scope - The legacy `docs/handbook/screenshots/` tree is left in place; the COPY in Dockerfile.handbook overlays it with the assembled output, so the legacy PNGs are harmless. **Phase 2** (Maestro pipeline retirement) decides whether to delete the directory + flow YAMLs. - DEV deploy verification — happens after merge to develop (the deploy pipeline does that automatically). ## Test plan - [ ] `Handbook Build Check` CI green on the PR (docker build + container smoke) - [ ] `Visual Regression` job green on the PR (parallel — no Golden changes here, should pass trivially) - [ ] After merge to develop: `handbook-deploy.yaml` builds the image, DEV-deploy succeeds, `https://dev-handbook.realunit.app/de/` shows the Goldens-sourced screenshots (spot-check 3 pages: welcome, dashboard, terms) - [ ] PRD deploy follows DEV-green, same spot-check on prod URL ## Pre-conditions for merge The stack below this PR must merge in order: #541 → #562 → #568 → this. Each subsequent merge re-targets the PR base automatically.

…tacked on #541) (#562) ## Summary Stacked on #541 (which is itself the merge bus for #552). Two changes that together unblock unifying the Maestro handbook screenshots with the Golden baselines (see plan at `~/Documents/Claude/realunit-handbook-unification-plan.md`): 1. **Locale switch en → de**: `wrapForGolden` defaulted to `Locale('en')`, so all 59 current Goldens render in English. The Maestro handbook pins the simulator to `de_CH` and captures German UI — the two pipelines cannot share images while they speak different languages. 2. **Handbook gap coverage**: three new Goldens for handbook pages that had no Golden equivalent: - `create_wallet_page_revealed` — handbook 05-seed-revealed (state variant of `state.hideSeed=false`) - `settings_seed_page_revealed` — handbook 19-settings-seed-revealed (`showSeed=true`) - `settings_confirm_logout_wallet_sheet_default` — handbook 24-settings-delete-wallet (modal in initial unchecked state) ## Mapping audit (Phase 0) Verified against `.maestro/handbook/*.yaml`: | Handbook page | Golden | Status | |---|---|---| | 01–09, 11–16, 18, 20–23, 25 | existing | ✅ | | 05-seed-revealed | new | ✅ this PR | | 17-settings-backup-pin | — | ⚠️ deferred (state variant of `verify_pin_page`, needs context-aware test setup) | | 19-settings-seed-revealed | new | ✅ this PR | | 24-settings-delete-wallet | new | ✅ this PR | | 26-terms | `legal_document_page_default` | ⚠️ to verify visually whether the bound content matches | | **10-biometric-prompt** | — | ❌ **out of scope**: iOS system bottom sheet from `LocalAuthentication`, not rendered by Flutter — Skia cannot reproduce it. Will be discussed before Phase 1 (Dockerfile.handbook switch). | ## BackdropFilter validation The existing `settings_seed_page_default` Golden already proves that Flutter's headless Skia renders `BackdropFilter` correctly (the blur is visible, not the historic XCUITest-black-PNG issue). Same applies to the new revealed/hidden state variants and the `create_wallet_view`'s `SeedBlurCard`. ## Bootstrap workflow `.github/workflows/golden-bootstrap.yaml` is re-introduced temporarily, triggered by push to this branch. It runs `flutter test test/goldens --update-goldens` on the `realunit-app` self-hosted dfx01 runner and uploads the regenerated PNGs as `golden-baselines`. I download the artifact, commit the baselines into `test/goldens/screens/**/goldens/macos/`, then delete the bootstrap workflow file in a follow-up commit — same pattern as the pilot PR. ## Test plan - [ ] `golden-bootstrap` workflow run completes green on dfx01 - [ ] Baselines downloaded + committed - [ ] `golden-bootstrap.yaml` removed - [ ] `Visual Regression` job in pull-request.yaml green on final commit - [ ] Spot-check sample DE Goldens visually match the handbook screenshots - [ ] Decide on `10-biometric-prompt` and `17-settings-backup-pin` before promoting to ready-for-review ## Out of scope - `Dockerfile.handbook` switch from `docs/handbook/screenshots/` to `test/goldens/` (Phase 1 of the unification plan) - Maestro pipeline retirement / nightly-only mode (Phase 2)

…in de Both PNGs were inherited from pilot's pre-#562 en-locale snapshots during the #562 conflict merge — they were the only goldens not re-rendered in the locale switch, so Visual Regression flagged them (1.94% / 0.40% diff) once the screens started rendering in de. Regenerated on dfx01 against the current de-locale + deterministic-clock setup so both pipelines (de-locale switch + determinism fix) line up.

- docs/handbook/de/index.html:603: golden count 57 → 68 (the 57 was the rendered-page count from PR #541's description; the actual PNG count under test/goldens/screens/ is 68 — verified via `find`). Renders on handbook.realunit.app, same risk class as the round-1 typo fix. - docs/handbook/de/index.html:638: rephrase Tier-3 / Goldens linking — previous wording read as if Tier-3 itself were documented in visual-regression-tests.md. Now: Tier-3 catches the smoke part inline, visual-regression-tests.md is the Goldens reference. - .github/workflows/tier3-handbook.yaml:3-5: stale top-of-file header sentence updated — "uploads the captured PNGs as a build artifact" is now "uploads the per-flow diagnostic captures … for forensic inspection", with forward-reference to the "Why no pixel-diff here" section that owns the full explanation.

…#573) ## Summary Macht die visual-regression Goldens zur **alleinigen** Quelle der `handbook.realunit.app`-Screenshots und räumt die Maestro-Tier-3-Pipeline so auf, dass keine zwei Pipelines mehr auf denselben Pfad schreiben. Bestandsaufnahme vor diesem PR: - [#541](#541) hat die 57 Goldens eingeführt - [#568](#568) hat `scripts/assemble-handbook-screenshots.sh` + `handbook-build-check.yaml` etabliert; `Dockerfile.handbook` baut die 26 Slots seither aus `test/goldens/screens/` zusammen - **Aber:** die 26 alten Maestro-PNGs lagen weiter unter `docs/handbook/screenshots/`, hatten gegenüber den Goldens drifted (md5 mismatch auf jedem File), und Tier-3 schrieb seine Captures in genau dieses Verzeichnis — also zwei Pipelines, ein Pfad, mehrdeutige Bedeutung ## Was ändert sich **Commit 1 — `git rm` der 26 duplizierten Maestro-PNGs unter `docs/handbook/screenshots/`** **Commit 2 — Goldens als kanonische Quelle verdrahten** - `.gitignore`: `docs/handbook/screenshots/` (jetzt reines Local-Preview-Verzeichnis, vom Assemble-Script befüllt) - `.github/workflows/handbook-deploy.yaml`: Path-Trigger um `test/goldens/screens/**` + `scripts/assemble-handbook-screenshots.sh` erweitert — Golden-Bumps lösen jetzt einen Deploy aus (das war eine latente Lücke: vorher hat ein Golden-Update den Handbook-Deploy nicht erreicht) - `Dockerfile.handbook` Header bereinigt - `docs/handbook/README.md`: Top-Statement, "Lokal lesen", "Screenshots regenerieren" und "Einen neuen Handbook-Eintrag hinzufügen" auf den golden-driven Workflow umgeschrieben **Commit 3 — Konsistenz-Audit (nach User-Feedback "ich will alles professionell und konsistent")** - Tier-3-Diagnostik-Captures wandern nach `build/handbook-captures/` (klar getrennt vom Local-Preview-Pfad): - `scripts/run-handbook-flows.sh` Header neu geschrieben + `SCREENS_DIR` → `CAPTURES_DIR` - `.github/workflows/tier3-handbook.yaml` Artifact-Pfad nachgezogen - `.maestro/handbook/*.yaml` (26 Flows): falsche `# Captures docs/handbook/screenshots/NN-*.png`-Header gegen "Tier-3 navigation smoke; handbook screenshot is the Golden mapped in ..."-Header getauscht - `README.md` Workflow-Tabelle: `tier3-handbook.yaml` als "navigation/tap-routing smoke" beschrieben mit expliziter Note dass Pixel-Drift Sache von `Visual Regression` ist - `docs/testing.md` Tier-3-Abschnitt: alte Aussage "uploads docs/handbook/screenshots/ as a build artifact so reviewers can spot visual drift" rausgenommen — Drift gehört den Goldens, Tier-3 catched Tap-Routing/Navigation/Locale/iOS-Build - `docs/screens.md`: "Handbook"-Spalte erklärt jetzt das Golden-Mapping; "Handbook numbering"-Note führt mit Goldens und nennt Tier 3 als parallelen Smoke - `docs/handbook/de/index.html`: Hero-Lede + `#architecture`-Section komplett umgeschrieben, Pipeline-Diagramm jetzt page-edit → `--update-goldens` auf dfx01 → commit → `handbook-deploy.yaml`; Meta-Description nachgezogen - `scripts/assemble-handbook-screenshots.sh` Header: Wort "legacy" raus — `docs/handbook/screenshots/` ist jetzt explizit das Local-Preview-Ziel **Commit 4 — Subagent-Code-Review-Findings adressiert** - `docs/handbook/de/index.html`: Typo "committee Repo" → "eingecheckte Repo" (rendert auf `handbook.realunit.app`) - `.github/workflows/tier3-handbook.yaml`: "Why no pixel-diff" + "Why iPhone 17" Header-Kommentare neu gefasst — Pixel-Drift jetzt explizit Visual-Regression-Job zugeordnet; iPhone-17-Begründung jetzt über Tap-Koordinaten + Safe-Area-Assertions statt der stalen Screenshot-Capture-Begründung - `scripts/assemble-handbook-screenshots.sh`: Beispiel-Pfad in Header präzisiert (`../screenshots/...` statt `screenshots/...`) - `docs/handbook/README.md`: Anglizismus "Page-Render" → "Seitenrendering" ## Bewusst akzeptiertes Trade-off Der `handbook-deploy.yaml`-Path-Trigger feuert bei jeder Änderung unter `test/goldens/screens/**` — also auch bei den 9 Feature-Verzeichnissen, die NICHT in die 26 Handbook-Slots mappen (`buy/`, `sell/`, `kyc/`, `hardware_connect_bitbox/`, `receive/`, `sell_bitbox/`, `debug_auth/`, …). Über-Trigger statt fehlende Trigger ist hier bewusst gewählt: die Mapping-Tabelle in `assemble-handbook-screenshots.sh` zu duplizieren wäre Wartungslast bei jeder Handbook-Erweiterung; ein `concurrency: handbook-deploy, cancel-in-progress: false`-Block serialisiert die Deploys, und das Image-Build ist <10 s + ~30 s Rollout. Worst case sind 1-2 unnötige Deploys pro Sprint — billiger als ein stiller Drift wenn jemand das Mapping erweitert und den Trigger vergisst. ## Verifiziert - `bash scripts/assemble-handbook-screenshots.sh /tmp/x` → 26/26 PNGs sauber aus den committeten Goldens assembliert (keine fehlenden Sources) - `python3 yaml.safe_load(...)` auf alle berührten Workflows → OK - `bash -n` auf beide Scripts → OK - `grep "Captures docs/handbook/screenshots"` → null Treffer im ganzen Repo - Mapping ↔ Goldens-Existenz: 26/26 ✅ · Mapping ↔ Maestro-Flow-Existenz: 26/26 ✅ · Semantischer Cross-Check Maestro-Assertion ↔ Golden-Inhalt: matched bei allen 26 - Subagent-Code-Review (`general-purpose` Agent, model: opus): keine BLOCKER/MAJOR; 3 MINORs + 2 NITs gefunden, alle in Commit 4 adressiert - CI auf Commits 2 + 3: Analyze & Test ✅ · Coverage Floor Gate ✅ · Visual Regression ✅ · BitBox quirks audit ✅ · Build handbook image + container smoke ✅ - Tier-3 (Maestro handbook flows): Run [26397845244](https://github.com/DFXswiss/realunit-app/actions/runs/26397845244) auf Commit 3 läuft (Label `tier3:full` gesetzt); Commit 4 triggert eine neue Runde ## Out of scope - Reduktion oder Abschaffung von `tier3-handbook.yaml` selbst — Tier 3 behält unique Wert als Navigation/Tap-Routing-Smoke + iOS-Build-/Install-Smoke + `de_CH`-Locale-Check. Separater Follow-up, sobald die Screenshot-Entkoppelung einen Release-Zyklus in PRD läuft. ## Manual test plan - [ ] Nach Merge: `handbook-deploy.yaml` triggert auf den Merge-Commit - [ ] Auf `dev-handbook.realunit.app`: 3 repräsentative Screenshots (`01-welcome.png`, `12-settings.png`, `26-terms.png`) sind die Golden-Versionen, nicht die alten Maestro-Captures - [ ] Nach einem späteren Golden-Bump auf eine der gemappten Pages: Handbook-Deploy feuert tatsächlich (war vorher silent) - [ ] Nach einem späteren Tier-3-Run: Artifact `handbook-captures` enthält PNGs aus `build/handbook-captures/`, nicht aus `docs/handbook/screenshots/`

- golden-regenerate.yaml: narrow `git add` to `test/goldens/screens/*/goldens/` so alchemist's transient `failures/` dirs never accidentally land in a bot-pushed commit (MINOR 3 from review). - golden-regenerate.yaml: tighten fallback artifact `if-no-files-found` from `warn` to `error`. An empty fallback artifact is worse than no artifact at all — the user expects the artifact precisely because the push failed; a silent empty is a footgun (MINOR 5). - docs/visual-regression-tests.md: rewrite the stale intro that still claimed "5 screens, 8 baseline PNGs" (left over from PR #541's pilot description) → 57 page files / 68 Golden PNGs, validated by the required `Visual Regression` check (MINOR 4). - docs/visual-regression-tests.md: extend the "On a protected ref the push fails by design" paragraph to also cover the parallel-human-push / non-fast-forward race — same artifact-fallback path, same recovery (MAJOR 1).

Empty seed commit so a draft PR can exist before the actual follow-ups land. Real changes (themes.dart fontFamily root-cause, BitBox test quality, golden pipeline determinism investigation, etc.) get pushed on top of this branch and are visible incrementally in the PR diff.

Collection PR for follow-ups identified during the #541 review session. Held as a **Draft** while commits accumulate; flipped to ready + merged at the end. ### Likely contents (subject to scope decisions) - `lib/styles/text_styles.dart` — hardcode `fontFamily` directly in `_Body.base/.sm/.xs` + `_Header.h1/.h2/.h4` (root-cause for the latent `appBarTheme.titleTextStyle` Ahem-rendering bug; supersedes the per-theme `copyWith(fontFamily: …)` pin added in #562) - `test/integration/wallet_creation_bitbox_test.dart:194-195` — drop the tautological `appStore.wallet = created; verify(...).called(1);` or replace with a real contract pin - `test/integration/connect_bitbox_flow_test.dart:249,285` — align wall-clock `Future.delayed(fastObserverInterval * 4)` with the `fakeAsync`+`async.elapse` pattern used in `bitbox_reconnect_recovery_test.dart` - `test/goldens/screens/legal/legal_document_golden_test.dart` — replace inline `_termsMarkdownStub` (1:1 copy of production terms) with a synthetic markdown fixture under `test/fixtures/` - Investigate why the dfx01 runner produces ±10–50 byte encoder drift on unrelated golden PNGs during baseline refresh (Skia version pin, build cache, font cache) ### Out of scope - Anything that should go to a dedicated PR (e.g. a real new feature) - #325 release develop → main ### Notes Empty seed commit kept in history so the PR existed before the first real change landed. --------- Co-authored-by: Blume1977 <jana.ruettimann@dfx.swiss> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

TaprootFreak marked this pull request as ready for review May 23, 2026 19:01

This was referenced May 23, 2026

Visual regression: scale from 5-screen pilot to every *_page.dart #547

Open

Meta: complete test coverage across every tier and surface #550

Open

TaprootFreak added the tier3:full Opt-in: run Tier 3 Maestro handbook flows on this PR label May 23, 2026

This was referenced May 23, 2026

feat(tests): scale visual regression to all 57 screens (stacked on #541) #552

Merged

Golden-update bootstrap mechanism after golden-bootstrap.yaml removal #555

Open

TaprootFreak added 6 commits May 23, 2026 22:56

chore(ci): add push trigger to golden-bootstrap so the feature branch…

3b73f26

… can self-bootstrap

test(goldens): commit initial baselines generated on dfx01

0ea11b6

Generated by the golden-bootstrap workflow run 26340878734. These 8 PNGs are the authoritative pixel baselines for the pilot — every PR's golden-tests job validates against them.

test(temp): intentional pixel drift to verify golden-tests catches re…

8e1f5e5

…gressions

TaprootFreak force-pushed the feat/visual-regression-pilot branch from 5613899 to e76118f Compare May 23, 2026 20:58

TaprootFreak changed the title ~~feat(tests): visual-regression pilot — 5 screens on dfx01 runner~~ feat(tests): visual-regression coverage for all 57 screens on dfx01 runner May 23, 2026

This was referenced May 23, 2026

test(goldens): switch default locale to de + handbook gap coverage (stacked on #541) #562

Merged

refactor(goldens): shared mocks, constants and plugin stubs #563

Merged

TaprootFreak added 5 commits May 23, 2026 23:32

TaprootFreak mentioned this pull request May 23, 2026

docs(readme): reflect golden tests in the feature matrix #566

Merged

TaprootFreak added 4 commits May 24, 2026 00:02

TaprootFreak mentioned this pull request May 23, 2026

feat(handbook): assemble screenshots from Goldens, not Maestro (stacked on #568) #570

Merged

4 tasks

TaprootFreak added 2 commits May 25, 2026 11:09

TaprootFreak merged commit 42f4cc2 into develop May 25, 2026
7 checks passed

TaprootFreak deleted the feat/visual-regression-pilot branch May 25, 2026 09:44

This was referenced May 25, 2026

chore: post-#541 follow-ups (rolling collection) #571

Merged

fix(deps): bump sqlite3 3.3.0 → 3.3.1 (libsqlite3mc hash mismatch) #572

Closed

chore(handbook): drop legacy Maestro screenshots, source from goldens #573

Merged

TaprootFreak mentioned this pull request May 26, 2026

docs(handbook): clarify Adresse + Signatur is debug-build only #581

Merged

2 tasks

TaprootFreak mentioned this pull request May 28, 2026

test(goldens): regenerate home_page_loaded baseline after dfx01 render drift #602

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tests): visual-regression coverage for all 57 screens on dfx01 runner#541

feat(tests): visual-regression coverage for all 57 screens on dfx01 runner#541
TaprootFreak merged 18 commits into
developfrom
feat/visual-regression-pilot

TaprootFreak commented May 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TaprootFreak commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Coverage

Verified

Documentation

Follow-ups after merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TaprootFreak commented May 23, 2026 •

edited

Loading