Skip to content

feat(tests): visual-regression coverage for all 57 screens on dfx01 runner#541

Merged
TaprootFreak merged 18 commits into
developfrom
feat/visual-regression-pilot
May 25, 2026
Merged

feat(tests): visual-regression coverage for all 57 screens on dfx01 runner#541
TaprootFreak merged 18 commits into
developfrom
feat/visual-regression-pilot

Conversation

@TaprootFreak
Copy link
Copy Markdown
Contributor

@TaprootFreak TaprootFreak commented May 23, 2026

Summary

  • Introduces visual-regression goldens for every `lib/screens//*_page.dart`** in the repo (56 of 57 rendered, 1 explicit `skip: true`)
  • Render host is the dfx01 self-hosted runner (Mac Studio M3 Ultra, labels `self-hosted, macOS, ARM64, m3-ultra, realunit-app`) — Hardware-pinning so Skia/CoreText state is identical between baseline generation and validation
  • Stack: alchemist 0.14.0, Open Sans (SIL OFL 1.1) committed as an asset (the previous system-font fallback wasn't deterministic across hosts)
  • New CI job `golden-tests` in `.github/workflows/pull-request.yaml` runs in parallel to `build`; `build` passes `--exclude-tags golden` so the visual-regression tests stay confined to the dfx01 runner

Coverage

  • Onboarding + wallet lifecycle (11): create_wallet, restore_wallet, verify_seed, home, onboarding_completed, setup_pin, verify_pin, hw bitbox, legal x2, debug_auth
  • Settings + subpages (17): languages, currencies, network, seed, tax_report, contact, wallet_address, legal_docs x3, user_data x7
  • KYC (15): 2fa, email x2, financial_data x4, ident, nationality, registration, subpages x5 (account_merge, completed, failure, loading, pending)
  • Trading + support + misc (9): receive, transaction_history, sell_bitbox, sell_bank_account_selection, support x4, web_view (skipped — InAppWebView is a platform-view with no headless render)
  • Pilot 5: welcome (iOS + Android theme variants), dashboard, settings, buy (initial + payment-info-loaded), sell (no-account + with-balance)

Total: 57 test files, 59 baseline PNGs.

Verified

  • Baselines generated on dfx01 via the (now removed) `golden-bootstrap.yaml` workflow, downloaded and committed
  • `golden-tests` CI ran green on the stacked PR feat(tests): scale visual regression to all 57 screens (stacked on #541) #552 (run 26342855405) — proving the committed baselines match a fresh render on dfx01
  • Drift detection verified during the pilot phase: a probe pixel change in `realUnitBlue` flipped CI red and uploaded master/test/maskedDiff/isolatedDiff PNGs as a `golden-diffs` artifact
  • Local `flutter analyze` clean, `flutter test --exclude-tags golden` passes (2148/2148)

Documentation

State-variant goldens (loaded/error/loading per screen) are out of scope here — they follow as separate per-feature PRs.

Follow-ups after merge

  • Set `golden-tests` as a required status check on develop branch protection

@TaprootFreak TaprootFreak marked this pull request as ready for review May 23, 2026 19:01
@TaprootFreak TaprootFreak added the tier3:full Opt-in: run Tier 3 Maestro handbook flows on this PR label May 23, 2026
TaprootFreak added a commit that referenced this pull request May 23, 2026
…) (#552)

## Summary

- Stacked on top of #541 (visual-regression pilot). **Do not merge until
#541 is merged**, then this PR rebases against develop.
- Adds 52 new golden tests covering every \`lib/screens/**/*_page.dart\`
(1 per page, default/initial state) on top of the pilot 5 — 57 tests
total
- Bootstrap workflow is reintroduced temporarily to regenerate all
baselines on dfx01; removed in a follow-up commit before
ready-for-review
- 1 page (\`web_view_page.dart\`) is intentionally \`skip: true\` —
InAppWebView is a platform-view that has no headless render

## Bucket breakdown
- **Onboarding + wallet** (11): create, restore, verify_seed, home,
onboarding_completed, pin x2, hw bitbox, legal x2, debug_auth
- **Settings subpages** (17): languages, currencies, network, seed,
tax_report, contact, wallet_address, legal_docs x3, user_data x7
- **KYC** (15): 2fa, email x2, financial_data x4, ident, nationality,
registration, subpages x5
- **Trading + support + misc** (9): receive, tx_history, sell_bitbox,
sell_bank_account_selection, support x4, web_view (skipped)

## Test plan
- [ ] Bootstrap workflow runs on push, generates 56 baseline PNGs (57
minus web_view)
- [ ] Baselines reviewed visually, committed to
test/goldens/screens/**/goldens/macos/
- [ ] golden-tests CI job grün on the same commit
- [ ] Bootstrap workflow removed before ready-for-review

State-variant goldens (loaded/error/loading per screen) are out of scope
here — they follow as separate per-feature PRs.
Introduces pixel-exact baseline tests for Welcome, Dashboard, Settings,
Buy, and Sell — 8 baselines total — rendered on the dfx01 self-hosted
runner (Mac Studio M3 Ultra).

Stack:
- alchemist 0.14.0 (Betterment) as the goldens framework
- Open Sans (SIL OFL 1.1) committed as an asset; previously fell back to
  the system font, which broke determinism across hosts
- New CI job 'golden-tests' parallel to 'build', runs on the realunit-app
  self-hosted runner so Skia/CoreText state is identical between baseline
  generation and validation
- 'build' job now passes --exclude-tags golden so goldens stay confined
  to the dedicated runner and don't false-fail on macos-latest

Baselines are bootstrapped in this PR via the temporary
golden-bootstrap.yaml workflow (removed in the same PR before
ready-for-review). Day-to-day workflow, drift handling, Flutter-bump
process, and dfx01-outage fallback are documented in
docs/visual-regression-tests.md.

Scaling to all 57 page-files is the explicit follow-up after this pilot
validates the pipeline end-to-end.
Generated by the golden-bootstrap workflow run 26340878734. These 8 PNGs
are the authoritative pixel baselines for the pilot — every PR's
golden-tests job validates against them.
- Revert lib/styles/colors.dart drift commit (25→50 R-channel) that was
  used to verify CI catches pixel regressions. Drift caught by the
  golden-tests job as expected, diff artifact uploaded with master/test/
  maskedDiff/isolatedDiff PNGs.
- Remove .github/workflows/golden-bootstrap.yaml — initial baselines have
  been committed (run 26340878734), the bootstrap workflow's purpose is
  fulfilled. From here, baseline regeneration is documented in
  docs/visual-regression-tests.md "Adding a new golden test" / "Reacting
  to a CI drift" sections.
…) (#552)

## Summary

- Stacked on top of #541 (visual-regression pilot). **Do not merge until
#541 is merged**, then this PR rebases against develop.
- Adds 52 new golden tests covering every \`lib/screens/**/*_page.dart\`
(1 per page, default/initial state) on top of the pilot 5 — 57 tests
total
- Bootstrap workflow is reintroduced temporarily to regenerate all
baselines on dfx01; removed in a follow-up commit before
ready-for-review
- 1 page (\`web_view_page.dart\`) is intentionally \`skip: true\` —
InAppWebView is a platform-view that has no headless render

## Bucket breakdown
- **Onboarding + wallet** (11): create, restore, verify_seed, home,
onboarding_completed, pin x2, hw bitbox, legal x2, debug_auth
- **Settings subpages** (17): languages, currencies, network, seed,
tax_report, contact, wallet_address, legal_docs x3, user_data x7
- **KYC** (15): 2fa, email x2, financial_data x4, ident, nationality,
registration, subpages x5
- **Trading + support + misc** (9): receive, tx_history, sell_bitbox,
sell_bank_account_selection, support x4, web_view (skipped)

## Test plan
- [ ] Bootstrap workflow runs on push, generates 56 baseline PNGs (57
minus web_view)
- [ ] Baselines reviewed visually, committed to
test/goldens/screens/**/goldens/macos/
- [ ] golden-tests CI job grün on the same commit
- [ ] Bootstrap workflow removed before ready-for-review

State-variant goldens (loaded/error/loading per screen) are out of scope
here — they follow as separate per-feature PRs.
@TaprootFreak TaprootFreak force-pushed the feat/visual-regression-pilot branch from 5613899 to e76118f Compare May 23, 2026 20:58
@TaprootFreak TaprootFreak changed the title feat(tests): visual-regression pilot — 5 screens on dfx01 runner feat(tests): visual-regression coverage for all 57 screens on dfx01 runner May 23, 2026
Stacked on top of #541. Pure refactor — no functional change, no
baseline touched.

## What changed

Five duplicated patterns across 35+ golden test files moved behind 3 new
helper files re-exported from \`test/helper/helper.dart\`:

| Helper | Symbol(s) | Was inline in |
|---|---|---|
| \`golden_constants.dart\` | \`phoneConstraints\` | 33 files |
| \`golden_mocks.dart\` | \`MockHomeBloc\` | 5 |
| | \`MockSettingsBloc\` | 7 |
| | \`MockAppStore\` | 5 |
| | \`MockDfxCountryService\` | 4 |
| | \`MockSoftwareWallet\` | 3 |
| \`golden_plugin_stubs.dart\` | \`stubNoScreenshotChannel()\` | 2 |

Plus 32 unused \`package:flutter/material.dart\` imports removed
(collateral of the phoneConstraints extraction).

**Net diff:** 44 insertions / 182 deletions across 44 files. Every new
golden test from here saves ~10-15 lines of boilerplate.

## Verified

- \`flutter analyze\` clean
- \`flutter test --exclude-tags golden\`: 2148/2148 green
- \`golden-tests\` CI will run on this branch once #541 lands on develop
(stacked PR limitation — \`pull_request.yaml\` triggers only on PRs
targeting develop)

## After merge follow-up

After #541 lands on develop, this PR's base auto-rebases to develop and
the \`golden-tests\` job validates that the refactor didn't break the
committed baselines on dfx01.
…k stub

Investigation outcome after attempting to activate web_view_golden_test:
flutter_inappwebview asserts InAppWebViewPlatform.instance is set on
the very first build, and the abstract interface defines five
platform-bound factory methods (controller, widget, cookie manager,
etc.). Stubbing the method channel alone is not enough — a full
InAppWebViewPlatform mock subclass tree is required, and even then the
webview body renders as a blank rectangle (the native content can't
exist headless).

For a one-page edge case the cost doesn't justify it. Keeping the
skip:true with a documented exit criterion is the honest call.

Also a tiny consistency fix in the same test file: use the shared
phoneConstraints const from test/helper/ instead of an inline literal.
The helper refactor missed this file because it was skip:true and the
subagent skipped it.
## Summary

All three primary CI workflows (`pull-request.yaml`,
`tier3-handbook.yaml`, `bitbox-simulator.yml`) were filtered with
`pull_request.branches: [develop]`. That meant stacked PRs (e.g.
`feature → integration → develop`) never triggered CI on the lower stack
levels — every regression was only caught once the stack collapsed to a
develop PR. Concretely visible on PR #561 (`ci/coverage-floor-100` →
`feat/visual-regression-pilot`): zero check-runs.

This PR drops the `branches:` filter on all three so every PR fires CI
regardless of target branch. The actual cost controls (draft-gate,
`tier3:full` label, `paths:` filter on bitbox-simulator) are unchanged.

## What changes

- **`pull-request.yaml`** — `branches: [develop]` removed from
`pull_request:`. Block comment rewritten to explain why the branch
filter is gone and what the remaining gates are (draft-gate,
concurrency). `push: develop` kept as the authoritative post-merge run.
- **`tier3-handbook.yaml`** — `branches: [develop]` removed from
`pull_request:`. Block comment updated: "label gate is the cost control,
not the branch filter". `push: develop` kept.
- **`bitbox-simulator.yml`** — `branches: [develop]` removed from
`pull_request:`. Comment rewritten: "`paths:` filter is the real cost
control". No `push:` trigger here, unchanged.

## What stays the same

- Draft PRs still skip every job via `if:
github.event.pull_request.draft == false` guards.
- Tier 3 still opt-in via `tier3:full` label.
- BitBox simulator still path-gated to hardware_wallet / wallet / bitbox
files.
- `push: develop` post-merge verification on the two workflows that had
it.
- Concurrency groups unchanged — stacked-PR runs land in different
groups by PR number, so they don't cancel each other.

## Test plan

- [x] `python3 -c "import yaml; …"` — all three YAML files parse
- [ ] CI green on this PR itself (which now means it triggers on
`pull_request` against develop AS WELL as on any future stacked PR)
- [ ] After merge: PR #561 (currently stacked on
`feat/visual-regression-pilot`) should trigger CI on the next push or
synchronize event
## Summary

After PR #539 closed wave 1-3 and brought scoped line coverage to
**100.0 %** (4 751 / 4 751), the 3 pp safety buffer below the measured
value no longer matches the "100 % rule" stated in `README.md`. Pin the
floor at **100** so any regression — including a single
accidentally-uncovered line — fails CI immediately.

When a future change genuinely needs to drop below 100 % (e.g. a Flutter
SDK update that re-counts a defensive branch), use the
`coverage:lower-floor` PR label so the regression is visible in the PR
list rather than silently smuggled in.

Functions floor stays at 50 (placeholder, `flutter test --coverage`
still emits no FN records).

## Stacked on

`feat/visual-regression-pilot` so it lands together with the
visual-regression scale-out work. When that branch merges, this PR
retargets to `develop` automatically (or `gh pr edit 560 --base
develop`).

## Test plan

- [ ] `Coverage Floor Gate` passes at 100.0 % == 100 floor
- [ ] No other CI changes needed
…able (#565)

## Summary

`docs/testing.md:280` claimed `FakeBitboxBehavior.disconnect` throws
`SigningCancelledException`. The fake actually throws
`BitboxNotConnectedException` at
`test/helper/fake_bitbox_credentials.dart:89`.

Both are distinct exception classes serving distinct intents:

| Class | Defined at | Used for |
|---|---|---|
| `SigningCancelledException` |
`lib/packages/wallet/exceptions/signing_cancelled_exception.dart` |
User-cancel mid-sign |
| `BitboxNotConnectedException` |
`lib/packages/service/dfx/exceptions/bitbox_exception.dart` | Disconnect
/ not-paired |

The cancel-flow code example immediately below the table (lines 284-295)
is correct and stays unchanged — it tests the `cancel` behavior, which
legitimately surfaces as `SigningCancelledException` through
`Eip712Signer.signRegistration` when the fake returns `'0x'`.

## Discovered during

Deep audit of issue #542 (Tier 1 integration tests) — verifying the
cited Tier-1 doc references against the actual fake implementation.

## Test plan

- [x] `grep -n SigningCancelledException docs/testing.md` and
`BitboxNotConnectedException` — confirmed only the one drifted line
touched
- [x] `dart analyze` — markdown change, no Dart impact
- [ ] CI green
Stacked on top of #541. README-only — no code, no baseline touched.

## What changed

The repo's Feature Matrix (\`README.md\` Sections "Onboarding &
authentication" / "Wallet actions" / "DFX backend integration" /
"Settings" / "Support") didn't reflect the 57 golden tests added in
#541. This PR closes that gap.

- **Legend:** added \`golden\` as a test type with a pointer to
\`docs/visual-regression-tests.md\`
- **Every existing page row** gets its golden test path appended to the
existing \`widget\`/\`cubit\`/\`unit\` entries
- **Two new rows** for pages that exist in the app but weren't in the
previous matrix: Settings root, Support root
- **Five previously-empty rows** flipped from \`—\` to \`golden (...)\`:
Sell to BitBox, Currencies/Languages/Network, Tax report, Contact, KYC
AccountMergeRequested
- **Triage gaps:** new "Visual regression (goldens)" bullet documenting
the gate, the \`web_view\` \`skip: true\` exception, and the cross-link
to \`docs/visual-regression-tests.md\`
- **CI/CD table:** \`pull-request.yaml\` row now mentions
\`--exclude-tags golden\` on the existing test step plus the parallel
\`Visual Regression\` job on the dfx01 runner

## Verified

- \`flutter analyze\` clean
- Diff: 38 insertions / 35 deletions across one file (README.md)

## After merge

#541 stays as before; this PR can land before or after #541. If it lands
first, GitHub auto-rebases #541 against the updated develop on next
push.
Yesterday's bootstrap baked DateTime.now() into the rendered date
fields, then today's CI run rendered with today's date and diffed.
Fix that without skipping the tests:

- App code: use clock.now() from package:clock (already a runtime dep,
  pattern is already established in pin_auth_cubit, dfx_fiat_service,
  dfx_language_service). Touches:
    lib/screens/settings_tax_report/settings_tax_report_page.dart
    lib/screens/settings_tax_report/cubit/settings_tax_report_cubit.dart
    lib/screens/transaction_history/transaction_history_page.dart
    lib/screens/transaction_history/cubits/filter/transaction_history_filter_cubit.dart
    lib/screens/transaction_history/cubits/filter/transaction_history_filter_state.dart
- transaction_history_page._todaysDate flips from `static final` to
  `static DateTime get` so cached lazy-init can't survive into a
  fixed-clock test zone.
- Tests: wrap the two affected goldenTest calls in
  withClock(Clock.fixed(DateTime.utc(2026, 5, 23)), () { ... }).
  Verified locally: --update-goldens green, plain re-run also green,
  pixel-deterministic across runs.
- Bootstrap workflow reintroduced temporarily; will be removed once
  the regenerated PNGs are committed.
…oses #556 follow-up) (#567)

## Summary

Drops the `Build-time feature-flag mechanism (analogous to
EXPO_PUBLIC_ENABLE_* in dfx-wallet)` row from the Coverage
infrastructure roadmap. Issue #556 closed this as not applicable after
an audit — keeping the unchecked row would advertise a roadmap item
we've decided not to build.

## Why

Three independent reasons (full writeup in [#556 close
comment](#556 (comment))):

1. **Coverage-gating already solved** — `lcov --remove` + inline `//
coverage:ignore-*` + Port-pattern refactors. The floor ratcheted from 51
→ 94 (#538 / #539) without any env flag.
2. **The dfx-wallet precedent is itself an engineering tool**, not
product/region/beta. Dart's `String.fromEnvironment` has no tree-shake
equivalent.
3. **No realunit-app surface benefits** — no beta/experimental/staging
screen exists, `/debugAuth` is already `kDebugMode`-gated, all 12
`defer` features intentionally ship.

## Re-add condition

If a future ship-but-hide feature appears (regulator-gated jurisdiction,
TestFlight-only experiment, compile-time endpoint switch), re-add this
row and open a fresh implementation issue.

## Test plan

- [x] Visual diff against the surrounding bullets (one line removed, no
formatting change)
- [ ] CI green
…ove bootstrap

Bootstrap run 26344939432 produced byte-identical PNGs for 57 of 59
goldens and new versions for the two that previously baked
DateTime.now() (settings_tax_report, transaction_history). The latter
now render with Clock.fixed(DateTime.utc(2026, 5, 23)) and are
pixel-stable across runs.
…#569)

## Summary

Before this PR: `test/integration/` had exactly **1 file**
(`kyc_sign_flow_test.dart`, 4 tests). Tier 1 — cubit + service + signer
cross-layer driven by `FakeBitboxCredentials` /
`SimulatedBitboxPlatform` — was the documented thin spot of the test
pyramid.

This PR triples the breadth: **6 new specs, 23 new tests**, 100 % green,
no production-code touched.

## What's in this PR

Three review-gated bundles, all merged into `test/tier1-expansion`:

### Bundle A — Sell + EIP-7702
- `sell_bitbox_flow_test.dart` (5 tests): real `SellBitboxCubit` + real
`RealUnitSellPaymentInfoService` + `MockClient` HTTP boundary +
`FakeBitboxCredentials`. Pins happy / cancel / disconnect /
malformed-sig / **deposit-retry** (SellBitboxDepositRetry state with
signedSwap + signedDeposit preserved across post-swap broadcast
failure).
- `eip7702_delegation_bitbox_test.dart` (4 tests):
`Eip712Signer.signDelegation` through `FakeBitboxCredentials`. Happy +
cancel + disconnect + **chainId-wiring** across mainnet / sepolia /
arbitrum.

### Bundle B — Connect + Wallet-Creation
- `connect_bitbox_flow_test.dart` (4 tests): real `ConnectBitboxCubit` +
real `BitboxService` + real `BitboxManager` through
`installSimulatedBitboxPlatform()` at the `BitboxUsbPlatform.instance`
seam. Happy / pair-rejected / observer-disconnect /
**re-pair-after-disconnect** (P461 #1 contract — `init()` re-attaches
the SAME credentials instance, so `isConnected` flips `true → false →
true` over a single reference).
- `wallet_creation_bitbox_test.dart` (3 tests): real
`WalletService.createBitboxWallet` → real `WalletRepository` → Drift
in-memory → real `BitboxService` + simulator. Happy (schema +
`verifyNever(getOrCreateMnemonicKey)`) / hardware-failure (rollback) /
**persistence-round-trip** (fresh `WalletService` on the same DB+Prefs
cold-loads without touching the AES-key store).

### Bundle C — Auth + Reconnect
- `dfx_auth_sign_ceremony_bitbox_test.dart` (4 tests): real
`DFXAuthService` + real `SessionCache` through the BitBox account
boundary. Happy (cold-cache full ceremony) / cancel (no POST,
`lockCurrentWallet` fires) / **timeout via fake_async** (3-min
`_signMessageTimeout` cap) / 403 country-blocked propagation.
- `bitbox_reconnect_recovery_test.dart` (3 tests): real `BitboxService`
observer + `SimulatedBitboxPlatform` + real
`SellBitboxCubit.retryAfterConnection`. Disconnect-mid-sign-retry /
observer-device-loss (credentials slot preserved) / reattach-after-init
(P461 surface).

## Coverage impact

Tier 1 tests are **cross-layer behaviour pins**, not line-coverage
drivers — every file the new tests touch is already at 100 % via the
wave 1-3 sweep. The value is catching regressions that pass per-layer
unit tests but break the seam between them (e.g. cubit catches the wrong
exception type, sign-ceremony skips lock on cancel, re-pair returns a
fresh credentials instance breaking the P461 contract).

## Test plan

- [x] `flutter analyze` — no issues
- [x] `flutter test test/integration/` — **27 tests green** (4 existing
+ 23 new)
- [ ] CI green (3 jobs in `pull-request.yaml`)
TaprootFreak added a commit that referenced this pull request May 25, 2026
…ed on #568) (#570)

## Summary

Stacked on #568. Phase 1 of the Maestro → Goldens handbook unification
plan: the 26 PNGs at `handbook.realunit.app/screenshots/` now come from
the same Golden baselines the visual-regression CI verifies on every PR.

## Why

- **Single source of truth**: one pixel-checked baseline per handbook
page. A UI regression that breaks a Golden also breaks the handbook
image before either ships.
- **Determinism**: dfx01's headless Skia/Open Sans rendering is
byte-stable across CI runs. Maestro's iOS-simulator screenshots drifted
on Apple Silicon + iOS 26 driver hangs (mobile-dev-inc/Maestro#3137).
- **Cycle time**: the handbook image rebuilds when Goldens change, in
seconds — no 30-minute Maestro suite to refresh a page.

## What changed

- **`scripts/assemble-handbook-screenshots.sh`** — explicit
`handbook-name → golden-path` mapping for all 26 pages (see PR #568 for
the audit that built this table). Copies and renames Goldens into a flat
output directory the Dockerfile then consumes. Fails loudly if a Golden
goes missing.
- **`Dockerfile.handbook`** — multi-stage build. Stage 1 (alpine + bash)
runs the assembly script. Stage 2 (nginx) copies the assembled
screenshots over the legacy `docs/handbook/screenshots/` tree,
preserving output paths so the handbook HTML `<img>` links work
unchanged.
- **`.github/workflows/handbook-build-check.yaml`** — new PR-only
validation. Runs the assembly script independently (cheap check before
docker spins up), builds the image, starts the container, hits
`/healthz`, verifies the auth gate returns 401 on `/de/`, and probes
`/screenshots/{01,11,26}.png` to prove the assembled files land on disk.
Does **not** push to Docker Hub and does **not** deploy. The
develop-push `handbook-deploy.yaml` retains sole ownership of DEV → PRD
rollout.

## Out of scope

- The legacy `docs/handbook/screenshots/` tree is left in place; the
COPY in Dockerfile.handbook overlays it with the assembled output, so
the legacy PNGs are harmless. **Phase 2** (Maestro pipeline retirement)
decides whether to delete the directory + flow YAMLs.
- DEV deploy verification — happens after merge to develop (the deploy
pipeline does that automatically).

## Test plan

- [ ] `Handbook Build Check` CI green on the PR (docker build +
container smoke)
- [ ] `Visual Regression` job green on the PR (parallel — no Golden
changes here, should pass trivially)
- [ ] After merge to develop: `handbook-deploy.yaml` builds the image,
DEV-deploy succeeds, `https://dev-handbook.realunit.app/de/` shows the
Goldens-sourced screenshots (spot-check 3 pages: welcome, dashboard,
terms)
- [ ] PRD deploy follows DEV-green, same spot-check on prod URL

## Pre-conditions for merge

The stack below this PR must merge in order: #541#562#568 → this.
Each subsequent merge re-targets the PR base automatically.
…tacked on #541) (#562)

## Summary

Stacked on #541 (which is itself the merge bus for #552). Two changes
that together unblock unifying the Maestro handbook screenshots with the
Golden baselines (see plan at
`~/Documents/Claude/realunit-handbook-unification-plan.md`):

1. **Locale switch en → de**: `wrapForGolden` defaulted to
`Locale('en')`, so all 59 current Goldens render in English. The Maestro
handbook pins the simulator to `de_CH` and captures German UI — the two
pipelines cannot share images while they speak different languages.
2. **Handbook gap coverage**: three new Goldens for handbook pages that
had no Golden equivalent:
- `create_wallet_page_revealed` — handbook 05-seed-revealed (state
variant of `state.hideSeed=false`)
- `settings_seed_page_revealed` — handbook 19-settings-seed-revealed
(`showSeed=true`)
- `settings_confirm_logout_wallet_sheet_default` — handbook
24-settings-delete-wallet (modal in initial unchecked state)

## Mapping audit (Phase 0)

Verified against `.maestro/handbook/*.yaml`:

| Handbook page | Golden | Status |
|---|---|---|
| 01–09, 11–16, 18, 20–23, 25 | existing | ✅ |
| 05-seed-revealed | new | ✅ this PR |
| 17-settings-backup-pin | — | ⚠️ deferred (state variant of
`verify_pin_page`, needs context-aware test setup) |
| 19-settings-seed-revealed | new | ✅ this PR |
| 24-settings-delete-wallet | new | ✅ this PR |
| 26-terms | `legal_document_page_default` | ⚠️ to verify visually
whether the bound content matches |
| **10-biometric-prompt** | — | ❌ **out of scope**: iOS system bottom
sheet from `LocalAuthentication`, not rendered by Flutter — Skia cannot
reproduce it. Will be discussed before Phase 1 (Dockerfile.handbook
switch). |

## BackdropFilter validation

The existing `settings_seed_page_default` Golden already proves that
Flutter's headless Skia renders `BackdropFilter` correctly (the blur is
visible, not the historic XCUITest-black-PNG issue). Same applies to the
new revealed/hidden state variants and the `create_wallet_view`'s
`SeedBlurCard`.

## Bootstrap workflow

`.github/workflows/golden-bootstrap.yaml` is re-introduced temporarily,
triggered by push to this branch. It runs `flutter test test/goldens
--update-goldens` on the `realunit-app` self-hosted dfx01 runner and
uploads the regenerated PNGs as `golden-baselines`. I download the
artifact, commit the baselines into
`test/goldens/screens/**/goldens/macos/`, then delete the bootstrap
workflow file in a follow-up commit — same pattern as the pilot PR.

## Test plan

- [ ] `golden-bootstrap` workflow run completes green on dfx01
- [ ] Baselines downloaded + committed
- [ ] `golden-bootstrap.yaml` removed
- [ ] `Visual Regression` job in pull-request.yaml green on final commit
- [ ] Spot-check sample DE Goldens visually match the handbook
screenshots
- [ ] Decide on `10-biometric-prompt` and `17-settings-backup-pin`
before promoting to ready-for-review

## Out of scope

- `Dockerfile.handbook` switch from `docs/handbook/screenshots/` to
`test/goldens/` (Phase 1 of the unification plan)
- Maestro pipeline retirement / nightly-only mode (Phase 2)
…in de

Both PNGs were inherited from pilot's pre-#562 en-locale snapshots during
the #562 conflict merge — they were the only goldens not re-rendered in the
locale switch, so Visual Regression flagged them (1.94% / 0.40% diff) once
the screens started rendering in de.

Regenerated on dfx01 against the current de-locale + deterministic-clock
setup so both pipelines (de-locale switch + determinism fix) line up.
@TaprootFreak TaprootFreak merged commit 42f4cc2 into develop May 25, 2026
7 checks passed
@TaprootFreak TaprootFreak deleted the feat/visual-regression-pilot branch May 25, 2026 09:44
TaprootFreak added a commit that referenced this pull request May 25, 2026
- docs/handbook/de/index.html:603: golden count 57 → 68 (the 57 was the
  rendered-page count from PR #541's description; the actual PNG count
  under test/goldens/screens/ is 68 — verified via `find`). Renders on
  handbook.realunit.app, same risk class as the round-1 typo fix.
- docs/handbook/de/index.html:638: rephrase Tier-3 / Goldens linking —
  previous wording read as if Tier-3 itself were documented in
  visual-regression-tests.md. Now: Tier-3 catches the smoke part inline,
  visual-regression-tests.md is the Goldens reference.
- .github/workflows/tier3-handbook.yaml:3-5: stale top-of-file header
  sentence updated — "uploads the captured PNGs as a build artifact"
  is now "uploads the per-flow diagnostic captures … for forensic
  inspection", with forward-reference to the "Why no pixel-diff here"
  section that owns the full explanation.
TaprootFreak added a commit that referenced this pull request May 25, 2026
…#573)

## Summary

Macht die visual-regression Goldens zur **alleinigen** Quelle der
`handbook.realunit.app`-Screenshots und räumt die
Maestro-Tier-3-Pipeline so auf, dass keine zwei Pipelines mehr auf
denselben Pfad schreiben. Bestandsaufnahme vor diesem PR:

- [#541](#541) hat die 57
Goldens eingeführt
- [#568](#568) hat
`scripts/assemble-handbook-screenshots.sh` + `handbook-build-check.yaml`
etabliert; `Dockerfile.handbook` baut die 26 Slots seither aus
`test/goldens/screens/` zusammen
- **Aber:** die 26 alten Maestro-PNGs lagen weiter unter
`docs/handbook/screenshots/`, hatten gegenüber den Goldens drifted (md5
mismatch auf jedem File), und Tier-3 schrieb seine Captures in genau
dieses Verzeichnis — also zwei Pipelines, ein Pfad, mehrdeutige
Bedeutung

## Was ändert sich

**Commit 1 — `git rm` der 26 duplizierten Maestro-PNGs unter
`docs/handbook/screenshots/`**

**Commit 2 — Goldens als kanonische Quelle verdrahten**
- `.gitignore`: `docs/handbook/screenshots/` (jetzt reines
Local-Preview-Verzeichnis, vom Assemble-Script befüllt)
- `.github/workflows/handbook-deploy.yaml`: Path-Trigger um
`test/goldens/screens/**` + `scripts/assemble-handbook-screenshots.sh`
erweitert — Golden-Bumps lösen jetzt einen Deploy aus (das war eine
latente Lücke: vorher hat ein Golden-Update den Handbook-Deploy nicht
erreicht)
- `Dockerfile.handbook` Header bereinigt
- `docs/handbook/README.md`: Top-Statement, "Lokal lesen", "Screenshots
regenerieren" und "Einen neuen Handbook-Eintrag hinzufügen" auf den
golden-driven Workflow umgeschrieben

**Commit 3 — Konsistenz-Audit (nach User-Feedback "ich will alles
professionell und konsistent")**
- Tier-3-Diagnostik-Captures wandern nach `build/handbook-captures/`
(klar getrennt vom Local-Preview-Pfad):
- `scripts/run-handbook-flows.sh` Header neu geschrieben + `SCREENS_DIR`
→ `CAPTURES_DIR`
  - `.github/workflows/tier3-handbook.yaml` Artifact-Pfad nachgezogen
- `.maestro/handbook/*.yaml` (26 Flows): falsche `# Captures
docs/handbook/screenshots/NN-*.png`-Header gegen "Tier-3 navigation
smoke; handbook screenshot is the Golden mapped in ..."-Header getauscht
- `README.md` Workflow-Tabelle: `tier3-handbook.yaml` als
"navigation/tap-routing smoke" beschrieben mit expliziter Note dass
Pixel-Drift Sache von `Visual Regression` ist
- `docs/testing.md` Tier-3-Abschnitt: alte Aussage "uploads
docs/handbook/screenshots/ as a build artifact so reviewers can spot
visual drift" rausgenommen — Drift gehört den Goldens, Tier-3 catched
Tap-Routing/Navigation/Locale/iOS-Build
- `docs/screens.md`: "Handbook"-Spalte erklärt jetzt das Golden-Mapping;
"Handbook numbering"-Note führt mit Goldens und nennt Tier 3 als
parallelen Smoke
- `docs/handbook/de/index.html`: Hero-Lede + `#architecture`-Section
komplett umgeschrieben, Pipeline-Diagramm jetzt page-edit →
`--update-goldens` auf dfx01 → commit → `handbook-deploy.yaml`;
Meta-Description nachgezogen
- `scripts/assemble-handbook-screenshots.sh` Header: Wort "legacy" raus
— `docs/handbook/screenshots/` ist jetzt explizit das Local-Preview-Ziel

**Commit 4 — Subagent-Code-Review-Findings adressiert**
- `docs/handbook/de/index.html`: Typo "committee Repo" → "eingecheckte
Repo" (rendert auf `handbook.realunit.app`)
- `.github/workflows/tier3-handbook.yaml`: "Why no pixel-diff" + "Why
iPhone 17" Header-Kommentare neu gefasst — Pixel-Drift jetzt explizit
Visual-Regression-Job zugeordnet; iPhone-17-Begründung jetzt über
Tap-Koordinaten + Safe-Area-Assertions statt der stalen
Screenshot-Capture-Begründung
- `scripts/assemble-handbook-screenshots.sh`: Beispiel-Pfad in Header
präzisiert (`../screenshots/...` statt `screenshots/...`)
- `docs/handbook/README.md`: Anglizismus "Page-Render" →
"Seitenrendering"

## Bewusst akzeptiertes Trade-off

Der `handbook-deploy.yaml`-Path-Trigger feuert bei jeder Änderung unter
`test/goldens/screens/**` — also auch bei den 9 Feature-Verzeichnissen,
die NICHT in die 26 Handbook-Slots mappen (`buy/`, `sell/`, `kyc/`,
`hardware_connect_bitbox/`, `receive/`, `sell_bitbox/`, `debug_auth/`,
…). Über-Trigger statt fehlende Trigger ist hier bewusst gewählt: die
Mapping-Tabelle in `assemble-handbook-screenshots.sh` zu duplizieren
wäre Wartungslast bei jeder Handbook-Erweiterung; ein `concurrency:
handbook-deploy, cancel-in-progress: false`-Block serialisiert die
Deploys, und das Image-Build ist <10 s + ~30 s Rollout. Worst case sind
1-2 unnötige Deploys pro Sprint — billiger als ein stiller Drift wenn
jemand das Mapping erweitert und den Trigger vergisst.

## Verifiziert

- `bash scripts/assemble-handbook-screenshots.sh /tmp/x` → 26/26 PNGs
sauber aus den committeten Goldens assembliert (keine fehlenden Sources)
- `python3 yaml.safe_load(...)` auf alle berührten Workflows → OK
- `bash -n` auf beide Scripts → OK
- `grep "Captures docs/handbook/screenshots"` → null Treffer im ganzen
Repo
- Mapping ↔ Goldens-Existenz: 26/26 ✅ · Mapping ↔ Maestro-Flow-Existenz:
26/26 ✅ · Semantischer Cross-Check Maestro-Assertion ↔ Golden-Inhalt:
matched bei allen 26
- Subagent-Code-Review (`general-purpose` Agent, model: opus): keine
BLOCKER/MAJOR; 3 MINORs + 2 NITs gefunden, alle in Commit 4 adressiert
- CI auf Commits 2 + 3: Analyze & Test ✅ · Coverage Floor Gate ✅ ·
Visual Regression ✅ · BitBox quirks audit ✅ · Build handbook image +
container smoke ✅
- Tier-3 (Maestro handbook flows): Run
[26397845244](https://github.com/DFXswiss/realunit-app/actions/runs/26397845244)
auf Commit 3 läuft (Label `tier3:full` gesetzt); Commit 4 triggert eine
neue Runde

## Out of scope

- Reduktion oder Abschaffung von `tier3-handbook.yaml` selbst — Tier 3
behält unique Wert als Navigation/Tap-Routing-Smoke +
iOS-Build-/Install-Smoke + `de_CH`-Locale-Check. Separater Follow-up,
sobald die Screenshot-Entkoppelung einen Release-Zyklus in PRD läuft.

## Manual test plan

- [ ] Nach Merge: `handbook-deploy.yaml` triggert auf den Merge-Commit
- [ ] Auf `dev-handbook.realunit.app`: 3 repräsentative Screenshots
(`01-welcome.png`, `12-settings.png`, `26-terms.png`) sind die
Golden-Versionen, nicht die alten Maestro-Captures
- [ ] Nach einem späteren Golden-Bump auf eine der gemappten Pages:
Handbook-Deploy feuert tatsächlich (war vorher silent)
- [ ] Nach einem späteren Tier-3-Run: Artifact `handbook-captures`
enthält PNGs aus `build/handbook-captures/`, nicht aus
`docs/handbook/screenshots/`
TaprootFreak added a commit that referenced this pull request May 25, 2026
- golden-regenerate.yaml: narrow `git add` to `test/goldens/screens/*/goldens/`
  so alchemist's transient `failures/` dirs never accidentally land in a
  bot-pushed commit (MINOR 3 from review).
- golden-regenerate.yaml: tighten fallback artifact `if-no-files-found`
  from `warn` to `error`. An empty fallback artifact is worse than no
  artifact at all — the user expects the artifact precisely because the
  push failed; a silent empty is a footgun (MINOR 5).
- docs/visual-regression-tests.md: rewrite the stale intro that still
  claimed "5 screens, 8 baseline PNGs" (left over from PR #541's pilot
  description) → 57 page files / 68 Golden PNGs, validated by the
  required `Visual Regression` check (MINOR 4).
- docs/visual-regression-tests.md: extend the "On a protected ref the
  push fails by design" paragraph to also cover the parallel-human-push
  / non-fast-forward race — same artifact-fallback path, same recovery
  (MAJOR 1).
TaprootFreak added a commit that referenced this pull request May 25, 2026
Empty seed commit so a draft PR can exist before the actual follow-ups
land. Real changes (themes.dart fontFamily root-cause, BitBox test
quality, golden pipeline determinism investigation, etc.) get pushed
on top of this branch and are visible incrementally in the PR diff.
TaprootFreak added a commit that referenced this pull request May 26, 2026
Collection PR for follow-ups identified during the #541 review session.
Held as a **Draft** while commits accumulate; flipped to ready + merged
at the end.

### Likely contents (subject to scope decisions)
- `lib/styles/text_styles.dart` — hardcode `fontFamily` directly in
`_Body.base/.sm/.xs` + `_Header.h1/.h2/.h4` (root-cause for the latent
`appBarTheme.titleTextStyle` Ahem-rendering bug; supersedes the
per-theme `copyWith(fontFamily: …)` pin added in #562)
- `test/integration/wallet_creation_bitbox_test.dart:194-195` — drop the
tautological `appStore.wallet = created; verify(...).called(1);` or
replace with a real contract pin
- `test/integration/connect_bitbox_flow_test.dart:249,285` — align
wall-clock `Future.delayed(fastObserverInterval * 4)` with the
`fakeAsync`+`async.elapse` pattern used in
`bitbox_reconnect_recovery_test.dart`
- `test/goldens/screens/legal/legal_document_golden_test.dart` — replace
inline `_termsMarkdownStub` (1:1 copy of production terms) with a
synthetic markdown fixture under `test/fixtures/`
- Investigate why the dfx01 runner produces ±10–50 byte encoder drift on
unrelated golden PNGs during baseline refresh (Skia version pin, build
cache, font cache)

### Out of scope
- Anything that should go to a dedicated PR (e.g. a real new feature)
- #325 release develop → main

### Notes
Empty seed commit kept in history so the PR existed before the first
real change landed.

---------

Co-authored-by: Blume1977 <jana.ruettimann@dfx.swiss>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tier3:full Opt-in: run Tier 3 Maestro handbook flows on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant