You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The four-tier model is documented as the living source of truth in docs/testing.md — this issue tracks the rollout; the doc owns the tier definitions, conventions, and onboarding examples.
Motivation
PR #312 ("gate sensitive KYC steps behind BitBox EIP-712 sign") shipped with zero automated tests for the new Cubit logic. The seven manual scenarios in that PR description were the only validation, and most were unchecked at merge time. Every change to BitBox-gated code carried the same regression risk.
Concrete incidents that standardised infrastructure has caught (or would have caught):
bitbox_flutter PR Bitbox integration #6 removed sign dedup → broke long EIP-712 sign flows → discovered only when running the 13-page sign on real hardware
bitbox_flutter PR feat: Create Welcome Page #11 fixed a BLE frame-desync regression — exactly the class of bug FakeBitboxBehavior.malformed now reproduces deterministically
A Future.timeout race in KycCubit (commit 5c12676) was only discovered post-merge
The pattern: BitBox-related bugs surface late, manual tests don't get re-run consistently, review depth varies. Standardised, tiered test infrastructure breaks the cycle.
Goals
Cubit-level regression coverage for every state transition in KycCubit, KycRegistrationSubmitCubit, Eip712Signer, DFXAuthService
Full sign-flow automation without physical hardware via an SDK-boundary fake
Firmware-level protocol validation in CI using the official BitBox02 simulator
Reproducible hardware tests as executable YAML for the pre-release gate
Reusable infrastructure that future DFX BitBox-integrated apps (e.g. dfx-wallet) can inherit
Non-goals
Replacing manual security-critical validation entirely — Tier 3 stays the gold standard for production release readiness
Testing BitBox firmware itself — upstream concern; we treat firmware as a black box
A hosted, always-on hardware CI farm
The four-tier model
Tier 0 Pure Dart logic (cubits, services, signers) no device CI
Tier 1 Cubit / widget + SDK-boundary fake no device CI
Tier 2 Real firmware simulator (USB-style framing) no device CI (firmware side, via testkit)
Tier 3 Real BitBox02 hardware device manual / on-demand
Tier 4 BLE capture / replay hybrid stretch
Canonical definitions and "when to use which tier" guidance live in docs/testing.md. Do not re-derive them here.
Stack is flutter_test + bloc_test + mocktail (already in dev_dependencies).
Long-term enforcement is the coverage gate — see Cross-cutting: coverage gating. Once the threshold check lands, every BitBox-touching PR will have to add Tier 0 cases for new logic branches automatically.
Deviation 1 — location.FakeBitboxCredentials lives at lib/packages/hardware_wallet/fake_bitbox_credentials.dart in this repo, not in bitbox_flutter as the original plan proposed. It sits under lib/ (with test/packages/hardware_wallet/fake_bitbox_credentials_test.dart exercising it directly) so the type can be imported cleanly from outside the test/ tree — notably from test/integration/kyc_sign_flow_test.dart. The rationale: BitboxCredentials in bitbox_flutter is a concrete class, so the fake extends it (rather than implementing an interface), and there is no consumer outside realunit-app today that would benefit from having it in the SDK package. If dfx-wallet or another consumer ever adopts the same pattern, promoting the fake to bitbox_flutter is a mechanical refactor.
Deviation 2 — CI shape. The original plan called for a separate integration-test: job in pull-request.yaml driving the iOS Simulator. In practice the cross-layer tests under test/integration/ run headless and are picked up by the regular flutter test --coverage step, so no dedicated job was needed. If/when a future Tier 1 test requires a real integration_test/ binding (full app boot, platform channels), the dedicated job becomes worthwhile.
What shipped:
FakeBitboxCredentials with the FakeBitboxBehavior enum: success / cancel / disconnect / timeout / malformed. Each mode mirrors a real-world ceremony outcome — see the table in docs/testing.md.
Deterministic test private key, derived address 0x9F5713DEacB8e9CAB6c2D3FaE1AFc2715F8D2D71, shared with code paths that need a non-BitBox credential.
Cross-layer test suite at test/integration/kyc_sign_flow_test.dart exercising FakeBitboxCredentials → Eip712Signer.signRegistration → SigningCancelledException.
Convention documented in docs/testing.md: test/integration/ for ≥ 2-layer scenarios using the fake; test/ for single-layer with mocked deps.
Phase 2 — Firmware simulator (REFRAMED)
The original plan called for hosting a Docker BitBox02 simulator on a DFX server and adding a TcpBitboxTransport to bitbox_flutter so the app could speak to it. That plan has been superseded by an upstream-aligned approach centred on DFXswiss/bitbox-testkit (an Apache-2.0 mirror with hardening).
What bitbox-testkit delivers
30 documented BitBox02 firmware quirks (the engineering knowledge base the original plan would have had to rediscover)
A bitbox-audit CLI for protocol-level validation
Scriptable fakes for firmware.Communication (Go) and PairedBitBox (TS) — i.e. the same idea as FakeBitboxCredentials, one layer down the stack
A reusable bitbox-simulator GitHub Action
Current pin: v0.5.0 (commit 45a1253d23b545d801cf5a1f42c040b85e389c7d).
How realunit-app wires it in
Two workflows, both pinned to the same testkit SHA:
pull_request with path filter on lib/packages/hardware_wallet/**, lib/packages/wallet/**, lib/screens/hardware_connect_bitbox/**, mirrored test dirs, and pubspec.yaml
Automatic firmware-side validation on BitBox-touching PRs
ETH-address derivation on chainId=1 AND chainId=137 (the multi-byte-v boundary that historically breaks EIP-155 consumers)
ETH personal-message signing at the firmware-doc 1024-byte upper boundary
EIP-1559 sign happy path
What it explicitly does NOT validate
realunit-app's Dart code talking to the BitBox via bitbox_flutter against the simulator. The testkit covers the FIRMWARE side of the protocol; the consumer side still needs Tier 3 (real hardware) for the BLE / USB transport.
iOS BLE-specific framing — same gap as the original Phase 2 plan (the simulator speaks U2F-HID over a TCP/USB-style channel, not fragmented BLE).
Sub-tier: in-process SimulatedBitboxPlatform
bitbox_flutter v0.0.7 exposes an in-process SimulatedBitboxPlatform (lib/testing/bitbox_testkit.dart) that stubs the USB platform-interface. It sits between Tier 1 (FakeBitboxCredentials at the credentials boundary) and Tier 2 (firmware simulator over the wire) — useful when a test needs to exercise bitbox_flutter internals without a real transport. Call it Tier 1.5 in informal discussion; not yet a separate row in docs/testing.md.
Status of the original sub-plan (preserved for reference)
Marked superseded unless a concrete future need re-opens them:
Docker BitBox02 firmware simulator hosted on a DFX server — not needed: GitHub Actions runs the simulator ephemerally per PR, no persistent infra to maintain.
TcpBitboxTransport in bitbox_flutter + --dart-define=BITBOX_HOST=... build flavour — not built. Whether this is still worth doing (to drive the Dart code path end-to-end against the simulator instead of only the upstream firmware side) is an open sub-question; a tracking issue is the right place if/when someone wants to pick that up.
Phase 3 — Maestro hardware-checkpoint flows (OPEN — next major work)
For scenarios that need real BitBox hardware validation before each production release. Maestro YAML makes the test plan executable documentation.
Existing context: .maestro/handbook/
.maestro/handbook/ already contains 19 YAML flows (01-welcome.yaml … 19-settings-seed-revealed.yaml) used by the handbook screenshot pipeline — landed in #441 and deployed via handbook-dev.yaml / handbook-prd.yaml. These flows drive the app to specific UI states and capture screenshots; they are NOT hardware tests.
To avoid filename collision and reader confusion, the hardware-checkpoint flows below should be namespaced under .maestro/kyc/ — not at the .maestro/ root.
Proposed inventory (one YAML per scenario, under .maestro/kyc/)
.maestro/kyc/01_fresh_wallet_full_flow.yaml — blank BitBox, fresh email, full 13-page sign, lands on ident
.maestro/kyc/README.md documenting BitBox state prerequisites (blank, attached-to-fresh-user, attached-to-other-user-level-25, attached-to-other-user-level-35) and the DFX DEV backend test-user reset procedure.
A reset entry point for DFX DEV KYC state (see Q4 in Open questions). Until that exists, scenario 3 (account merge) requires manual pre-staging.
tools/generate_test_report.dart to assemble a release dry-run report from the screenshot output.
CI shape
Manual / on-demand only. The natural mechanism is a label-gated workflow following the ci:full pattern documented in the follow-up comment — apply the label → registered runner on the tester's machine picks it up → seven Maestro flows run. Keeps the hardware-dependent path out of every speculative push.
Acceptance
All 7 scenarios documented as YAML under .maestro/kyc/
README with setup, BitBox state matrix, reset procedure
One full release dry-run completed with all scenarios green
Tester onboarding ≤ 30 min from zero
Phase 4 — VCR / replay (stretch)
A "tape recorder" for BLE traffic — macOS proxy app sitting between iPhone and BitBox, capturing U2F-HID frames in both directions, replayable through a RecordedBitboxTransport. Use case: capture once on real hardware after a firmware upgrade, replay deterministically in CI thereafter to detect transport-layer regressions.
Stretch only — Tier 2 (firmware logic) + Tier 3 (real device, on demand) covers most of this need together. Document for later; do not start until everything else has landed.
Cross-cutting: coverage gating
The original issue's acceptance criterion ("every new BitBox-touching PR adds tests at the appropriate tier") needs an enforcement mechanism. That mechanism is the coverage gate.
What has landed:
docs: add features matrix + 100% test-coverage rule #322 — README rule: "new PRs may only merge into develop if test coverage is 100% on the activated surface", with a Coverage scope definition (lib/packages/**, lib/screens/<feature>/cubits|bloc/**) and an inline // coverage:ignore-* escape hatch for genuinely unreachable code.
ci: measure test coverage and upload as artifact #323 — flutter test --coverage runs in pull-request.yaml, lcov strips generated / main.dart, the filtered coverage/lcov.info is uploaded as a workflow artifact. This is the measurement baseline.
What is still pending (tracked in the README "Coverage infrastructure roadmap" section):
An lcov threshold check that fails the build below the configured percentage
GitHub branch protection on develop requiring the coverage check
Build-time feature-flag mechanism so non-MVP surface can be excluded from the gate
Until those land, the 100% rule is aspirational. Once they do, Tier 0 + Tier 1 enforcement becomes automatic.
The canonical seven scenarios
These map to PR #312's manual test plan and remain the test backbone across tiers.
"indirect" in the Tier 2 column means the testkit validates the underlying ETH-sign primitives the scenario depends on, not the scenario as a whole — the scenario itself is a UI / backend interaction that the firmware does not observe.
Open questions
Simulator firmware-version pinning — ✅ ANSWERED. bitbox-testkit is pinned to v0.5.0 at SHA 45a1253d in both workflows; the slash-command variant accepts ref=... for ad-hoc overrides. Quarterly bump cadence is the working convention.
iOS-specific BLE quirks — 🟡 STILL OPEN. The simulator emulates U2F-HID over a USB-style channel, not BLE-fragmented. docs/testing.md documents this gap explicitly. Tier 3 stays the only validation for the BLE transport — that's the entire point of Phase 3.
DFX DEV backend coupling — 🟡 STILL OPEN. Phase 3 hardware flows write real KYC state to the DEV backend; a stable test-user pool and a reset endpoint need coordination with the API team. Today, scenarios 2 / 3 / 4 require manual pre-staging of a test user at the correct KYC level.
Test data hygiene — 🟡 STILL OPEN. No tools/reset_test_users.sh, no DFX DEV admin endpoint for KYC reset. Block on Phase 3 acceptance; until it lands, manual cleanup is the only option.
Hosting cost — ✅ MOOT. The testkit runs ephemerally per GitHub Actions run; no persistent simulator host to budget for. The Phase 3 runner runs on the tester's existing machine.
DFXswiss/bitbox-testkit — current pin v0.5.0 (Apache-2.0 mirror with hardening; firmware-side Phase 2 engine)
DFXswiss/bitbox_flutter — current pin in pubspec.yaml is v0.0.5, latest tag is v0.0.7 (2026-05-18). v0.0.7 introduces the in-process SimulatedBitboxPlatform.
~1–2 days for threshold gate + branch protection once the activated-surface scope is finalised
Phases 0–2 cover the regression-risk surface that originally motivated the issue. Phase 3 is the next pre-release-readiness step. Phase 4 stays speculative.
Comprehensive BitBox testing infrastructure (Tier 0–4)
Status (2026-05-19)
DFXswiss/bitbox-testkit(firmware side); Dart-side TCP transport not built (likely no longer needed).github/workflows/bitbox-simulator.yml,.github/workflows/bitbox-simulator-slash.ymlThe four-tier model is documented as the living source of truth in
docs/testing.md— this issue tracks the rollout; the doc owns the tier definitions, conventions, and onboarding examples.Motivation
PR #312 ("gate sensitive KYC steps behind BitBox EIP-712 sign") shipped with zero automated tests for the new Cubit logic. The seven manual scenarios in that PR description were the only validation, and most were unchecked at merge time. Every change to BitBox-gated code carried the same regression risk.
Concrete incidents that standardised infrastructure has caught (or would have caught):
bitbox_flutterPR Bitbox integration #6 removed sign dedup → broke long EIP-712 sign flows → discovered only when running the 13-page sign on real hardwarebitbox_flutterPR feat: Create Welcome Page #11 fixed a BLE frame-desync regression — exactly the class of bugFakeBitboxBehavior.malformednow reproduces deterministicallyFuture.timeoutrace inKycCubit(commit5c12676) was only discovered post-mergeThe pattern: BitBox-related bugs surface late, manual tests don't get re-run consistently, review depth varies. Standardised, tiered test infrastructure breaks the cycle.
Goals
KycCubit,KycRegistrationSubmitCubit,Eip712Signer,DFXAuthServicedfx-wallet) can inheritNon-goals
The four-tier model
Canonical definitions and "when to use which tier" guidance live in
docs/testing.md. Do not re-derive them here.Phase 0 — Cubit unit tests (DONE)
Landed in #319 (merged 2026-05-15).
Test files shipped:
test/screens/kyc/cubits/kyc/kyc_cubit_test.darttest/screens/kyc/steps/registration/cubits/registration_submit/kyc_registration_submit_cubit_test.darttest/packages/wallet/eip712_signer_test.darttest/packages/wallet/eip712_signer_bitbox_test.darttest/packages/service/dfx/dfx_auth_service_test.dartStack is
flutter_test+bloc_test+mocktail(already indev_dependencies).Long-term enforcement is the coverage gate — see Cross-cutting: coverage gating. Once the threshold check lands, every BitBox-touching PR will have to add Tier 0 cases for new logic branches automatically.
Phase 1 — SDK-boundary fake + cross-layer tests (DONE, with deviations)
Landed in #320 (merged 2026-05-15).
Deviation 1 — location.
FakeBitboxCredentialslives atlib/packages/hardware_wallet/fake_bitbox_credentials.dartin this repo, not inbitbox_flutteras the original plan proposed. It sits underlib/(withtest/packages/hardware_wallet/fake_bitbox_credentials_test.dartexercising it directly) so the type can be imported cleanly from outside thetest/tree — notably fromtest/integration/kyc_sign_flow_test.dart. The rationale:BitboxCredentialsinbitbox_flutteris a concrete class, so the fakeextendsit (rather than implementing an interface), and there is no consumer outside realunit-app today that would benefit from having it in the SDK package. Ifdfx-walletor another consumer ever adopts the same pattern, promoting the fake tobitbox_flutteris a mechanical refactor.Deviation 2 — CI shape. The original plan called for a separate
integration-test:job inpull-request.yamldriving the iOS Simulator. In practice the cross-layer tests undertest/integration/run headless and are picked up by the regularflutter test --coveragestep, so no dedicated job was needed. If/when a future Tier 1 test requires a realintegration_test/binding (full app boot, platform channels), the dedicated job becomes worthwhile.What shipped:
FakeBitboxCredentialswith theFakeBitboxBehaviorenum:success/cancel/disconnect/timeout/malformed. Each mode mirrors a real-world ceremony outcome — see the table indocs/testing.md.0x9F5713DEacB8e9CAB6c2D3FaE1AFc2715F8D2D71, shared with code paths that need a non-BitBox credential.test/integration/kyc_sign_flow_test.dartexercisingFakeBitboxCredentials→Eip712Signer.signRegistration→SigningCancelledException.docs/testing.md:test/integration/for ≥ 2-layer scenarios using the fake;test/for single-layer with mocked deps.Phase 2 — Firmware simulator (REFRAMED)
The original plan called for hosting a Docker BitBox02 simulator on a DFX server and adding a
TcpBitboxTransporttobitbox_flutterso the app could speak to it. That plan has been superseded by an upstream-aligned approach centred onDFXswiss/bitbox-testkit(an Apache-2.0 mirror with hardening).What
bitbox-testkitdeliversbitbox-auditCLI for protocol-level validationfirmware.Communication(Go) andPairedBitBox(TS) — i.e. the same idea asFakeBitboxCredentials, one layer down the stackbitbox-simulatorGitHub ActionCurrent pin:
v0.5.0(commit45a1253d23b545d801cf5a1f42c040b85e389c7d).How realunit-app wires it in
Two workflows, both pinned to the same testkit SHA:
.github/workflows/bitbox-simulator.ymlpull_requestwith path filter onlib/packages/hardware_wallet/**,lib/packages/wallet/**,lib/screens/hardware_connect_bitbox/**, mirrored test dirs, andpubspec.yaml.github/workflows/bitbox-simulator-slash.yml/bitbox-simulatorPR comment, optionalref=...arg, member-or-above onlyWhat this validates today
bitbox-api↔ BitBox02 firmware Noise handshake round-tripchainId=1ANDchainId=137(the multi-byte-v boundary that historically breaks EIP-155 consumers)What it explicitly does NOT validate
bitbox_flutteragainst the simulator. The testkit covers the FIRMWARE side of the protocol; the consumer side still needs Tier 3 (real hardware) for the BLE / USB transport.Sub-tier: in-process
SimulatedBitboxPlatformbitbox_flutterv0.0.7 exposes an in-processSimulatedBitboxPlatform(lib/testing/bitbox_testkit.dart) that stubs the USB platform-interface. It sits between Tier 1 (FakeBitboxCredentialsat the credentials boundary) and Tier 2 (firmware simulator over the wire) — useful when a test needs to exercisebitbox_flutterinternals without a real transport. Call it Tier 1.5 in informal discussion; not yet a separate row indocs/testing.md.Status of the original sub-plan (preserved for reference)
Marked superseded unless a concrete future need re-opens them:
TcpBitboxTransportinbitbox_flutter+--dart-define=BITBOX_HOST=...build flavour — not built. Whether this is still worth doing (to drive the Dart code path end-to-end against the simulator instead of only the upstream firmware side) is an open sub-question; a tracking issue is the right place if/when someone wants to pick that up.See the follow-up comment on
ci:fullpatterning for how a futuree2e-simulator.yamlwould label-gate its heavy run.Phase 3 — Maestro hardware-checkpoint flows (OPEN — next major work)
For scenarios that need real BitBox hardware validation before each production release. Maestro YAML makes the test plan executable documentation.
Existing context:
.maestro/handbook/.maestro/handbook/already contains 19 YAML flows (01-welcome.yaml…19-settings-seed-revealed.yaml) used by the handbook screenshot pipeline — landed in #441 and deployed viahandbook-dev.yaml/handbook-prd.yaml. These flows drive the app to specific UI states and capture screenshots; they are NOT hardware tests.To avoid filename collision and reader confusion, the hardware-checkpoint flows below should be namespaced under
.maestro/kyc/— not at the.maestro/root.Proposed inventory (one YAML per scenario, under
.maestro/kyc/).maestro/kyc/01_fresh_wallet_full_flow.yaml— blank BitBox, fresh email, full 13-page sign, lands on ident.maestro/kyc/02_existing_user_low_level.yaml— wallet already linked,level < 30.maestro/kyc/03_different_dfx_user_merge.yaml— expectsKycAccountMergePage.maestro/kyc/04_returning_level_30_plus.yaml— must STILL sign (security gate).maestro/kyc/05_cancel_mid_sign.yaml— expectsSignature was emptySnackBar.maestro/kyc/06_bitbox_disconnected.yaml— modal sheet → reconnect → retry.maestro/kyc/07_tfa_required.yaml— backend returnsTFA_REQUIREDRequired tooling
.maestro/kyc/README.mddocumenting BitBox state prerequisites (blank, attached-to-fresh-user, attached-to-other-user-level-25, attached-to-other-user-level-35) and the DFX DEV backend test-user reset procedure.tools/generate_test_report.dartto assemble a release dry-run report from the screenshot output.CI shape
Manual / on-demand only. The natural mechanism is a label-gated workflow following the
ci:fullpattern documented in the follow-up comment — apply the label → registered runner on the tester's machine picks it up → seven Maestro flows run. Keeps the hardware-dependent path out of every speculative push.Acceptance
.maestro/kyc/Phase 4 — VCR / replay (stretch)
A "tape recorder" for BLE traffic — macOS proxy app sitting between iPhone and BitBox, capturing U2F-HID frames in both directions, replayable through a
RecordedBitboxTransport. Use case: capture once on real hardware after a firmware upgrade, replay deterministically in CI thereafter to detect transport-layer regressions.Stretch only — Tier 2 (firmware logic) + Tier 3 (real device, on demand) covers most of this need together. Document for later; do not start until everything else has landed.
Cross-cutting: coverage gating
The original issue's acceptance criterion ("every new BitBox-touching PR adds tests at the appropriate tier") needs an enforcement mechanism. That mechanism is the coverage gate.
What has landed:
developif test coverage is 100% on the activated surface", with a Coverage scope definition (lib/packages/**,lib/screens/<feature>/cubits|bloc/**) and an inline// coverage:ignore-*escape hatch for genuinely unreachable code.flutter test --coverageruns inpull-request.yaml,lcovstrips generated /main.dart, the filteredcoverage/lcov.infois uploaded as a workflow artifact. This is the measurement baseline.What is still pending (tracked in the README "Coverage infrastructure roadmap" section):
developrequiring the coverage checkUntil those land, the 100% rule is aspirational. Once they do, Tier 0 + Tier 1 enforcement becomes automatic.
The canonical seven scenarios
These map to PR #312's manual test plan and remain the test backbone across tiers.
level < 30level >= 30, MUST still signcancel)disconnect→success)TFA_REQUIRED→ 2FA page"indirect" in the Tier 2 column means the testkit validates the underlying ETH-sign primitives the scenario depends on, not the scenario as a whole — the scenario itself is a UI / backend interaction that the firmware does not observe.
Open questions
bitbox-testkitis pinned tov0.5.0at SHA45a1253din both workflows; the slash-command variant acceptsref=...for ad-hoc overrides. Quarterly bump cadence is the working convention.docs/testing.mddocuments this gap explicitly. Tier 3 stays the only validation for the BLE transport — that's the entire point of Phase 3.tools/reset_test_users.sh, no DFX DEV admin endpoint for KYC reset. Block on Phase 3 acceptance; until it lands, manual cleanup is the only option.References
PRs that have landed (this repo):
FakeBitboxCredentials+ cross-layer sign-flow integration testsv0.0.3doc-comment inbitbox-simulator.yml.Related repos:
DFXswiss/bitbox-testkit— current pinv0.5.0(Apache-2.0 mirror with hardening; firmware-side Phase 2 engine)DFXswiss/bitbox_flutter— current pin inpubspec.yamlisv0.0.5, latest tag isv0.0.7(2026-05-18).v0.0.7introduces the in-processSimulatedBitboxPlatform.External:
Living doc:
docs/testing.md— canonical four-tier definitions, examples, conventionsEffort summary
bitbox_flutteris optional, ~1 day if a second consumer materialisesTcpBitboxTransportonly if a concrete need surfacesPhases 0–2 cover the regression-risk surface that originally motivated the issue. Phase 3 is the next pre-release-readiness step. Phase 4 stays speculative.