Stale state leak in KycCubit on retry after timeout
TL;DR
The _timedOut flag introduced in commit 5c12676 (#312) is reset at the start of every checkKyc() call, which means a late HTTP response from an earlier call can override a fresh response from a subsequent call. This is not just a UI flicker — the final user-visible state can be permanently stale until the cubit is recreated or another checkKyc() is triggered.
Empirically reproduced; fix proposal below.
Problem
After PR #312 merged, KycCubit.checkKyc() wraps the underlying work in a 30s timeout (commit 5c12676):
Future<void> checkKyc() async {
_timedOut = false; // ← reset every call
try {
await _runCheckKyc().timeout(_checkKycTimeout);
} on TimeoutException {
_timedOut = true;
if (!isClosed) emit(const KycFailure('...'));
}
}
Future<void> _runCheckKyc() async {
emit(const KycLoading());
final results = await Future.wait([...]);
if (isClosed || _timedOut) return; // ← guard
// ... emit state
}
Future.timeout does not cancel the underlying work. The original future continues running in the background even after the timeout fires. The _timedOut flag was intended to stop these late responses from emitting state — and it does, for the same call. But the flag is shared across all checkKyc() invocations and gets reset at the start of every new call.
Timeline of the leak
t=0: User triggers checkKyc() → _timedOut = false, Call 1 starts, Future.wait awaited
t=30s: Call 1's outer .timeout() fires TimeoutException → _timedOut = true, KycFailure emitted
t=35s: User taps Retry on the failure page → checkKyc() called again → _timedOut = false (reset!), Call 2 starts
t=36s: Call 2's Future.wait completes (backend recovered), guard _timedOut == false passes, fresh state emitted (e.g. KycSuccess(ident))
t=45s: Call 1's late HTTP response arrives, Call 1's body resumes, guard _timedOut == false (was reset by Call 2), Call 1 emits its stale state (e.g. KycSuccess(registration))
- User sees the stale state from
t=0 instead of the fresh state from t=35. State stays stale until the next checkKyc().
Empirical proof
Reproducer (pure Dart, no Flutter deps):
import 'dart:async';
class FakeCubit {
String _state = 'initial';
bool _timedOut = false;
bool get isClosed => false;
void emit(String s, {required String from}) {
_state = s;
print(' [emit from $from] $s');
}
Future<void> checkKyc(String id, Duration delay, String result) async {
_timedOut = false;
try {
await _run(id, delay, result).timeout(const Duration(milliseconds: 100));
} on TimeoutException {
_timedOut = true;
emit('Failure: timeout', from: id);
}
}
Future<void> _run(String id, Duration delay, String result) async {
emit('Loading', from: id);
await Future.delayed(delay);
if (_timedOut) return;
emit(result, from: id);
}
}
Future<void> main() async {
final c = FakeCubit();
c.checkKyc('CALL1', const Duration(milliseconds: 300), 'Call1-STALE');
await Future.delayed(const Duration(milliseconds: 150)); // Call 1 timeout fires
c.checkKyc('CALL2', const Duration(milliseconds: 50), 'Call2-FRESH');
await Future.delayed(const Duration(milliseconds: 400)); // Wait for everything
print('Final state: ${c._state}');
}
Output:
[emit from CALL1] Loading
[emit from CALL1] Failure: timeout
[emit from CALL2] Loading
[emit from CALL2] Call2-FRESH
[emit from CALL1] Call1-STALE ← stale leak overrides fresh state
Final state: Call1-STALE ← BUG: final state is from the timed-out call
Impact
- Severity: medium. The bug only triggers when the backend takes longer than 30s AND the user retries within the late-response window. In a healthy backend this never fires.
- Correctness, not just UX. Earlier review notes called this a "UI flicker" — that's wrong. The stale emit is the last state change, so it persists. The user can land permanently on the wrong KYC step until they navigate away (cubit recreated) or trigger another
checkKyc().
- Concrete user-visible failure modes:
- Call 2 says "go to ident", Call 1 (stale) overrides with "go to registration" → user signs again unnecessarily
- Call 2 says "you need 2FA", Call 1 (stale) overrides with "completed" → user thinks they're done, isn't
- Call 2 says "backend error", Call 1 (stale) overrides with success → user proceeds with broken backend state
The KYC + hardware-sign flow is exactly the place where this kind of subtle state corruption hurts most — it's a multi-minute ceremony with high friction to repeat.
Proposed fix: generation counter
Replace the boolean _timedOut flag with a monotonically-incrementing generation counter. Each checkKyc() call captures its own generation, and the _runCheckKyc body bails if the current generation no longer matches:
int _runGeneration = 0;
Future<void> checkKyc() async {
final generation = ++_runGeneration;
try {
await _runCheckKyc(generation).timeout(_checkKycTimeout);
} on TimeoutException {
if (generation == _runGeneration && !isClosed) {
emit(const KycFailure('KYC backend did not respond in time'));
}
} catch (e) {
if (generation == _runGeneration && !isClosed) {
emit(KycFailure(e.toString()));
}
}
}
Future<void> _runCheckKyc(int generation) async {
if (isClosed || generation != _runGeneration) return;
emit(const KycLoading());
final results = await Future.wait([
_kycService.getKycStatus(),
_kycService.getUser(),
]);
if (isClosed || generation != _runGeneration) return;
// ... rest of body, threading `generation` through to recursive calls and `_continueKyc`
}
Future<void> _continueKyc(int generation) async {
final kycStatus = await _kycService.continueKyc();
if (isClosed || generation != _runGeneration) return;
// ... emit
}
Verified with the same reproducer harness, now passing:
[CALL1] checkKyc starting, generation=1
[emit from CALL1] Loading
[emit from CALL1] Failure: timeout
[CALL2] checkKyc starting, generation=2
[emit from CALL2] Loading
[emit from CALL2] Call2-FRESH
[CALL1] bailed: gen 1 != current 2 ← stale response correctly rejected
Final state: Call2-FRESH ← FIXED
This is the canonical "cancellation token" pattern for non-cancellable async work — generation numbers are the smallest possible identifier that survives the lack of a real cancellation mechanism in Dart Future.
Alternative considered: CancelableOperation from package:async
More idiomatic Dart but requires the underlying HTTP layer to support cancellation. http.Client does not propagate CancelableOperation's cancel signal. Either we accept the underlying HTTP work keeps running (same as today, just with cleaner cancellation API) or rewrite the HTTP layer (out of scope). Generation counter is the pragmatic choice.
Alternative considered: per-call local state
Pass a _Run object into _runCheckKyc and stash all per-call state there. Cleaner separation but heavier refactor for a single boolean's worth of state. Revisit if more per-call state accumulates.
Acceptance criteria
Files affected
lib/screens/kyc/cubits/kyc/kyc_cubit.dart (the only file)
References
Stale state leak in
KycCubiton retry after timeoutTL;DR
The
_timedOutflag introduced in commit5c12676(#312) is reset at the start of everycheckKyc()call, which means a late HTTP response from an earlier call can override a fresh response from a subsequent call. This is not just a UI flicker — the final user-visible state can be permanently stale until the cubit is recreated or anothercheckKyc()is triggered.Empirically reproduced; fix proposal below.
Problem
After PR #312 merged,
KycCubit.checkKyc()wraps the underlying work in a 30s timeout (commit5c12676):Future.timeoutdoes not cancel the underlying work. The original future continues running in the background even after the timeout fires. The_timedOutflag was intended to stop these late responses from emitting state — and it does, for the same call. But the flag is shared across allcheckKyc()invocations and gets reset at the start of every new call.Timeline of the leak
t=0: User triggerscheckKyc()→_timedOut = false, Call 1 starts,Future.waitawaitedt=30s: Call 1's outer.timeout()firesTimeoutException→_timedOut = true,KycFailureemittedt=35s: User taps Retry on the failure page →checkKyc()called again →_timedOut = false(reset!), Call 2 startst=36s: Call 2'sFuture.waitcompletes (backend recovered), guard_timedOut == falsepasses, fresh state emitted (e.g.KycSuccess(ident))t=45s: Call 1's late HTTP response arrives, Call 1's body resumes, guard_timedOut == false(was reset by Call 2), Call 1 emits its stale state (e.g.KycSuccess(registration))t=0instead of the fresh state fromt=35. State stays stale until the nextcheckKyc().Empirical proof
Reproducer (pure Dart, no Flutter deps):
Output:
Impact
checkKyc().The KYC + hardware-sign flow is exactly the place where this kind of subtle state corruption hurts most — it's a multi-minute ceremony with high friction to repeat.
Proposed fix: generation counter
Replace the boolean
_timedOutflag with a monotonically-incrementing generation counter. EachcheckKyc()call captures its own generation, and the_runCheckKycbody bails if the current generation no longer matches:Verified with the same reproducer harness, now passing:
This is the canonical "cancellation token" pattern for non-cancellable async work — generation numbers are the smallest possible identifier that survives the lack of a real cancellation mechanism in Dart
Future.Alternative considered:
CancelableOperationfrompackage:asyncMore idiomatic Dart but requires the underlying HTTP layer to support cancellation.
http.Clientdoes not propagateCancelableOperation's cancel signal. Either we accept the underlying HTTP work keeps running (same as today, just with cleaner cancellation API) or rewrite the HTTP layer (out of scope). Generation counter is the pragmatic choice.Alternative considered: per-call local state
Pass a
_Runobject into_runCheckKycand stash all per-call state there. Cleaner separation but heavier refactor for a single boolean's worth of state. Revisit if more per-call state accumulates.Acceptance criteria
_timedOutinlib/screens/kyc/cubits/kyc/kyc_cubit.dartgenerationthreaded through_runCheckKyc(including the recursive call in the email-registration branch) and_continueKyctest/screens/kyc/cubits/kyc/kyc_cubit_test.dart(will be created as part of Comprehensive BitBox testing infrastructure (Tier 0–4) #314 Phase 0) covers the retry-race scenario explicitly:checkKyc()calls, first one times out, second one is fast; assert final state matches the second call's resultflutter analyzeclean,flutter testpassesFiles affected
lib/screens/kyc/cubits/kyc/kyc_cubit.dart(the only file)References
5c12676in PR fix: gate sensitive KYC steps behind BitBox EIP-712 sign #312Future.timeoutsemantics: https://api.dart.dev/stable/dart-async/Future/timeout.html — "the source future is not cancelled"