fix(platform-wallet): auto-release a stranded shielded-spend reservation on sync#3982
Conversation
…ion on sync
A shielded spend that returns ShieldedSpendUnconfirmed (broadcast accepted, but
the result-wait failed ambiguously) keeps the spent notes' reservation
(pending_nullifiers) so a retry can't double-spend. That reservation was
released only when the tx lands (a later sync marks the note spent) or on app
restart. A broadcast-accepted-but-never-lands spend therefore stranded its
notes for the whole session — funds looked stuck with no in-session resolution.
Fix: remember the Platform-recorded anchor each spend was built against (the
anchor-selection fix guarantees it's a recorded root), then on each sync —
after the normal spent-note reconcile — release any still-pending reservation
whose anchor is no longer in Platform's recorded set. Once the anchor is pruned
(shielded_anchor_retention_blocks = 1000) the transition can never execute, so
with the nullifier still unspent (which "still pending after the reconcile"
already implies) the spend is provably dead and its notes can be freed; the
linked activity row flips Pending -> Failed so the UI shows a clear, retryable
failure. Reuses the GetShieldedAnchors query — no protocol change.
Fund-safety: releases ONLY when the anchor is absent from the freshly-fetched
recorded set (never a still-recorded, slow-but-landing spend); the on-chain
nullifier set stays the authoritative double-spend guard. The spent-note
reconcile runs first, so a landed tx's reservation is already cleared before
the release pass — no race. pending_nullifiers stays in-memory (a restart still
frees everything); the anchor fetch is best-effort (a hiccup never fails sync).
Data model: SubwalletState.pending_nullifiers BTreeSet -> BTreeMap<[u8;32],
PendingSpend { anchor, activity_id }>; arm_pending_release fills it in across
all four spend flows before broadcast. Wallet suite 312 tests pass; fmt +
clippy (--all-features) clean.
Stacks on the shielded anchor-selection fix (PR #3977) — the stored anchor must
be a recorded one, which that PR guarantees.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
✅ Review complete (commit 553fabd) |
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The auto-release design (arm-with-anchor, sync-time prune-check, fund-safe fallback to on-chain nullifier set) is sound. Main residual issues are a real-but-narrow TOCTOU/discarded-bool combination in the release loop (log misinformation + edge-case wrong activity flip), a non-idempotent mark_pending that today is only unreachable due to a cross-module invariant, plus minor efficiency and type-safety nits. No blocking issues survive verification.
🟡 2 suggestion(s) | 💬 4 nitpick(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs:810-846: Release loop discards `clear_pending`'s bool and never re-validates the snapshot against the current entry
`release_stranded_spends` snapshots `(nullifier, anchor, activity_id)` under a read lock at line 772-787, drops it, awaits a network anchor fetch at line 796, and only then per-entry write-locks to call `clear_pending`. The result is not re-tied to the snapshot:
1. `clear_pending` returns `Result<bool, _>` where `Ok(false)` means "nullifier was no longer pending" (e.g. a concurrent `mark_spent`/`finalize_pending`/`cancel_pending` already cleared it). This code only branches on `Err`, so `Ok(false)` falls through: it still logs the misleading `"shielded reservation released...freeing the notes"` info line and still calls `record_activity_status_by_id(..., Failed)`.
2. `clear_pending(id, &nullifier)` matches by nullifier only. If a concurrent flow cancelled the snapshotted reservation and re-armed a new one at the same nullifier with a fresh anchor (e.g. definite-failure path → retry), this loop iteration would clear the brand-new reservation based on the stale anchor's pruned status.
Fund-safety is preserved by other guards (the on-chain nullifier set is authoritative, and `record_activity_status`'s `Confirmed && block_height.is_some()` guard blocks the worst scan-derived overwrite), so the impact is bounded to a misleading log line and a narrow-window wrong Pending→Failed / Confirmed(no-height)→Failed activity flip. Still, the fix is cheap: consult the returned bool, and prefer re-checking that the current pending entry still matches the snapshot before clearing.
In `packages/rs-platform-wallet/src/wallet/shielded/store.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/wallet/shielded/store.rs:484-488: `mark_pending` uses `BTreeMap::insert`, which silently resets an already-armed entry
The rewritten `mark_pending` calls `self.pending_nullifiers.insert(*nullifier, PendingSpend::default())`, which **replaces** any existing value. If it is ever invoked on an already-armed reservation, both `anchor` and `activity_id` get silently reset to `None`, wiping the release pass's only handle on the stranded spend. The doc comment even labels this behavior "idempotent", but it isn't — it's only unreachable-because-`unspent_notes()`-excludes-pending-nullifiers-from-selection.
That invariant is load-bearing across two modules; any future selector that bypasses the unspent filter, or a subtle race between select and reserve, would silently disarm every affected reservation. Making the operation actually idempotent via the entry API costs nothing and localizes the invariant.
| for (id, (nullifier, anchor, activity_id)) in stale { | ||
| // Fund-safety invariant: release ONLY when the anchor is absent | ||
| // from the recorded set. A still-recorded anchor may yet land. | ||
| if recorded.contains(&anchor) { | ||
| continue; | ||
| } | ||
| if let Err(e) = self.store.write().await.clear_pending(id, &nullifier) { | ||
| tracing::warn!( | ||
| wallet_id = %hex::encode(id.wallet_id), | ||
| account = id.account_index, | ||
| error = %e, | ||
| "shielded reservation release: clear_pending failed" | ||
| ); | ||
| continue; | ||
| } | ||
| tracing::info!( | ||
| wallet_id = %hex::encode(id.wallet_id), | ||
| account = id.account_index, | ||
| nullifier = %hex::encode(nullifier), | ||
| anchor = %hex::encode(anchor), | ||
| "shielded reservation released: its recorded anchor was pruned, so the stranded \ | ||
| spend can never execute; freeing the notes" | ||
| ); | ||
| // Flip the linked activity row to Failed. Only queue to the | ||
| // wallet's own persister (cloned out — `WalletPersister: Clone`). | ||
| if let Some(entry_id) = activity_id { | ||
| let persister = self.persisters.read().await.get(&id.wallet_id).cloned(); | ||
| super::operations::record_activity_status_by_id( | ||
| &self.store, | ||
| persister.as_ref(), | ||
| id.wallet_id, | ||
| id, | ||
| &entry_id, | ||
| super::activity::ShieldedActivityStatus::Failed, | ||
| ) | ||
| .await; | ||
| } |
There was a problem hiding this comment.
🟡 Suggestion: Release loop discards clear_pending's bool and never re-validates the snapshot against the current entry
release_stranded_spends snapshots (nullifier, anchor, activity_id) under a read lock at line 772-787, drops it, awaits a network anchor fetch at line 796, and only then per-entry write-locks to call clear_pending. The result is not re-tied to the snapshot:
clear_pendingreturnsResult<bool, _>whereOk(false)means "nullifier was no longer pending" (e.g. a concurrentmark_spent/finalize_pending/cancel_pendingalready cleared it). This code only branches onErr, soOk(false)falls through: it still logs the misleading"shielded reservation released...freeing the notes"info line and still callsrecord_activity_status_by_id(..., Failed).clear_pending(id, &nullifier)matches by nullifier only. If a concurrent flow cancelled the snapshotted reservation and re-armed a new one at the same nullifier with a fresh anchor (e.g. definite-failure path → retry), this loop iteration would clear the brand-new reservation based on the stale anchor's pruned status.
Fund-safety is preserved by other guards (the on-chain nullifier set is authoritative, and record_activity_status's Confirmed && block_height.is_some() guard blocks the worst scan-derived overwrite), so the impact is bounded to a misleading log line and a narrow-window wrong Pending→Failed / Confirmed(no-height)→Failed activity flip. Still, the fix is cheap: consult the returned bool, and prefer re-checking that the current pending entry still matches the snapshot before clearing.
| for (id, (nullifier, anchor, activity_id)) in stale { | |
| // Fund-safety invariant: release ONLY when the anchor is absent | |
| // from the recorded set. A still-recorded anchor may yet land. | |
| if recorded.contains(&anchor) { | |
| continue; | |
| } | |
| if let Err(e) = self.store.write().await.clear_pending(id, &nullifier) { | |
| tracing::warn!( | |
| wallet_id = %hex::encode(id.wallet_id), | |
| account = id.account_index, | |
| error = %e, | |
| "shielded reservation release: clear_pending failed" | |
| ); | |
| continue; | |
| } | |
| tracing::info!( | |
| wallet_id = %hex::encode(id.wallet_id), | |
| account = id.account_index, | |
| nullifier = %hex::encode(nullifier), | |
| anchor = %hex::encode(anchor), | |
| "shielded reservation released: its recorded anchor was pruned, so the stranded \ | |
| spend can never execute; freeing the notes" | |
| ); | |
| // Flip the linked activity row to Failed. Only queue to the | |
| // wallet's own persister (cloned out — `WalletPersister: Clone`). | |
| if let Some(entry_id) = activity_id { | |
| let persister = self.persisters.read().await.get(&id.wallet_id).cloned(); | |
| super::operations::record_activity_status_by_id( | |
| &self.store, | |
| persister.as_ref(), | |
| id.wallet_id, | |
| id, | |
| &entry_id, | |
| super::activity::ShieldedActivityStatus::Failed, | |
| ) | |
| .await; | |
| } | |
| let cleared = { | |
| let mut store = self.store.write().await; | |
| let still_matches = match store.stale_pending_spends(id) { | |
| Ok(entries) => entries.iter().any(|(n, a, aid)| { | |
| n == &nullifier && a == &anchor && *aid == activity_id | |
| }), | |
| Err(e) => { | |
| tracing::warn!( | |
| wallet_id = %hex::encode(id.wallet_id), | |
| account = id.account_index, | |
| error = %e, | |
| "shielded reservation release: stale_pending_spends recheck failed" | |
| ); | |
| continue; | |
| } | |
| }; | |
| if !still_matches { | |
| continue; | |
| } | |
| match store.clear_pending(id, &nullifier) { | |
| Ok(cleared) => cleared, | |
| Err(e) => { | |
| tracing::warn!( | |
| wallet_id = %hex::encode(id.wallet_id), | |
| account = id.account_index, | |
| error = %e, | |
| "shielded reservation release: clear_pending failed" | |
| ); | |
| continue; | |
| } | |
| } | |
| }; | |
| if !cleared { | |
| continue; | |
| } |
source: ['claude', 'codex']
| pub(super) fn mark_pending(&mut self, nullifier: &[u8; 32]) -> bool { | ||
| self.pending_nullifiers.insert(*nullifier) | ||
| self.pending_nullifiers | ||
| .insert(*nullifier, PendingSpend::default()) | ||
| .is_none() | ||
| } |
There was a problem hiding this comment.
🟡 Suggestion: mark_pending uses BTreeMap::insert, which silently resets an already-armed entry
The rewritten mark_pending calls self.pending_nullifiers.insert(*nullifier, PendingSpend::default()), which replaces any existing value. If it is ever invoked on an already-armed reservation, both anchor and activity_id get silently reset to None, wiping the release pass's only handle on the stranded spend. The doc comment even labels this behavior "idempotent", but it isn't — it's only unreachable-because-unspent_notes()-excludes-pending-nullifiers-from-selection.
That invariant is load-bearing across two modules; any future selector that bypasses the unspent filter, or a subtle race between select and reserve, would silently disarm every affected reservation. Making the operation actually idempotent via the entry API costs nothing and localizes the invariant.
| pub(super) fn mark_pending(&mut self, nullifier: &[u8; 32]) -> bool { | |
| self.pending_nullifiers.insert(*nullifier) | |
| self.pending_nullifiers | |
| .insert(*nullifier, PendingSpend::default()) | |
| .is_none() | |
| } | |
| pub(super) fn mark_pending(&mut self, nullifier: &[u8; 32]) -> bool { | |
| use std::collections::btree_map::Entry; | |
| match self.pending_nullifiers.entry(*nullifier) { | |
| Entry::Vacant(v) => { | |
| v.insert(PendingSpend::default()); | |
| true | |
| } | |
| Entry::Occupied(_) => false, | |
| } | |
| } |
source: ['claude', 'codex']
| for (id, (nullifier, anchor, activity_id)) in stale { | ||
| // Fund-safety invariant: release ONLY when the anchor is absent | ||
| // from the recorded set. A still-recorded anchor may yet land. | ||
| if recorded.contains(&anchor) { | ||
| continue; | ||
| } | ||
| if let Err(e) = self.store.write().await.clear_pending(id, &nullifier) { | ||
| tracing::warn!( | ||
| wallet_id = %hex::encode(id.wallet_id), | ||
| account = id.account_index, | ||
| error = %e, | ||
| "shielded reservation release: clear_pending failed" | ||
| ); | ||
| continue; | ||
| } | ||
| tracing::info!( | ||
| wallet_id = %hex::encode(id.wallet_id), | ||
| account = id.account_index, | ||
| nullifier = %hex::encode(nullifier), | ||
| anchor = %hex::encode(anchor), | ||
| "shielded reservation released: its recorded anchor was pruned, so the stranded \ | ||
| spend can never execute; freeing the notes" | ||
| ); | ||
| // Flip the linked activity row to Failed. Only queue to the | ||
| // wallet's own persister (cloned out — `WalletPersister: Clone`). | ||
| if let Some(entry_id) = activity_id { | ||
| let persister = self.persisters.read().await.get(&id.wallet_id).cloned(); | ||
| super::operations::record_activity_status_by_id( | ||
| &self.store, | ||
| persister.as_ref(), | ||
| id.wallet_id, | ||
| id, | ||
| &entry_id, | ||
| super::activity::ShieldedActivityStatus::Failed, | ||
| ) | ||
| .await; | ||
| } | ||
| } |
There was a problem hiding this comment.
💬 Nitpick: Multi-note stranded spend flips the same activity row N times
arm_pending_release stamps every reserved note with the same activity_id, so a multi-note stranded spend yields N tuples from stale_pending_spends that all share one activity_id. This loop then calls record_activity_status_by_id(..., Failed) N times for the same row, each with a read lock + write lock cycle on self.store and a queued persister write. Correctness holds (the second+ Failed→Failed writes are no-ops), but it produces N-1 redundant lock cycles and persister queues per stranded transaction. Dedupe activity_id before the flip (e.g. collect a HashSet<[u8; 32]> during the release pass and flip once per unique id).
source: ['claude']
| /// A pending reservation that carries a recorded anchor: | ||
| /// `(nullifier, anchor, activity_id)`. Yielded by | ||
| /// [`ShieldedStore::stale_pending_spends`] for the sync reconcile, | ||
| /// which releases the reservation once `anchor` is no longer in | ||
| /// Platform's recorded set (the spend can then never execute). | ||
| pub type StalePendingSpend = ([u8; 32], [u8; 32], Option<[u8; 32]>); |
There was a problem hiding this comment.
💬 Nitpick: StalePendingSpend tuple alias exposes two adjacent [u8; 32] fields with only positional semantics
pub type StalePendingSpend = ([u8; 32], [u8; 32], Option<[u8; 32]>); is re-exported through the public ShieldedStore trait. nullifier and anchor are both raw 32-byte arrays in adjacent positions, so a transposition — either at a call site or in a future third-party ShieldedStore impl — would compile but silently corrupt the release decision (the two identifiers drive different fund-safety checks). Also, by construction (set_pending_spend sets both fields together and stale_pending_spends filters on anchor.is_some()), the returned activity_id is always Some; the Option forces every caller to handle an impossible None branch. A named struct with named fields would encode both invariants in the type and make an order swap a compile error.
| /// A pending reservation that carries a recorded anchor: | |
| /// `(nullifier, anchor, activity_id)`. Yielded by | |
| /// [`ShieldedStore::stale_pending_spends`] for the sync reconcile, | |
| /// which releases the reservation once `anchor` is no longer in | |
| /// Platform's recorded set (the spend can then never execute). | |
| pub type StalePendingSpend = ([u8; 32], [u8; 32], Option<[u8; 32]>); | |
| /// A pending reservation that carries a recorded anchor. Yielded by | |
| /// [`ShieldedStore::stale_pending_spends`] for the sync reconcile, which | |
| /// releases the reservation once `anchor` is no longer in Platform's | |
| /// recorded set (the spend can then never execute). | |
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | |
| pub struct StalePendingSpend { | |
| pub nullifier: [u8; 32], | |
| pub anchor: [u8; 32], | |
| pub activity_id: [u8; 32], | |
| } |
source: ['claude']
| #[tokio::test] | ||
| async fn release_stranded_spends_no_op_without_anchored_reservation() { | ||
| let dir = temp_dir("release_noop"); | ||
| let coordinator = coordinator_with_one_wallet(&dir).await; | ||
|
|
||
| let subwallets = vec![( | ||
| SubwalletId::new([0x11; 32], 0), | ||
| OrchardKeySet::from_seed(&[0x42u8; 64], dashcore::Network::Testnet, 0) | ||
| .expect("derive viewing keys") | ||
| .viewing_keys(), | ||
| )]; | ||
|
|
||
| // No reservation was armed, so `stale_pending_spends` is empty and | ||
| // the pass returns before fetching the recorded anchor set. | ||
| coordinator.release_stranded_spends(&subwallets).await; | ||
|
|
||
| let _ = std::fs::remove_dir_all(&dir); | ||
| } |
There was a problem hiding this comment.
💬 Nitpick: Coordinator release path only has a fast-path test
release_stranded_spends_no_op_without_anchored_reservation pins only the short-circuit before the network round-trip. The value-adding branch — armed reservation + anchor absent from recorded set → clear_pending runs and the linked activity row flips to Failed — has no coordinator-level test. All positive/negative release-decision coverage lives at the store.rs unit level. That's exactly the layer where the TOCTOU/clear_pending-bool issue above lives, so the current suite would not catch a regression there. A small impl-level helper that takes an already-fetched HashSet<[u8;32]> and runs the per-entry loop would let the full release path be pinned without introducing a live GetShieldedAnchors network dependency (the codebase's stated no-mock convention). Not required for this PR, but would materially reduce residual risk on the release loop.
source: ['claude']
| let persister = self.persisters.read().await.get(&id.wallet_id).cloned(); | ||
| super::operations::record_activity_status_by_id( | ||
| &self.store, | ||
| persister.as_ref(), | ||
| id.wallet_id, | ||
| id, | ||
| &entry_id, | ||
| super::activity::ShieldedActivityStatus::Failed, | ||
| ) | ||
| .await; | ||
| } |
There was a problem hiding this comment.
💬 Nitpick: Persister map read-lock is re-acquired per released entry
self.persisters.read().await.get(&id.wallet_id).cloned() runs once per released reservation. WalletPersister: Clone and the map is stable across the loop, so N released entries cause N read-lock acquisitions. Release volume is bounded and typically zero so cost is trivial, but hoisting the read-lock — or snapshotting the relevant WalletId -> WalletPersister pairs into a local map once — would keep lock ordering with the store write above more obviously acyclic and avoid replicating a shape that turns hot-loop antipattern elsewhere.
source: ['claude']
Issue being fixed or feature implemented
Second half of the TestFlight shielded-withdrawal investigation (bug A). A shielded spend that returns
ShieldedSpendUnconfirmed— broadcast accepted, but the result-wait failed ambiguously — intentionally keeps the spent notes' reservation (pending_nullifiers) so a retry can't double-spend. That reservation was released only when the tx lands (a later sync marks the note spent) or on app restart. So a spend that is broadcast-accepted but never lands (lost ACK / mempool eviction) stranded its notes for the whole session — the funds looked stuck, with the "may have gone through — do not retry" message and no in-session resolution.(#3977 removed the dominant trigger — anchor-mismatch spends are now refused pre-broadcast with the notes freed — so this is the residual case.)
What was done?
Remember the Platform-recorded anchor each spend was built against, then on each sync — after the normal spent-note reconcile — release any still-pending reservation whose anchor is no longer in Platform's recorded set (
GetShieldedAnchors). Once the anchor is pruned (shielded_anchor_retention_blocks = 1000) the transition can never execute, so with the nullifier still unspent (which "still pending after the reconcile" already implies) the spend is provably dead and its notes are freed; the linked activity flipsPending → Failed.SubwalletState.pending_nullifiers:BTreeSet<[u8;32]>→BTreeMap<[u8;32], PendingSpend { anchor, activity_id }>(in-memory, as before — a restart still frees everything).arm_pending_releasefills in the anchor + activity id across all four spend flows (unshield/transfer/withdraw/identity_create_from_shielded_pool), afterextract_spends_and_anchor+record_pending_activity, before broadcast.coordinator::release_stranded_spendsruns the release pass insync(); it skips the network call entirely when no armed reservations exist, and a fetch hiccup only warns (never fails the sync).Fund-safety (please review): release fires only when the anchor is absent from the freshly-fetched recorded set — never a still-recorded (slow-but-landing) spend. The on-chain nullifier set stays the authoritative double-spend guard. The spent-note reconcile runs before the release pass, so a landed tx's reservation is already cleared → no race. Designed + spec'd in
docs/shielded/SHIELDED_SPEND_RESERVATION_AUTORELEASE_SPEC.md.How Has This Been Tested?
unspent_notesexcludes a pending nullifier and re-includes it after clear; an unarmed (None-anchor) reservation is never surfaced.record_activity_status_by_idflipsPending → Failed(the exact call the release pass makes); missing-entry is a no-op.cargo test -p platform-wallet --lib --features shielded→ 312 passed (incl. the fix(platform-wallet): build shielded spends against a Platform-recorded anchor #3977 tests);fmt+clippy --all-featuresclean.Not automated (network-gated): the full fetch→release→flip seam against a live
GetShieldedAnchors(mocking it is deliberately avoided in this codebase); its parts are each unit-tested.Breaking Changes
None externally. Two new
ShieldedStoretrait methods (set_pending_spend,stale_pending_spends); both in-tree impls updated. In-memory data-model change only.Checklist:
🤖 Generated with Claude Code