Skip to content

fix(platform-wallet): auto-release a stranded shielded-spend reservation on sync#3982

Open
shumkov wants to merge 1 commit into
fix/shielded-withdrawal-stranded-reservationfrom
fix/shielded-spend-reservation-auto-release
Open

fix(platform-wallet): auto-release a stranded shielded-spend reservation on sync#3982
shumkov wants to merge 1 commit into
fix/shielded-withdrawal-stranded-reservationfrom
fix/shielded-spend-reservation-auto-release

Conversation

@shumkov

@shumkov shumkov commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Stacked on #3977 (the shielded anchor-selection fix). Base is that branch so the diff shows only this change; it will auto-retarget to v4.0-dev when #3977 merges. The stored anchor must be a Platform-recorded one, which #3977 guarantees.

Issue being fixed or feature implemented

Second half of the TestFlight shielded-withdrawal investigation (bug A). A shielded spend that returns ShieldedSpendUnconfirmed — broadcast accepted, but the result-wait failed ambiguously — intentionally keeps the spent notes' reservation (pending_nullifiers) so a retry can't double-spend. That reservation was released only when the tx lands (a later sync marks the note spent) or on app restart. So a spend that is broadcast-accepted but never lands (lost ACK / mempool eviction) stranded its notes for the whole session — the funds looked stuck, with the "may have gone through — do not retry" message and no in-session resolution.

(#3977 removed the dominant trigger — anchor-mismatch spends are now refused pre-broadcast with the notes freed — so this is the residual case.)

What was done?

Remember the Platform-recorded anchor each spend was built against, then on each sync — after the normal spent-note reconcile — release any still-pending reservation whose anchor is no longer in Platform's recorded set (GetShieldedAnchors). Once the anchor is pruned (shielded_anchor_retention_blocks = 1000) the transition can never execute, so with the nullifier still unspent (which "still pending after the reconcile" already implies) the spend is provably dead and its notes are freed; the linked activity flips Pending → Failed.

  • SubwalletState.pending_nullifiers: BTreeSet<[u8;32]>BTreeMap<[u8;32], PendingSpend { anchor, activity_id }> (in-memory, as before — a restart still frees everything).
  • arm_pending_release fills in the anchor + activity id across all four spend flows (unshield / transfer / withdraw / identity_create_from_shielded_pool), after extract_spends_and_anchor + record_pending_activity, before broadcast.
  • coordinator::release_stranded_spends runs the release pass in sync(); it skips the network call entirely when no armed reservations exist, and a fetch hiccup only warns (never fails the sync).

Fund-safety (please review): release fires only when the anchor is absent from the freshly-fetched recorded set — never a still-recorded (slow-but-landing) spend. The on-chain nullifier set stays the authoritative double-spend guard. The spent-note reconcile runs before the release pass, so a landed tx's reservation is already cleared → no race. Designed + spec'd in docs/shielded/SHIELDED_SPEND_RESERVATION_AUTORELEASE_SPEC.md.

How Has This Been Tested?

  • Store units: a reservation with a still-recorded anchor is retained, with a pruned anchor is released; unspent_notes excludes a pending nullifier and re-includes it after clear; an unarmed (None-anchor) reservation is never surfaced.
  • Activity: record_activity_status_by_id flips Pending → Failed (the exact call the release pass makes); missing-entry is a no-op.
  • Coordinator: the release pass no-ops (no fetch, no panic) when there are no armed reservations.
  • cargo test -p platform-wallet --lib --features shielded312 passed (incl. the fix(platform-wallet): build shielded spends against a Platform-recorded anchor #3977 tests); fmt + clippy --all-features clean.

Not automated (network-gated): the full fetch→release→flip seam against a live GetShieldedAnchors (mocking it is deliberately avoided in this codebase); its parts are each unit-tested.

Breaking Changes

None externally. Two new ShieldedStore trait methods (set_pending_spend, stale_pending_spends); both in-tree impls updated. In-memory data-model change only.

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have added "!" to the title and described breaking changes in the corresponding section if my code contains any
  • I have made corresponding changes to the documentation if needed

🤖 Generated with Claude Code

…ion on sync

A shielded spend that returns ShieldedSpendUnconfirmed (broadcast accepted, but
the result-wait failed ambiguously) keeps the spent notes' reservation
(pending_nullifiers) so a retry can't double-spend. That reservation was
released only when the tx lands (a later sync marks the note spent) or on app
restart. A broadcast-accepted-but-never-lands spend therefore stranded its
notes for the whole session — funds looked stuck with no in-session resolution.

Fix: remember the Platform-recorded anchor each spend was built against (the
anchor-selection fix guarantees it's a recorded root), then on each sync —
after the normal spent-note reconcile — release any still-pending reservation
whose anchor is no longer in Platform's recorded set. Once the anchor is pruned
(shielded_anchor_retention_blocks = 1000) the transition can never execute, so
with the nullifier still unspent (which "still pending after the reconcile"
already implies) the spend is provably dead and its notes can be freed; the
linked activity row flips Pending -> Failed so the UI shows a clear, retryable
failure. Reuses the GetShieldedAnchors query — no protocol change.

Fund-safety: releases ONLY when the anchor is absent from the freshly-fetched
recorded set (never a still-recorded, slow-but-landing spend); the on-chain
nullifier set stays the authoritative double-spend guard. The spent-note
reconcile runs first, so a landed tx's reservation is already cleared before
the release pass — no race. pending_nullifiers stays in-memory (a restart still
frees everything); the anchor fetch is best-effort (a hiccup never fails sync).

Data model: SubwalletState.pending_nullifiers BTreeSet -> BTreeMap<[u8;32],
PendingSpend { anchor, activity_id }>; arm_pending_release fills it in across
all four spend flows before broadcast. Wallet suite 312 tests pass; fmt +
clippy (--all-features) clean.

Stacks on the shielded anchor-selection fix (PR #3977) — the stored anchor must
be a recorded one, which that PR guarantees.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@thepastaclaw

thepastaclaw commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

✅ Review complete (commit 553fabd)

@thepastaclaw thepastaclaw left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The auto-release design (arm-with-anchor, sync-time prune-check, fund-safe fallback to on-chain nullifier set) is sound. Main residual issues are a real-but-narrow TOCTOU/discarded-bool combination in the release loop (log misinformation + edge-case wrong activity flip), a non-idempotent mark_pending that today is only unreachable due to a cross-module invariant, plus minor efficiency and type-safety nits. No blocking issues survive verification.

🟡 2 suggestion(s) | 💬 4 nitpick(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs:810-846: Release loop discards `clear_pending`'s bool and never re-validates the snapshot against the current entry
  `release_stranded_spends` snapshots `(nullifier, anchor, activity_id)` under a read lock at line 772-787, drops it, awaits a network anchor fetch at line 796, and only then per-entry write-locks to call `clear_pending`. The result is not re-tied to the snapshot:

  1. `clear_pending` returns `Result<bool, _>` where `Ok(false)` means "nullifier was no longer pending" (e.g. a concurrent `mark_spent`/`finalize_pending`/`cancel_pending` already cleared it). This code only branches on `Err`, so `Ok(false)` falls through: it still logs the misleading `"shielded reservation released...freeing the notes"` info line and still calls `record_activity_status_by_id(..., Failed)`.
  2. `clear_pending(id, &nullifier)` matches by nullifier only. If a concurrent flow cancelled the snapshotted reservation and re-armed a new one at the same nullifier with a fresh anchor (e.g. definite-failure path → retry), this loop iteration would clear the brand-new reservation based on the stale anchor's pruned status.

  Fund-safety is preserved by other guards (the on-chain nullifier set is authoritative, and `record_activity_status`'s `Confirmed && block_height.is_some()` guard blocks the worst scan-derived overwrite), so the impact is bounded to a misleading log line and a narrow-window wrong Pending→Failed / Confirmed(no-height)→Failed activity flip. Still, the fix is cheap: consult the returned bool, and prefer re-checking that the current pending entry still matches the snapshot before clearing.

In `packages/rs-platform-wallet/src/wallet/shielded/store.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/wallet/shielded/store.rs:484-488: `mark_pending` uses `BTreeMap::insert`, which silently resets an already-armed entry
  The rewritten `mark_pending` calls `self.pending_nullifiers.insert(*nullifier, PendingSpend::default())`, which **replaces** any existing value. If it is ever invoked on an already-armed reservation, both `anchor` and `activity_id` get silently reset to `None`, wiping the release pass's only handle on the stranded spend. The doc comment even labels this behavior "idempotent", but it isn't — it's only unreachable-because-`unspent_notes()`-excludes-pending-nullifiers-from-selection.

  That invariant is load-bearing across two modules; any future selector that bypasses the unspent filter, or a subtle race between select and reserve, would silently disarm every affected reservation. Making the operation actually idempotent via the entry API costs nothing and localizes the invariant.

Comment on lines +810 to +846
for (id, (nullifier, anchor, activity_id)) in stale {
// Fund-safety invariant: release ONLY when the anchor is absent
// from the recorded set. A still-recorded anchor may yet land.
if recorded.contains(&anchor) {
continue;
}
if let Err(e) = self.store.write().await.clear_pending(id, &nullifier) {
tracing::warn!(
wallet_id = %hex::encode(id.wallet_id),
account = id.account_index,
error = %e,
"shielded reservation release: clear_pending failed"
);
continue;
}
tracing::info!(
wallet_id = %hex::encode(id.wallet_id),
account = id.account_index,
nullifier = %hex::encode(nullifier),
anchor = %hex::encode(anchor),
"shielded reservation released: its recorded anchor was pruned, so the stranded \
spend can never execute; freeing the notes"
);
// Flip the linked activity row to Failed. Only queue to the
// wallet's own persister (cloned out — `WalletPersister: Clone`).
if let Some(entry_id) = activity_id {
let persister = self.persisters.read().await.get(&id.wallet_id).cloned();
super::operations::record_activity_status_by_id(
&self.store,
persister.as_ref(),
id.wallet_id,
id,
&entry_id,
super::activity::ShieldedActivityStatus::Failed,
)
.await;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Release loop discards clear_pending's bool and never re-validates the snapshot against the current entry

release_stranded_spends snapshots (nullifier, anchor, activity_id) under a read lock at line 772-787, drops it, awaits a network anchor fetch at line 796, and only then per-entry write-locks to call clear_pending. The result is not re-tied to the snapshot:

  1. clear_pending returns Result<bool, _> where Ok(false) means "nullifier was no longer pending" (e.g. a concurrent mark_spent/finalize_pending/cancel_pending already cleared it). This code only branches on Err, so Ok(false) falls through: it still logs the misleading "shielded reservation released...freeing the notes" info line and still calls record_activity_status_by_id(..., Failed).
  2. clear_pending(id, &nullifier) matches by nullifier only. If a concurrent flow cancelled the snapshotted reservation and re-armed a new one at the same nullifier with a fresh anchor (e.g. definite-failure path → retry), this loop iteration would clear the brand-new reservation based on the stale anchor's pruned status.

Fund-safety is preserved by other guards (the on-chain nullifier set is authoritative, and record_activity_status's Confirmed && block_height.is_some() guard blocks the worst scan-derived overwrite), so the impact is bounded to a misleading log line and a narrow-window wrong Pending→Failed / Confirmed(no-height)→Failed activity flip. Still, the fix is cheap: consult the returned bool, and prefer re-checking that the current pending entry still matches the snapshot before clearing.

Suggested change
for (id, (nullifier, anchor, activity_id)) in stale {
// Fund-safety invariant: release ONLY when the anchor is absent
// from the recorded set. A still-recorded anchor may yet land.
if recorded.contains(&anchor) {
continue;
}
if let Err(e) = self.store.write().await.clear_pending(id, &nullifier) {
tracing::warn!(
wallet_id = %hex::encode(id.wallet_id),
account = id.account_index,
error = %e,
"shielded reservation release: clear_pending failed"
);
continue;
}
tracing::info!(
wallet_id = %hex::encode(id.wallet_id),
account = id.account_index,
nullifier = %hex::encode(nullifier),
anchor = %hex::encode(anchor),
"shielded reservation released: its recorded anchor was pruned, so the stranded \
spend can never execute; freeing the notes"
);
// Flip the linked activity row to Failed. Only queue to the
// wallet's own persister (cloned out — `WalletPersister: Clone`).
if let Some(entry_id) = activity_id {
let persister = self.persisters.read().await.get(&id.wallet_id).cloned();
super::operations::record_activity_status_by_id(
&self.store,
persister.as_ref(),
id.wallet_id,
id,
&entry_id,
super::activity::ShieldedActivityStatus::Failed,
)
.await;
}
let cleared = {
let mut store = self.store.write().await;
let still_matches = match store.stale_pending_spends(id) {
Ok(entries) => entries.iter().any(|(n, a, aid)| {
n == &nullifier && a == &anchor && *aid == activity_id
}),
Err(e) => {
tracing::warn!(
wallet_id = %hex::encode(id.wallet_id),
account = id.account_index,
error = %e,
"shielded reservation release: stale_pending_spends recheck failed"
);
continue;
}
};
if !still_matches {
continue;
}
match store.clear_pending(id, &nullifier) {
Ok(cleared) => cleared,
Err(e) => {
tracing::warn!(
wallet_id = %hex::encode(id.wallet_id),
account = id.account_index,
error = %e,
"shielded reservation release: clear_pending failed"
);
continue;
}
}
};
if !cleared {
continue;
}

source: ['claude', 'codex']

Comment on lines 484 to +488
pub(super) fn mark_pending(&mut self, nullifier: &[u8; 32]) -> bool {
self.pending_nullifiers.insert(*nullifier)
self.pending_nullifiers
.insert(*nullifier, PendingSpend::default())
.is_none()
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: mark_pending uses BTreeMap::insert, which silently resets an already-armed entry

The rewritten mark_pending calls self.pending_nullifiers.insert(*nullifier, PendingSpend::default()), which replaces any existing value. If it is ever invoked on an already-armed reservation, both anchor and activity_id get silently reset to None, wiping the release pass's only handle on the stranded spend. The doc comment even labels this behavior "idempotent", but it isn't — it's only unreachable-because-unspent_notes()-excludes-pending-nullifiers-from-selection.

That invariant is load-bearing across two modules; any future selector that bypasses the unspent filter, or a subtle race between select and reserve, would silently disarm every affected reservation. Making the operation actually idempotent via the entry API costs nothing and localizes the invariant.

Suggested change
pub(super) fn mark_pending(&mut self, nullifier: &[u8; 32]) -> bool {
self.pending_nullifiers.insert(*nullifier)
self.pending_nullifiers
.insert(*nullifier, PendingSpend::default())
.is_none()
}
pub(super) fn mark_pending(&mut self, nullifier: &[u8; 32]) -> bool {
use std::collections::btree_map::Entry;
match self.pending_nullifiers.entry(*nullifier) {
Entry::Vacant(v) => {
v.insert(PendingSpend::default());
true
}
Entry::Occupied(_) => false,
}
}

source: ['claude', 'codex']

Comment on lines +810 to +847
for (id, (nullifier, anchor, activity_id)) in stale {
// Fund-safety invariant: release ONLY when the anchor is absent
// from the recorded set. A still-recorded anchor may yet land.
if recorded.contains(&anchor) {
continue;
}
if let Err(e) = self.store.write().await.clear_pending(id, &nullifier) {
tracing::warn!(
wallet_id = %hex::encode(id.wallet_id),
account = id.account_index,
error = %e,
"shielded reservation release: clear_pending failed"
);
continue;
}
tracing::info!(
wallet_id = %hex::encode(id.wallet_id),
account = id.account_index,
nullifier = %hex::encode(nullifier),
anchor = %hex::encode(anchor),
"shielded reservation released: its recorded anchor was pruned, so the stranded \
spend can never execute; freeing the notes"
);
// Flip the linked activity row to Failed. Only queue to the
// wallet's own persister (cloned out — `WalletPersister: Clone`).
if let Some(entry_id) = activity_id {
let persister = self.persisters.read().await.get(&id.wallet_id).cloned();
super::operations::record_activity_status_by_id(
&self.store,
persister.as_ref(),
id.wallet_id,
id,
&entry_id,
super::activity::ShieldedActivityStatus::Failed,
)
.await;
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Nitpick: Multi-note stranded spend flips the same activity row N times

arm_pending_release stamps every reserved note with the same activity_id, so a multi-note stranded spend yields N tuples from stale_pending_spends that all share one activity_id. This loop then calls record_activity_status_by_id(..., Failed) N times for the same row, each with a read lock + write lock cycle on self.store and a queued persister write. Correctness holds (the second+ Failed→Failed writes are no-ops), but it produces N-1 redundant lock cycles and persister queues per stranded transaction. Dedupe activity_id before the flip (e.g. collect a HashSet<[u8; 32]> during the release pass and flip once per unique id).

source: ['claude']

Comment on lines +119 to +124
/// A pending reservation that carries a recorded anchor:
/// `(nullifier, anchor, activity_id)`. Yielded by
/// [`ShieldedStore::stale_pending_spends`] for the sync reconcile,
/// which releases the reservation once `anchor` is no longer in
/// Platform's recorded set (the spend can then never execute).
pub type StalePendingSpend = ([u8; 32], [u8; 32], Option<[u8; 32]>);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Nitpick: StalePendingSpend tuple alias exposes two adjacent [u8; 32] fields with only positional semantics

pub type StalePendingSpend = ([u8; 32], [u8; 32], Option<[u8; 32]>); is re-exported through the public ShieldedStore trait. nullifier and anchor are both raw 32-byte arrays in adjacent positions, so a transposition — either at a call site or in a future third-party ShieldedStore impl — would compile but silently corrupt the release decision (the two identifiers drive different fund-safety checks). Also, by construction (set_pending_spend sets both fields together and stale_pending_spends filters on anchor.is_some()), the returned activity_id is always Some; the Option forces every caller to handle an impossible None branch. A named struct with named fields would encode both invariants in the type and make an order swap a compile error.

Suggested change
/// A pending reservation that carries a recorded anchor:
/// `(nullifier, anchor, activity_id)`. Yielded by
/// [`ShieldedStore::stale_pending_spends`] for the sync reconcile,
/// which releases the reservation once `anchor` is no longer in
/// Platform's recorded set (the spend can then never execute).
pub type StalePendingSpend = ([u8; 32], [u8; 32], Option<[u8; 32]>);
/// A pending reservation that carries a recorded anchor. Yielded by
/// [`ShieldedStore::stale_pending_spends`] for the sync reconcile, which
/// releases the reservation once `anchor` is no longer in Platform's
/// recorded set (the spend can then never execute).
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct StalePendingSpend {
pub nullifier: [u8; 32],
pub anchor: [u8; 32],
pub activity_id: [u8; 32],
}

source: ['claude']

Comment on lines +1250 to +1267
#[tokio::test]
async fn release_stranded_spends_no_op_without_anchored_reservation() {
let dir = temp_dir("release_noop");
let coordinator = coordinator_with_one_wallet(&dir).await;

let subwallets = vec![(
SubwalletId::new([0x11; 32], 0),
OrchardKeySet::from_seed(&[0x42u8; 64], dashcore::Network::Testnet, 0)
.expect("derive viewing keys")
.viewing_keys(),
)];

// No reservation was armed, so `stale_pending_spends` is empty and
// the pass returns before fetching the recorded anchor set.
coordinator.release_stranded_spends(&subwallets).await;

let _ = std::fs::remove_dir_all(&dir);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Nitpick: Coordinator release path only has a fast-path test

release_stranded_spends_no_op_without_anchored_reservation pins only the short-circuit before the network round-trip. The value-adding branch — armed reservation + anchor absent from recorded set → clear_pending runs and the linked activity row flips to Failed — has no coordinator-level test. All positive/negative release-decision coverage lives at the store.rs unit level. That's exactly the layer where the TOCTOU/clear_pending-bool issue above lives, so the current suite would not catch a regression there. A small impl-level helper that takes an already-fetched HashSet<[u8;32]> and runs the per-entry loop would let the full release path be pinned without introducing a live GetShieldedAnchors network dependency (the codebase's stated no-mock convention). Not required for this PR, but would materially reduce residual risk on the release loop.

source: ['claude']

Comment on lines +836 to +846
let persister = self.persisters.read().await.get(&id.wallet_id).cloned();
super::operations::record_activity_status_by_id(
&self.store,
persister.as_ref(),
id.wallet_id,
id,
&entry_id,
super::activity::ShieldedActivityStatus::Failed,
)
.await;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Nitpick: Persister map read-lock is re-acquired per released entry

self.persisters.read().await.get(&id.wallet_id).cloned() runs once per released reservation. WalletPersister: Clone and the map is stable across the loop, so N released entries cause N read-lock acquisitions. Release volume is bounded and typically zero so cost is trivial, but hoisting the read-lock — or snapshotting the relevant WalletId -> WalletPersister pairs into a local map once — would keep lock ordering with the store write above more obviously acyclic and avoid replicating a shape that turns hot-loop antipattern elsewhere.

source: ['claude']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants