Skip to content

fix: replace abort() with cooperative wait in wait_for_run_task#576

Merged
xdustinface merged 3 commits intov0.42-devfrom
fix/replace-abort
Apr 1, 2026
Merged

fix: replace abort() with cooperative wait in wait_for_run_task#576
xdustinface merged 3 commits intov0.42-devfrom
fix/replace-abort

Conversation

@xdustinface
Copy link
Copy Markdown
Collaborator

@xdustinface xdustinface commented Mar 23, 2026

abort() can interrupt the cleanup sequence in DashSpvClient::run() (the monitor_shutdown.cancel() + tokio::join!), leaving monitor tasks running after FFI callback pointers are freed. Use cooperative wait with a timeout fallback instead.

Based on:

Summary by CodeRabbit

  • Bug Fixes
    • Improved client shutdown: waits up to 5 seconds for background tasks to finish cooperatively, logs warnings on errors or timeouts, and falls back to forceful termination only after timeout — resulting in more stable shutdown and cleaner resource cleanup.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7bc6fd07-48cc-4d4a-b914-e8564720c02e

📥 Commits

Reviewing files that changed from the base of the PR and between 3292274 and 8be09a3.

📒 Files selected for processing (1)
  • dash-spv-ffi/src/client.rs

📝 Walkthrough

Walkthrough

Replaces unconditional abort of the client's run task with a cooperative shutdown: introduces RUN_TASK_SHUTDOWN_TIMEOUT (5s) and wait_for_run_task() which awaits the run task with a timeout, logs on join errors, and aborts only if the timeout elapses. stop_client_internal and dash_spv_ffi_client_destroy use this flow.

Changes

Cohort / File(s) Summary
Graceful Task Shutdown
dash-spv-ffi/src/client.rs
Added RUN_TASK_SHUTDOWN_TIMEOUT (5s). Replaced direct abort() usage with wait_for_run_task() which uses tokio::time::timeout to await the run task, logs join errors, and aborts only after timeout. Updated stop_client_internal and dash_spv_ffi_client_destroy to call the new wait-based shutdown.

Sequence Diagram(s)

sequenceDiagram
    participant API as Client API
    participant Client as FFIDashSpvClient
    participant Task as RunTask
    participant Timer as Timeout

    API->>Client: request stop / destroy
    Client->>Client: cancel shutdown_token
    Client->>Task: wait_for_run_task()
    par run finishes before timeout
        Task-->>Client: finishes
        Client-->>API: return (done)
    and run does NOT finish before timeout
        alt timeout elapsed
            Timer-->>Client: timeout
            Client->>Task: task.abort()
            Task-->>Client: aborted join (await)
            Client-->>API: return (aborted)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I waited five seconds as the task took its time,
Gentle as clover, no abrupt end to the rhyme.
If it lingers too long, I give a polite tap—
Then tidy the threads and fold up my lap. ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: replacing abort() with a cooperative wait mechanism in wait_for_run_task, which is the core fix in this pull request.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/replace-abort

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 61.53846% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.00%. Comparing base (2f22194) to head (8be09a3).
⚠️ Report is 6 commits behind head on v0.42-dev.

Files with missing lines Patch % Lines
dash-spv-ffi/src/client.rs 61.53% 5 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##           v0.42-dev     #576      +/-   ##
=============================================
- Coverage      67.03%   67.00%   -0.04%     
=============================================
  Files            320      320              
  Lines          67249    67255       +6     
=============================================
- Hits           45083    45062      -21     
- Misses         22166    22193      +27     
Flag Coverage Δ
core 75.21% <ø> (ø)
ffi 35.47% <61.53%> (-0.01%) ⬇️
rpc 19.92% <ø> (ø)
spv 83.61% <ø> (-0.17%) ⬇️
wallet 66.63% <ø> (ø)
Files with missing lines Coverage Δ
dash-spv-ffi/src/client.rs 57.81% <61.53%> (-0.79%) ⬇️

... and 3 files with indirect coverage changes

@xdustinface xdustinface force-pushed the refactor/move-callbacks-to-crates branch from aaddece to 23accc0 Compare March 23, 2026 06:04
@github-actions
Copy link
Copy Markdown

This PR has merge conflicts with the base branch. Please rebase or merge the base branch into your branch to resolve them.

@github-actions github-actions bot added the merge-conflict The PR conflicts with the target branch. label Mar 23, 2026
Base automatically changed from refactor/move-callbacks-to-crates to v0.42-dev March 25, 2026 02:13
`abort()` can interrupt the cleanup sequence in `DashSpvClient::run()`
(the `monitor_shutdown.cancel()` + `tokio::join!`), leaving monitor
tasks running after FFI callback pointers are freed. Use cooperative
wait with a timeout fallback instead.
@github-actions github-actions bot removed the merge-conflict The PR conflicts with the target branch. label Mar 31, 2026
@xdustinface xdustinface marked this pull request as ready for review March 31, 2026 00:45
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dash-spv-ffi/src/client.rs`:
- Around line 117-122: Add unit tests covering both cooperative shutdown and
timeout fallback for the new wait_for_run_task behavior: write one test that
starts DashSpvClient::run() (or a test helper that spawns run task), signals the
shutdown_token and asserts the run task exits without being aborted (cooperative
path), and a second test that simulates a non-responsive run loop (e.g., by
holding the run future) so wait_for_run_task hits its timeout and aborts the
task, asserting the abort path was used; target the wait_for_run_task method and
any test-helper hooks around DashSpvClient::run(), using
tokio::time::pause/advance or similar to deterministically drive the timeout.
- Around line 125-133: The timeout branch in the shutdown sequence (where
self.runtime.block_on(async { tokio::time::timeout(RUN_TASK_SHUTDOWN_TIMEOUT,
task).await })) only logs that it will "abort" but never actually cancels the
run task; update the branch that checks finished.is_err() to call task.abort()
(or the appropriate Cancel/abort method on the JoinHandle) after logging, and
optionally await or poll the handle to ensure it terminated; reference the
variables/functions runtime, block_on, tokio::time::timeout,
RUN_TASK_SHUTDOWN_TIMEOUT and the task/JoinHandle when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a39adbd9-16da-4e92-9b56-cd8ac16888fb

📥 Commits

Reviewing files that changed from the base of the PR and between 2f22194 and b334da6.

📒 Files selected for processing (1)
  • dash-spv-ffi/src/client.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dash-spv-ffi/src/client.rs (1)

351-368: ⚠️ Potential issue | 🟠 Major

Shutdown ordering in dash_spv_ffi_client_destroy conflicts with stop_client_internal and may undermine the cooperative shutdown fix.

The destroy function calls stop() before wait_for_run_task(), while stop_client_internal does the opposite—wait_for_run_task() then stop(). This ordering difference contradicts the PR's intent to ensure run() completes its cleanup before other operations interfere.

Since stop() initiates shutdown of the sync coordinator, network, and storage while the run task may still be in its cleanup phase (joining tasks), the current destroy ordering could still cause the race condition this PR aims to prevent.

Proposed fix
 pub unsafe extern "C" fn dash_spv_ffi_client_destroy(client: *mut FFIDashSpvClient) {
     if !client.is_null() {
         let client = Box::from_raw(client);

         // Cancel shutdown token to stop all tasks
         client.shutdown_token.cancel();

+        // Wait for the run task to finish (cooperative, with timeout fallback)
+        client.wait_for_run_task();
+
         // Stop the SPV client
         client.runtime.block_on(async {
             let _ = client.inner.stop().await;
         });

-        // Wait for the run task to finish (cooperative, with timeout fallback)
-        client.wait_for_run_task();
-
         tracing::info!("FFI client destroyed and all tasks cleaned up");
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv-ffi/src/client.rs` around lines 351 - 368,
dash_spv_ffi_client_destroy currently calls client.inner.stop() before
client.wait_for_run_task(), which contradicts stop_client_internal's ordering
and can reintroduce the race; change the shutdown sequence in
dash_spv_ffi_client_destroy so after turning the Box from raw and cancelling the
shutdown_token you first call client.wait_for_run_task() (using the existing
method) and only then call client.runtime.block_on(async { let _ =
client.inner.stop().await; }); keep the shutdown_token cancellation and final
tracing::info call as-is so run() can finish cleanup before stop() tears down
coordinator/network/storage.
🧹 Nitpick comments (1)
dash-spv-ffi/src/client.rs (1)

122-123: Consider handling poisoned mutex for defensive robustness.

The unwrap() on the mutex lock is acceptable here since a poisoned mutex would indicate a panic during run_task manipulation—an unrecoverable state. However, for defensive coding, you could use lock().ok().and_then(|mut g| g.take()) to silently treat a poisoned mutex as "no task to wait for" rather than panicking during teardown.

♻️ Optional defensive alternative
     fn wait_for_run_task(&self) {
-        let task = self.run_task.lock().unwrap().take();
+        let task = self.run_task.lock().ok().and_then(|mut guard| guard.take());
         if let Some(mut task) = task {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv-ffi/src/client.rs` around lines 122 - 123, The current
wait_for_run_task method uses self.run_task.lock().unwrap().take() which will
panic on a poisoned mutex; change it to handle poisoning defensively by
replacing the unwrap() call with lock().ok().and_then(|mut guard| guard.take())
(or an equivalent match that treats a poisoned lock as None) so
wait_for_run_task returns/acts as if there is no task instead of panicking;
update any subsequent code that assumes a task was obtained to handle the None
case gracefully.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@dash-spv-ffi/src/client.rs`:
- Around line 351-368: dash_spv_ffi_client_destroy currently calls
client.inner.stop() before client.wait_for_run_task(), which contradicts
stop_client_internal's ordering and can reintroduce the race; change the
shutdown sequence in dash_spv_ffi_client_destroy so after turning the Box from
raw and cancelling the shutdown_token you first call client.wait_for_run_task()
(using the existing method) and only then call client.runtime.block_on(async {
let _ = client.inner.stop().await; }); keep the shutdown_token cancellation and
final tracing::info call as-is so run() can finish cleanup before stop() tears
down coordinator/network/storage.

---

Nitpick comments:
In `@dash-spv-ffi/src/client.rs`:
- Around line 122-123: The current wait_for_run_task method uses
self.run_task.lock().unwrap().take() which will panic on a poisoned mutex;
change it to handle poisoning defensively by replacing the unwrap() call with
lock().ok().and_then(|mut guard| guard.take()) (or an equivalent match that
treats a poisoned lock as None) so wait_for_run_task returns/acts as if there is
no task instead of panicking; update any subsequent code that assumes a task was
obtained to handle the None case gracefully.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9588d8e5-ae37-4f9b-9b11-bf1746a99088

📥 Commits

Reviewing files that changed from the base of the PR and between b334da6 and 3292274.

📒 Files selected for processing (1)
  • dash-spv-ffi/src/client.rs

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 31, 2026
@github-actions github-actions bot added the ready-for-review CodeRabbit has approved this PR label Mar 31, 2026
Wait for the run task before calling `stop()` so `run()` can finish
its own cleanup. Matches the ordering in `stop_client_internal`.
@github-actions github-actions bot removed the ready-for-review CodeRabbit has approved this PR label Mar 31, 2026
@github-actions github-actions bot added the ready-for-review CodeRabbit has approved this PR label Mar 31, 2026
@xdustinface xdustinface requested a review from ZocoLini March 31, 2026 03:23
@xdustinface xdustinface merged commit 518fdb2 into v0.42-dev Apr 1, 2026
42 checks passed
@xdustinface xdustinface deleted the fix/replace-abort branch April 1, 2026 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-review CodeRabbit has approved this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants