fix: replace abort() with cooperative wait in wait_for_run_task#576
fix: replace abort() with cooperative wait in wait_for_run_task#576xdustinface merged 3 commits intov0.42-devfrom
abort() with cooperative wait in wait_for_run_task#576Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughReplaces unconditional abort of the client's run task with a cooperative shutdown: introduces Changes
Sequence Diagram(s)sequenceDiagram
participant API as Client API
participant Client as FFIDashSpvClient
participant Task as RunTask
participant Timer as Timeout
API->>Client: request stop / destroy
Client->>Client: cancel shutdown_token
Client->>Task: wait_for_run_task()
par run finishes before timeout
Task-->>Client: finishes
Client-->>API: return (done)
and run does NOT finish before timeout
alt timeout elapsed
Timer-->>Client: timeout
Client->>Task: task.abort()
Task-->>Client: aborted join (await)
Client-->>API: return (aborted)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## v0.42-dev #576 +/- ##
=============================================
- Coverage 67.03% 67.00% -0.04%
=============================================
Files 320 320
Lines 67249 67255 +6
=============================================
- Hits 45083 45062 -21
- Misses 22166 22193 +27
|
aaddece to
23accc0
Compare
|
This PR has merge conflicts with the base branch. Please rebase or merge the base branch into your branch to resolve them. |
`abort()` can interrupt the cleanup sequence in `DashSpvClient::run()` (the `monitor_shutdown.cancel()` + `tokio::join!`), leaving monitor tasks running after FFI callback pointers are freed. Use cooperative wait with a timeout fallback instead.
fabbdad to
b334da6
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dash-spv-ffi/src/client.rs`:
- Around line 117-122: Add unit tests covering both cooperative shutdown and
timeout fallback for the new wait_for_run_task behavior: write one test that
starts DashSpvClient::run() (or a test helper that spawns run task), signals the
shutdown_token and asserts the run task exits without being aborted (cooperative
path), and a second test that simulates a non-responsive run loop (e.g., by
holding the run future) so wait_for_run_task hits its timeout and aborts the
task, asserting the abort path was used; target the wait_for_run_task method and
any test-helper hooks around DashSpvClient::run(), using
tokio::time::pause/advance or similar to deterministically drive the timeout.
- Around line 125-133: The timeout branch in the shutdown sequence (where
self.runtime.block_on(async { tokio::time::timeout(RUN_TASK_SHUTDOWN_TIMEOUT,
task).await })) only logs that it will "abort" but never actually cancels the
run task; update the branch that checks finished.is_err() to call task.abort()
(or the appropriate Cancel/abort method on the JoinHandle) after logging, and
optionally await or poll the handle to ensure it terminated; reference the
variables/functions runtime, block_on, tokio::time::timeout,
RUN_TASK_SHUTDOWN_TIMEOUT and the task/JoinHandle when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a39adbd9-16da-4e92-9b56-cd8ac16888fb
📒 Files selected for processing (1)
dash-spv-ffi/src/client.rs
Addresses CodeRabbit review comment on PR #576 #576 (comment)
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dash-spv-ffi/src/client.rs (1)
351-368:⚠️ Potential issue | 🟠 MajorShutdown ordering in
dash_spv_ffi_client_destroyconflicts withstop_client_internaland may undermine the cooperative shutdown fix.The destroy function calls
stop()beforewait_for_run_task(), whilestop_client_internaldoes the opposite—wait_for_run_task()thenstop(). This ordering difference contradicts the PR's intent to ensurerun()completes its cleanup before other operations interfere.Since
stop()initiates shutdown of the sync coordinator, network, and storage while the run task may still be in its cleanup phase (joining tasks), the current destroy ordering could still cause the race condition this PR aims to prevent.Proposed fix
pub unsafe extern "C" fn dash_spv_ffi_client_destroy(client: *mut FFIDashSpvClient) { if !client.is_null() { let client = Box::from_raw(client); // Cancel shutdown token to stop all tasks client.shutdown_token.cancel(); + // Wait for the run task to finish (cooperative, with timeout fallback) + client.wait_for_run_task(); + // Stop the SPV client client.runtime.block_on(async { let _ = client.inner.stop().await; }); - // Wait for the run task to finish (cooperative, with timeout fallback) - client.wait_for_run_task(); - tracing::info!("FFI client destroyed and all tasks cleaned up"); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dash-spv-ffi/src/client.rs` around lines 351 - 368, dash_spv_ffi_client_destroy currently calls client.inner.stop() before client.wait_for_run_task(), which contradicts stop_client_internal's ordering and can reintroduce the race; change the shutdown sequence in dash_spv_ffi_client_destroy so after turning the Box from raw and cancelling the shutdown_token you first call client.wait_for_run_task() (using the existing method) and only then call client.runtime.block_on(async { let _ = client.inner.stop().await; }); keep the shutdown_token cancellation and final tracing::info call as-is so run() can finish cleanup before stop() tears down coordinator/network/storage.
🧹 Nitpick comments (1)
dash-spv-ffi/src/client.rs (1)
122-123: Consider handling poisoned mutex for defensive robustness.The
unwrap()on the mutex lock is acceptable here since a poisoned mutex would indicate a panic during run_task manipulation—an unrecoverable state. However, for defensive coding, you could uselock().ok().and_then(|mut g| g.take())to silently treat a poisoned mutex as "no task to wait for" rather than panicking during teardown.♻️ Optional defensive alternative
fn wait_for_run_task(&self) { - let task = self.run_task.lock().unwrap().take(); + let task = self.run_task.lock().ok().and_then(|mut guard| guard.take()); if let Some(mut task) = task {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dash-spv-ffi/src/client.rs` around lines 122 - 123, The current wait_for_run_task method uses self.run_task.lock().unwrap().take() which will panic on a poisoned mutex; change it to handle poisoning defensively by replacing the unwrap() call with lock().ok().and_then(|mut guard| guard.take()) (or an equivalent match that treats a poisoned lock as None) so wait_for_run_task returns/acts as if there is no task instead of panicking; update any subsequent code that assumes a task was obtained to handle the None case gracefully.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@dash-spv-ffi/src/client.rs`:
- Around line 351-368: dash_spv_ffi_client_destroy currently calls
client.inner.stop() before client.wait_for_run_task(), which contradicts
stop_client_internal's ordering and can reintroduce the race; change the
shutdown sequence in dash_spv_ffi_client_destroy so after turning the Box from
raw and cancelling the shutdown_token you first call client.wait_for_run_task()
(using the existing method) and only then call client.runtime.block_on(async {
let _ = client.inner.stop().await; }); keep the shutdown_token cancellation and
final tracing::info call as-is so run() can finish cleanup before stop() tears
down coordinator/network/storage.
---
Nitpick comments:
In `@dash-spv-ffi/src/client.rs`:
- Around line 122-123: The current wait_for_run_task method uses
self.run_task.lock().unwrap().take() which will panic on a poisoned mutex;
change it to handle poisoning defensively by replacing the unwrap() call with
lock().ok().and_then(|mut guard| guard.take()) (or an equivalent match that
treats a poisoned lock as None) so wait_for_run_task returns/acts as if there is
no task instead of panicking; update any subsequent code that assumes a task was
obtained to handle the None case gracefully.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9588d8e5-ae37-4f9b-9b11-bf1746a99088
📒 Files selected for processing (1)
dash-spv-ffi/src/client.rs
Wait for the run task before calling `stop()` so `run()` can finish its own cleanup. Matches the ordering in `stop_client_internal`.
abort()can interrupt the cleanup sequence inDashSpvClient::run()(themonitor_shutdown.cancel()+tokio::join!), leaving monitor tasks running after FFI callback pointers are freed. Use cooperative wait with a timeout fallback instead.Based on:
DashSpvClient#572Summary by CodeRabbit