Skip to content

fix(tools): replace cross-await RwLock with Semaphore to prevent deadlock#1856

Open
Fire-dtx wants to merge 1 commit into
Hmbown:mainfrom
Fire-dtx:main
Open

fix(tools): replace cross-await RwLock with Semaphore to prevent deadlock#1856
Fire-dtx wants to merge 1 commit into
Hmbown:mainfrom
Fire-dtx:main

Conversation

@Fire-dtx
Copy link
Copy Markdown

@Fire-dtx Fire-dtx commented May 21, 2026

Replace Arc<RwLock<()>> in ToolCallRuntime with Arc<Semaphore> to eliminate the risk of a tool re-entering and deadlocking on the same RwLock, and to prevent serial tools from blocking parallel tools' read-lock acquisition.

  • Serial tools acquire and hold the permit for the full execution duration, ensuring mutual exclusion.
  • Parallel tools acquire then immediately drop the permit, only blocking until any in-flight serial tool completes, then run concurrently.

Summary

Testing

  • cargo test --all-features
  • cargo fmt --all -- --check
  • cargo clippy --all-targets --all-features

Checklist

  • Updated docs or comments as needed
  • Added or updated tests where relevant
  • Verified TUI behavior manually if UI changes

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the RwLock synchronization mechanism with a Semaphore in ToolCallRuntime to manage tool execution. The review identifies a critical race condition where parallel tools drop their permit immediately, allowing serial tools to run concurrently and potentially compromise state consistency. Feedback also notes a breaking API change due to the removal of a public field and manual Default implementation, as well as a remaining deadlock risk for serial tools in re-entrant scenarios.

Comment thread crates/tools/src/lib.rs
Comment on lines 392 to 398
if configured.supports_parallel_tool_calls {
let _guard = self.runtime.parallel_execution.read().await;
// Parallel tools wait for any in-flight serial tool to finish,
// but do not hold the permit so other parallel tools may run concurrently.
drop(self.runtime.serial_semaphore.acquire().await
.map_err(|_| FunctionCallError::Cancelled { name: call.name })?);
self.execute_with_timeout(handler, configured.spec.timeout_ms, invocation)
.await
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation introduces a potential race condition between parallel and serial tools. By dropping the semaphore permit immediately at line 395 before executing the tool, parallel tools no longer prevent serial (mutating) tools from starting and running concurrently. The previous RwLock::read() implementation ensured mutual exclusion between readers and writers. If parallel tools (which are typically read-only) require a consistent state, this change breaks that guarantee as a serial tool can now acquire the permit and modify state while parallel tools are still active.

Comment thread crates/tools/src/lib.rs
Comment on lines +312 to +317
#[derive(Debug)]
pub struct ToolCallRuntime {
pub parallel_execution: Arc<RwLock<()>>,
/// Serialise non-parallel tool executions. Capacity 1 ensures at most one
/// serial tool runs at a time, and blocks parallel tools while it runs.
serial_semaphore: Arc<Semaphore>,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The removal of the pub parallel_execution field and its replacement with a private serial_semaphore field is a breaking change for the ToolCallRuntime API. If external access to the synchronization primitive is required by consumers of this crate, the new field should be marked pub. Additionally, the ToolCallRuntime struct no longer derives Default, which may affect users relying on that trait being derived rather than manually implemented.

Comment thread crates/tools/src/lib.rs
Comment on lines +402 to 405
let _permit = self.runtime.serial_semaphore.acquire().await
.map_err(|_| FunctionCallError::Cancelled { name: call.name })?;
self.execute_with_timeout(handler, configured.spec.timeout_ms, invocation)
.await
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this change addresses deadlocks for parallel tools (by not holding the lock during execution), serial tools still hold the semaphore permit across the await point at line 404. If a serial tool execution triggers a re-entrant call to another serial tool, it will still result in a deadlock when attempting to acquire the semaphore at line 402. If re-entrancy is a known risk in this tool system, the semaphore-based approach does not fully resolve the deadlock issue for the serial execution path.

…lock

Replace `Arc<RwLock<()>>` in `ToolCallRuntime` with `Arc<Semaphore>` to
eliminate the risk of a tool re-entering and deadlocking on the same
RwLock, and to prevent serial tools from blocking parallel tools'
read-lock acquisition.

- Serial tools acquire and hold the permit for the full execution
  duration, ensuring mutual exclusion.
- Parallel tools acquire then immediately drop the permit, only blocking
  until any in-flight serial tool completes, then run concurrently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants