fix(tools): replace cross-await RwLock with Semaphore to prevent deadlock#1856
fix(tools): replace cross-await RwLock with Semaphore to prevent deadlock#1856Fire-dtx wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request replaces the RwLock synchronization mechanism with a Semaphore in ToolCallRuntime to manage tool execution. The review identifies a critical race condition where parallel tools drop their permit immediately, allowing serial tools to run concurrently and potentially compromise state consistency. Feedback also notes a breaking API change due to the removal of a public field and manual Default implementation, as well as a remaining deadlock risk for serial tools in re-entrant scenarios.
| if configured.supports_parallel_tool_calls { | ||
| let _guard = self.runtime.parallel_execution.read().await; | ||
| // Parallel tools wait for any in-flight serial tool to finish, | ||
| // but do not hold the permit so other parallel tools may run concurrently. | ||
| drop(self.runtime.serial_semaphore.acquire().await | ||
| .map_err(|_| FunctionCallError::Cancelled { name: call.name })?); | ||
| self.execute_with_timeout(handler, configured.spec.timeout_ms, invocation) | ||
| .await |
There was a problem hiding this comment.
This implementation introduces a potential race condition between parallel and serial tools. By dropping the semaphore permit immediately at line 395 before executing the tool, parallel tools no longer prevent serial (mutating) tools from starting and running concurrently. The previous RwLock::read() implementation ensured mutual exclusion between readers and writers. If parallel tools (which are typically read-only) require a consistent state, this change breaks that guarantee as a serial tool can now acquire the permit and modify state while parallel tools are still active.
| #[derive(Debug)] | ||
| pub struct ToolCallRuntime { | ||
| pub parallel_execution: Arc<RwLock<()>>, | ||
| /// Serialise non-parallel tool executions. Capacity 1 ensures at most one | ||
| /// serial tool runs at a time, and blocks parallel tools while it runs. | ||
| serial_semaphore: Arc<Semaphore>, | ||
| } |
There was a problem hiding this comment.
The removal of the pub parallel_execution field and its replacement with a private serial_semaphore field is a breaking change for the ToolCallRuntime API. If external access to the synchronization primitive is required by consumers of this crate, the new field should be marked pub. Additionally, the ToolCallRuntime struct no longer derives Default, which may affect users relying on that trait being derived rather than manually implemented.
| let _permit = self.runtime.serial_semaphore.acquire().await | ||
| .map_err(|_| FunctionCallError::Cancelled { name: call.name })?; | ||
| self.execute_with_timeout(handler, configured.spec.timeout_ms, invocation) | ||
| .await |
There was a problem hiding this comment.
While this change addresses deadlocks for parallel tools (by not holding the lock during execution), serial tools still hold the semaphore permit across the await point at line 404. If a serial tool execution triggers a re-entrant call to another serial tool, it will still result in a deadlock when attempting to acquire the semaphore at line 402. If re-entrancy is a known risk in this tool system, the semaphore-based approach does not fully resolve the deadlock issue for the serial execution path.
…lock Replace `Arc<RwLock<()>>` in `ToolCallRuntime` with `Arc<Semaphore>` to eliminate the risk of a tool re-entering and deadlocking on the same RwLock, and to prevent serial tools from blocking parallel tools' read-lock acquisition. - Serial tools acquire and hold the permit for the full execution duration, ensuring mutual exclusion. - Parallel tools acquire then immediately drop the permit, only blocking until any in-flight serial tool completes, then run concurrently.
Replace
Arc<RwLock<()>>inToolCallRuntimewithArc<Semaphore>to eliminate the risk of a tool re-entering and deadlocking on the same RwLock, and to prevent serial tools from blocking parallel tools' read-lock acquisition.Summary
Testing
cargo test --all-featurescargo fmt --all -- --checkcargo clippy --all-targets --all-featuresChecklist