Skip to content

Route DebuggerController operations through a per-controller worker queue#1087

Closed
xusheng6 wants to merge 3 commits into
devfrom
refactor-worker-queue
Closed

Route DebuggerController operations through a per-controller worker queue#1087
xusheng6 wants to merge 3 commits into
devfrom
refactor-worker-queue

Conversation

@xusheng6
Copy link
Copy Markdown
Member

Summary

Replaces the previous "spawn one std::thread per operation and detach it" pattern (17 sites across Launch, Attach, Connect, Go, GoReverse, Step{Into,Over,Return}{,Reverse}, RunTo{,Reverse}, Restart, Detach, Quit, Pause) with a single persistent worker thread per DebuggerController plus a FIFO work queue. Every public Xxx() / XxxAndWait() method now routes through Submit().

The worker thread is owned by the controller, started in the constructor and joined in the destructor (after draining the queue). Async work can no longer outlive the controller it touches — the lifetime guarantee is structural rather than reliant on per-call DbgRef captures, which this PR removes (those were added in #1086 as a stopgap; the queue model makes them unnecessary).

Builds on #1086.

Design notes

Submit (templated, header)

  • Detects re-entrant calls from the worker itself (std::this_thread::get_id() == m_workerThreadId) and runs them inline. This is what lets RestartAndWaitOnWorker call QuitAndWaitOnWorker + LaunchAndWaitOnWorker without deadlocking on its own queue.
  • Returns a std::future so sync callers can wait.

SubmitAndWait (templated, header)

  • Submits a task and blocks the caller until the future resolves.
  • Takes an optional std::chrono::milliseconds timeout. milliseconds::max() ("wait forever") is the default and bypasses wait_for entirely (avoids overflow inside the stdlib).
  • On non-infinite timeout: signals m_adapter->BreakInto() and still waits for the in-flight op to settle before returning.

Public/Internal/OnWorker split

  • The existing public/internal split (XxxAndWaitInternal does the actual work; the old XxxAndWait was the lock-Internal-notify wrapper) is preserved.
  • The old public XxxAndWait is renamed XxxAndWaitOnWorker (private) — same body, now invoked from the worker thread.
  • New public XxxAndWait(timeout) is a one-line SubmitAndWait wrapper.

Public API compatibility

  • Every public XxxAndWait keeps its current signature with an appended optional timeout parameter defaulted to infinite. Existing callers in core/ffi.cpp and ui/uinotification.cpp keep their current behavior.

Pause / BreakInto (special-cased)

  • Both Pause() and PauseAndWait() bypass the queue and call m_adapter->BreakInto() directly on the caller's thread. The worker is presumed blocked inside ExecuteAdapterAndWait for whatever resume op is in flight; BreakInto unsticks it.
  • PauseAndWait then waits for the worker to drain past the in-flight task by submitting a no-op probe.

IL stepping fits without changes

  • Step operations on HLIL/MLIL etc. are still implemented as loops inside StepIntoIL/StepOverIL which call XxxAndWaitInternal multiple times. The loop runs as a single queued task on the worker — no change to event-filtering semantics or stop-reason logic.

Out of scope (follow-ups)

  • Adapter-internal threads: DbgEngAdapter::Attach's spawned thread (BINARYNINJA-1A), DbgEngAdapter::EngineLoop, LLDB EventListener, RSP send/receive loops. These need adapter destructors to act as join barriers; that's a separate concern from the controller's queue.
  • m_targetControlMutex cleanup. With single-worker serialization it's largely redundant, but I left it in place to minimize behavioral risk in this PR.
  • BinaryDataNotification and settings-callback unregistration in Destroy (BINARYNINJA-75 / -2H — the RebaseToAddress family of crashes).
  • m_userRequestedBreak should arguably be std::atomic<bool> now that Pause writes it from outside the worker. Left as-is for this PR.

Test plan

  • Builds cleanly (ninja — both libdebuggercore.dylib and libdebuggerui.dylib)
  • Manual: launch, step, hit breakpoint, pause while running, detach mid-step, close tab mid-operation
  • Python regression tests in test/debugger_test.py

🤖 Generated with Claude Code

xusheng6 and others added 2 commits May 20, 2026 15:02
The 17 `Launch`/`Attach`/`Connect`/`Go`/`Step*`/`RunTo*`/`Restart`/`Detach`/
`Quit`/`Pause` methods each spawn a detached `std::thread` that runs the
corresponding `*AndWait` work, capturing `this` by reference. Nothing keeps
the controller alive for the duration of the detached call, so if the
controller's refcount drops to zero before the worker exits (e.g. the user
closes the tab), the worker dereferences a dangling pointer.

Capture a `DbgRef<DebuggerController>` by value into each lambda so the
controller stays alive until the worker thread finishes.

Fixes #1083.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er queue

Replaces the previous "spawn one std::thread per operation and detach it"
pattern with a single persistent worker thread per controller plus a FIFO
work queue. Every public Xxx()/XxxAndWait() method now goes through Submit().
The worker thread is owned by the controller, started in the constructor
and joined in the destructor (after draining), so any task can no longer
outlive the controller it touches -- the lifetime guarantee is structural
rather than reliant on per-call DbgRef captures (which this PR removes).

Highlights:
* Submit() detects re-entrant calls from the worker itself and runs them
  inline, so e.g. RestartAndWaitOnWorker can call QuitAndWaitOnWorker and
  LaunchAndWaitOnWorker without deadlocking on the queue.
* SubmitAndWait() submits a task and waits on the resulting future. Every
  XxxAndWait now takes an optional std::chrono::milliseconds timeout,
  defaulting to milliseconds::max() ("wait forever") so existing callers
  in ffi.cpp and uinotification.cpp keep their current behavior. If a
  non-infinite timeout elapses, the engine is signaled to break and we
  still wait for the in-flight op to settle before returning.
* Pause()/PauseAndWait() are special-cased: they bypass the queue and call
  m_adapter->BreakInto() directly on the caller's thread. The worker is
  blocked inside ExecuteAdapterAndWait for whatever resume op is in
  flight; BreakInto unsticks it. PauseAndWait then waits for the worker
  to drain past the running task by submitting a no-op probe.

The existing public/internal split (XxxAndWaitInternal does the actual
work; the old XxxAndWait was the lock-Internal-notify wrapper) is
preserved. The old wrapper is renamed to XxxAndWaitOnWorker (private) and
the new public XxxAndWait(timeout) is a thin SubmitAndWait wrapper around
it. This keeps the public API surface identical to today for any caller
that doesn't pass a timeout.

Out of scope for this PR: adapter-internal threads (DbgEngAdapter::Attach
spawned thread, EngineLoop, LLDB EventListener, RSP send/receive loops).
Those have their own lifetime concerns and will be handled separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stop machinery

Adapter stops are now an internal signal from the adapter thread to the
worker, not events on the public dispatcher queue. PostDebuggerEvent
intercepts AdapterStoppedEventType and delivers the stop reason to a
new mutex+condvar pair (m_adapterStopMutex / m_adapterStopCv /
m_adapterStopPending). The worker consumes them via two paths:

1. In-flight ops: ExecuteAdapterAndWait now claims the channel for its
   entire duration (m_inAdapterWait = true), kicks off the adapter
   operation, then loops on WaitForAdapterStop. The conditional-
   breakpoint check moves here -- on a false condition we silently
   resume via m_adapter->Go() and wait again, without releasing the
   channel. This eliminates the old "drive m_adapter->Go() from the
   dispatcher thread" hack and the m_suppressResumeEvent flag that
   papered over it.

2. Spontaneous stops: when the user types e.g. `si` directly into the
   LLDB REPL while no controller op is in flight, m_inAdapterWait is
   false, so PostDebuggerEvent queues HandleSpontaneousAdapterStop on
   the worker -- which updates caches and calls NotifyStopped, the
   same effect as the dispatcher synthesis the old code did at
   debuggercontroller.cpp:2279.

Target-exit / detach events continue through the dispatcher queue
(they're user-visible) but ALSO signal the adapter-stop channel so an
in-flight WaitForAdapterStop unblocks when the target dies.

The dispatcher (DebuggerMainThread) no longer has any
AdapterStoppedEventType-specific code paths. m_lastAdapterStopEventConsumed
and m_suppressResumeEvent are removed; the temporary
RegisterEventCallback/Semaphore pattern inside ExecuteAdapterAndWait is
replaced by the direct WaitForAdapterStop wait.

AdapterStoppedEventType remains in the public enum so adapters keep
posting it the same way -- the change is purely in how the controller
routes it once it arrives at PostDebuggerEvent. Adapter implementations
need no changes.

Stacked on the worker-queue refactor (this PR / #1087). Implements
#1089.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant