Route DebuggerController operations through a per-controller worker queue#1090
Open
xusheng6 wants to merge 4 commits into
Open
Route DebuggerController operations through a per-controller worker queue#1090xusheng6 wants to merge 4 commits into
xusheng6 wants to merge 4 commits into
Conversation
The 17 `Launch`/`Attach`/`Connect`/`Go`/`Step*`/`RunTo*`/`Restart`/`Detach`/ `Quit`/`Pause` methods each spawn a detached `std::thread` that runs the corresponding `*AndWait` work, capturing `this` by reference. Nothing keeps the controller alive for the duration of the detached call, so if the controller's refcount drops to zero before the worker exits (e.g. the user closes the tab), the worker dereferences a dangling pointer. Capture a `DbgRef<DebuggerController>` by value into each lambda so the controller stays alive until the worker thread finishes. Fixes #1083. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er queue
Replaces the previous "spawn one std::thread per operation and detach it"
pattern with a single persistent worker thread per controller plus a FIFO
work queue. Every public Xxx()/XxxAndWait() method now goes through Submit().
The worker thread is owned by the controller, started in the constructor
and joined in the destructor (after draining), so any task can no longer
outlive the controller it touches -- the lifetime guarantee is structural
rather than reliant on per-call DbgRef captures (which this PR removes).
Highlights:
* Submit() detects re-entrant calls from the worker itself and runs them
inline, so e.g. RestartAndWaitOnWorker can call QuitAndWaitOnWorker and
LaunchAndWaitOnWorker without deadlocking on the queue.
* SubmitAndWait() submits a task and waits on the resulting future. Every
XxxAndWait now takes an optional std::chrono::milliseconds timeout,
defaulting to milliseconds::max() ("wait forever") so existing callers
in ffi.cpp and uinotification.cpp keep their current behavior. If a
non-infinite timeout elapses, the engine is signaled to break and we
still wait for the in-flight op to settle before returning.
* Pause()/PauseAndWait() are special-cased: they bypass the queue and call
m_adapter->BreakInto() directly on the caller's thread. The worker is
blocked inside ExecuteAdapterAndWait for whatever resume op is in
flight; BreakInto unsticks it. PauseAndWait then waits for the worker
to drain past the running task by submitting a no-op probe.
The existing public/internal split (XxxAndWaitInternal does the actual
work; the old XxxAndWait was the lock-Internal-notify wrapper) is
preserved. The old wrapper is renamed to XxxAndWaitOnWorker (private) and
the new public XxxAndWait(timeout) is a thin SubmitAndWait wrapper around
it. This keeps the public API surface identical to today for any caller
that doesn't pass a timeout.
Out of scope for this PR: adapter-internal threads (DbgEngAdapter::Attach
spawned thread, EngineLoop, LLDB EventListener, RSP send/receive loops).
Those have their own lifetime concerns and will be handled separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stop machinery Adapter stops are now an internal signal from the adapter thread to the worker, not events on the public dispatcher queue. PostDebuggerEvent intercepts AdapterStoppedEventType and delivers the stop reason to a new mutex+condvar pair (m_adapterStopMutex / m_adapterStopCv / m_adapterStopPending). The worker consumes them via two paths: 1. In-flight ops: ExecuteAdapterAndWait now claims the channel for its entire duration (m_inAdapterWait = true), kicks off the adapter operation, then loops on WaitForAdapterStop. The conditional- breakpoint check moves here -- on a false condition we silently resume via m_adapter->Go() and wait again, without releasing the channel. This eliminates the old "drive m_adapter->Go() from the dispatcher thread" hack and the m_suppressResumeEvent flag that papered over it. 2. Spontaneous stops: when the user types e.g. `si` directly into the LLDB REPL while no controller op is in flight, m_inAdapterWait is false, so PostDebuggerEvent queues HandleSpontaneousAdapterStop on the worker -- which updates caches and calls NotifyStopped, the same effect as the dispatcher synthesis the old code did at debuggercontroller.cpp:2279. Target-exit / detach events continue through the dispatcher queue (they're user-visible) but ALSO signal the adapter-stop channel so an in-flight WaitForAdapterStop unblocks when the target dies. The dispatcher (DebuggerMainThread) no longer has any AdapterStoppedEventType-specific code paths. m_lastAdapterStopEventConsumed and m_suppressResumeEvent are removed; the temporary RegisterEventCallback/Semaphore pattern inside ExecuteAdapterAndWait is replaced by the direct WaitForAdapterStop wait. AdapterStoppedEventType remains in the public enum so adapters keep posting it the same way -- the change is purely in how the controller routes it once it arrives at PostDebuggerEvent. Adapter implementations need no changes. Stacked on the worker-queue refactor (this PR / #1087). Implements #1089. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs in how interrupting ops (Restart/Quit/Detach) interact with the
worker queue:
1. The public Restart/Quit/Detach methods just Submit'd their work to
the queue. If a resume op (Go/Step/RunTo) was in flight, the worker
was blocked inside WaitForAdapterStop -- the queued Restart/Quit/Detach
would sit indefinitely with nothing to interrupt the running target.
To the user, clicking Restart while running looked like a no-op.
Fix: these methods now call RequestInterrupt() (a small helper that
sets m_userRequestedBreak and invokes m_adapter->BreakInto()) on the
caller's thread before queueing the task -- same pattern as Pause().
The BreakInto unsticks the in-flight op's WaitForAdapterStop, that
op drains, and the queued Restart/Quit/Detach runs next. Applied to
both the async public methods and their *AndWait(timeout) variants.
2. QuitAndWaitOnWorker, when it observed IsRunning() inside a re-entrant
call (e.g. from Restart), called the public PauseAndWait() -- which
does BreakInto + Submit([]{}).wait(). On the worker thread Submit
detects re-entry and runs the empty lambda inline, so the wait was
a no-op. We then issued Quit on a still-running target, which DbgEng
in particular doesn't handle well.
Fix: switch that call to PauseAndWaitInternal(), which routes through
ExecuteAdapterAndWait(Pause) and actually waits for the engine via
the adapter-stop channel. With (1) in place this branch should rarely
fire, but it's the correct defensive behavior.
Also pulled the BreakInto + m_userRequestedBreak pair out of Pause and
PauseAndWait into the shared RequestInterrupt() helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two stacked commits:
std::thread(...).detach()pattern (17 sites inLaunch/Attach/Connect/Go/GoReverse/Step…/RunTo…/Restart/Detach/Quit/Pause) with one persistent worker thread per controller plus a FIFO work queue. Every public op routes throughSubmit()/SubmitAndWait(). The worker is owned by the controller and joined in the destructor (after draining), so async work cannot outlive the controller — lifetime is structural rather than reliant on per-callDbgRefcaptures.m_adapterStopMutex/m_adapterStopCv/m_adapterStopPending), not events on the public dispatcher queue. The conditional-breakpoint silent-resume logic moves intoExecuteAdapterAndWait, eliminating the previous "drivem_adapter->Go()from the dispatcher thread" hack and them_suppressResumeEventflag that papered over it. Spontaneous adapter stops (user typessidirectly into the LLDB REPL while no controller op is in flight) are detected viam_inAdapterWaitand routed through the worker queue.Implements #1088, #1089.
Design notes (worker queue)
Submitdetects re-entrant calls from the worker itself (std::this_thread::get_id() == m_workerThreadId) and runs them inline — letsRestartAndWaitOnWorkercallQuitAndWaitOnWorker+LaunchAndWaitOnWorkerwithout deadlocking.SubmitAndWaitsubmits a task and blocks on the future. EveryXxxAndWaitnow takes an optionalstd::chrono::milliseconds timeout(defaultmilliseconds::max()= wait forever, bypasseswait_forto avoid stdlib overflow). On non-infinite timeout: signalsm_adapter->BreakInto()and still waits for the in-flight op to settle.XxxAndWaitInternaldoes the actual work; the oldXxxAndWait(lock + Internal + notify) is renamedXxxAndWaitOnWorker(private) and invoked from the worker; the new publicXxxAndWait(timeout)is a one-lineSubmitAndWaitwrapper.Pause()/PauseAndWait()bypass the queue and callm_adapter->BreakInto()directly on the caller's thread — the worker is blocked insideExecuteAdapterAndWaitfor whatever resume op is in flight;BreakIntounsticks it.Design notes (adapter-stop channel)
PostDebuggerEventinterceptsAdapterStoppedEventTypeand routes the reason to the internal channel. Ifm_inAdapterWaitis true the in-flightExecuteAdapterAndWaitconsumes it; otherwise the stop is queued on the worker as a spontaneous handler.ExecuteAdapterAndWaitclaims the channel for its entire duration (including the silent-resume loop), so a stop between iterations is still consumed by us, not synthesized as spontaneous.TargetExitedEventType/DetachedEventTypestill go through the dispatcher queue (user-visible) but also signal the adapter-stop channel so any in-flightWaitForAdapterStopunblocks when the target dies.m_lastAdapterStopEventConsumed,m_suppressResumeEvent, the temporaryRegisterEventCallback/Semaphorepattern insideExecuteAdapterAndWait, the dispatcher's entireAdapterStoppedEventType-specific block, and theGo-from-dispatcher hack.AdapterStoppedEventTypestays in the public enum so adapters keep posting it the same way. Zero source changes across the 8 adapters.Public API compatibility
XxxAndWaitonly gains an appended optionaltimeoutparameter defaulted to infinite. Existing callers incore/ffi.cppandui/uinotification.cppdon't change.Spontaneous-stop case (e.g.
sityped in the LLDB REPL)When the user issues a command directly to the underlying adapter REPL while no controller op is queued, the engine runs and stops.
m_inAdapterWaitis false at that point, soPostDebuggerEventSubmitsHandleSpontaneousAdapterStoponto the worker, which updates caches and callsNotifyStopped— same observable behavior as the dispatcher synthesis the old code did atdebuggercontroller.cpp:2279, just on the worker. Only one engine op can run at a time, so an adapter stop is unambiguously either "the response to whatever the worker last asked for" (consumed byWaitForAdapterStop) or "the user did this themselves while we were idle" (queued as spontaneous).Out of scope (follow-ups)
DbgEngAdapter::Attach's spawned thread (BINARYNINJA-1A),DbgEngAdapter::EngineLoop, LLDBEventListener, RSP send/receive. These need adapter destructors to act as join barriers — separate from controller-side scheduling.m_targetControlMutexcleanup. Redundant once the queue serializes work, but left in place to minimize behavioral risk.BinaryDataNotification/ settings notification unregister inDestroy(BINARYNINJA-75, -2H — theRebaseToAddressfamily).m_userRequestedBreakshould becomestd::atomic<bool>now thatPausewrites it from a non-worker thread.Test plan
ninja— bothlibdebuggercore.dylibandlibdebuggerui.dylib)sidirectly into LLDB while idle, confirm UI updatestest/debugger_test.py🤖 Generated with Claude Code