You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The 17 public operations on DebuggerController (Launch, Attach, Connect, Go, GoReverse, Step{Into,Over,Return}{,Reverse}, RunTo{,Reverse}, Restart, Detach, Quit, Pause) each spawn a fresh std::thread and detach it. Every site captures this raw (or via DbgRef after #1086 lands), then calls back into the controller. This has caused a recurring family of crashes — the controller can be destroyed while one of these detached threads is still running, leading to use-after-free on its members. Several recent Sentry reports trace to exactly this pattern:
BINARYNINJA-3X — RspConnector::SendRaw via CorelliumAdapter::BreakInto (26 events)
The per-call DbgRef capture (#1086) is a stopgap that keeps the controller alive for the duration of the detached call. The structural fix is to stop spawning per-op threads in the first place.
Approach
Replace the per-operation detach pattern with:
One persistent worker thread per controller, owned by the controller and joined in the destructor (after draining).
A FIFO work queue that every public operation submits to. The queue serializes ops naturally — no concurrent adapter calls, no mutex dance to prevent them.
Submit() / SubmitAndWait() templates that handle queueing and re-entrant calls (when the worker invokes another operation on itself, run inline instead of deadlocking).
Pause() and PauseAndWait() bypass the queue and call m_adapter->BreakInto() directly on the caller's thread — the worker is blocked inside ExecuteAdapterAndWait for the in-flight op and needs an out-of-band signal to unblock.
Timeout parameter on every XxxAndWait, default std::chrono::milliseconds::max() so existing callers don't change behavior. Closes the long-standing "sync ops can hang forever" complaint without an API break.
The public API surface is unchanged — every existing Launch() / LaunchAndWait() call site keeps working. Internally:
The old XxxAndWait() body (lock + Internal + unlock + NotifyStopped) is renamed XxxAndWaitOnWorker() (private) and invoked by the worker.
New public XxxAndWait(timeout) is a one-liner around SubmitAndWait.
Adapter-internal threads (DbgEngAdapter::Attach's spawned thread, EngineLoop, LLDB EventListener, RSP send/receive). These have their own lifetime concerns — adapter destructors need to act as join barriers. Separate from controller-side scheduling.
m_targetControlMutex cleanup. Redundant once the queue serializes work, but left in place to minimize behavioral risk.
BinaryDataNotification / settings notification unregister in Destroy (BINARYNINJA-75, -2H — the RebaseToAddress family).
m_userRequestedBreak should become std::atomic<bool> now that Pause writes it from a non-worker thread.
Simplification of the adapter-vs-controller stop event plumbing (separate issue).
Motivation
The 17 public operations on
DebuggerController(Launch,Attach,Connect,Go,GoReverse,Step{Into,Over,Return}{,Reverse},RunTo{,Reverse},Restart,Detach,Quit,Pause) each spawn a freshstd::threadand detach it. Every site capturesthisraw (or viaDbgRefafter #1086 lands), then calls back into the controller. This has caused a recurring family of crashes — the controller can be destroyed while one of these detached threads is still running, leading to use-after-free on its members. Several recent Sentry reports trace to exactly this pattern:RspConnector::SendRawviaCorelliumAdapter::BreakInto(26 events)DbgEngTTDAdapter::Reset→EngineLoopDbgEngAdapter::ReadMemoryDbgEngAdapter::Attachworker threadDetachAndWaitcrashThe per-call
DbgRefcapture (#1086) is a stopgap that keeps the controller alive for the duration of the detached call. The structural fix is to stop spawning per-op threads in the first place.Approach
Replace the per-operation detach pattern with:
Submit()/SubmitAndWait()templates that handle queueing and re-entrant calls (when the worker invokes another operation on itself, run inline instead of deadlocking).Pause()andPauseAndWait()bypass the queue and callm_adapter->BreakInto()directly on the caller's thread — the worker is blocked insideExecuteAdapterAndWaitfor the in-flight op and needs an out-of-band signal to unblock.XxxAndWait, defaultstd::chrono::milliseconds::max()so existing callers don't change behavior. Closes the long-standing "sync ops can hang forever" complaint without an API break.The public API surface is unchanged — every existing
Launch()/LaunchAndWait()call site keeps working. Internally:XxxAndWait()body (lock + Internal + unlock + NotifyStopped) is renamedXxxAndWaitOnWorker()(private) and invoked by the worker.XxxAndWait(timeout)is a one-liner aroundSubmitAndWait.Implementation
See #1087 (open).
Out of scope (follow-ups)
DbgEngAdapter::Attach's spawned thread,EngineLoop, LLDBEventListener, RSP send/receive). These have their own lifetime concerns — adapter destructors need to act as join barriers. Separate from controller-side scheduling.m_targetControlMutexcleanup. Redundant once the queue serializes work, but left in place to minimize behavioral risk.BinaryDataNotification/ settings notification unregister inDestroy(BINARYNINJA-75, -2H — theRebaseToAddressfamily).m_userRequestedBreakshould becomestd::atomic<bool>now thatPausewrites it from a non-worker thread.