Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb][debugserver] Interrupt should reset outstanding SIGSTOP #132128

Conversation

jasonmolenda
Copy link
Collaborator

@jasonmolenda jasonmolenda commented Mar 20, 2025

This fixes an uncommon bug with debugserver controlling an inferior process that is hitting an internal breakpoint & continuing when multiple interrupts are sent by SB API to lldb -- with the result being that lldb never stops the inferior process, ignoring the interrupt/stops being sent by the driver layer (Xcode, in this case).

In the reproducing setup (which required a machine with unique timing characteristics), lldb is sent SBProcess::Stop and then shortly after, SBProcess::SendAsyncInterrupt. The driver process only sees that the inferior is publicly running at this point, even though it's hitting an internal breakpoint (new dylib being loaded), disabling the bp, step instructioning, re-enabling the breakpoint, then continuing.

The packet sequence lldb sends to debugserver looks like

  1. vCont;s // instruction step
  2. ^c // async interrupt
  3. Z.... // re-enable breakpoint
  4. c // resume inferior execution
  5. ^c // async interrupt

When debugserver needs to interrupt a running process (MachProcess::Interrupt), the main thread in debugserver sends a SIGSTOP posix signal to the inferior process, and notes that it has sent this signal by setting m_sent_interrupt_signo.

When we send the first async interrupt while instruction stepping, the signal is sent (probably after the inferior has already stopped) but lldb can only receive the mach exception that includes the SIGSTOP when the process is running. So at the point of step (3), we have a SIGSTOP outstanding in the kernel, and
m_sent_interrupt_signo is set to SIGSTOP.

When we resume the inferior (c in step 4), debugserver sees that m_sent_interrupt_signo is still set for an outstanding SIGSTOP, but at this point we've already stopped so it's an unnecessary stop. It records that (1) we've got a SIGSTOP still coming that debugserver sent and (2) we should ignore it by also setting m_auto_resume_signo to the same signal value.

Once we've resumed the process, the mach exception thread (MachTask::ExceptionThread) receives the outstanding mach exception, adds it to a queue to be processed
(MachProcess::ExceptionMessageReceived) and when we've collected all outstanding mach exceptions, it calls
MachProcess::ExceptionMessageBundleComplete top evaluate them.

MachProcess::ExceptionMessageBundleComplete halts the process (without updating the MachProcess m_state) while evaluating them. It sees that this incoming SIGSTOP was meant to be ignored (m_auto_resume_signo is set), so it MachProcess::PrivateResume's the process again.

At the same time MachTask::ExceptionThread is receiving and processing the ME, MachProcess::Interrupt is called with another interrupt that debugserver received. This method checks that we're still eStateRunning (we are) but then sees that we have an outstanding SIGSTOP already (m_sent_interrupt_signo) and does nothing, assuming that we will stop shortly from that one. It then returns to call RNBRemote::HandlePacket_last_signal to print the status -- but because the process is still eStateRunning, this does nothing.

So the first ^c (resulting in a pending SIGSTOP) is received and we resume the process silently. And the second ^c is ignored because we've got one interrupt already being processed.

The fix was very simple. In MachProcess::Interrupt when we detect that we have a SIGSTOP out in the wild (m_sent_interrupt_signo), we need to clear m_auto_resume_signo which is used to indicate that this SIGSTOP is meant to be ignored, because it was from before our most recent resume.

MachProcess::Interrupt holds the m_exception_and_signal_mutex mutex already (after Jonas's commit last week), and all of MachProcess::ExceptionMessageBundleComplete holds that same mutex, so we know we can modify m_auto_resume_signo here and it will be handled correctly when the outstanding mach exception is finally processed.

rdar://145872120

This fixes an uncommon bug with debugserver controlling an inferior
process that is hitting an internal breakpoint & continuing when
multiple interrupts are sent by SB API to lldb.  In the reproducing
setup (which required a machine with unique timing characteristics),
lldb is sent SBProcess::Stop and then shortly after,
SBProcess::SendAsyncSignal.  The driver process only sees that the
inferior is publicly running at this point, even though it's hitting
an internal breakpoint, disabling it, step instructioning, re-enabling
the breakpoint, then continuing.

The packet sequence lldb sends to debugserver looks like

1. vCont;s   // instruction step
2. ^c        // async interrupt
3. Z....     // re-enable breakpoint
4. c         // resume inferior execution
5. ^c        // async interrupt

When debugserver needs to interrupt a running process
(`MachProcess::Interrupt`), the main thread in debugserver sends a
SIGSTOP posix signal to the inferior process, and notes that it has
sent this signal by setting `m_sent_interrupt_signo`.

When we send the first async interrupt while instruction stepping,
the signal is sent (probably after the inferior has already stopped)
but lldb can only *receive* the mach exception that includes the
SIGSTOP when the process is running.  So at the point of step (3),
we have a SIGSTOP outstanding in the kernel, and
`m_sent_interrupt_signo` is set to SIGSTOP.

When we resume the inferior (`c` in step 4), debugserver sees that
`m_sent_interrupt_signo` is still set for an outstanding SIGSTOP,
but at this point we've already stopped so it's an unnecessary stop.
It records that (1) we've got a SIGSTOP still coming that debugserver
sent and (2) we should ignore it by also setting `m_auto_resume_signo`
to the same signal value.

Once we've resumed the process, the mach exception thread
(`MachTask::ExceptionThread`) receives the outstanding mach exception,
adds it to a queue to be processed
(`MachProcess::ExceptionMessageReceived`) and when we've collected
all outstanding mach exceptions, it calls
`MachProcess::ExceptionMessageBundleComplete` top evaluate them.

`MachProcess::ExceptionMessageBundleComplete` halts the process
(without updating the MachProcess `m_state`) while evaluating them.
It sees that this incoming SIGSTOP was meant to be ignored
(`m_auto_resume_signo` is set), so it `MachProcess::PrivateResume`'s
the process again.

At the same time `MachTask::ExceptionThread` is receiving and
processing the ME, `MachProcess::Interrupt` is called with another
interrupt that debugserver received.  This method checks that we're
still eStateRunning (we are) but then sees that we have an outstanding
SIGSTOP already (`m_sent_interrupt_signo`) and does nothing, assuming
that we will stop shortly from that one.  It then returns to call
`RNBRemote::HandlePacket_last_signal` to print the status -- but
because the process is still `eStateRunning`, this does nothing.

So the first ^c (resulting in a pending SIGSTOP) is received and
we resume the process silently.  And the second ^c is ignored because
we've got one interrupt already being processed.

The fix was very simple.  In `MachProcess::Interrupt` when we detect
that we have a SIGSTOP out in the wild (`m_sent_interrupt_signo`),
we need to clear `m_auto_resume_signo` which is used to indicate
that this SIGSTOP is meant to be ignored, because it was from before
our most recent resume.

MachProcess::Interrupt holds the `m_exception_and_signal_mutex`
mutex already (after Jonas's commit last week), and all of
`MachProcess::ExceptionMessageBundleComplete` holds that same mutex,
so we know we can modify `m_auto_resume_signo` here and it will be
handled correctly when the outstanding mach exception is finally
processed.

rdar://145872120
@llvmbot
Copy link
Member

llvmbot commented Mar 20, 2025

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

This fixes an uncommon bug with debugserver controlling an inferior process that is hitting an internal breakpoint & continuing when multiple interrupts are sent by SB API to lldb. In the reproducing setup (which required a machine with unique timing characteristics), lldb is sent SBProcess::Stop and then shortly after, SBProcess::SendAsyncSignal. The driver process only sees that the inferior is publicly running at this point, even though it's hitting an internal breakpoint, disabling it, step instructioning, re-enabling the breakpoint, then continuing.

The packet sequence lldb sends to debugserver looks like

  1. vCont;s // instruction step
  2. ^c // async interrupt
  3. Z.... // re-enable breakpoint
  4. c // resume inferior execution
  5. ^c // async interrupt

When debugserver needs to interrupt a running process (MachProcess::Interrupt), the main thread in debugserver sends a SIGSTOP posix signal to the inferior process, and notes that it has sent this signal by setting m_sent_interrupt_signo.

When we send the first async interrupt while instruction stepping, the signal is sent (probably after the inferior has already stopped) but lldb can only receive the mach exception that includes the SIGSTOP when the process is running. So at the point of step (3), we have a SIGSTOP outstanding in the kernel, and
m_sent_interrupt_signo is set to SIGSTOP.

When we resume the inferior (c in step 4), debugserver sees that m_sent_interrupt_signo is still set for an outstanding SIGSTOP, but at this point we've already stopped so it's an unnecessary stop. It records that (1) we've got a SIGSTOP still coming that debugserver sent and (2) we should ignore it by also setting m_auto_resume_signo to the same signal value.

Once we've resumed the process, the mach exception thread (MachTask::ExceptionThread) receives the outstanding mach exception, adds it to a queue to be processed
(MachProcess::ExceptionMessageReceived) and when we've collected all outstanding mach exceptions, it calls
MachProcess::ExceptionMessageBundleComplete top evaluate them.

MachProcess::ExceptionMessageBundleComplete halts the process (without updating the MachProcess m_state) while evaluating them. It sees that this incoming SIGSTOP was meant to be ignored (m_auto_resume_signo is set), so it MachProcess::PrivateResume's the process again.

At the same time MachTask::ExceptionThread is receiving and processing the ME, MachProcess::Interrupt is called with another interrupt that debugserver received. This method checks that we're still eStateRunning (we are) but then sees that we have an outstanding SIGSTOP already (m_sent_interrupt_signo) and does nothing, assuming that we will stop shortly from that one. It then returns to call RNBRemote::HandlePacket_last_signal to print the status -- but because the process is still eStateRunning, this does nothing.

So the first ^c (resulting in a pending SIGSTOP) is received and we resume the process silently. And the second ^c is ignored because we've got one interrupt already being processed.

The fix was very simple. In MachProcess::Interrupt when we detect that we have a SIGSTOP out in the wild (m_sent_interrupt_signo), we need to clear m_auto_resume_signo which is used to indicate that this SIGSTOP is meant to be ignored, because it was from before our most recent resume.

MachProcess::Interrupt holds the m_exception_and_signal_mutex mutex already (after Jonas's commit last week), and all of MachProcess::ExceptionMessageBundleComplete holds that same mutex, so we know we can modify m_auto_resume_signo here and it will be handled correctly when the outstanding mach exception is finally processed.

rdar://145872120


Full diff: https://github.com/llvm/llvm-project/pull/132128.diff

1 Files Affected:

  • (modified) lldb/tools/debugserver/source/MacOSX/MachProcess.mm (+4)
diff --git a/lldb/tools/debugserver/source/MacOSX/MachProcess.mm b/lldb/tools/debugserver/source/MacOSX/MachProcess.mm
index 4742beeb6a4a3..447ebbe7fb9e5 100644
--- a/lldb/tools/debugserver/source/MacOSX/MachProcess.mm
+++ b/lldb/tools/debugserver/source/MacOSX/MachProcess.mm
@@ -1594,6 +1594,10 @@ static uint64_t bits(uint64_t value, uint32_t msbit, uint32_t lsbit) {
                          m_sent_interrupt_signo);
       }
     } else {
+      // We've requested that the process stop anew; if we had recorded this
+      // requested stop as being in place when we resumed (& therefore would
+      // throw it away), clear that.
+      m_auto_resume_signo = 0;
       DNBLogThreadedIf(LOG_PROCESS, "MachProcess::Interrupt() - previously "
                                     "sent an interrupt signal %i that hasn't "
                                     "been received yet, interrupt aborted",

// We've requested that the process stop anew; if we had recorded this
// requested stop as being in place when we resumed (& therefore would
// throw it away), clear that.
m_auto_resume_signo = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and this is protected by the m_exception_and_signal_mutex. 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you added that lock when you were addressing TSAN issues, it's definitely required.

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jasonmolenda jasonmolenda merged commit e60e064 into llvm:main Mar 20, 2025
12 checks passed
@jasonmolenda jasonmolenda deleted the handle-double-interrupt-of-private-stop-state-process-correctly2 branch March 20, 2025 20:31
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Mar 20, 2025
…132128)

This fixes an uncommon bug with debugserver controlling an inferior
process that is hitting an internal breakpoint & continuing when
multiple interrupts are sent by SB API to lldb -- with the result being
that lldb never stops the inferior process, ignoring the interrupt/stops
being sent by the driver layer (Xcode, in this case).

In the reproducing setup (which required a machine with unique timing
characteristics), lldb is sent SBProcess::Stop and then shortly after,
SBProcess::SendAsyncInterrupt. The driver process only sees that the
inferior is publicly running at this point, even though it's hitting an
internal breakpoint (new dylib being loaded), disabling the bp, step
instructioning, re-enabling the breakpoint, then continuing.

The packet sequence lldb sends to debugserver looks like

1. vCont;s   // instruction step
2. ^c        // async interrupt
3. Z....     // re-enable breakpoint
4. c         // resume inferior execution
5. ^c        // async interrupt

When debugserver needs to interrupt a running process
(`MachProcess::Interrupt`), the main thread in debugserver sends a
SIGSTOP posix signal to the inferior process, and notes that it has sent
this signal by setting `m_sent_interrupt_signo`.

When we send the first async interrupt while instruction stepping, the
signal is sent (probably after the inferior has already stopped) but
lldb can only *receive* the mach exception that includes the SIGSTOP
when the process is running. So at the point of step (3), we have a
SIGSTOP outstanding in the kernel, and
`m_sent_interrupt_signo` is set to SIGSTOP.

When we resume the inferior (`c` in step 4), debugserver sees that
`m_sent_interrupt_signo` is still set for an outstanding SIGSTOP, but at
this point we've already stopped so it's an unnecessary stop. It records
that (1) we've got a SIGSTOP still coming that debugserver sent and (2)
we should ignore it by also setting `m_auto_resume_signo` to the same
signal value.

Once we've resumed the process, the mach exception thread
(`MachTask::ExceptionThread`) receives the outstanding mach exception,
adds it to a queue to be processed
(`MachProcess::ExceptionMessageReceived`) and when we've collected all
outstanding mach exceptions, it calls
`MachProcess::ExceptionMessageBundleComplete` top evaluate them.

`MachProcess::ExceptionMessageBundleComplete` halts the process (without
updating the MachProcess `m_state`) while evaluating them. It sees that
this incoming SIGSTOP was meant to be ignored (`m_auto_resume_signo` is
set), so it `MachProcess::PrivateResume`'s the process again.

At the same time `MachTask::ExceptionThread` is receiving and processing
the ME, `MachProcess::Interrupt` is called with another interrupt that
debugserver received. This method checks that we're still eStateRunning
(we are) but then sees that we have an outstanding SIGSTOP already
(`m_sent_interrupt_signo`) and does nothing, assuming that we will stop
shortly from that one. It then returns to call
`RNBRemote::HandlePacket_last_signal` to print the status -- but because
the process is still `eStateRunning`, this does nothing.

So the first ^c (resulting in a pending SIGSTOP) is received and we
resume the process silently. And the second ^c is ignored because we've
got one interrupt already being processed.

The fix was very simple. In `MachProcess::Interrupt` when we detect that
we have a SIGSTOP out in the wild (`m_sent_interrupt_signo`), we need to
clear `m_auto_resume_signo` which is used to indicate that this SIGSTOP
is meant to be ignored, because it was from before our most recent
resume.

MachProcess::Interrupt holds the `m_exception_and_signal_mutex` mutex
already (after Jonas's commit last week), and all of
`MachProcess::ExceptionMessageBundleComplete` holds that same mutex, so
we know we can modify `m_auto_resume_signo` here and it will be handled
correctly when the outstanding mach exception is finally processed.

rdar://145872120
(cherry picked from commit e60e064)
jasonmolenda added a commit to swiftlang/llvm-project that referenced this pull request Mar 21, 2025
…132128) (#10309)

This fixes an uncommon bug with debugserver controlling an inferior
process that is hitting an internal breakpoint & continuing when
multiple interrupts are sent by SB API to lldb -- with the result being
that lldb never stops the inferior process, ignoring the interrupt/stops
being sent by the driver layer (Xcode, in this case).

In the reproducing setup (which required a machine with unique timing
characteristics), lldb is sent SBProcess::Stop and then shortly after,
SBProcess::SendAsyncInterrupt. The driver process only sees that the
inferior is publicly running at this point, even though it's hitting an
internal breakpoint (new dylib being loaded), disabling the bp, step
instructioning, re-enabling the breakpoint, then continuing.

The packet sequence lldb sends to debugserver looks like

1. vCont;s   // instruction step
2. ^c        // async interrupt
3. Z....     // re-enable breakpoint
4. c         // resume inferior execution
5. ^c        // async interrupt

When debugserver needs to interrupt a running process
(`MachProcess::Interrupt`), the main thread in debugserver sends a
SIGSTOP posix signal to the inferior process, and notes that it has sent
this signal by setting `m_sent_interrupt_signo`.

When we send the first async interrupt while instruction stepping, the
signal is sent (probably after the inferior has already stopped) but
lldb can only *receive* the mach exception that includes the SIGSTOP
when the process is running. So at the point of step (3), we have a
SIGSTOP outstanding in the kernel, and
`m_sent_interrupt_signo` is set to SIGSTOP.

When we resume the inferior (`c` in step 4), debugserver sees that
`m_sent_interrupt_signo` is still set for an outstanding SIGSTOP, but at
this point we've already stopped so it's an unnecessary stop. It records
that (1) we've got a SIGSTOP still coming that debugserver sent and (2)
we should ignore it by also setting `m_auto_resume_signo` to the same
signal value.

Once we've resumed the process, the mach exception thread
(`MachTask::ExceptionThread`) receives the outstanding mach exception,
adds it to a queue to be processed
(`MachProcess::ExceptionMessageReceived`) and when we've collected all
outstanding mach exceptions, it calls
`MachProcess::ExceptionMessageBundleComplete` top evaluate them.

`MachProcess::ExceptionMessageBundleComplete` halts the process (without
updating the MachProcess `m_state`) while evaluating them. It sees that
this incoming SIGSTOP was meant to be ignored (`m_auto_resume_signo` is
set), so it `MachProcess::PrivateResume`'s the process again.

At the same time `MachTask::ExceptionThread` is receiving and processing
the ME, `MachProcess::Interrupt` is called with another interrupt that
debugserver received. This method checks that we're still eStateRunning
(we are) but then sees that we have an outstanding SIGSTOP already
(`m_sent_interrupt_signo`) and does nothing, assuming that we will stop
shortly from that one. It then returns to call
`RNBRemote::HandlePacket_last_signal` to print the status -- but because
the process is still `eStateRunning`, this does nothing.

So the first ^c (resulting in a pending SIGSTOP) is received and we
resume the process silently. And the second ^c is ignored because we've
got one interrupt already being processed.

The fix was very simple. In `MachProcess::Interrupt` when we detect that
we have a SIGSTOP out in the wild (`m_sent_interrupt_signo`), we need to
clear `m_auto_resume_signo` which is used to indicate that this SIGSTOP
is meant to be ignored, because it was from before our most recent
resume.

MachProcess::Interrupt holds the `m_exception_and_signal_mutex` mutex
already (after Jonas's commit last week), and all of
`MachProcess::ExceptionMessageBundleComplete` holds that same mutex, so
we know we can modify `m_auto_resume_signo` here and it will be handled
correctly when the outstanding mach exception is finally processed.

rdar://145872120
(cherry picked from commit e60e064)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants