Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply "[lldb] Implement basic support for reverse-continue (#125242)" (again) #128156

Merged
merged 3 commits into from
Mar 17, 2025

Conversation

labath
Copy link
Collaborator

@labath labath commented Feb 21, 2025

This reverts commit 87b7f63, reapplying
7e66cf7 with a small (and probably temporary)
change to generate more debug info to help with diagnosing buildbot issues.

…5242)" (again)

This reverts commit 87b7f63, reapplying
7e66cf7 with a small (and probably temporary)
change to generate more debug info to help with diagnosing buildbot issues.
@labath labath requested a review from JDevlieghere as a code owner February 21, 2025 09:53
@labath labath requested review from adrian-prantl and jasonmolenda and removed request for JDevlieghere February 21, 2025 09:53
@llvmbot
Copy link
Member

llvmbot commented Feb 21, 2025

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

This reverts commit 87b7f63, reapplying
7e66cf7 with a small (and probably temporary)
change to generate more debug info to help with diagnosing buildbot issues.


Patch is 86.67 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128156.diff

40 Files Affected:

  • (modified) lldb/include/lldb/API/SBProcess.h (+1)
  • (modified) lldb/include/lldb/Target/Process.h (+26-2)
  • (modified) lldb/include/lldb/Target/StopInfo.h (+7)
  • (modified) lldb/include/lldb/Target/Thread.h (+4-5)
  • (modified) lldb/include/lldb/Target/ThreadList.h (+5-1)
  • (modified) lldb/include/lldb/Target/ThreadPlan.h (+13)
  • (modified) lldb/include/lldb/Target/ThreadPlanBase.h (+2)
  • (modified) lldb/include/lldb/lldb-enumerations.h (+6)
  • (modified) lldb/packages/Python/lldbsuite/test/gdbclientutils.py (+3-2)
  • (added) lldb/packages/Python/lldbsuite/test/lldbgdbproxy.py (+177)
  • (added) lldb/packages/Python/lldbsuite/test/lldbreverse.py (+541)
  • (modified) lldb/packages/Python/lldbsuite/test/lldbtest.py (+2)
  • (modified) lldb/packages/Python/lldbsuite/test/tools/lldb-server/lldbgdbserverutils.py (+9-5)
  • (modified) lldb/source/API/SBProcess.cpp (+12)
  • (modified) lldb/source/API/SBThread.cpp (+2)
  • (modified) lldb/source/Interpreter/CommandInterpreter.cpp (+2-1)
  • (modified) lldb/source/Plugins/Process/Linux/NativeThreadLinux.cpp (+3)
  • (modified) lldb/source/Plugins/Process/MacOSX-Kernel/ProcessKDP.cpp (+7-1)
  • (modified) lldb/source/Plugins/Process/MacOSX-Kernel/ProcessKDP.h (+1-1)
  • (modified) lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp (+8-1)
  • (modified) lldb/source/Plugins/Process/Windows/Common/ProcessWindows.h (+1-1)
  • (modified) lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp (+20)
  • (modified) lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.h (+6)
  • (modified) lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp (+1)
  • (modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp (+85-13)
  • (modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h (+3-1)
  • (modified) lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp (+7-2)
  • (modified) lldb/source/Plugins/Process/scripted/ScriptedProcess.h (+1-1)
  • (modified) lldb/source/Target/Process.cpp (+20-4)
  • (modified) lldb/source/Target/StopInfo.cpp (+28)
  • (modified) lldb/source/Target/Thread.cpp (+6-3)
  • (modified) lldb/source/Target/ThreadList.cpp (+29-3)
  • (modified) lldb/source/Target/ThreadPlanBase.cpp (+4)
  • (added) lldb/test/API/functionalities/reverse-execution/Makefile (+3)
  • (added) lldb/test/API/functionalities/reverse-execution/TestReverseContinueBreakpoints.py (+159)
  • (added) lldb/test/API/functionalities/reverse-execution/TestReverseContinueNotSupported.py (+35)
  • (added) lldb/test/API/functionalities/reverse-execution/TestReverseContinueWatchpoints.py (+136)
  • (added) lldb/test/API/functionalities/reverse-execution/main.c (+25)
  • (modified) lldb/tools/lldb-dap/JSONUtils.cpp (+3)
  • (modified) lldb/tools/lldb-dap/LLDBUtils.cpp (+1)
diff --git a/lldb/include/lldb/API/SBProcess.h b/lldb/include/lldb/API/SBProcess.h
index 1624e02070b1b..882b8bd837131 100644
--- a/lldb/include/lldb/API/SBProcess.h
+++ b/lldb/include/lldb/API/SBProcess.h
@@ -159,6 +159,7 @@ class LLDB_API SBProcess {
   lldb::SBError Destroy();
 
   lldb::SBError Continue();
+  lldb::SBError ContinueInDirection(lldb::RunDirection direction);
 
   lldb::SBError Stop();
 
diff --git a/lldb/include/lldb/Target/Process.h b/lldb/include/lldb/Target/Process.h
index c3622a29bc772..2e827d4c5cb74 100644
--- a/lldb/include/lldb/Target/Process.h
+++ b/lldb/include/lldb/Target/Process.h
@@ -1089,6 +1089,13 @@ class Process : public std::enable_shared_from_this<Process>,
   ///     Returns an error object.
   virtual Status WillResume() { return Status(); }
 
+  /// Reports whether this process supports reverse execution.
+  ///
+  /// \return
+  ///     Returns true if the process supports reverse execution (at least
+  /// under some circumstances).
+  virtual bool SupportsReverseDirection() { return false; }
+
   /// Resumes all of a process's threads as configured using the Thread run
   /// control functions.
   ///
@@ -1104,9 +1111,13 @@ class Process : public std::enable_shared_from_this<Process>,
   /// \see Thread:Resume()
   /// \see Thread:Step()
   /// \see Thread:Suspend()
-  virtual Status DoResume() {
+  virtual Status DoResume(lldb::RunDirection direction) {
+    if (direction == lldb::RunDirection::eRunForward)
+      return Status::FromErrorStringWithFormatv(
+          "error: {0} does not support resuming processes", GetPluginName());
     return Status::FromErrorStringWithFormatv(
-        "error: {0} does not support resuming processes", GetPluginName());
+        "error: {0} does not support reverse execution of processes",
+        GetPluginName());
   }
 
   /// Called after resuming a process.
@@ -2677,6 +2688,18 @@ void PruneThreadPlans();
                             const AddressRange &range, size_t alignment,
                             Status &error);
 
+  /// Get the base run direction for the process.
+  /// The base direction is the direction the process will execute in
+  /// (forward or backward) if no thread plan overrides the direction.
+  lldb::RunDirection GetBaseDirection() const { return m_base_direction; }
+  /// Set the base run direction for the process.
+  /// As a side-effect, if this changes the base direction, then we
+  /// discard all non-base thread plans to ensure that when execution resumes
+  /// we definitely execute in the requested direction.
+  /// FIXME: this is overkill. In some situations ensuring the latter
+  /// would not require discarding all non-base thread plans.
+  void SetBaseDirection(lldb::RunDirection direction);
+
 protected:
   friend class Trace;
 
@@ -3076,6 +3099,7 @@ void PruneThreadPlans();
   ThreadList
       m_extended_thread_list; ///< Constituent for extended threads that may be
                               /// generated, cleared on natural stops
+  lldb::RunDirection m_base_direction; ///< ThreadPlanBase run direction
   uint32_t m_extended_thread_stop_id; ///< The natural stop id when
                                       ///extended_thread_list was last updated
   QueueList
diff --git a/lldb/include/lldb/Target/StopInfo.h b/lldb/include/lldb/Target/StopInfo.h
index 45beac129e86f..9a13371708be5 100644
--- a/lldb/include/lldb/Target/StopInfo.h
+++ b/lldb/include/lldb/Target/StopInfo.h
@@ -20,6 +20,7 @@ namespace lldb_private {
 class StopInfo : public std::enable_shared_from_this<StopInfo> {
   friend class Process::ProcessEventData;
   friend class ThreadPlanBase;
+  friend class ThreadPlanReverseContinue;
 
 public:
   // Constructors and Destructors
@@ -154,6 +155,12 @@ class StopInfo : public std::enable_shared_from_this<StopInfo> {
   static lldb::StopInfoSP
   CreateStopReasonProcessorTrace(Thread &thread, const char *description);
 
+  // This creates a StopInfo indicating that execution stopped because
+  // it was replaying some recorded execution history, and execution reached
+  // the end of that recorded history.
+  static lldb::StopInfoSP
+  CreateStopReasonHistoryBoundary(Thread &thread, const char *description);
+
   static lldb::StopInfoSP CreateStopReasonFork(Thread &thread,
                                                lldb::pid_t child_pid,
                                                lldb::tid_t child_tid);
diff --git a/lldb/include/lldb/Target/Thread.h b/lldb/include/lldb/Target/Thread.h
index 1d1e3dcfc1dc6..6ede7fa301a82 100644
--- a/lldb/include/lldb/Target/Thread.h
+++ b/lldb/include/lldb/Target/Thread.h
@@ -201,14 +201,13 @@ class Thread : public std::enable_shared_from_this<Thread>,
   ///    The User resume state for this thread.
   lldb::StateType GetResumeState() const { return m_resume_state; }
 
-  /// This function is called on all the threads before "ShouldResume" and
-  /// "WillResume" in case a thread needs to change its state before the
-  /// ThreadList polls all the threads to figure out which ones actually will
-  /// get to run and how.
+  // This function is called to determine whether the thread needs to
+  // step over a breakpoint and if so, push a step-over-breakpoint thread
+  // plan.
   ///
   /// \return
   ///    True if we pushed a ThreadPlanStepOverBreakpoint
-  bool SetupForResume();
+  bool SetupToStepOverBreakpointIfNeeded(lldb::RunDirection direction);
 
   // Do not override this function, it is for thread plan logic only
   bool ShouldResume(lldb::StateType resume_state);
diff --git a/lldb/include/lldb/Target/ThreadList.h b/lldb/include/lldb/Target/ThreadList.h
index bca01f5fe2083..c108962003598 100644
--- a/lldb/include/lldb/Target/ThreadList.h
+++ b/lldb/include/lldb/Target/ThreadList.h
@@ -113,6 +113,10 @@ class ThreadList : public ThreadCollection {
   /// If a thread can "resume" without having to resume the target, it
   /// will return false for WillResume, and then the process will not be
   /// restarted.
+  /// Sets *direction to the run direction of the thread(s) that will
+  /// be resumed. If threads that we want to run disagree about the
+  /// direction, we execute forwards and pop any of the thread plans
+  /// that requested reverse execution.
   ///
   /// \return
   ///    \b true instructs the process to resume normally,
@@ -120,7 +124,7 @@ class ThreadList : public ThreadCollection {
   ///    the process will not actually run.  The thread must then return
   ///    the correct StopInfo when asked.
   ///
-  bool WillResume();
+  bool WillResume(lldb::RunDirection &direction);
 
   void DidResume();
 
diff --git a/lldb/include/lldb/Target/ThreadPlan.h b/lldb/include/lldb/Target/ThreadPlan.h
index d6da484f4fc13..a7bac8cc5ecf6 100644
--- a/lldb/include/lldb/Target/ThreadPlan.h
+++ b/lldb/include/lldb/Target/ThreadPlan.h
@@ -283,6 +283,15 @@ namespace lldb_private {
 //  report_run_vote argument to the constructor works like report_stop_vote, and
 //  is a way for a plan to instruct a sub-plan on how to respond to
 //  ShouldReportStop.
+//
+//  Reverse execution:
+//
+//  Every thread plan has an associated RunDirection (forward or backward).
+//  For ThreadPlanBase, this direction is the Process's base direction.
+//  Whenever we resume the target, we need to ensure that the topmost thread
+//  plans for each runnable thread all agree on their direction. This is
+//  ensured in ThreadList::WillResume(), which chooses a direction and then
+//  discards thread plans incompatible with that direction.
 
 class ThreadPlan : public std::enable_shared_from_this<ThreadPlan>,
                    public UserID {
@@ -497,6 +506,10 @@ class ThreadPlan : public std::enable_shared_from_this<ThreadPlan>,
 
   virtual lldb::StateType GetPlanRunState() = 0;
 
+  virtual lldb::RunDirection GetDirection() const {
+    return lldb::RunDirection::eRunForward;
+  }
+
 protected:
   // Constructors and Destructors
   ThreadPlan(ThreadPlanKind kind, const char *name, Thread &thread,
diff --git a/lldb/include/lldb/Target/ThreadPlanBase.h b/lldb/include/lldb/Target/ThreadPlanBase.h
index 5c44b9fb17b27..f4418d779a4da 100644
--- a/lldb/include/lldb/Target/ThreadPlanBase.h
+++ b/lldb/include/lldb/Target/ThreadPlanBase.h
@@ -38,6 +38,8 @@ class ThreadPlanBase : public ThreadPlan {
 
   bool IsBasePlan() override { return true; }
 
+  lldb::RunDirection GetDirection() const override;
+
 protected:
   bool DoWillResume(lldb::StateType resume_state, bool current_plan) override;
   bool DoPlanExplainsStop(Event *event_ptr) override;
diff --git a/lldb/include/lldb/lldb-enumerations.h b/lldb/include/lldb/lldb-enumerations.h
index fecf9cbb765f7..551f28be9585c 100644
--- a/lldb/include/lldb/lldb-enumerations.h
+++ b/lldb/include/lldb/lldb-enumerations.h
@@ -135,6 +135,9 @@ FLAGS_ENUM(LaunchFlags){
 /// Thread Run Modes.
 enum RunMode { eOnlyThisThread, eAllThreads, eOnlyDuringStepping };
 
+/// Execution directions
+enum RunDirection { eRunForward, eRunReverse };
+
 /// Byte ordering definitions.
 enum ByteOrder {
   eByteOrderInvalid = 0,
@@ -254,6 +257,9 @@ enum StopReason {
   eStopReasonVFork,
   eStopReasonVForkDone,
   eStopReasonInterrupt, ///< Thread requested interrupt
+  // Indicates that execution stopped because the debugger backend relies
+  // on recorded data and we reached the end of that data.
+  eStopReasonHistoryBoundary,
 };
 
 /// Command Return Status Types.
diff --git a/lldb/packages/Python/lldbsuite/test/gdbclientutils.py b/lldb/packages/Python/lldbsuite/test/gdbclientutils.py
index 4b782b3b470fe..753de22b9cfee 100644
--- a/lldb/packages/Python/lldbsuite/test/gdbclientutils.py
+++ b/lldb/packages/Python/lldbsuite/test/gdbclientutils.py
@@ -516,8 +516,9 @@ def start(self):
         self._thread.start()
 
     def stop(self):
-        self._thread.join()
-        self._thread = None
+        if self._thread is not None:
+            self._thread.join()
+            self._thread = None
 
     def get_connect_address(self):
         return self._socket.get_connect_address()
diff --git a/lldb/packages/Python/lldbsuite/test/lldbgdbproxy.py b/lldb/packages/Python/lldbsuite/test/lldbgdbproxy.py
new file mode 100644
index 0000000000000..e41f89e2443a4
--- /dev/null
+++ b/lldb/packages/Python/lldbsuite/test/lldbgdbproxy.py
@@ -0,0 +1,177 @@
+import logging
+import os
+import os.path
+import random
+
+import lldb
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test.gdbclientutils import *
+import lldbgdbserverutils
+from lldbsuite.support import seven
+
+
+class GDBProxyTestBase(TestBase):
+    """
+    Base class for gdbserver proxy tests.
+
+    This class will setup and start a mock GDB server for the test to use.
+    It pases through requests to a regular lldb-server/debugserver and
+    forwards replies back to the LLDB under test.
+    """
+
+    """The gdbserver that we implement."""
+    server = None
+    """The inner lldb-server/debugserver process that we proxy requests into."""
+    monitor_server = None
+    monitor_sock = None
+
+    server_socket_class = TCPServerSocket
+
+    DEFAULT_TIMEOUT = 20 * (10 if ("ASAN_OPTIONS" in os.environ) else 1)
+
+    _verbose_log_handler = None
+    _log_formatter = logging.Formatter(fmt="%(asctime)-15s %(levelname)-8s %(message)s")
+
+    def setUpBaseLogging(self):
+        self.logger = logging.getLogger(__name__)
+
+        self.logger.propagate = False
+        self.logger.setLevel(logging.DEBUG)
+
+        # log all warnings to stderr
+        self._stderr_log_handler = logging.StreamHandler()
+        self._stderr_log_handler.setLevel(logging.DEBUG if self.TraceOn() else logging.WARNING)
+        self._stderr_log_handler.setFormatter(self._log_formatter)
+        self.logger.addHandler(self._stderr_log_handler)
+
+    def setUp(self):
+        TestBase.setUp(self)
+
+        self.setUpBaseLogging()
+
+        if self.isVerboseLoggingRequested():
+            # If requested, full logs go to a log file
+            log_file_name = self.getLogBasenameForCurrentTest() + "-proxy.log"
+            self._verbose_log_handler = logging.FileHandler(log_file_name)
+            self._verbose_log_handler.setFormatter(self._log_formatter)
+            self._verbose_log_handler.setLevel(logging.DEBUG)
+            self.logger.addHandler(self._verbose_log_handler)
+
+        if lldbplatformutil.getPlatform() == "macosx":
+            self.debug_monitor_exe = lldbgdbserverutils.get_debugserver_exe()
+            self.debug_monitor_extra_args = []
+        else:
+            self.debug_monitor_exe = lldbgdbserverutils.get_lldb_server_exe()
+            self.debug_monitor_extra_args = ["gdbserver"]
+        self.assertIsNotNone(self.debug_monitor_exe)
+
+        self.server = MockGDBServer(self.server_socket_class())
+        self.server.responder = self
+
+    def tearDown(self):
+        # TestBase.tearDown will kill the process, but we need to kill it early
+        # so its client connection closes and we can stop the server before
+        # finally calling the base tearDown.
+        if self.process() is not None:
+            self.process().Kill()
+        self.server.stop()
+
+        self.logger.removeHandler(self._verbose_log_handler)
+        self._verbose_log_handler = None
+        self.logger.removeHandler(self._stderr_log_handler)
+        self._stderr_log_handler = None
+
+        TestBase.tearDown(self)
+
+    def isVerboseLoggingRequested(self):
+        # We will report our detailed logs if the user requested that the "gdb-remote" channel is
+        # logged.
+        return any(("gdb-remote" in channel) for channel in lldbtest_config.channels)
+
+    def connect(self, target):
+        """
+        Create a process by connecting to the mock GDB server.
+        """
+        self.prep_debug_monitor_and_inferior()
+        self.server.start()
+
+        listener = self.dbg.GetListener()
+        error = lldb.SBError()
+        process = target.ConnectRemote(
+            listener, self.server.get_connect_url(), "gdb-remote", error
+        )
+        self.assertTrue(error.Success(), error.description)
+        self.assertTrue(process, PROCESS_IS_VALID)
+        return process
+
+    def prep_debug_monitor_and_inferior(self):
+        inferior_exe_path = self.getBuildArtifact("a.out")
+        self.connect_to_debug_monitor([inferior_exe_path])
+        self.assertIsNotNone(self.monitor_server)
+        self.initial_handshake()
+
+    def initial_handshake(self):
+        self.monitor_server.send_packet(seven.bitcast_to_bytes("+"))
+        reply = seven.bitcast_to_string(self.monitor_server.get_normal_packet())
+        self.assertEqual(reply, "+")
+        self.monitor_server.send_packet(seven.bitcast_to_bytes("QStartNoAckMode"))
+        reply = seven.bitcast_to_string(self.monitor_server.get_normal_packet())
+        self.assertEqual(reply, "+")
+        reply = seven.bitcast_to_string(self.monitor_server.get_normal_packet())
+        self.assertEqual(reply, "OK")
+        self.monitor_server.set_validate_checksums(False)
+        self.monitor_server.send_packet(seven.bitcast_to_bytes("+"))
+        reply = seven.bitcast_to_string(self.monitor_server.get_normal_packet())
+        self.assertEqual(reply, "+")
+
+    def get_debug_monitor_command_line_args(self, connect_address, launch_args):
+        return (
+            self.debug_monitor_extra_args
+            + ["--reverse-connect", connect_address]
+            + launch_args
+        )
+
+    def launch_debug_monitor(self, launch_args):
+        family, type, proto, _, addr = socket.getaddrinfo(
+            "localhost", 0, proto=socket.IPPROTO_TCP
+        )[0]
+        sock = socket.socket(family, type, proto)
+        sock.settimeout(self.DEFAULT_TIMEOUT)
+        sock.bind(addr)
+        sock.listen(1)
+        addr = sock.getsockname()
+        connect_address = "[{}]:{}".format(*addr)
+
+        commandline_args = self.get_debug_monitor_command_line_args(
+            connect_address, launch_args
+        )
+
+        # Start the server.
+        self.logger.info(f"Spawning monitor {commandline_args}")
+        monitor_process = self.spawnSubprocess(
+            self.debug_monitor_exe, commandline_args, install_remote=False
+        )
+        self.assertIsNotNone(monitor_process)
+
+        self.monitor_sock = sock.accept()[0]
+        self.monitor_sock.settimeout(self.DEFAULT_TIMEOUT)
+        return monitor_process
+
+    def connect_to_debug_monitor(self, launch_args):
+        monitor_process = self.launch_debug_monitor(launch_args)
+        # Turn off checksum validation because debugserver does not produce
+        # correct checksums.
+        self.monitor_server = lldbgdbserverutils.Server(
+            self.monitor_sock, monitor_process
+        )
+
+    def respond(self, packet):
+        """Subclasses can override this to change how packets are handled."""
+        return self.pass_through(packet)
+
+    def pass_through(self, packet):
+        self.logger.info(f"Sending packet {packet}")
+        self.monitor_server.send_packet(seven.bitcast_to_bytes(packet))
+        reply = seven.bitcast_to_string(self.monitor_server.get_normal_packet())
+        self.logger.info(f"Received reply {reply}")
+        return reply
diff --git a/lldb/packages/Python/lldbsuite/test/lldbreverse.py b/lldb/packages/Python/lldbsuite/test/lldbreverse.py
new file mode 100644
index 0000000000000..a42cc7cac15d3
--- /dev/null
+++ b/lldb/packages/Python/lldbsuite/test/lldbreverse.py
@@ -0,0 +1,541 @@
+import os
+import os.path
+import lldb
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test.gdbclientutils import *
+from lldbsuite.test.lldbgdbproxy import *
+import lldbgdbserverutils
+import re
+
+
+class ThreadSnapshot:
+    def __init__(self, thread_id, registers):
+        self.thread_id = thread_id
+        self.registers = registers
+
+
+class MemoryBlockSnapshot:
+    def __init__(self, address, data):
+        self.address = address
+        self.data = data
+
+
+class StateSnapshot:
+    def __init__(self, thread_snapshots, memory):
+        self.thread_snapshots = thread_snapshots
+        self.memory = memory
+        self.thread_id = None
+
+
+class RegisterInfo:
+    def __init__(self, lldb_index, name, bitsize, little_endian):
+        self.lldb_index = lldb_index
+        self.name = name
+        self.bitsize = bitsize
+        self.little_endian = little_endian
+
+
+BELOW_STACK_POINTER = 16384
+ABOVE_STACK_POINTER = 4096
+
+BLOCK_SIZE = 1024
+
+SOFTWARE_BREAKPOINTS = 0
+HARDWARE_BREAKPOINTS = 1
+WRITE_WATCHPOINTS = 2
+
+
+class ReverseTestBase(GDBProxyTestBase):
+    """
+    Base class for tests that need reverse execution.
+
+    This class uses a gdbserver proxy to add very limited reverse-
+    execution capability to lldb-server/debugserver for testing
+    purposes only.
+
+    To use this class, run the inferior forward until some stopping point.
+    Then call `start_recording()` and execute forward again until reaching
+    a software breakpoint; this class records the state before each execution executes.
+    At that point, the server will accept "bc" and "bs" packets to step
+    backwards through the state.
+    When executing during recording, we only allow single-step and continue without
+    delivering a signal, and only software breakpoint stops are allowed.
+
+    We assume that while recording is enabled, the only effects of instructions
+    are on general-purpose registers (read/written by the 'g' and 'G' packets)
+    and on memory bytes between [SP - BELOW_STACK_POINTER, SP + ABOVE_STACK_POINTER).
+    """
+
+    NO_DEBUG_INFO_TESTCASE = True
+
+    """
+    A list of StateSnapshots in time order.
+
+    There is one snapshot per single-stepped instruction,
+    representing the state before that instruction was
+    executed. The last snapshot in the list is the
+    snapshot before the last instruction was executed.
+    This is an undo log; we snapshot a superset of the state that may have
+    been changed by the instruction's execution.
+    """
+    snapshots = None
+    recording_enabled = False
+
+    breakpoints = None
+
+    pc_register_info = None
+    sp_register_info = None
+    general_purpose_register_info = None
+
+    def __init__(self, *args, **kwargs):
+        GDBProxyTestBase.__init__(self, *args, **kwargs)
+        self.breakpoints = [set(), set(), set(), set(), set()]
+
+    def respond(self, packet):
+        if not packet:
+          ...
[truncated]

Copy link

github-actions bot commented Feb 21, 2025

✅ With the latest revision this PR passed the Python code formatter.


# log all warnings to stderr
self._stderr_log_handler = logging.StreamHandler()
self._stderr_log_handler.setLevel(logging.DEBUG if self.TraceOn() else logging.WARNING)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply this if you want to dump traces unconditionally.

Suggested change
self._stderr_log_handler.setLevel(logging.DEBUG if self.TraceOn() else logging.WARNING)
self._stderr_log_handler.setLevel(logging.DEBUG)

@@ -635,10 +635,11 @@ bool Thread::SetupForResume() {
// what the current plan is.

lldb::RegisterContextSP reg_ctx_sp(GetRegisterContext());
if (reg_ctx_sp) {
ProcessSP process_sp(GetProcess());
if (reg_ctx_sp && process_sp && direction == eRunForward) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge conflict around here

@labath
Copy link
Collaborator Author

labath commented Mar 11, 2025

Ping. Jason, Adrian, I was waiting for you to submit this when you're ready to deal with the fallout, but maybe I should just submit that now?

@labath
Copy link
Collaborator Author

labath commented Mar 17, 2025

Submitting per #123945 (comment)

@labath labath merged commit 1b23719 into llvm:main Mar 17, 2025
10 checks passed
@labath labath deleted the rr branch March 17, 2025 15:06
@Michael137
Copy link
Member

Can confirm that the arm64 bots are failing again: https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/as-lldb-cmake/22265

TestReverseContinueWatchpoints.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.99git (https://github.com/llvm/llvm-project.git revision 52e7ca9279b4cbe30cacca67548347ef5f96b120)
  clang revision 52e7ca9279b4cbe30cacca67548347ef5f96b120
  llvm revision 52e7ca9279b4cbe30cacca67548347ef5f96b120

Watchpoint 1 hit:
old value: 2
new value: 1

Watchpoint 1 hit:
old value: 2
new value: 1

Attached the log file: test-failure.txt

Let me know what other info might be useful

@rocallahan
Copy link
Contributor

The logs show the problem here:

AssertionError: Ran command:
"thread list"

Got output:
Process 79916 stopped
* thread #1: tid = 0x4870aa, 0x0000000100e0ff94 a.out`trigger_watchpoint at main.c:7:60, stop reason = instruction step into

Expecting sub string: "stopped" (was found)
Expecting sub string: "trigger_watchpoint" (was found)
Expecting sub string: "stop reason = watchpoint 1" (was not found)

Basically we reverse-continue, hit a watchpoint, and that is correctly reported to the user as "stop reason = watchpoint 1".
Then we singlestep forward, which on x86(-64) triggers the watchpoint, but on Arm here LLDB is reporting it to the user as "instruction step into", i.e. not reporting the watchpoint being hit.

Maybe this is an x86 vs ARM issue and previous testing on ARM did not show it because we didn't run the watchpoint tests on ARM or for some other reason?

@rocallahan
Copy link
Contributor

My reverse-execution proxy code implements watchpoints during reverse execution more or less by reversing the x86 behavior: if we execute a sequence of instructions and instruction #N in the sequence modifies a watched memory location, then when reverse-executing, we stop when the PC is at the address of instruction N, i.e. with instruction N being the next instruction to be executed, and report that that watchpoint fired. (On x86 executing in the forward direction, normal debugger behavior is that execution stops with the PC set to the address of instruction N+1, i.e. after instruction N has been executed, because this is what the hardware does). This all works the same whether you're using continue or a sequence of single-steps (in either direction).

As I understand it, when ARM hardware hits a watchpoint, it stops with the PC at the address of instruction #N that did the memory access, i.e. the next instruction to be executed is #N again. So that's different from x86 and maybe that's the root of our problems here. I will start an ARM64 VM and do some experiments ... maybe tomorrow.

@rocallahan
Copy link
Contributor

On Linux ARM64 both gdb and lldb seem to adjust the hardware behavior so that single-stepping an instruction that fires a watchpoint leaves the PC pointing to the next instruction.

Maybe this is not true on Mac? I can't test that myself.

@rocallahan
Copy link
Contributor

These are the test log packets between the reverse-exec proxy to the debugserver for the forward singlestep operation:

2025-03-17 15:49:10,842 INFO     Sending packet vCont;s:4870aa
2025-03-17 15:49:10,843 INFO     Received reply T05thread:4870aa;threads:4870aa;thread-pcs:100e0ff94;00:0100000000000000;01:1037ff6e01000000;02:2037ff6e01000000;03:b037ff6e01000000;04:74444e8101000000;05:d87b80db01000000;06:636570745f746162;07:a006000000000000;08:0200000000000000;09:b834ff6e01000000;0a:000000003e196aff;0b:c002000000000000;0c:0080000000000000;0d:0000000000000010;0e:0400000000000000;0f:0080000000000000;10:e034ff6e01000000;11:e034ff6e01000000;12:0000000000000000;13:401c220101000000;14:34ffe00001000000;15:e034ff6e01000000;16:1019220101000000;17:6035ff6e01000000;18:a035ff6e01000000;19:bb85538101000000;1a:0000000000000000;1b:0000000000000000;1c:0000000000000000;1d:c034ff6e01000000;1e:68ffe00001000000;1f:b034ff6e01000000;20:94ffe00001000000;21:00100080;a1:0000000000000000;a2:220000cb;a3:00000000;metype:6;mecount:2;medata:1;medata:0;memory:0x16eff34c0=f036ff6e01000000e0904b8101000000;memory:0x16eff36f0=000000000000000000000000008045ff;

That reports metype:6, i.e. EXC_BREAKPOINT. I'm not sure if that is the right code for a watchpoint firing on this system. Whether it is or not, LLDB is definitely interpreting this as "instruction step into" and is not reporting to the user that a watchpoint fired. The proxy code should not be modifying this T packet so it would be surprising if the reverse-exec test code is causing this problem.
So I wonder whether, in normal operation on the test system, singlestepping an instruction which triggers a watchpoint actually reports the watchpoint being hit.

@rocallahan
Copy link
Contributor

@jasonmolenda Pavel Labath thought you might have some insight here. Thanks.

@Michael137
Copy link
Member

FYI, these are also failing on x86_64 on macOS: https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/lldb-cmake/10678/

Failed Tests (3):
  lldb-api :: functionalities/reverse-execution/TestReverseContinueBreakpoints.py
  lldb-api :: functionalities/reverse-execution/TestReverseContinueWatchpoints.py

Log file: x86-failures.txt

@Michael137
Copy link
Member

Seems to be something else though?

An exception happened when receiving the response from the gdb server. Closing the client...
An exception happened when receiving the response from the gdb server. Closing the client...
An exception happened when receiving the response from the gdb server. Closing the client...
An exception happened when receiving the response from the gdb server. Closing the client...
An exception happened when receiving the response from the gdb server. Closing the client...
An exception happened when receiving the response from the gdb server. Closing the client...
An exception happened when receiving the response from the gdb server. Closing the client...
An exception happened when receiving the response from the gdb server. Closing the client...

@labath
Copy link
Collaborator Author

labath commented Mar 18, 2025

Let me know what other info might be useful

It would be useful to know if you can reproduce this problem locally (because I can't -- the test passes on my mac). If you can't then it would be useful to get as much information about this buildbot as possible (what kind of debugserver it uses, OS versions and such), as it means there's something different about it.

On Linux ARM64 both gdb and lldb seem to adjust the hardware behavior so that single-stepping an instruction that fires a watchpoint leaves the PC pointing to the next instruction.

Maybe this is not true on Mac? I can't test that myself.

It works the same way on a mac. And the behavior is implemented on the client (it disables the watchpoint, single-steps, and then re-enables the watchpoint). I'm not entirely sure what happens (or what should happen) when running backwards to the watchpoint.

One more (possibly irrelevant) note: On lldb, this behavior is actually controlled by the watchpoint_exceptions_received:before/after response to the qHostInfo packet, which allows to to work with an "x86" machine which has the "arm" exception behavior (and vice versa). While somewhat weird, this is very handy when working with emulators, as it avoids the need to simulate the arm exception behavior on x86 (which is quite hard).

So I wonder whether, in normal operation on the test system, singlestepping an instruction which triggers a watchpoint actually reports the watchpoint being hit.

This definitely works (on my machine). Here's the relevant part of the log:

<lldb.process.gdb-remote.async>  <  18> send packet: $vCont;s:2d3138#57
<lldb.process.gdb-remote.async>  <1093> read packet: $T00thread:2d3138;threads:2d3138;thread-pcs:100003f88;00:0100000000000000;01:20f8df6f01000000;02:30f8df6f01000000;03:18f9df6f01000000;04:0100000000000000;05:0000000000000000;06:0000000000000000;07:200d000000000000;08:0600000000000000;09:50c149f701000000;0a:10f5de6f01000000;0b:60f5df6f01000000;0c:b011000000000000;0d:0040000001000000;0e:0040000000000000;0f:55af9a9b01000000;10:a4bfdf8d01000000;11:e0afd7ff01000000;12:0000000000000000;13:50c049f701000000;14:a0c049f701000000;15:50c049f701000000;16:b8f6df6f01000000;17:b8f6df6f01000000;18:0020a48d01000000;19:0000000000000000;1a:0000000000000000;1b:0000000000000000;1c:0000000000000000;1d:00f8df6f01000000;1e:7482a48d01000000;1f:c0f5df6f01000000;20:883f000001000000;21:00102060;a1:c8f5df6f01000000;a2:620000d2;a3:00000000;reason:watchpoint;description:3631373139313537323020302036313731393135373230;watch_addr:16fdff5c8;me_watch_addr:16fdff5c8;wp_hw_idx:0;wp_esr_iss:62;wp_esr_wpt:0;wp_esr_wptv:0;wp_esr_wpf:0;wp_esr_fnp:0;wp_esr_vncr:0;wp_esr_fnv:0;wp_esr_cm:1;wp_esr_wnr:1;wp_esr_dfsc:22;memory:0x16fdff800=00000000000000000000000000000000;#00
<lldb.process.internal-state(pid=24579)>         <  18> send packet: $z2,16fdff5c8,4#05
<lldb.process.internal-state(pid=24579)>         <   6> read packet: $OK#00
<lldb.process.gdb-remote.async>  <  18> send packet: $vCont;s:2d3138#57
<lldb.process.gdb-remote.async>  < 862> read packet: $T05thread:2d3138;threads:2d3138;thread-pcs:100003f8c;00:0100000000000000;01:20f8df6f01000000;02:30f8df6f01000000;03:18f9df6f01000000;04:0100000000000000;05:0000000000000000;06:0000000000000000;07:200d000000000000;08:0600000000000000;09:50c149f701000000;0a:10f5de6f01000000;0b:60f5df6f01000000;0c:b011000000000000;0d:0040000001000000;0e:0040000000000000;0f:55af9a9b01000000;10:a4bfdf8d01000000;11:e0afd7ff01000000;12:0000000000000000;13:50c049f701000000;14:a0c049f701000000;15:50c049f701000000;16:b8f6df6f01000000;17:b8f6df6f01000000;18:0020a48d01000000;19:0000000000000000;1a:0000000000000000;1b:0000000000000000;1c:0000000000000000;1d:00f8df6f01000000;1e:7482a48d01000000;1f:c0f5df6f01000000;20:8c3f000001000000;21:00100060;a1:0000000000000000;a2:220000cb;a3:00000000;metype:6;mecount:2;medata:1;medata:0;memory:0x16fdff800=00000000000000000000000000000000;#00
<lldb.process.internal-state(pid=24579)>         <  18> send packet: $Z2,16fdff5c8,4#e5
<lldb.process.internal-state(pid=24579)>         <   6> read packet: $OK#00
lldb.debugger.event-handler      <  16> send packet: $jThreadsInfo#c1
lldb.debugger.event-handler      <1111> read packet: $[{"tid":2961720,"metype":6,"medata":[1,0],"reason":"exception","qaddr":8446511552,"associated_with_dispatch_queue":true,"dispatch_queue_t":8446526400,"qname":"com.apple.main-thread","qkind":"serial","qserialnum":1,"registers":{"0":"0100000000000000","1":"20f8df6f01000000","2":"30f8df6f01000000","3":"18f9df6f01000000","4":"0100000000000000","5":"0000000000000000","6":"0000000000000000","7":"200d000000000000","8":"0600000000000000","9":"50c149f701000000","10":"10f5de6f01000000","11":"60f5df6f01000000","12":"b011000000000000","13":"0040000001000000","14":"0040000000000000","15":"55af9a9b01000000","16":"a4bfdf8d01000000","17":"e0afd7ff01000000","18":"0000000000000000","19":"50c049f701000000","20":"a0c049f701000000","21":"50c049f701000000","22":"b8f6df6f01000000","23":"b8f6df6f01000000","24":"0020a48d01000000","25":"0000000000000000","26":"0000000000000000","27":"0000000000000000","28":"0000000000000000","29":"00f8df6f01000000","30":"7482a48d01000000","31":"c0f5df6f01000000","32":"8c3f000001000000","33":"00100060"}],"memory":[{"address":6171916288,"bytes":"00000000000000000000000000000000"}]]}]]#00
lldb.debugger.event-handler      <  18> send packet: $z2,16fdff5c8,4#05
lldb.debugger.event-handler      <   6> read packet: $OK#00
lldb.debugger.event-handler      <  18> send packet: $x16fdff400,200#c7
lldb.debugger.event-handler      < 516> read packet: $000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000c0970d00000000000000000000000000c0978d01000000010000000000000050c049f7010000000100000000000000c0f5df6f010000006000a78d0100000000000000000000000000000000000000120000011a000000c41fe5000000000080c349f7010000000000000001000000000000000000000000000000000000000020a48d01000000b8f6df6f01000000c0f5df6f010000003cfca68d01000000a0c049f70100000050c049f70100000000f8df6f010000000600000000000000000000000000000000000000000000000000000000000000a0c049f70100000028f6df6f0100000030f6df6f01000000#00

Watchpoint 1 hit:
old value: 5
new value: 6
lldb.debugger.event-handler      <  18> send packet: $Z2,16fdff5c8,4#e5
lldb.debugger.event-handler      <   6> read packet: $OK#00
lldb.debugger.event-handler      < 216> send packet: $jThreadExtendedInfo:{"dti_qos_class_index":4,"dti_queue_index":20,"dti_voucher_index":28,"plo_pthread_tsd_base_address_offset":0,"plo_pthread_tsd_base_offset":224,"plo_pthread_tsd_entry_size":8,"thread":2961720}]#fd
lldb.debugger.event-handler      < 200> read packet: ${"tsd_address":8446511392,"requested_qos":{"enum_value":33,"constant_name":"QOS_CLASS_USER_INTERACTIVE","printable_name":"User Interactive"}],"pthread_t":8446511168,"dispatch_queue_t":8446526400}]#00

Interestingly, the first step operation is reported as reason:watchpoint and only the second step (when the watchpoint is disabled) returns mach exception data. I don't exactly know what this means, but it feels relevant.

I'm doing a fresh build of lldb now and I'm going to post the results of my test run for comparison.

@labath
Copy link
Collaborator Author

labath commented Mar 18, 2025

This is what I've got at 23743f5:

success.log

@Michael137
Copy link
Member

It would be useful to know if you can reproduce this problem locally (because I can't -- the test passes on my mac). If you can't then it would be useful to get as much information about this buildbot as possible (what kind of debugserver it uses, OS versions and such), as it means there's something different about it.

Yup can confirm it doesn't reproduce locally for me either. I don't have access to the buildbot machine but @adrian-prantl might?

Perhaps we can just dump this info as part of the test?

@adrian-prantl
Copy link
Collaborator

It would be useful to know if you can reproduce this problem locally (because I can't -- the test passes on my mac). If you can't then it would be useful to get as much information about this buildbot as possible (what kind of debugserver it uses, OS versions and such), as it means there's something different about it.

We are now hosting them on the same infrastructure that also hosts ci.swift.org, so I no longer have direct access to them either. What anyone can always do is land a temporary patch that adds more logging to the test.

Couple of questions I can answer:

  • green dragon uses the just-built debugserver (according to the make log output and my memory)
  • based on the CMake output, I think the machines run macOS 14.1
-- LLVM host triple: arm64-apple-darwin23.1.0
...
-- LLDB tests use just-built debug server

@adrian-prantl
Copy link
Collaborator

May I recommend we add a skip for macOS < 15 to get the bot green again?

@labath
Copy link
Collaborator Author

labath commented Mar 18, 2025

I've skipped them (I hope) with 6d38dbf. @rocallahan, it seems the only way to extract information from these bots is to commit PRs. Let me know what kind of information would help you and I'll try to find a way to get the test to do that.

@rocallahan
Copy link
Contributor

Maybe disabling tests for macOS < 15 would be OK even in the long term? There isn't any looming plan to actually use this code on macOS < 15. The main reason to run these tests on non-Linux is so to help developers catch it early if they break these tests. I guess most developers on Mac will be doing most of their testing on 15 or later?

@Michael137
Copy link
Member

Btw, the x86_64 macOS bots are still failing: https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/lldb-cmake/10698/execution/node/102/log/?consoleFull

Failed Tests (2):
  lldb-api :: functionalities/reverse-execution/TestReverseContinueBreakpoints.py
  lldb-api :: functionalities/reverse-execution/TestReverseContinueWatchpoints.py

In the logs I see:

An exception happened when receiving the response from the gdb server. Closing the client...          
An exception happened when receiving the response from the gdb server. Closing the client...          
An exception happened when receiving the response from the gdb server. Closing the client...          
An exception happened when receiving the response from the gdb server. Closing the client...          
An exception happened when receiving the response from the gdb server. Closing the client...          
An exception happened when receiving the response from the gdb server. Closing the client...          
An exception happened when receiving the response from the gdb server. Closing the client...          
An exception happened when receiving the response from the gdb server. Closing the client...

I also see this on my local x86_64 machine

Should we perhaps skip this on intel+macOS?

@labath
Copy link
Collaborator Author

labath commented Mar 19, 2025

Should we perhaps skip this on intel+macOS?

861efd4

Maybe disabling tests for macOS < 15 would be OK even in the long term? There isn't any looming plan to actually use this code on macOS < 15. The main reason to run these tests on non-Linux is so to help developers catch it early if they break these tests. I guess most developers on Mac will be doing most of their testing on 15 or later?

Maybe? I think it'd be nice to fix the x86 problem at least, which appears to be because debugserver reports some registers that its unable to read (ds at least). It'd be better if it didn't do that, but I think it'd make sense for the proxy to not fail on these errors (it could e.g. record the error it gets when reading the register, and forward it if lldb asks for it, but then skip restoring the state of the register).

As for the macOS 14 case, while I think it be nice to support that as well (I don't quite like the fact that the test is running on any of the bots, and it feels like we're very close to making that work), I think you've already done a lot more to make your tests run on platforms you aren't targetting than what other patches do, so it wouldn't be fair to expect you do to that.

@rocallahan
Copy link
Contributor

Btw, the x86_64 macOS bots are still failing: https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/lldb-cmake/10698/execution/node/102/log/?consoleFull

Where is this reported? I did not see it in any of the commit annotations in Github.

I think it'd make sense for the proxy to not fail on these errors (it could e.g. record the error it gets when reading the register, and forward it if lldb asks for it, but then skip restoring the state of the register).

Sure, I'll do that.

@rocallahan
Copy link
Contributor

(it could e.g. record the error it gets when reading the register, and forward it if lldb asks for it, but then skip restoring the state of the register).

If LLDB asks for it then we'll just forward that request to the debugserver which will (presumably) return an error which we forward back to LLDB. So we don't need to do anything there. All we need to do is, if there is an error reading the register, ignore it and don't include it in the snapshot, so we don't try to restore it.

@rocallahan
Copy link
Contributor

Created PR #132122.

@jasonmolenda
Copy link
Collaborator

Sorry for not commenting last week when this issue was live, but with regards to Apple silicon macs, there was a kernel behavior where watchpoints and hardware breakpoints were disabled by the kernel when it was single instruction stepping. I believe this was fixed in macOS 14.4, released March 2024. The fix was definitely in place by macOS 15.

@labath
Copy link
Collaborator Author

labath commented Mar 27, 2025

Thanks for chiming in Jason. It sounds like skipping for macos<15.0 was the right choice then. However, looking at the test decorators, I see they're currently set too broadly. I've created #133240 to tighten them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants