Skip to content

Conversation

@qxy11
Copy link

@qxy11 qxy11 commented Aug 25, 2025

Summary

Right now when the native process exists, we get a lost connection for the GPU target:

(lldb) target select 0
Current targets:
* target #0: /home/qxy11/llvm/Debug/a.out ( arch=x86_64-unknown-linux-gnu, platform=host, pid=242142, state=stopped )
  target #1: <none> ( arch=x86_64-unknown-linux-gnu, platform=host, pid=1234, state=running )
(lldb) c
Process 3805000 resuming
Process 3805000 exited with status = 0 (0x00000000) 
Process 1234 exited with status = -1 (0xffffffff) lost connection
(lldb) q

The desired behavior should be that the GPU connection returns an exit status when the native process exits, returning a $WXX packet. This change fixes this so that when the native process is exiting, it notifies the GPU plugin to exit as well.

This currently is done in the Mock GPU plugin, and sets the exit status for the GPU process to the same one as the native process, but we can extend and follow up on AMD once this is approved.

Tests

We can follow up with unit tests once the basic unit tests are landed from other PRs.

Basic test running until native process reached completion:

(lldb) c
Process 1234 resuming
(lldb) target select 0
Current targets:
* target #0: /home/qxy11/llvm/Debug/a.out ( arch=x86_64-unknown-linux-gnu, platform=host, pid=3805000, state=stopped )
  target #1: <none> ( arch=x86_64-unknown-linux-gnu, platform=host, pid=1234, state=running )
(lldb) c
Process 3805000 resuming
gpu_shlib_load
gpu_third_stop
gpu_shlib_load
gpu_kernel
Process 3805000 exited with status = 0 (0x00000000) 
Process 1234 exited with status = 0 (0x00000000) 
(lldb)

Check server logs:

1756162713.459808350 [3383979/3383979] gdb-server <  22> read packet: $vCont;c:p33a2ae.-1#9d
1756162713.459902287 [3383979/3383979] gdb-server <  61> send packet: $O6770755f73686c69625f6c6f61640d0a6770755f6b65726e656c0d0a#43
1756162713.460208416 [3383979/3383979] ProcessMockGPU::HandleNativeProcessExit() native process exited with status=(Exited with status 0)
1756162713.460271358 [3383979/3383979] mock-gpu.server <   7> send packet: $W00#b7
1756162713.460320950 [3383979/3383979] gdb-server <  22> send packet: $W00;process:33a2ae#ea
lldb-server exiting...

As expected, the both processes send back $W00 packets now. The mock-gpu.server packet doesn't include the process ID since it doesn't have multi-process support enabled.

Test killing the process:

(lldb) target select 0
Current targets:
* target #0: /home/qxy11/llvm/Debug/a.out ( arch=x86_64-unknown-linux-gnu, platform=host, pid=3879593, state=stopped )
  target #1: <none> ( arch=x86_64-unknown-linux-gnu, platform=host, pid=1234, state=running )
(lldb) process kill
Process 1234 exited with status = 9 (0x00000009) 
Process 3879593 exited with status = 9 (0x00000009) killed
(lldb)  

Test native process segfaults and exits:

(lldb) intern-state     pid = 2581667, SyncState::SetStateStopped(stop_id=4) m_stop_id = 4, m_state = stopped 
intern-state     pid = 2581667, SyncState::DidResume() m_stop_id = 4, m_state = running
intern-state     pid = 2581667, SyncState::SetStateStopped(stop_id=5) m_stop_id = 5, m_state = stopped 
Process 2581667 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSEGV: address not mapped to object (fault address=0x0)
    frame #0: 0x00005555555551e7 a.out`main(argc=1, argv=0x00007fffffffd6a8) at memory-space-main.c:24:6
   21     gpu_initialize();
   22     // CPU BREAKPOINT - BEFORE LAUNCH
   23     int *p = NULL;
-> 24     *p = 42;
   25     gpu_shlib_load();
   26     gpu_third_stop();
   27     gpu_shlib_load();
Likely cause: p accessed 0x0
(lldb) c
lldb             pid = 2581667, SyncState::DidResume() m_stop_id = 5, m_state = running
Process 2581667 resuming
Process 2581667 exited with status = 11 (0x0000000b) 
Process 1234 exited with status = 11 (0x0000000b) 
(lldb) 

@dmpots dmpots requested review from clayborg and walter-erquinigo and removed request for clayborg August 26, 2025 18:30
Copy link
Collaborator

@dmpots dmpots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

void SetLaunchInfo(ProcessLaunchInfo &launch_info);

/// Called when the native process exits to set the GPU process exit status
void HandleNativeProcessExit(const WaitStatus &exit_status);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mark this as override

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not an override. The only mandatory interface in this patch is at the plugin level.

Copy link
Collaborator

@walter-erquinigo walter-erquinigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minimal changes left. Thank you for doing this :)

Comment on lines 1144 to 1149
auto exit_status = process->GetExitStatus();
if (exit_status.has_value()) {
for (auto &plugin_up : m_plugins) {
plugin_up->NativeProcessDidExit(*exit_status);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use auto unless the type is unreadable. In this case it should be very readable

also, the if/for statements are very simple, so you should remove braces. See https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

/// GPU plugins to perform proper termination
///
/// \param[in] exit_status The exit status of the native process.
virtual void NativeProcessDidExit(const WaitStatus &exit_status) {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this pure virtual. I think we shuoldn't have a default implementation for this because termination must be handled by each plugin properly

/// Get the GPU plug-in notified when the native process exits.
///
/// This function will get called when the native process exits. This allows
/// GPU plugins to perform proper termination
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing period

Comment on lines 171 to 173
if (auto *mock_gpu_process = static_cast<ProcessMockGPU *>(gpu_process)) {
mock_gpu_process->HandleNativeProcessExit(exit_status);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove braces

@qxy11 qxy11 requested a review from walter-erquinigo August 28, 2025 17:06
// Notify GPU plugins that the native process has exited
std::optional<WaitStatus> exit_status = process->GetExitStatus();
if (exit_status.has_value())
for (auto &plugin_up : m_plugins) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use auto here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code should look like this

// Notify server plugins that the native process has exited
  std::optional<WaitStatus> exit_status = process->GetExitStatus();
  if (exit_status.has_value())
    for (std::unique_ptr<lldb_server::LLDBServerPlugin> &plugin_up : m_plugins)
      plugin_up->NativeProcessDidExit(*exit_status);

Comment on lines 304 to 305
// Handle exiting the GPU process when a native process exits.
virtual void HandleNativeProcessExit(const WaitStatus &exit_status) {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't make sense for CPU processes.
Could you extend NativeProcessProtocol as a new class GPUProcessProtocol that has this additional method?
That would leave CPU process clean.
Then, make this function pure virtual

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed my mind. You don't need this function at all. It's enough to add NativeProcessDidExit to LLDBServerPlugin.h. Each plugin should decide how they want to manage this high level event.

@walter-erquinigo
Copy link
Collaborator

@qxy11 , I've copied most of your changes from this PR onto my own nvidia branch and it works for me :)

Summary:
Have each plugin process decide how they want to handle native process exit.
walter-erquinigo added a commit that referenced this pull request Sep 30, 2025
This applies the WIP PR
#38 to our branch, which
ensures that whenever the CPU exits, the GPU also reports its exit with
the same exit code.
walter-erquinigo added a commit that referenced this pull request Oct 9, 2025
This applies the WIP PR
#38 to our branch, which
ensures that whenever the CPU exits, the GPU also reports its exit with
the same exit code.
@dmpots dmpots merged commit 08b5c9a into clayborg:llvm-server-plugins Oct 14, 2025
5 checks passed
walter-erquinigo added a commit that referenced this pull request Oct 24, 2025
This applies the WIP PR
#38 to our branch, which
ensures that whenever the CPU exits, the GPU also reports its exit with
the same exit code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants