Skip to content

Segfault when attaching profiler #80683

@ogxd

Description

@ogxd

Description

See initial thread: dotnet/diagnostics#3600

While developing CLR profilers, I encountered a segfault that happens systematically when attaching for the second time to a dotnet process in Linux (Debian 11). I don't know if this because it's running in docker or not.

I managed to take a core dump that you can get here: segfault_coredump.zip

Using lldb + sos, I was able to load the symbols and here the stacktrace for the thread that ends in segfault:

* thread #1, name = 'dotnet', stop reason = signal SIGSEGV
  * frame #0: 0x00007f8c755eb06c libcoreclr.so`EESocketCleanupHelper(bool) [inlined] InterlockedOr(Destination=0x0000000000000008, Value=64) at pal.h:3748:1
    frame #1: 0x00007f8c755eb06c libcoreclr.so`EESocketCleanupHelper(bool) [inlined] Thread::SetThreadState(this=0x0000000000000000, ts=TS_ExecutingOnAltStack) at threads.h:1060
    frame #2: 0x00007f8c755eb06c libcoreclr.so`EESocketCleanupHelper(bool) [inlined] Thread::SetExecutingOnAltStack(this=0x0000000000000000) at threads.h:1257
    frame #3: 0x00007f8c755eb06c libcoreclr.so`EESocketCleanupHelper(isExecutingOnAltStack=<unavailable>) at ceemain.cpp:560
    frame #4: 0x00007f8c756020b8 libcoreclr.so`sigsegv_handler(int, siginfo_t*, void*) [inlined] invoke_previous_action(action=<unavailable>, code=11, siginfo=0x00007f8c73fd1bf0, context=0x00007f8c73fd1ac0, signalRestarts=true) at signal.cpp:430:5
    frame #5: 0x00007f8c7560204e libcoreclr.so`sigsegv_handler(code=11, siginfo=0x00007f8c73fd1bf0, context=0x00007f8c73fd1ac0) at signal.cpp:639
    frame #6: 0x00007f8c75cf1140 libpthread.so.0`__restore_rt
    frame #7: 0x00007f8c75d0de7a ld-linux-x86-64.so.2`___lldb_unnamed_symbol38$$ld-linux-x86-64.so.2 + 10
    frame #8: 0x00007f8c75d0e3a4 ld-linux-x86-64.so.2`___lldb_unnamed_symbol39$$ld-linux-x86-64.so.2 + 932
    frame #9: 0x00007f8c75d0ece1 ld-linux-x86-64.so.2`___lldb_unnamed_symbol40$$ld-linux-x86-64.so.2 + 289
    frame #10: 0x00007f8c7590e3a4 libc.so.6`___lldb_unnamed_symbol1115$$libc.so.6 + 116
    frame #11: 0x00007f8c75cd93b4 libdl.so.2`___lldb_unnamed_symbol6$$libdl.so.2 + 20
    frame #12: 0x00007f8c7590ea90 libc.so.6`_dl_catch_exception + 128
    frame #13: 0x00007f8c7590eb4f libc.so.6`_dl_catch_error + 47
    frame #14: 0x00007f8c75cd9a65 libdl.so.2`___lldb_unnamed_symbol11$$libdl.so.2 + 101
    frame #15: 0x00007f8c75cd941c libdl.so.2`dlsym + 92
    frame #16: 0x00007f8c7560e44c libcoreclr.so`::GetProcAddress(hModule=0x00007f8be8002790, lpProcName="") at module.cpp:333:33
    frame #17: 0x00007f8c754d56de libcoreclr.so`FakeCoCreateInstanceEx(_GUID const&, char16_t const*, _GUID const&, void**, void**) [inlined] (anonymous namespace)::FakeCoCallDllGetClassObject(rclsid=<unavailable>, wszDllPath=u"/tmp/dr-dotnet/libprofilers.so", riid=<unavailable>, ppv=0x00007f8c747d13c8, phmodDll=<unavailable>) at util.cpp:224:68
    frame #18: 0x00007f8c754d5639 libcoreclr.so`FakeCoCreateInstanceEx(rclsid=0x00007f8be800794c, wszDllPath=u"/tmp/dr-dotnet/libprofilers.so", riid=<unavailable>, ppv=0x00007f8c747d1678, phmodDll=0x00007f8c747d1658) at util.cpp:292
    frame #19: 0x00007f8c75319bea libcoreclr.so`EEToProfInterfaceImpl::CreateProfiler(_GUID const*, char const*, char16_t const*) [inlined] CoCreateProfiler(pClsid=<unavailable>, szClsid=<unavailable>, wszProfileDLL=<unavailable>, ppCallback=<unavailable>, phmodProfilerDLL=0x00007f8c747d1658) at eetoprofinterfaceimpl.cpp:293:10
    frame #20: 0x00007f8c75319b9e libcoreclr.so`EEToProfInterfaceImpl::CreateProfiler(this=0x00007f8be8007590, pClsid=<unavailable>, szClsid="{805a308b-061c-47f3-9b30-f785c3186e82}", wszProfileDLL=<unavailable>) at eetoprofinterfaceimpl.cpp:667
    frame #21: 0x00007f8c75319908 libcoreclr.so`EEToProfInterfaceImpl::Init(this=0x00007f8be8007590, pProfToEE=0x00007f8be8007380, pClsid=0x00007f8be800794c, szClsid="{805a308b-061c-47f3-9b30-f785c3186e82}", wszProfileDLL=u"/tmp/dr-dotnet/libprofilers.so", fLoadedViaAttach=<unavailable>, dwConcurrentGCWaitTimeoutInMs=10) at eetoprofinterfaceimpl.cpp:581:14
    frame #22: 0x00007f8c75388d21 libcoreclr.so`ProfilingAPIUtility::DoPreInitialization(pEEProf=0x00007f8be8007590, pClsid=0x00007f8be800794c, szClsid="{805a308b-061c-47f3-9b30-f785c3186e82}", wszProfilerDLL=u"/tmp/dr-dotnet/libprofilers.so", loadType=kAttachLoad, dwConcurrentGCWaitTimeoutInMs=10) at profilinghelper.cpp:989:19
    frame #23: 0x00007f8c75388779 libcoreclr.so`ProfilingAPIUtility::LoadProfiler(loadType=kAttachLoad, pClsid=0x00007f8be800794c, szClsid="{805a308b-061c-47f3-9b30-f785c3186e82}", wszProfilerDLL=u"/tmp/dr-dotnet/libprofilers.so", pvClientData=0x00007f8be80014aa, cbClientData=<unavailable>, dwConcurrentGCWaitTimeoutInMs=10) at profilinghelper.cpp:1133:10
    frame #24: 0x00007f8c755e0a74 libcoreclr.so`ds_profiler_protocol_helper_handle_ipc_message(_DiagnosticsIpcMessage*, _DiagnosticsIpcStream*) [inlined] ProfilingAPIUtility::LoadProfilerForAttach(pClsid=<unavailable>, wszProfilerDLL=u"/tmp/dr-dotnet/libprofilers.so", pvClientData=0x00007f8be80014aa, cbClientData=37, dwConcurrentGCWaitTimeoutInMs=10) at profilinghelper.inl:183:12
    frame #25: 0x00007f8c755e0a47 libcoreclr.so`ds_profiler_protocol_helper_handle_ipc_message(_DiagnosticsIpcMessage*, _DiagnosticsIpcStream*) at ds-rt-coreclr.h:294
    frame #26: 0x00007f8c755e0a47 libcoreclr.so`ds_profiler_protocol_helper_handle_ipc_message(_DiagnosticsIpcMessage*, _DiagnosticsIpcStream*) [inlined] profiler_protocol_helper_attach_profiler(message=<unavailable>, stream=0x00007f8be8007570) at ds-profiler-protocol.c:129
    frame #27: 0x00007f8c755e0938 libcoreclr.so`ds_profiler_protocol_helper_handle_ipc_message(message=<unavailable>, stream=0x00007f8be8007570) at ds-profiler-protocol.c:269
    frame #28: 0x00007f8c755dbfb1 libcoreclr.so`server_thread(data=<unavailable>) at ds-server.c:167:4
    frame #29: 0x00007f8c7563bfee libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x000055ec7ccbb580) at thread.cpp:1829:16
    frame #30: 0x00007f8c75ce5ea7 libpthread.so.0`start_thread + 215
    frame #31: 0x00007f8c758d4a2f libc.so.6`__clone + 63

Reproduction Steps

Reproduction is currently a little complex, but I believe it could be reproduced by attaching twice in linux (eg: tweaking the CLR GC profiler test to attach, detach and reattach).

Expected behavior

Attach without segfault

Actual behavior

Segfault and crash the profiler application

Regression?

No response

Known Workarounds

No response

Configuration

The bug was observed under net 6.0 and net 7.0. I haven't tested on earlier versions.
Linux version is Debian 11 (the one that is shipped with the CLR on Microsoft docker repository)

Other information

No response

Metadata

Metadata

Assignees

Labels

area-Diagnostics-coreclrquestionAnswer questions and provide assistance, not an issue with source code or documentation.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions