Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WKS::gc_heap::make_unused_array seg fault #711

Closed
dlewis-arcontech opened this issue Feb 19, 2021 · 40 comments
Closed

WKS::gc_heap::make_unused_array seg fault #711

dlewis-arcontech opened this issue Feb 19, 2021 · 40 comments
Labels
area-NativeAOT-coreclr .NET runtime optimized for ahead of time compilation

Comments

@dlewis-arcontech
Copy link

dlewis-arcontech commented Feb 19, 2021

Hello I am building a C POD dll/so for C# code using Native AOT. The C# code also then uses a C++/C dll/so. When I run the code under windows for Native AOT/.Net Core and Framework it all works fine. Under linux it works fine for .Net Core but I get a seg fault with AOT. I've built as debug and got a back track with GDB. I can get it happen pretty quickly (with a few minutes run). Back trace is:

#0 0x00007ffff61604b7 in WKS::gc_heap::make_unused_array (
x=0x8a3e30 "\340\021\376\366\377\177", size=140737332173216, clearp=0,
resetp=)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:27805
#1 0x00007ffff61927f1 in fix_allocation_context (acontext=0x8a3ce0,
for_gc_p=, record_ac_p=1)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:6913
#2 WKS::GCHeap::FixAllocContext (this=, context=0x8a3ce0,
arg=0x1, heap=)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:40955
#3 0x00007ffff615224d in GCToEEInterface::GcEnumAllocContexts (
fn=0x7ffff61606c0 <WKS::fix_alloc_context(gc_alloc_context*, void*)>,
param=0x7fffbdf8e630)
at /__w/1/s/src/coreclr/nativeaot/Runtime/gcrhscan.cpp:104
#4 0x00007ffff6178a57 in fix_allocation_contexts (for_gc_p=1)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:6994
#5 WKS::gc_heap::garbage_collect (n=0)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:20036
#6 0x00007ffff6169c70 in WKS::GCHeap::GarbageCollectGeneration (
this=, gen=0, reason=reason_alloc_soh)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:41936
#7 0x00007ffff616be37 in WKS::gc_heap::try_allocate_more_space (
acontext=, size=, flags=,
---Type to continue, or q to quit---
gen_number=)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:15841
#8 0x00007ffff61924c0 in allocate_more_space (acontext=0x7fffb8000c10,
flags=0, alloc_generation_number=0, size=)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:16343
#9 allocate (jsize=48, acontext=0x7fffb8000c10, flags=0)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:16374
#10 WKS::GCHeap::Alloc (this=, context=0x7fffb8000c10, size=48,
flags=0) at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:40912
#11 0x00007ffff61aff28 in RhpNewObject ()
at /__w/1/s/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosamd64.inc:435
#12 0x00007ffff626a8fd in arcontech_capi_Arcontech_DotNetAPI_RecordWatcher__OnUpdate (this=..., databook=..., databookStatus=DatabookUnavailable,
replaceAll=false, fields=...)
at /opt/ARCONTECH/svn/AKB-2675/Arcontech.DotNetAPI/RecordWatcher.cs:445
#13 0x00007ffff626c8b4 in arcontech_capi_Arcontech_DotNetAPI_RecordWatcher__Arcontech_DotNetAPI_FeedApi_IFeedInstrumentWatcher_Update (this=..., iItem=...)
at /opt/ARCONTECH/svn/AKB-2675/Arcontech.DotNetAPI/RecordWatcher.cs:919
#14 0x00007ffff62a5975 in arcontech_capi_Arcontech_DotNetAPI_FeedApi_FeedHandlerInstrumentWatcher__ProcessBatch (this=..., receptionCache=...)
at /opt/ARCONTECH/svn/AKB-2675/Arcontech.DotNetAPI/FeedApi/FeedHandlerInstrumentWatcher.cs:112
#15 0x00007ffff6612179 in __Arcontech_DotNetAPI_FeedApi_IFeedQueueItem_DispatchM---Type to continue, or q to quit---
essage (this=..., receptionCache=...)
at /opt/ARCONTECH/svn/AKB-2675/Arcontech.DotNetAPI/FeedApi/FeedHandlerBase.cs:924
#16 0x00007ffff62d926c in arcontech_capi_Arcontech_DotNetAPI_FeedApi_FeedHandlerQueueDispatcher_QueueItem__Dispatch (this=...)
at /opt/ARCONTECH/svn/AKB-2675/Arcontech.DotNetAPI/FeedApi/FeedHandlerQueueDispatcher.cs:38
#17 0x00007ffff62ab048 in arcontech_capi_Arcontech_DotNetAPI_FeedApi_FeedHandlerQueueDispatcher__ProcessQueue (this=...)
at /opt/ARCONTECH/svn/AKB-2675/Arcontech.DotNetAPI/FeedApi/FeedHandlerQueueDispatcher.cs:147
#18 0x00007ffff645bef7 in S_P_CoreLib_System_Threading_Thread_StartHelper__RunWorker (this=...)
at //src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs:68
#19 0x00007ffff645be78 in S_P_CoreLib_System_Threading_Thread_StartHelper__Run
(this=...)
at /
/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs:54
#20 0x00007ffff6370947 in S_P_CoreLib_System_Threading_Thread__StartThread (
parameter=140737352128784)
at //src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Thread.CoreRT.cs:430
---Type to continue, or q to quit---
#21 0x00007ffff6370e80 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint
(parameter=140737352128784)
at /
/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Thread.CoreRT.Unix.cs:111
#22 0x00007ffff58b5e65 in start_thread () from /lib64/libpthread.so.0
#23 0x00007ffff70e888d in clone () from /lib64/libc.so.6

@jkotas
Copy link
Member

jkotas commented Feb 19, 2021

This crash is GC heap corruption. It can be caused by a bug in the runtime; or a bug in your interop code.

size=140737332173216 passed to gc_heap::make_unused_array looks wrong. Tracing down where it came from would be the first step.

Any chance you can share the crash dump with symbols for us to take a look? If it would be ok to share it with just me, my email is in my github profile.

@jkotas jkotas added the area-NativeAOT-coreclr .NET runtime optimized for ahead of time compilation label Feb 19, 2021
@dlewis-arcontech
Copy link
Author

Hello Jan. Thanks I've email you, happy to share the crash dump and symbols.

@jkotas
Copy link
Member

jkotas commented Feb 24, 2021

Thank you for sharing the crash dumps offline. The crash is caused by a dead thread registered in ThreadStore::m_ThreadList. This list is meant to contain actively running threads only. Threads are supposed to remove themself from the list as they are dying using ThreadStore::DetachCurrentThread. The offending thread did not remove itself for some reason.

I am still trying to find out what might have caused this thread to not remove itself from this list.

It would be useful to know whether the missing call to ThreadStore::DetachCurrentThread is intermittent or consistent condition. Could you please try to add code that creates thread that immediately exits to someplace early in your app (e.g. new Thread(() => { }).Start();), set breakpoint at ThreadStore::DetachCurrentThread in gdb, see whether it is ever called in your app before it crashes?

@dlewis-arcontech
Copy link
Author

Hello Jan, I've added the code you mentioned just at the start of the run and tried to set a break in gdb, I'm not as familiar with gdb as I am with debugging on windows so I might have made a mistake. My first attempt gets a Segmentation fault straight away before the code really gets going:

(gdb) break ThreadStore::DetachCurrentThread
Function "ThreadStore::DetachCurrentThread" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y

Breakpoint 1 (ThreadStore::DetachCurrentThread) pending.
(gdb) run
Starting program: /opt/ARCONTECH/svn/cpp_net_core/build/samples-debug/./samples
*** record_watcher_sample ***
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffd58ad700 (LWP 5189)]
[New Thread 0x7fffd50ac700 (LWP 5190)]
[New Thread 0x7fffd48ab700 (LWP 5191)]
[Thread 0x7fffd50ac700 (LWP 5190) exited]
[New Thread 0x7fffd50ac700 (LWP 5192)]
[New Thread 0x7fffc7fff700 (LWP 5193)]
[New Thread 0x7fffbf74a700 (LWP 5194)]
[New Thread 0x7fffbef49700 (LWP 5195)]

Program received signal SIGSEGV, Segmentation fault.
GetNext (this=0x7fffffffd418)
at /__w/1/s/src/coreclr/nativeaot/Runtime/threadstore.cpp:57
57 /__w/1/s/src/coreclr/nativeaot/Runtime/threadstore.cpp: No such file or directory.

Then backtrace shows:

#0 GetNext (this=0x7fffffffd418)
at /__w/1/s/src/coreclr/nativeaot/Runtime/threadstore.cpp:57
#1 ThreadStore::SuspendAllThreads (this=,
waitForGCEvent=, fireDebugEvent=)
at /__w/1/s/src/coreclr/nativeaot/Runtime/threadstore.cpp:253
#2 0x00007ffff6169c87 in WKS::GCHeap::GarbageCollectGeneration (
this=, gen=0, reason=reason_alloc_soh)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:41900
#3 0x00007ffff616bf07 in WKS::gc_heap::try_allocate_more_space (
acontext=, size=, flags=,
gen_number=)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:15841
#4 0x00007ffff6192590 in allocate_more_space (acontext=0x60ff40, flags=0,
alloc_generation_number=0, size=)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:16343
#5 allocate (jsize=32, acontext=0x60ff40, flags=0)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:16374
#6 WKS::GCHeap::Alloc (this=, context=0x60ff40, size=32,
flags=0) at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:40912
#7 0x00007ffff61afff8 in RhpNewObject ()
at /__w/1/s/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosamd64.inc:435
#8 0x00007ffff6460397 in Flt2LngOvf ()
from /opt/ARCONTECH/svn/cpp_net_core/build/samples-debug/arcontech_capi.so
---Type to continue, or q to quit---
#9 0x00007ffff64604ed in Flt2LngOvf ()
from /opt/ARCONTECH/svn/cpp_net_core/build/samples-debug/arcontech_capi.so
#10 0x00007ffff64601f4 in Flt2LngOvf ()
from /opt/ARCONTECH/svn/cpp_net_core/build/samples-debug/arcontech_capi.so
#11 0x00007ffff6387c94 in S_P_CoreLib_System_Runtime_InteropServices_PInvokeMarshal__GetDelegateForFunctionPointer (ptr=4214936, delegateType=...)
at //src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/InteropServices/PInvokeMarshal.cs:248
#12 0x00007ffff6388628 in S_P_CoreLib_System_Runtime_InteropServices_Marshal__GetDelegateForFunctionPointerInternal (ptr=4214936, t=...)
at /
/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/InteropServices/Marshal.CoreRT.cs:202
#13 0x00007ffff6729b43 in S_P_CoreLib_System_Runtime_InteropServices_Marshal__GetDelegateForFunctionPointer_0<System___Canon> (ptr=4214936)
at /_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/Marshal.cs:943
#14 0x00007ffff672ba3b in arcontech_capi_Arcontech_DotNetAPI_FeedApi_GenericMarshall__LoadEntryPoint<System___Canon> (addr=4214936)
at /opt/ARCONTECH/svn/AKB-2675/Arcontech.DotNetAPI/FeedApi/Feed32.cs:21
#15 0x00007ffff65293e7 in arcontech_capi_Arcontech_DotNetAPI_NetcoreCppApi_NetcoreObjectInterface__CreateRecordWatcherNative (instrumentName=7232424,
statusChanged=4214899, finished=4214936, update=4214862,
updateAction=4214828, updateStart=4214794, context=7232480,
---Type to continue, or q to quit---
conflate=false, datasetName=6348840)
at /opt/ARCONTECH/svn/AKB-2675/Arcontech.DotNetAPI/NetcoreCppApi/NetcoreObjectInterface.cs:200
#16 0x0000000000407fa1 in capi_interop::invoke(char const*, char const*, bool, void const*, void ()(void const, int), void ()(void const, int), void ()(void const, CAPI_RECORD_UPDATE const*), void ()(void const), void ()(void const, CAPI_STATUS_UPDATE const*)) ()
#17 0x0000000000404f43 in arcontech::city_vision::cpp_api::record_watcher::impl::open() ()
#18 0x0000000000404bc8 in arcontech::city_vision::cpp_api::record_watcher::open() ()
#19 0x0000000000402651 in record_watcher_multiple_sample(std::string const&, std::string const&, int, int) ()
#20 0x0000000000401fdb in main ()

@jkotas
Copy link
Member

jkotas commented Feb 24, 2021

This crash is different symptom of the same problem (dead thread in active thread list).

Could you please set breakpoint at TlsDestructionMonitor::~TlsDestructionMonitor to see whether it is getting executed?

If TlsDestructionMonitor::~TlsDestructionMonitor is not getting hit either, could you please set a breakpoint at __nptl_deallocate_tsd to see whether it is getting executed?

@jkotas
Copy link
Member

jkotas commented Feb 24, 2021

To provide more context - the path to ThreadStore::DetachCurrentThread looks like this:

#0  ThreadStore::DetachCurrentThread () at /__w/1/s/src/coreclr/nativeaot/Runtime/threadstore.cpp:160
#1  0x00007f3b600b5dc1 in TlsDestructionMonitor::~TlsDestructionMonitor (this=<optimized out>)
    at /__w/1/s/src/coreclr/nativeaot/Runtime/unix/PalRedhawkUnix.cpp:409
#2  0x00007f3b600d67b9 in ?? () from ./bin/Debug/net5.0/linux-x64/native/NativeLibrary.so
#3  0x00007f3b5f649ca2 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#4  0x00007f3b5f649eb3 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3b60a1c9fd in clone () from /lib64/libc.so.6

We need to find out where things are getting derailed on this path.

@dlewis-arcontech
Copy link
Author

Thanks, just running it now with gdb told to:

break TlsDestructionMonitor::~TlsDestructionMonitor

I'm guessing this is the correct command syntax?

@jkotas
Copy link
Member

jkotas commented Feb 24, 2021

Yes, that's the right way to set the breakpoint.

@jkotas
Copy link
Member

jkotas commented Feb 24, 2021

If you are able to stop at __nptl_deallocate_tsd, there is an indirect call within the method that looks like this: 0x7f3b5f649ca0 <__nptl_deallocate_tsd+144>: callq *%rdx. You can set a breakpoint at this instruction using break *0x7f3b5f649ca0, and then use si to single step into the call.

@dlewis-arcontech
Copy link
Author

Ok thanks I'll try and send an update. I've been on another project most of the day and getting my machine back to the state to be able to rerun the test is taking a while. Hopefully I'll get there soon!

@dlewis-arcontech
Copy link
Author

Didn't get a break point in break TlsDestructionMonitor::~TlsDestructionMonitor before the segmentation fault. Trying __nptl_deallocate_tsd next.

@dlewis-arcontech
Copy link
Author

Got a break point in __nptl_deallocate_tsd, bt is:

#0 0x00007ffff58b5bd0 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#1 0x00007ffff58b5e73 in start_thread () from /lib64/libpthread.so.0
#2 0x00007ffff70e888d in clone () from /lib64/libc.so.6

@dlewis-arcontech
Copy link
Author

Think I'm hitting my lack of gdb knowledge, after hitting __nptl_deallocate_tsd I tried break *0x7f3b5f649ca0, then I thought I'd have to continue until <__nptl_deallocate_tsd+144>: callq *%rdx and then single step after that 2nd breakpoint but I'm getting this:

(gdb) break *0x7f3b5f649ca0
Breakpoint 2 at 0x7f3b5f649ca0
(gdb) continue
Continuing.
Warning:
Cannot insert breakpoint 2.
Error accessing memory address 0x7f3b5f649ca0: Input/output error.

0x00007ffff58b5bd2 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0

@jkotas
Copy link
Member

jkotas commented Feb 24, 2021

The address to set the breakpoint on will be different on your machine. You can find it from disassembly e.g. by running x /50i $pc when stopped at the start of __nptl_deallocate_tsd.

Here is the transcript of what I have executed on my test app that just creates threads. It transcript shows that the call always goes into NativeLibrary.so in my testapp:

(gdb) break __nptl_deallocate_tsd
Function "__nptl_deallocate_tsd" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (__nptl_deallocate_tsd) pending.
(gdb) r
Starting program: /repro/NativeLibrary/./a.out
Breakpoint 1, 0x00007fa4597e0c10 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0

Missing separate debuginfos, use: debuginfo-install glibc-2.17-323.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64

(gdb) x /50i $pc
=> 0x7fa4597e0c10 <__nptl_deallocate_tsd>:      xor    %eax,%eax
   0x7fa4597e0c12 <__nptl_deallocate_tsd+2>:    mov    %fs:0x610,%al
   0x7fa4597e0c1a <__nptl_deallocate_tsd+10>:   test   %al,%al
   0x7fa4597e0c1c <__nptl_deallocate_tsd+12>:   je     0x7fa4597e0d71 <__nptl_deallocate_tsd+353>
   0x7fa4597e0c22 <__nptl_deallocate_tsd+18>:   push   %r15
   0x7fa4597e0c24 <__nptl_deallocate_tsd+20>:   push   %r14
   0x7fa4597e0c26 <__nptl_deallocate_tsd+22>:   mov    $0x4,%r14d
   0x7fa4597e0c2c <__nptl_deallocate_tsd+28>:   push   %r13
   0x7fa4597e0c2e <__nptl_deallocate_tsd+30>:   push   %r12
   0x7fa4597e0c30 <__nptl_deallocate_tsd+32>:   push   %rbp
   0x7fa4597e0c31 <__nptl_deallocate_tsd+33>:   push   %rbx
   0x7fa4597e0c32 <__nptl_deallocate_tsd+34>:   sub    $0x8,%rsp
   0x7fa4597e0c36 <__nptl_deallocate_tsd+38>:   movb   $0x0,%fs:0x610
   0x7fa4597e0c3f <__nptl_deallocate_tsd+47>:   lea    0x20f6c2(%rip),%r13        # 0x7fa4599f0308 <__pthread_keys+8>
   0x7fa4597e0c46 <__nptl_deallocate_tsd+54>:   xor    %r12d,%r12d
   0x7fa4597e0c49 <__nptl_deallocate_tsd+57>:   nopl   0x0(%rax)
   0x7fa4597e0c50 <__nptl_deallocate_tsd+64>:   mov    %fs:0x510(,%r12,8),%rbx
   0x7fa4597e0c59 <__nptl_deallocate_tsd+73>:   test   %rbx,%rbx
   0x7fa4597e0c5c <__nptl_deallocate_tsd+76>:   je     0x7fa4597e0cb0 <__nptl_deallocate_tsd+160>
   0x7fa4597e0c5e <__nptl_deallocate_tsd+78>:   add    $0x8,%rbx
   0x7fa4597e0c62 <__nptl_deallocate_tsd+82>:   mov    %r13,%rbp
   0x7fa4597e0c65 <__nptl_deallocate_tsd+85>:   mov    $0x20,%r15d
   0x7fa4597e0c6b <__nptl_deallocate_tsd+91>:   jmp    0x7fa4597e0c7e <__nptl_deallocate_tsd+110>
   0x7fa4597e0c6d <__nptl_deallocate_tsd+93>:   nopl   (%rax)
   0x7fa4597e0c70 <__nptl_deallocate_tsd+96>:   add    $0x10,%rbx
   0x7fa4597e0c74 <__nptl_deallocate_tsd+100>:  add    $0x10,%rbp
   0x7fa4597e0c78 <__nptl_deallocate_tsd+104>:  sub    $0x1,%r15
   0x7fa4597e0c7c <__nptl_deallocate_tsd+108>:  je     0x7fa4597e0cb0 <__nptl_deallocate_tsd+160>
   0x7fa4597e0c7e <__nptl_deallocate_tsd+110>:  mov    (%rbx),%rdi
   0x7fa4597e0c81 <__nptl_deallocate_tsd+113>:  test   %rdi,%rdi
   0x7fa4597e0c84 <__nptl_deallocate_tsd+116>:  je     0x7fa4597e0c70 <__nptl_deallocate_tsd+96>
   0x7fa4597e0c86 <__nptl_deallocate_tsd+118>:  mov    -0x8(%rbp),%rax
   0x7fa4597e0c8a <__nptl_deallocate_tsd+122>:  cmp    %rax,-0x8(%rbx)
   0x7fa4597e0c8e <__nptl_deallocate_tsd+126>:  movq   $0x0,(%rbx)
   0x7fa4597e0c95 <__nptl_deallocate_tsd+133>:  jne    0x7fa4597e0c70 <__nptl_deallocate_tsd+96>
   0x7fa4597e0c97 <__nptl_deallocate_tsd+135>:  mov    0x0(%rbp),%rdx
   0x7fa4597e0c9b <__nptl_deallocate_tsd+139>:  test   %rdx,%rdx
   0x7fa4597e0c9e <__nptl_deallocate_tsd+142>:  je     0x7fa4597e0c70 <__nptl_deallocate_tsd+96>
   0x7fa4597e0ca0 <__nptl_deallocate_tsd+144>:  callq  *%rdx
   0x7fa4597e0ca2 <__nptl_deallocate_tsd+146>:  add    $0x10,%rbx
   0x7fa4597e0ca6 <__nptl_deallocate_tsd+150>:  add    $0x10,%rbp
   0x7fa4597e0caa <__nptl_deallocate_tsd+154>:  sub    $0x1,%r15
   0x7fa4597e0cae <__nptl_deallocate_tsd+158>:  jne    0x7fa4597e0c7e <__nptl_deallocate_tsd+110>
   0x7fa4597e0cb0 <__nptl_deallocate_tsd+160>:  add    $0x1,%r12
   0x7fa4597e0cb4 <__nptl_deallocate_tsd+164>:  add    $0x200,%r13
   0x7fa4597e0cbb <__nptl_deallocate_tsd+171>:  cmp    $0x20,%r12
   0x7fa4597e0cbf <__nptl_deallocate_tsd+175>:  jne    0x7fa4597e0c50 <__nptl_deallocate_tsd+64>
   0x7fa4597e0cc1 <__nptl_deallocate_tsd+177>:  xor    %eax,%eax
   0x7fa4597e0cc3 <__nptl_deallocate_tsd+179>:  mov    %fs:0x610,%al
   0x7fa4597e0ccb <__nptl_deallocate_tsd+187>:  test   %al,%al
(gdb) break *0x7fa4597e0ca0   <- This is the address of `callq  *%rdx` above
Breakpoint 2 at 0x7fa4597e0ca0
(gdb) c
Continuing.

Breakpoint 2, 0x00007fa4597e0ca0 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007fa45a26d7a0 in ?? () from ./bin/Debug/net5.0/linux-x64/native/NativeLibrary.so <- NativeLibrary.so is the interesting piece of information 
(gdb) c
Continuing.
[New Thread 0x7fa431ffb700 (LWP 48)]
[Thread 0x7fa433fff700 (LWP 44) exited]
[Switching to Thread 0x7fa4337fe700 (LWP 45)]

Breakpoint 1, 0x00007fa4597e0c10 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007fa4597e0c12 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) c
Continuing.

Breakpoint 2, 0x00007fa4597e0ca0 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007fa45a26d7a0 in ?? () from ./bin/Debug/net5.0/linux-x64/native/NativeLibrary.so <- NativeLibrary.so is the interesting piece of information 

@dlewis-arcontech
Copy link
Author

I don't seem to be getting the breakpoint 2 hit:

Breakpoint 1, 0x00007ffff58b5bd0 in __nptl_deallocate_tsd ()
from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install cityvision-api-5.3.18.37989_Release-335.x86_64 glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libicu-50.2-3.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) x /50i $pc
=> 0x7ffff58b5bd0 <__nptl_deallocate_tsd>: xor %eax,%eax
0x7ffff58b5bd2 <__nptl_deallocate_tsd+2>: mov %fs:0x610,%al
0x7ffff58b5bda <__nptl_deallocate_tsd+10>: test %al,%al
0x7ffff58b5bdc <__nptl_deallocate_tsd+12>:
je 0x7ffff58b5d31 <__nptl_deallocate_tsd+353>
0x7ffff58b5be2 <__nptl_deallocate_tsd+18>: push %r15
0x7ffff58b5be4 <__nptl_deallocate_tsd+20>: push %r14
0x7ffff58b5be6 <__nptl_deallocate_tsd+22>: mov $0x4,%r14d
0x7ffff58b5bec <__nptl_deallocate_tsd+28>: push %r13
0x7ffff58b5bee <__nptl_deallocate_tsd+30>: push %r12
0x7ffff58b5bf0 <__nptl_deallocate_tsd+32>: push %rbp
0x7ffff58b5bf1 <__nptl_deallocate_tsd+33>: push %rbx
0x7ffff58b5bf2 <__nptl_deallocate_tsd+34>: sub $0x8,%rsp
0x7ffff58b5bf6 <__nptl_deallocate_tsd+38>: movb $0x0,%fs:0x610
0x7ffff58b5bff <__nptl_deallocate_tsd+47>:
lea 0x20f702(%rip),%r13 # 0x7ffff5ac5308 <__pthread_keys+8>
0x7ffff58b5c06 <__nptl_deallocate_tsd+54>: xor %r12d,%r12d
0x7ffff58b5c09 <__nptl_deallocate_tsd+57>: nopl 0x0(%rax)
0x7ffff58b5c10 <__nptl_deallocate_tsd+64>: mov %fs:0x510(,%r12,8),%rbx
0x7ffff58b5c19 <__nptl_deallocate_tsd+73>: test %rbx,%rbx
0x7ffff58b5c1c <__nptl_deallocate_tsd+76>:
je 0x7ffff58b5c70 <__nptl_deallocate_tsd+160>
0x7ffff58b5c1e <__nptl_deallocate_tsd+78>: add $0x8,%rbx
---Type to continue, or q to quit---
0x7ffff58b5c22 <__nptl_deallocate_tsd+82>: mov %r13,%rbp
0x7ffff58b5c25 <__nptl_deallocate_tsd+85>: mov $0x20,%r15d
0x7ffff58b5c2b <__nptl_deallocate_tsd+91>:
jmp 0x7ffff58b5c3e <__nptl_deallocate_tsd+110>
0x7ffff58b5c2d <__nptl_deallocate_tsd+93>: nopl (%rax)
0x7ffff58b5c30 <__nptl_deallocate_tsd+96>: add $0x10,%rbx
0x7ffff58b5c34 <__nptl_deallocate_tsd+100>: add $0x10,%rbp
0x7ffff58b5c38 <__nptl_deallocate_tsd+104>: sub $0x1,%r15
0x7ffff58b5c3c <__nptl_deallocate_tsd+108>:
je 0x7ffff58b5c70 <__nptl_deallocate_tsd+160>
0x7ffff58b5c3e <__nptl_deallocate_tsd+110>: mov (%rbx),%rdi
0x7ffff58b5c41 <__nptl_deallocate_tsd+113>: test %rdi,%rdi
0x7ffff58b5c44 <__nptl_deallocate_tsd+116>:
je 0x7ffff58b5c30 <__nptl_deallocate_tsd+96>
0x7ffff58b5c46 <__nptl_deallocate_tsd+118>: mov -0x8(%rbp),%rax
0x7ffff58b5c4a <__nptl_deallocate_tsd+122>: cmp %rax,-0x8(%rbx)
0x7ffff58b5c4e <__nptl_deallocate_tsd+126>: movq $0x0,(%rbx)
0x7ffff58b5c55 <__nptl_deallocate_tsd+133>:
jne 0x7ffff58b5c30 <__nptl_deallocate_tsd+96>
0x7ffff58b5c57 <__nptl_deallocate_tsd+135>: mov 0x0(%rbp),%rdx
0x7ffff58b5c5b <__nptl_deallocate_tsd+139>: test %rdx,%rdx
0x7ffff58b5c5e <__nptl_deallocate_tsd+142>:
je 0x7ffff58b5c30 <__nptl_deallocate_tsd+96>
---Type to continue, or q to quit---
0x7ffff58b5c60 <__nptl_deallocate_tsd+144>: callq *%rdx
0x7ffff58b5c62 <__nptl_deallocate_tsd+146>: add $0x10,%rbx
0x7ffff58b5c66 <__nptl_deallocate_tsd+150>: add $0x10,%rbp
0x7ffff58b5c6a <__nptl_deallocate_tsd+154>: sub $0x1,%r15
0x7ffff58b5c6e <__nptl_deallocate_tsd+158>:
jne 0x7ffff58b5c3e <__nptl_deallocate_tsd+110>
0x7ffff58b5c70 <__nptl_deallocate_tsd+160>: add $0x1,%r12
0x7ffff58b5c74 <__nptl_deallocate_tsd+164>: add $0x200,%r13
0x7ffff58b5c7b <__nptl_deallocate_tsd+171>: cmp $0x20,%r12
0x7ffff58b5c7f <__nptl_deallocate_tsd+175>:
jne 0x7ffff58b5c10 <__nptl_deallocate_tsd+64>
0x7ffff58b5c81 <__nptl_deallocate_tsd+177>: xor %eax,%eax
0x7ffff58b5c83 <__nptl_deallocate_tsd+179>: mov %fs:0x610,%al
0x7ffff58b5c8b <__nptl_deallocate_tsd+187>: test %al,%al
(gdb) break *0x7ffff58b5c60
Breakpoint 2 at 0x7ffff58b5c60
(gdb) c
Continuing.
[Switching to Thread 0x7fff73fff700 (LWP 6227)]

Breakpoint 1, 0x00007ffff58b5bd0 in __nptl_deallocate_tsd ()
from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5bd2 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) c
Continuing.
[Thread 0x7fffbceef700 (LWP 6225) exited]
[New Thread 0x7fff737fe700 (LWP 6307)]
[Thread 0x7fff73fff700 (LWP 6227) exited]
[Switching to Thread 0x7fffbe5dd700 (LWP 6147)]

Breakpoint 1, 0x00007ffff58b5bd0 in __nptl_deallocate_tsd ()
from /lib64/libpthread.so.0
(gdb) c
Continuing.
[Switching to Thread 0x7fffbe0b5700 (LWP 6188)]

Breakpoint 1, 0x00007ffff58b5bd0 in __nptl_deallocate_tsd ()
from /lib64/libpthread.so.0
(gdb) c
Continuing.
[Switching to Thread 0x7fffbdfce700 (LWP 6195)]

@dlewis-arcontech
Copy link
Author

Run the test again from the start with the same result, after breaking on __nptl_deallocate_tsd I find the address of <__nptl_deallocate_tsd+144>: callq *%rdx. Then I do break *address and continue and it never hits the second breakpoint for __nptl_deallocate_tsd+144. I've hit continue over 10 times right up to the segmentation fault and breakpoint 2 is never hit.

@dlewis-arcontech
Copy link
Author

Pretty clean run, only had to continue 3 times before segmentation fault:

Breakpoint 1, 0x00007ffff58b5bd0 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install cityvision-api-5.3.18.37989_Release-335.x86_64 glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libicu-50.2-3.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) x /50i $pc
=> 0x7ffff58b5bd0 <__nptl_deallocate_tsd>: xor %eax,%eax
0x7ffff58b5bd2 <__nptl_deallocate_tsd+2>: mov %fs:0x610,%al
0x7ffff58b5bda <__nptl_deallocate_tsd+10>: test %al,%al
0x7ffff58b5bdc <__nptl_deallocate_tsd+12>: je 0x7ffff58b5d31 <__nptl_deallocate_tsd+353>
0x7ffff58b5be2 <__nptl_deallocate_tsd+18>: push %r15
0x7ffff58b5be4 <__nptl_deallocate_tsd+20>: push %r14
0x7ffff58b5be6 <__nptl_deallocate_tsd+22>: mov $0x4,%r14d
0x7ffff58b5bec <__nptl_deallocate_tsd+28>: push %r13
0x7ffff58b5bee <__nptl_deallocate_tsd+30>: push %r12
0x7ffff58b5bf0 <__nptl_deallocate_tsd+32>: push %rbp
0x7ffff58b5bf1 <__nptl_deallocate_tsd+33>: push %rbx
0x7ffff58b5bf2 <__nptl_deallocate_tsd+34>: sub $0x8,%rsp
0x7ffff58b5bf6 <__nptl_deallocate_tsd+38>: movb $0x0,%fs:0x610
0x7ffff58b5bff <__nptl_deallocate_tsd+47>: lea 0x20f702(%rip),%r13 # 0x7ffff5ac5308 <__pthread_keys+8>
0x7ffff58b5c06 <__nptl_deallocate_tsd+54>: xor %r12d,%r12d
0x7ffff58b5c09 <__nptl_deallocate_tsd+57>: nopl 0x0(%rax)
0x7ffff58b5c10 <__nptl_deallocate_tsd+64>: mov %fs:0x510(,%r12,8),%rbx
0x7ffff58b5c19 <__nptl_deallocate_tsd+73>: test %rbx,%rbx
0x7ffff58b5c1c <__nptl_deallocate_tsd+76>: je 0x7ffff58b5c70 <__nptl_deallocate_tsd+160>
0x7ffff58b5c1e <__nptl_deallocate_tsd+78>: add $0x8,%rbx
0x7ffff58b5c22 <__nptl_deallocate_tsd+82>: mov %r13,%rbp
0x7ffff58b5c25 <__nptl_deallocate_tsd+85>: mov $0x20,%r15d
0x7ffff58b5c2b <__nptl_deallocate_tsd+91>: jmp 0x7ffff58b5c3e <__nptl_deallocate_tsd+110>
0x7ffff58b5c2d <__nptl_deallocate_tsd+93>: nopl (%rax)
0x7ffff58b5c30 <__nptl_deallocate_tsd+96>: add $0x10,%rbx
0x7ffff58b5c34 <__nptl_deallocate_tsd+100>: add $0x10,%rbp
0x7ffff58b5c38 <__nptl_deallocate_tsd+104>: sub $0x1,%r15
0x7ffff58b5c3c <__nptl_deallocate_tsd+108>: je 0x7ffff58b5c70 <__nptl_deallocate_tsd+160>
0x7ffff58b5c3e <__nptl_deallocate_tsd+110>: mov (%rbx),%rdi
0x7ffff58b5c41 <__nptl_deallocate_tsd+113>: test %rdi,%rdi
0x7ffff58b5c44 <__nptl_deallocate_tsd+116>: je 0x7ffff58b5c30 <__nptl_deallocate_tsd+96>
0x7ffff58b5c46 <__nptl_deallocate_tsd+118>: mov -0x8(%rbp),%rax
0x7ffff58b5c4a <__nptl_deallocate_tsd+122>: cmp %rax,-0x8(%rbx)
0x7ffff58b5c4e <__nptl_deallocate_tsd+126>: movq $0x0,(%rbx)
0x7ffff58b5c55 <__nptl_deallocate_tsd+133>: jne 0x7ffff58b5c30 <__nptl_deallocate_tsd+96>
0x7ffff58b5c57 <__nptl_deallocate_tsd+135>: mov 0x0(%rbp),%rdx
0x7ffff58b5c5b <__nptl_deallocate_tsd+139>: test %rdx,%rdx
0x7ffff58b5c5e <__nptl_deallocate_tsd+142>: je 0x7ffff58b5c30 <__nptl_deallocate_tsd+96>
0x7ffff58b5c60 <__nptl_deallocate_tsd+144>: callq *%rdx
0x7ffff58b5c62 <__nptl_deallocate_tsd+146>: add $0x10,%rbx
0x7ffff58b5c66 <__nptl_deallocate_tsd+150>: add $0x10,%rbp
0x7ffff58b5c6a <__nptl_deallocate_tsd+154>: sub $0x1,%r15
0x7ffff58b5c6e <__nptl_deallocate_tsd+158>: jne 0x7ffff58b5c3e <__nptl_deallocate_tsd+110>
0x7ffff58b5c70 <__nptl_deallocate_tsd+160>: add $0x1,%r12
0x7ffff58b5c74 <__nptl_deallocate_tsd+164>: add $0x200,%r13
0x7ffff58b5c7b <__nptl_deallocate_tsd+171>: cmp $0x20,%r12
---Type to continue, or q to quit---
0x7ffff58b5c7f <__nptl_deallocate_tsd+175>: jne 0x7ffff58b5c10 <__nptl_deallocate_tsd+64>
0x7ffff58b5c81 <__nptl_deallocate_tsd+177>: xor %eax,%eax
0x7ffff58b5c83 <__nptl_deallocate_tsd+179>: mov %fs:0x610,%al
0x7ffff58b5c8b <__nptl_deallocate_tsd+187>: test %al,%al
(gdb) break *0x7ffff58b5c60
Breakpoint 2 at 0x7ffff58b5c60
(gdb) c
Continuing.
[New Thread 0x7fff737fe700 (LWP 7052)]
[Thread 0x7fff73fff700 (LWP 7008) exited]
[Switching to Thread 0x7fff737fe700 (LWP 7052)]

Breakpoint 1, 0x00007ffff58b5bd0 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) c
Continuing.
[Thread 0x7fff737fe700 (LWP 7052) exited]
[Switching to Thread 0x7fffbceef700 (LWP 7007)]

Breakpoint 1, 0x00007ffff58b5bd0 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) c
Continuing.
[Thread 0x7fffbceef700 (LWP 7007) exited]
[New Thread 0x7fffbceef700 (LWP 7091)]
[New Thread 0x7fff737fe700 (LWP 7092)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffbdf8f700 (LWP 6971)]
0x00007ffff61604b7 in WKS::gc_heap::make_unused_array (x=0x7fffc806b460 "\340\021\376\366\377\177", size=18446744073709112368, clearp=0, resetp=)
at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:27805
27805 /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp: No such file or directory.
(gdb)

@jkotas
Copy link
Member

jkotas commented Feb 24, 2021

Could you please restart and try:

  • Set breakpoint at __nptl_deallocate_tsd
  • Once at __nptl_deallocate_tsd, keep doing si until you step beyond the offset +144 and share the transcript? I would like to know which one of the conditional branches is taken to skip the indirect call.

@dlewis-arcontech
Copy link
Author

Looks like the jump at <__nptl_deallocate_tsd+142>: je 0x7ffff58b5c30 is kicking in, seems to be going in a loop back to <__nptl_deallocate_tsd+96>:

Breakpoint 1, 0x00007ffff58b5bd0 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install cityvision-api-5.3.18.37989_Release-335.x86_64 glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libicu-50.2-3.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) x /50i $pc
=> 0x7ffff58b5bd0 <__nptl_deallocate_tsd>: xor %eax,%eax
0x7ffff58b5bd2 <__nptl_deallocate_tsd+2>: mov %fs:0x610,%al
0x7ffff58b5bda <__nptl_deallocate_tsd+10>: test %al,%al
0x7ffff58b5bdc <__nptl_deallocate_tsd+12>: je 0x7ffff58b5d31 <__nptl_deallocate_tsd+353>
0x7ffff58b5be2 <__nptl_deallocate_tsd+18>: push %r15
0x7ffff58b5be4 <__nptl_deallocate_tsd+20>: push %r14
0x7ffff58b5be6 <__nptl_deallocate_tsd+22>: mov $0x4,%r14d
0x7ffff58b5bec <__nptl_deallocate_tsd+28>: push %r13
0x7ffff58b5bee <__nptl_deallocate_tsd+30>: push %r12
0x7ffff58b5bf0 <__nptl_deallocate_tsd+32>: push %rbp
0x7ffff58b5bf1 <__nptl_deallocate_tsd+33>: push %rbx
0x7ffff58b5bf2 <__nptl_deallocate_tsd+34>: sub $0x8,%rsp
0x7ffff58b5bf6 <__nptl_deallocate_tsd+38>: movb $0x0,%fs:0x610
0x7ffff58b5bff <__nptl_deallocate_tsd+47>: lea 0x20f702(%rip),%r13 # 0x7ffff5ac5308 <__pthread_keys+8>
0x7ffff58b5c06 <__nptl_deallocate_tsd+54>: xor %r12d,%r12d
0x7ffff58b5c09 <__nptl_deallocate_tsd+57>: nopl 0x0(%rax)
0x7ffff58b5c10 <__nptl_deallocate_tsd+64>: mov %fs:0x510(,%r12,8),%rbx
0x7ffff58b5c19 <__nptl_deallocate_tsd+73>: test %rbx,%rbx
0x7ffff58b5c1c <__nptl_deallocate_tsd+76>: je 0x7ffff58b5c70 <__nptl_deallocate_tsd+160>
0x7ffff58b5c1e <__nptl_deallocate_tsd+78>: add $0x8,%rbx
0x7ffff58b5c22 <__nptl_deallocate_tsd+82>: mov %r13,%rbp
0x7ffff58b5c25 <__nptl_deallocate_tsd+85>: mov $0x20,%r15d
0x7ffff58b5c2b <__nptl_deallocate_tsd+91>: jmp 0x7ffff58b5c3e <__nptl_deallocate_tsd+110>
0x7ffff58b5c2d <__nptl_deallocate_tsd+93>: nopl (%rax)
0x7ffff58b5c30 <__nptl_deallocate_tsd+96>: add $0x10,%rbx
0x7ffff58b5c34 <__nptl_deallocate_tsd+100>: add $0x10,%rbp
0x7ffff58b5c38 <__nptl_deallocate_tsd+104>: sub $0x1,%r15
0x7ffff58b5c3c <__nptl_deallocate_tsd+108>: je 0x7ffff58b5c70 <__nptl_deallocate_tsd+160>
0x7ffff58b5c3e <__nptl_deallocate_tsd+110>: mov (%rbx),%rdi
0x7ffff58b5c41 <__nptl_deallocate_tsd+113>: test %rdi,%rdi
0x7ffff58b5c44 <__nptl_deallocate_tsd+116>: je 0x7ffff58b5c30 <__nptl_deallocate_tsd+96>
0x7ffff58b5c46 <__nptl_deallocate_tsd+118>: mov -0x8(%rbp),%rax
0x7ffff58b5c4a <__nptl_deallocate_tsd+122>: cmp %rax,-0x8(%rbx)
0x7ffff58b5c4e <__nptl_deallocate_tsd+126>: movq $0x0,(%rbx)
0x7ffff58b5c55 <__nptl_deallocate_tsd+133>: jne 0x7ffff58b5c30 <__nptl_deallocate_tsd+96>
0x7ffff58b5c57 <__nptl_deallocate_tsd+135>: mov 0x0(%rbp),%rdx
0x7ffff58b5c5b <__nptl_deallocate_tsd+139>: test %rdx,%rdx
0x7ffff58b5c5e <__nptl_deallocate_tsd+142>: je 0x7ffff58b5c30 <__nptl_deallocate_tsd+96>
0x7ffff58b5c60 <__nptl_deallocate_tsd+144>: callq *%rdx
0x7ffff58b5c62 <__nptl_deallocate_tsd+146>: add $0x10,%rbx
0x7ffff58b5c66 <__nptl_deallocate_tsd+150>: add $0x10,%rbp
0x7ffff58b5c6a <__nptl_deallocate_tsd+154>: sub $0x1,%r15
0x7ffff58b5c6e <__nptl_deallocate_tsd+158>: jne 0x7ffff58b5c3e <__nptl_deallocate_tsd+110>
0x7ffff58b5c70 <__nptl_deallocate_tsd+160>: add $0x1,%r12
0x7ffff58b5c74 <__nptl_deallocate_tsd+164>: add $0x200,%r13
0x7ffff58b5c7b <__nptl_deallocate_tsd+171>: cmp $0x20,%r12
---Type to continue, or q to quit---
0x7ffff58b5c7f <__nptl_deallocate_tsd+175>: jne 0x7ffff58b5c10 <__nptl_deallocate_tsd+64>
0x7ffff58b5c81 <__nptl_deallocate_tsd+177>: xor %eax,%eax
0x7ffff58b5c83 <__nptl_deallocate_tsd+179>: mov %fs:0x610,%al
0x7ffff58b5c8b <__nptl_deallocate_tsd+187>: test %al,%al
(gdb) si
0x00007ffff58b5bd2 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5bda in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5bdc in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5be2 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5be4 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5be6 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5bec in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5bee in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5bf0 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5bf1 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5bf2 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5bf6 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5bff in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c06 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c09 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c10 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c19 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c1c in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c1e in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c22 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c25 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c2b in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c3e in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c41 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c44 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c46 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c4a in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c4e in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c55 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c57 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c5b in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c5e in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c30 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c34 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c38 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c3c in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c3e in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c41 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c44 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c30 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c34 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c38 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c3c in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c3e in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c41 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c44 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c30 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c34 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c38 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c3c in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c3e in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c41 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c44 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb) si
0x00007ffff58b5c30 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
(gdb)

@jkotas
Copy link
Member

jkotas commented Feb 25, 2021

Do you have a custom steps to link arcontech_capi.so or are you building with just dotnet publish /p:NativeLib=Shared ... as shown in NativeLibrary sample?

PalAttachThread in my test app (https://github.com/dotnet/runtimelab/tree/feature/NativeAOT/samples/NativeLibrary with minor modifications):

   0x7f5f61db6180 <PalAttachThread(void*)>:     push   %rbp
   0x7f5f61db6181 <PalAttachThread(void*)+1>:   mov    %rsp,%rbp
   0x7f5f61db6184 <PalAttachThread(void*)+4>:   push   %r14
   0x7f5f61db6186 <PalAttachThread(void*)+6>:   push   %rbx
   0x7f5f61db6187 <PalAttachThread(void*)+7>:   mov    %rdi,%r14
   0x7f5f61db618a <PalAttachThread(void*)+10>:  lea    0x76ad7f(%rip),%rdi        # 0x7f5f62520f10
   0x7f5f61db6191 <PalAttachThread(void*)+17>:  callq  0x7f5f61d5c0a0 <__tls_get_addr@plt>
   0x7f5f61db6196 <PalAttachThread(void*)+22>:  mov    %rax,%rbx
   0x7f5f61db6199 <PalAttachThread(void*)+25>:  cmpb   $0x0,0x110(%rax)
   0x7f5f61db61a0 <PalAttachThread(void*)+32>:  je     0x7f5f61db61ae <PalAttachThread(void*)+46>
   0x7f5f61db61a2 <PalAttachThread(void*)+34>:  mov    %r14,0xf0(%rbx)
   0x7f5f61db61a9 <PalAttachThread(void*)+41>:  pop    %rbx
   0x7f5f61db61aa <PalAttachThread(void*)+42>:  pop    %r14
   0x7f5f61db61ac <PalAttachThread(void*)+44>:  pop    %rbp
   0x7f5f61db61ad <PalAttachThread(void*)+45>:  retq
   0x7f5f61db61ae <PalAttachThread(void*)+46>:  movb   $0x1,0x110(%rbx)
   0x7f5f61db61b5 <PalAttachThread(void*)+53>:  lea    0xf0(%rbx),%rsi
   0x7f5f61db61bc <PalAttachThread(void*)+60>:  lea    0xbed(%rip),%rdi        # 0x7f5f61db6db0 <TlsDestructionMonitor::~TlsDestructionMonitor()>
   0x7f5f61db61c3 <PalAttachThread(void*)+67>:  lea    0x769c96(%rip),%rdx        # 0x7f5f6251fe60
   0x7f5f61db61ca <PalAttachThread(void*)+74>:  callq  0x7f5f61dd78b0 <__cxa_thread_atexit>
   0x7f5f61db61cf <PalAttachThread(void*)+79>:  jmp    0x7f5f61db61a2 <PalAttachThread(void*)+34>

PalAttachThread from the crashdumps:

   0x7fc5d1151ad0 <PalAttachThread(void*)>:     push   %rbp
   0x7fc5d1151ad1 <PalAttachThread(void*)+1>:   mov    %rsp,%rbp
   0x7fc5d1151ad4 <PalAttachThread(void*)+4>:   push   %r14
   0x7fc5d1151ad6 <PalAttachThread(void*)+6>:   push   %rbx
   0x7fc5d1151ad7 <PalAttachThread(void*)+7>:   mov    %rdi,%r14
   0x7fc5d1151ada <PalAttachThread(void*)+10>:  lea    0xbb843f(%rip),%rdi        # 0x7fc5d1d09f20
   0x7fc5d1151ae1 <PalAttachThread(void*)+17>:  callq  0x7fc5d10f0e80 <__tls_get_addr@plt>
   0x7fc5d1151ae6 <PalAttachThread(void*)+22>:  mov    %rax,%rbx
   0x7fc5d1151ae9 <PalAttachThread(void*)+25>:  cmpb   $0x0,0x110(%rax)
   0x7fc5d1151af0 <PalAttachThread(void*)+32>:  je     0x7fc5d1151afe <PalAttachThread(void*)+46>
   0x7fc5d1151af2 <PalAttachThread(void*)+34>:  mov    %r14,0xf0(%rbx)
   0x7fc5d1151af9 <PalAttachThread(void*)+41>:  pop    %rbx
   0x7fc5d1151afa <PalAttachThread(void*)+42>:  pop    %r14
   0x7fc5d1151afc <PalAttachThread(void*)+44>:  pop    %rbp
   0x7fc5d1151afd <PalAttachThread(void*)+45>:  retq
   0x7fc5d1151afe <PalAttachThread(void*)+46>:  movb   $0x1,0x110(%rbx)
   0x7fc5d1151b05 <PalAttachThread(void*)+53>:  lea    0xf0(%rbx),%rsi
   0x7fc5d1151b0c <PalAttachThread(void*)+60>:  lea    0xbed(%rip),%rdi        # 0x7fc5d1152700 <TlsDestructionMonitor::~TlsDestructionMonitor()>
   0x7fc5d1151b13 <PalAttachThread(void*)+67>:  lea    0xbb7766(%rip),%rdx        # 0x7fc5d1d09280
   0x7fc5d1151b1a <PalAttachThread(void*)+74>:  callq  0x7fc5d10f0bb0 <__cxa_thread_atexit@plt>
   0x7fc5d1151b1f <PalAttachThread(void*)+79>:  jmp    0x7fc5d1151af2 <PalAttachThread(void*)+34>

One calls __cxa_thread_atexit and the other one calls __cxa_thread_atexit@plt. It is what seems to be causing or at least related to the problem.

@jkotas
Copy link
Member

jkotas commented Feb 25, 2021

Also, my test app does not have dependency on libstdc++.so.6, but arcontech_capi.so does have this dependency.

ldd NativeLibrary.so
        linux-vdso.so.1 =>  (0x00007ffebffd2000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f91ea672000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f91ea370000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f91ea168000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f91e9f52000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f91e9d36000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f91e9968000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f91eb216000)
ldd arcontech_capi.so
        linux-vdso.so.1 =>  (0x00007ffdd21bc000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f1ca3bf4000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f1ca39f0000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f1ca36ee000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f1ca34e6000)
        libanl.so.1 => /lib64/libanl.so.1 (0x00007f1ca32e2000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f1ca30cc000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1ca2eb0000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1ca2ae2000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f1ca5010000)

I would like to know where the dependency on libstdc++.so.6 for arcontech_capi.so is comming from or how to replicate it in my sample app.

@dlewis-arcontech
Copy link
Author

dlewis-arcontech commented Feb 25, 2021

Hi Jan,

No custom steps, it's build with:

dotnet publish Arcontech.DotNetAPI.NetStandard.Native.csproj /p:NativeLib=Shared -r rhel-x64 -c release /p:SelfContained=true

where the csproj is pretty standard, it has come company details in the copyright and otherwise just a normal project (added spaces before and after < > otherwise the editor was flattening the text):

< Project Sdk="Microsoft.NET.Sdk" >

< PropertyGroup >
< TargetFramework>net5.0
< AssemblyName>arcontech_capi
< RootNamespace>Arcontech.DotNetAPI
< Platforms>AnyCPU;x64;x86
< Authors / >
< PackageId>Arcontech Dot Net API
< GenerateAssemblyInfo>false
< /PropertyGroup >

< PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|AnyCPU'" >
< AllowUnsafeBlocks>true
< /PropertyGroup >

< PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|AnyCPU'" >
< AllowUnsafeBlocks>true
< /PropertyGroup >

< ItemGroup >
< RdXmlFile Include="rd.xml" / >
< /ItemGroup >

< ItemGroup >
< PackageReference Include="Microsoft.DotNet.ILCompiler" Version="6.0.0-*" / >
< /ItemGroup >

< /Project >

@dlewis-arcontech
Copy link
Author

The natively compiled AOT .so is used by some c/c++ code calling into the .so entry points, then the natively compiled AOT .so also loads a different c++/c natively compiled .so that use we for tcpip communication. So the stack is:

xxx executable (C/C++ classes that use the arcontech_capi.so)
arcontech_capi.so
feed64.so (C/C++ tcpip code that is used via C entry points by arcontech_capi.so)

I'll see if I can work out how the reference to libstdc++.so.6 is getting in there.

@dlewis-arcontech
Copy link
Author

Hi Jan, would you know what namespaces to look for that'll cause the Native AOT to reference libstdc++.so over libc.so? I'm trying to figure it out but I can't see anything obvious yet. It might need breaking my project into smaller parts to eliminate but that might not be a quick job. Would you normally not expect libstdc++.so to be referenced?

@jkotas
Copy link
Member

jkotas commented Feb 25, 2021

I would expect libstdc++ to be statically linked, like it is in NativeLibrary.so sample. Could you please build https://github.com/dotnet/runtimelab/tree/feature/NativeAOT/samples/NativeLibrary sample using the same steps and in the same VM you are building arcontech_capi.so, and then run ldd on the resulting NativeLibrary.so to see whether it has dependency on libstdc++.so.6 ?

@jkotas
Copy link
Member

jkotas commented Feb 25, 2021

would you know what namespaces to look for that'll cause the Native AOT to reference libstdc++.so

I do not think that the managed code makes a difference. I think it will environment problem, like version of clang installed on the machine. Speaking of which - what is the version of clang that you got?

The version that I got by running yum install clang is:

clang --version
    clang version 3.4.2 (tags/RELEASE_34/dot2-final)

@dlewis-arcontech
Copy link
Author

dlewis-arcontech commented Feb 25, 2021

Hello Jan,

I've built the NativeLibrary as a Shared .so and got the same result:

$ ldd NativeLibrary.so
linux-vdso.so.1 => (0x00007ffee5f93000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f1bd059d000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f1bd0399000)
libm.so.6 => /lib64/libm.so.6 (0x00007f1bd0097000)
librt.so.1 => /lib64/librt.so.1 (0x00007f1bcfe8f000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f1bcfc79000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1bcfa5d000)
libc.so.6 => /lib64/libc.so.6 (0x00007f1bcf68f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1bd1032000)

I've added the build output here showing the llvm version:

$ /opt/ARCONTECH/.dotnet/dotnet publish /p:NativeLib=Shared -r rhel-x64 -c release /p:SelfContained=true
Microsoft (R) Build Engine version 16.8.3+39993bd9d for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

Determining projects to restore...
All projects are up-to-date for restore.
NativeLibrary -> /opt/ARCONTECH/git/runtimelab-feature-NativeAOT/samples/NativeLibrary/bin/release/net5.0/rhel-x64/NativeLibrary.dll
/opt/llvm-3.9.0/bin/clang
Generating compatible native code. To optimize for size or speed, visit https://aka.ms/OptimizeCoreRT
NativeLibrary -> /opt/ARCONTECH/git/runtimelab-feature-NativeAOT/samples/NativeLibrary/bin/release/net5.0/rhel-x64/publish/

@jkotas
Copy link
Member

jkotas commented Feb 25, 2021

Could you please try clang 3.4.2 that comes with CentOS 7.7.1908 by default to see whether it makes difference?

@dlewis-arcontech
Copy link
Author

Hello Jan,

Reverted back to 3.4.2but still the same issue:

[arcon@Darren NativeLibrary]$ /opt/ARCONTECH/.dotnet/dotnet publish /p:NativeLib=Shared -r rhel-x64 -c release /p:SelfContained=true
Microsoft (R) Build Engine version 16.8.3+39993bd9d for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

Determining projects to restore...
All projects are up-to-date for restore.
NativeLibrary -> /opt/ARCONTECH/git/runtimelab-feature-NativeAOT/samples/NativeLibrary/bin/release/net5.0/rhel-x64/NativeLibrary.dll
/usr/bin/clang
Generating compatible native code. To optimize for size or speed, visit https://aka.ms/OptimizeCoreRT
NativeLibrary -> /opt/ARCONTECH/git/runtimelab-feature-NativeAOT/samples/NativeLibrary/bin/release/net5.0/rhel-x64/publish/
[arcon@Darren NativeLibrary]$ clang --version
clang version 3.4.2 (tags/RELEASE_34/dot2-final)
Target: x86_64-redhat-linux-gnu
Thread model: posix
[arcon@Darren NativeLibrary]$ cd ./bin/release/net5.0/rhel-x64/publish/
[arcon@Darren publish]$ ldd NativeLibrary.so
linux-vdso.so.1 => (0x00007fff0f9f6000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f9fb5078000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f9fb4e74000)
libm.so.6 => /lib64/libm.so.6 (0x00007f9fb4b72000)
librt.so.1 => /lib64/librt.so.1 (0x00007f9fb496a000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f9fb4754000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9fb4538000)
libc.so.6 => /lib64/libc.so.6 (0x00007f9fb416a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f9fb5b0d000)

@dlewis-arcontech
Copy link
Author

I've done a -v diag build with Native Library to see what is going on and I can see:

-lstdc++

Is being included just not sure how it gets there.

clang "obj/release/net5.0/rhel-x64/native/NativeLibrary.o" -o "bin/release/net5.0/rhel-x64/native/NativeLibrary.so" -Wl,--version-script=obj/release/net5.0/rhel-x64/native/NativeLibrary.exports /opt/ARCONTECH/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/6.0.0-preview.2.21125.1/sdk/libbootstrapperdll.a /opt/ARCONTECH/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/6.0.0-preview.2.21125.1/sdk/libRuntime.a /opt/ARCONTECH/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/6.0.0-preview.2.21125.1/framework/libSystem.Native.a /opt/ARCONTECH/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/6.0.0-preview.2.21125.1/framework/libSystem.Globalization.Native.a /opt/ARCONTECH/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/6.0.0-preview.2.21125.1/framework/libSystem.IO.Compression.Native.a /opt/ARCONTECH/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/6.0.0-preview.2.21125.1/framework/libSystem.Net.Security.Native.a /opt/ARCONTECH/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/6.0.0-preview.2.21125.1/framework/libSystem.Security.Cryptography.Native.OpenSsl.a -g -Wl,-rpath,'$ORIGIN' -Wl,--as-needed -pthread -lstdc++ -ldl -lm -lz -lgssapi_krb5 -lrt -lanl -shared -Wl,--require-defined,CoreRT_StaticInitialization -Wl,--discard-all -Wl,--gc-sections (TaskId:119)

@jkotas
Copy link
Member

jkotas commented Feb 25, 2021

-lstdc++ is coming from https://github.com/dotnet/runtimelab/blob/feature/NativeAOT/src/coreclr/nativeaot/BuildIntegration/Microsoft.NETCore.Native.Unix.props#L69.

For some reason, it is causing stdc++ to be linked dynamically on your machine (leaving the reference to libstdc++.so.6), but statically on my machine (no reference to the libstdc++.so.6).

To make sure that we are following the right trail, could you please try the following?

  • Change the Add method in the NativeLibrary sample to:
public static int Add(int a, int b)
{
    for (;;) { new Thread(() => Console.WriteLine("Hello World!")).Start(); Thread.Sleep(1); }
}
  • Compile the sample using dotnet publish /p:NativeLib=Shared /p:SelfContained=true -r linux-x64
  • Compile the main executable that is part of the sample: clang LoadLibrary.c -ldl
  • Run it: ./a.out

Does it run fine printing "Hello World" forever, or does it also crash?

@jkotas
Copy link
Member

jkotas commented Feb 25, 2021

Another experiment to try: Compile empty C program with verbose linker output clang dummy.c -lstdc++ -Wl,-verbose. The verbose linker output should show where it finds libstdc++. In my case, it finds it here:

attempt to open /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.5/libstdc++.so failed
attempt to open /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.5/libstdc++.a succeeded

@dlewis-arcontech
Copy link
Author

Hello Jan, I've tried changing the add as you suggested and a.out just produces Hello World! for ever. It's been running for a few minutes and no crash yet.

@dlewis-arcontech
Copy link
Author

The dummy.c verbose compile produces the following matches with libstdc++

attempt to open /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.5/libstdc++.so succeeded
-lstdc++ (/usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.5/libstdc++.so)
libm.so.6 needed by /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.5/libstdc++.so
ld-linux-x86-64.so.2 needed by /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.5/libstdc++.so
libgcc_s.so.1 needed by /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.5/libstdc++.so

It seems to succeed with the .so and not look for the .a. The /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.5/ directory (no .a):

[arcon@Darren lib64]$ cd /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.5/
[arcon@Darren 4.8.5]$ ls -l ++
lrwxrwxrwx. 1 root root 37 Nov 14 2019 libstdc++.so -> ../../../../lib64/libstdc++.so.6.0.19

And then the directory the symlink points to:

[arcon@Darren 4.8.5]$ cd ../../../../lib64/
[arcon@Darren lib64]$ ls -l ++
lrwxrwxrwx. 1 root root 18 Sep 4 2019 libcdio++.so.0 -> libcdio++.so.0.0.2
-rwxr-xr-x. 1 root root 15472 Oct 30 2018 libcdio++.so.0.0.2
lrwxrwxrwx. 1 root root 20 Sep 4 2019 libconfig++.so.9 -> libconfig++.so.9.1.3
-rwxr-xr-x. 1 root root 95288 Jun 10 2014 libconfig++.so.9.1.3
lrwxrwxrwx. 1 root root 18 Sep 4 2019 libFLAC++.so.6 -> libFLAC++.so.6.3.0
-rwxr-xr-x. 1 root root 107384 Apr 1 2015 libFLAC++.so.6.3.0
lrwxrwxrwx. 1 root root 21 Sep 4 2019 libiso9660++.so.0 -> libiso9660++.so.0.0.0
-rwxr-xr-x. 1 root root 15368 Oct 30 2018 libiso9660++.so.0.0.0
lrwxrwxrwx. 1 root root 19 Sep 4 2019 libncurses++.so.5 -> libncurses++.so.5.9
-rwxr-xr-x. 1 root root 78520 Sep 6 2017 libncurses++.so.5.9
lrwxrwxrwx. 1 root root 20 Sep 4 2019 libncurses++w.so.5 -> libncurses++w.so.5.9
-rwxr-xr-x. 1 root root 78520 Sep 6 2017 libncurses++w.so.5.9
lrwxrwxrwx. 1 root root 19 Sep 4 2019 libplist++.so.3 -> libplist++.so.3.0.0
-rwxr-xr-x. 1 root root 61784 Aug 3 2017 libplist++.so.3.0.0
lrwxrwxrwx. 1 root root 19 Nov 14 2019 libstdc++.so.6 -> libstdc++.so.6.0.19
-rwxr-xr-x. 1 root root 991616 Aug 6 2019 libstdc++.so.6.0.19

@jkotas
Copy link
Member

jkotas commented Feb 25, 2021

Ok, it means that the dynamic linking against libstdc++.so.6 alone is not the root cause of the problem. I will keep digging in the crash dumps...

@dlewis-arcontech
Copy link
Author

Thanks. I'll see if I can work out what areas of the code might be provoking the issue. I can also try building on different Linux versions to see if there's any different behaviour.

@dlewis-arcontech
Copy link
Author

Hi Jan, we've tried the code as a fresh check out on a clean CentOS 7 VM then rebuilt and it still behaves the same. If we run the code using .Net Core it runs fine but as a Native AOT build we get the same error as above. We've also retested Native AOT on windows and that is stable. Seems to be a Native AOT linux only issue.

@jkotas
Copy link
Member

jkotas commented Feb 27, 2021

Ok, I think I have figured it out:

  1. Your main program is compiled without threading support (-pthread command line option for the compiler)
  2. It causes the libstdc++.so to get initialize with assumption that there will only ever be one thread.
  3. When your C# library (that uses threading) gets loaded later, it gets bound against already initialized libstdc++.so. libstdc++.so is not re-initialized to assume multi-threading.
  4. Crash

I think either of these options will fix the problem - can you give it a try?

  • Compile your main program with threading support by adding -pthread option for the compiler
    or
  • Compile your NativeAOT binary with statically linked C++ runtime. The quick and dirty way is to rm /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.5/libstdc++.so in your VM. There are probably better ways to do that.

The standard .NET Core is compiled with statically linked C++ runtime (the second option above) and it is why it works fine.

jkotas added a commit to jkotas/runtimelab that referenced this issue Feb 27, 2021
- Add note about -pthread option on Unix (see dotnet#711 for details)
- Fix warnings
MichalStrehovsky pushed a commit that referenced this issue Mar 1, 2021
- Add note about -pthread option on Unix (see #711 for details)
- Fix warnings
@dlewis-arcontech
Copy link
Author

Hello Jan, I've tried the -pthread option and can confirm it's working. Thanks for looking into this issue, sorry it turned out to be a config issue but it wasn't an obvious problem to track down. I'll carry on testing Native AOT with the -pthread build.

@jkotas
Copy link
Member

jkotas commented Mar 1, 2021

Thank you for your cooperation with tracking this down! We will know what the problem is next time somebody hits this.

@jkotas jkotas closed this as completed Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-NativeAOT-coreclr .NET runtime optimized for ahead of time compilation
Projects
None yet
Development

No branches or pull requests

2 participants