Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreCLR freezes with no output on FreeBSD #4172

Closed
janhenke opened this issue Apr 24, 2015 · 24 comments · Fixed by dotnet/coreclr#838
Closed

CoreCLR freezes with no output on FreeBSD #4172

janhenke opened this issue Apr 24, 2015 · 24 comments · Fixed by dotnet/coreclr#838
Labels
os-freebsd FreeBSD OS

Comments

@janhenke
Copy link
Member

Now that we have a successful build, the runtime issues start. Testing our binaries on FreeBSD shows that we are not yet able to really run an application.

Running any assembly currently results in the corerun process simply freezing. It creates one line of empty output and does nothing more. CPU usage remains 0% and the process ignores SIGINT. You need to SIGKILL it to stop it.

We have to investigate the cause of it. I made a system call trace with truss, available here: https://gist.github.com/janhenke/618a1f0e297850694b4a
We have to find our if the problem comes from CoreCLR (PAL?) or CoreFX.

@janhenke
Copy link
Member Author

/cc @josteink @akoeplinger

@janvorli
Copy link
Member

There are two things you can try now.

  1. Run PAL test suite to see if there are any failing tests.
  2. Enable PAL tracing logs by setting the following env variables in the shell before launching the corerun:
export PAL_API_LEVELS=1
export PAL_DBG_CHANNELS=+all.all:-VIRTUAL.TRACE

This should give us some idea on how far it got before the hang.

@kangaroo
Copy link
Contributor

@janvorli I had already asked him to run with PAL_DBG_CHANNELS and evidently there was no output. I think the next step (since it also hangs in gdb due to the sigprocmask) is to run with a dtrace entry probe.

@kangaroo
Copy link
Contributor

I had some time to run the PAL test suite:

Finished running PAL tests.

The following test(s) failed:
threading/SleepEx/test2/paltest_sleepex_test2. Exit code: 1

PAL Test Results:
  Passed: 820
  Failed: 1

Didn't have time to build up the managed bits before bed. I'll check out the hang from the airport tomorrow.

@richlander
Copy link
Member

Thanks for looking into that @kangaroo. Appreciated.

@kangaroo
Copy link
Contributor

Here's the PAL trace: https://gist.github.com/kangaroo/2d1e1640a781eb499cf1

@RussellHaley
Copy link

I eagerly await your fix.

@josteink
Copy link
Member

@janhenke Would you consider posting the compiled exe and depending DLLs/so-files for others to test as well?

I don't have a Windows machine availble to build my own mscoree/mscorwks and I suspect that applies to thers who may be able to help too.

Edit: I worked around this by sneakily coopying your files off the VM, but I guess this still applies to others.

@janhenke
Copy link
Member Author

That exe was not meant to be distributed. For everyone please follow the freebsd instructions linked from the README.md in the project root. It contains instructions to download and compile a HelloWorld example.

@josteink
Copy link
Member

I'm able to attach using LLDB, but it seems I do have a lot to learn about it.

 lldb ./corerun
(lldb) target create "./corerun"
Current executable set to './corerun' (x86_64).
(lldb) settings set target.disable-stdio true
(lldb) process launch --stop-at-entry -- -c . HelloWorld.exe
Process 55773 launching
Process 55773 launched: './corerun' (x86_64)
Process 55773 stopped
(lldb) c
Process 55773 resuming
# hangs until CTRL-Z is pressed

I thought if I disabled stdio, I should be able to command the debugger even while the program is ruunning (or in our case) acting up?

@josteink
Copy link
Member

@kangaroo I'm not sure if its a coincidence or not, but I see your PAL traces hangs after two things have happened:

  1. Lots of shared-memory operations going in.
  2. PAL creating a thread.

Regarding shared memory, pthreads and FreeBSD, there was a comment on the CI-issue , also asking if there had been any changes on this front. I'm still curious when the original shmemory.cpp file was created, and under what conditions the comments there were written.

It might be a coincidence. It might be our PAL pthread-code being setup incorrectly. But I also think it's worth double-checking.

Anyone inside Microsoft with access to the original TFS-source able to check out when this file and comment was originally written? If for nothing else, just to leave that theory dead. @jkotas? Can you help with this?

@jkotas
Copy link
Member

jkotas commented Apr 26, 2015

The original shmemory.cpp was created in 2001 and this comment was included in the initial version of the file. It was written in the context of state of the art FreeBSD at that time.

@josteink
Copy link
Member

@jkotas Thanks for the input. I guess that leaves that issue dead for now, which is good.

Using some n00bed up LLDB-attempts, I managed to reproduce about the same things @kangaroo did in his PAL trace-logs:

$ lldb ./corerun -- HelloWorld.exe
(lldb) target create "./corerun"
Current executable set to './corerun' (x86_64).
(lldb) settings set -- target.run-args  "HelloWorld.exe"
(lldb) process launch
Process 97457 launching
Process 97457 launched: './corerun' (x86_64)
<CTRL-C>
Process 97457 stopped
* thread dotnet/coreclr#1: tid = 100516, 0x00000008014f78ca libthr.so.3, stop reason = signal SIGSTOP
    frame #0: 0x00000008014f78ca libthr.so.3
->  0x8014f78ca: syscall 
    0x8014f78cc: retq   
    0x8014f78cd: nop    
    0x8014f78ce: nop    
(lldb) l
   406  {
   407      const char* clrFilesPath;
   408      const char* managedAssemblyPath;
   409      const char** managedAssemblyArgv;
   410      int managedAssemblyArgc;
   411  
   412      if (!ParseArguments(
   413              argc,
   414              argv,
   415              &clrFilesPath,
(lldb) thread backtrace
* thread dotnet/coreclr#1: tid = 100516, 0x00000008014f78ca libthr.so.3, stop reason = signal SIGSTOP
  * frame #0: 0x00000008014f78ca libthr.so.3
    frame dotnet/coreclr#1: 0x00000008014f240d libthr.so.3`??? + 109
    frame dotnet/coreclr#2: 0x00000008014f1fca libthr.so.3`pthread_mutex_lock + 74
    frame dotnet/coreclr#3: 0x00000008011e0969 libc.so.7`??? + 57
    frame dotnet/coreclr#4: 0x00000008011cb593 libc.so.7`??? + 179
    frame dotnet/coreclr#5: 0x00000008011ea736 libc.so.7`__calloc + 3734
    frame dotnet/coreclr#6: 0x00000008014f24de libthr.so.3`??? + 318
    frame dotnet/coreclr#7: 0x00000008014f1fca libthr.so.3`pthread_mutex_lock + 74
    frame dotnet/coreclr#8: 0x00000008011e0969 libc.so.7`??? + 57
    frame dotnet/coreclr#9: 0x00000008011cb593 libc.so.7`??? + 179
    frame dotnet/coreclr#10: 0x00000008011ea736 libc.so.7`__calloc + 3734
    frame dotnet/coreclr#11: 0x00000008014f3bda libthr.so.3`??? + 394
    frame dotnet/coreclr#12: 0x00000008014ebe3e libthr.so.3`_pthread_create + 110
    frame dotnet/coreclr#13: 0x00000008021462e9 libcoreclrpal.so`CorUnix::InternalCreateThread(pThread=0x0000000801c20800, lpThreadAttributes=0x0000000000000000, dwStackSize=262144, lpStartAddress=0x0000000802125e00, lpParameter=0x0000000801c49280, dwCreationFlags=0, eThreadType=PalWorkerThread, lpThreadId=0x0000000801c49288, phThread=0x00007fffffffd0f0)(void*), void*, unsigned int, CorUnix::PalThreadType, unsigned int*, void**) + 3801 at thread.cpp:706
    frame dotnet/coreclr#14: 0x000000080211fdca libcoreclrpal.so`CorUnix::CPalSynchronizationManager::StartWorker(pthrCurrent=0x0000000801c20800) + 346 at synchmanager.cpp:1592
    frame dotnet/coreclr#15: 0x000000080211fc43 libcoreclrpal.so`CorUnix::CPalSynchMgrController::StartWorker(pthrCurrent=0x0000000801c20800) + 35 at synchmanager.cpp:112
    frame dotnet/coreclr#16: 0x00000008020990da libcoreclrpal.so`Initialize(argc=1, argv=0x00007fffffffd4e0, flags=255) + 4090 at pal.cpp:449
    frame dotnet/coreclr#17: 0x00000008020980ae libcoreclrpal.so`::PAL_Initialize(argc=1, argv=0x00007fffffffd4e0) + 46 at pal.cpp:143
    frame dotnet/coreclr#18: 0x000000080209a263 libcoreclrpal.so`::PAL_InitializeCoreCLR(szExePath=0x0000000801c06040, szCoreCLRPath=0x0000000801c080e0, fStayInPAL=1) + 835 at pal.cpp:658
    frame dotnet/coreclr#19: 0x000000080300dd6f libcoreclr.so`::ExecuteAssembly(exePath=0x0000000801c06040, coreClrPath=0x0000000801c080e0, appDomainFriendlyName=0x000000000040a832, propertyCount=5, propertyKeys=0x00007fffffffdad0, propertyValues=0x00007fffffffdaa0, argc=0, argv=0x0000000000000000, managedAssemblyPath=0x0000000801c060c0, entryPointAssemblyName=0x0000000000000000, entryPointTypeName=0x0000000000000000, entryPointMethodName=0x0000000000000000, exitCode=0x00007fffffffdbac) + 143 at unixinterface.cpp:129
    frame dotnet/coreclr#20: 0x0000000000403d97 corerun`ExecuteManagedAssembly(currentExeAbsolutePath=0x0000000801c06040, clrFilesAbsolutePath=0x0000000801c07040, managedAssemblyAbsolutePath=0x0000000801c060c0, managedAssemblyArgc=0, managedAssemblyArgv=0x0000000000000000) + 4359 at corerun.cpp:366
    frame dotnet/coreclr#21: 0x0000000000404a71 corerun`main(argc=2, argv=0x00007fffffffe660) + 2817 at corerun.cpp:479
    frame dotnet/coreclr#22: 0x00000000004019af corerun`_start + 367
(lldb) 

Looking at that stack-trace, we have something to go by, including line-numbers in thread.cpp.

The thread-attribute sent in (as far as I can see) is PRIVATE:

#define PTHREAD_PROCESS_PRIVATE     0
#define PTHREAD_PROCESS_SHARED      1   

Which also eliminates the issue mentioned in the CI-thread as a source of possible concern.

We can see there's some thread-mutex action going on, which given the lack of CPU-usage nor other resource consumption may suggest a dead-lock, but how on earth can starting a simple managed assembly cause that?

Anyway, trying to reproduce once again, I end up at the same place (two thread mutexes on top of each other), so for now that is my top candidate.

So I guess this isn't much, but I think more than nothing,

@josteink
Copy link
Member

Looking up the code at the address of the thread being started, I got a bit further, but just barely:

$ lldb ./corerun -- HelloWorld.exe
(lldb) target create "./corerun"
Current executable set to './corerun' (x86_64).
(lldb) settings set -- target.run-args  "HelloWorld.exe"
(lldb) process launch
...
<CTRL-C>
    frame dotnet/coreclr#13: 0x00000008021462e9 libcoreclrpal.so`CorUnix::InternalCreateThread(pThread=0x0000000801c20800, lpThreadAttributes=0x0000000000000000, dwStackSize=262144, lpStartAddress=0x0000000802125e00, lpParameter=0x0000000801c49280, dwCreationFlags=0, eThreadType=PalWorkerThread, lpThreadId=0x0000000801c49288, phThread=0x00007fffffffd0e0)(void*), void*, unsigned int, CorUnix::PalThreadType, unsigned int*, void**) + 3801 at thread.cpp:706
    frame dotnet/coreclr#14: 0x000000080211fdca libcoreclrpal.so`CorUnix::CPalSynchronizationManager::StartWorker(pthrCurrent=0x0000000801c20800) + 346 at synchmanager.cpp:1592
    frame dotnet/coreclr#15: 0x000000080211fc43
...
(lldb) image lookup -a 0x0000000802125e00
      Address: libcoreclrpal.so[0x0000000000125e00] (libcoreclrpal.so..text + 1015536)
      Summary: libcoreclrpal.so`CorUnix::CPalSynchronizationManager::WorkerThread(void*) at synchmanager.cpp:1886

This is the WorkedThread-method found in synchmanager.cpp, but from what I can tell, the thread haven't really been started yet, so I doubt that its contents is of too much interest...

So we're pretty much stuck/deadlocked in synchmanager.cpp's StartWorker-method (but the previous stack-trace really told us that already).

The strange thing IMO is that this is before we should have any threads to synchronize at all. I honestly find that pretty odd.

(lldb) thread list
Process 2245 stopped
* thread dotnet/coreclr#1: tid = 100481, 0x00000008014f78ca libthr.so.3, stop reason = signal SIGSTOP

@janvorli
Copy link
Member

Could you also print backtrace of all threads? Using "bt all" command in the lldb. At that point, there should be just one thread, but I'd like to rule out an issue that by some weird way we end up creating many threads.

@josteink
Copy link
Member

Yup. One thread only:

(lldb) bt all
* thread dotnet/coreclr#1: tid = 101525, 0x00000008014f78ca libthr.so.3, stop reason = signal SIGSTOP
  * frame #0: 0x00000008014f78ca libthr.so.3
    frame dotnet/coreclr#1: 0x00000008014f240d libthr.so.3`??? + 109
    frame dotnet/coreclr#2: 0x00000008014f1fca libthr.so.3`pthread_mutex_lock + 74
    frame dotnet/coreclr#3: 0x00000008011e0969 libc.so.7`??? + 57
    frame dotnet/coreclr#4: 0x00000008011cb593 libc.so.7`??? + 179
    frame dotnet/coreclr#5: 0x00000008011ea736 libc.so.7`__calloc + 3734
    frame dotnet/coreclr#6: 0x00000008014f24de libthr.so.3`??? + 318
    frame dotnet/coreclr#7: 0x00000008014f1fca libthr.so.3`pthread_mutex_lock + 74
    frame dotnet/coreclr#8: 0x00000008011e0969 libc.so.7`??? + 57
    frame dotnet/coreclr#9: 0x00000008011cb593 libc.so.7`??? + 179
    frame dotnet/coreclr#10: 0x00000008011ea736 libc.so.7`__calloc + 3734
    frame dotnet/coreclr#11: 0x00000008014f3bda libthr.so.3`??? + 394
    frame dotnet/coreclr#12: 0x00000008014ebe3e libthr.so.3`_pthread_create + 110
    frame dotnet/coreclr#13: 0x00000008021462e9 libcoreclrpal.so`CorUnix::InternalCreateThread(pThread=0x0000000801c20800, lpThreadAttributes=0x0000000000000000, dwStackSize=262144, lpStartAddress=0x0000000802125e00, lpParameter=0x0000000801c49280, dwCreationFlags=0, eThreadType=PalWorkerThread, lpThreadId=0x0000000801c49288, phThread=0x00007fffffffd0e0)(void*), void*, unsigned int, CorUnix::PalThreadType, unsigned int*, void**) + 3801 at thread.cpp:706
    frame dotnet/coreclr#14: 0x000000080211fdca libcoreclrpal.so`CorUnix::CPalSynchronizationManager::StartWorker(pthrCurrent=0x0000000801c20800) + 346 at synchmanager.cpp:1592
    frame dotnet/coreclr#15: 0x000000080211fc43 libcoreclrpal.so`CorUnix::CPalSynchMgrController::StartWorker(pthrCurrent=0x0000000801c20800) + 35 at synchmanager.cpp:112
    frame dotnet/coreclr#16: 0x00000008020990da libcoreclrpal.so`Initialize(argc=1, argv=0x00007fffffffd4d0, flags=255) + 4090 at pal.cpp:449
    frame dotnet/coreclr#17: 0x00000008020980ae libcoreclrpal.so`::PAL_Initialize(argc=1, argv=0x00007fffffffd4d0) + 46 at pal.cpp:143
    frame dotnet/coreclr#18: 0x000000080209a263 libcoreclrpal.so`::PAL_InitializeCoreCLR(szExePath=0x0000000801c06040, szCoreCLRPath=0x0000000801c080e0, fStayInPAL=1) + 835 at pal.cpp:658
    frame dotnet/coreclr#19: 0x000000080300dd6f libcoreclr.so`::ExecuteAssembly(exePath=0x0000000801c06040, coreClrPath=0x0000000801c080e0, appDomainFriendlyName=0x000000000040a832, propertyCount=5, propertyKeys=0x00007fffffffdac0, propertyValues=0x00007fffffffda90, argc=0, argv=0x0000000000000000, managedAssemblyPath=0x0000000801c060c0, entryPointAssemblyName=0x0000000000000000, entryPointTypeName=0x0000000000000000, entryPointMethodName=0x0000000000000000, exitCode=0x00007fffffffdb9c) + 143 at unixinterface.cpp:129
    frame dotnet/coreclr#20: 0x0000000000403d97 corerun`ExecuteManagedAssembly(currentExeAbsolutePath=0x0000000801c06040, clrFilesAbsolutePath=0x0000000801c07040, managedAssemblyAbsolutePath=0x0000000801c060c0, managedAssemblyArgc=0, managedAssemblyArgv=0x0000000000000000) + 4359 at corerun.cpp:366
    frame dotnet/coreclr#21: 0x0000000000404a71 corerun`main(argc=2, argv=0x00007fffffffe658) + 2817 at corerun.cpp:479
    frame dotnet/coreclr#22: 0x00000000004019af corerun`_start + 367
(lldb) 

After trying this X number of times now, it consistently freezes at this point of code though, so there's definitely something weird going on in (or prior to) StartWorker in synchmanager.cpp.

@jkotas
Copy link
Member

jkotas commented Apr 26, 2015

As an experiment - you can try passing in NULL into pthread_create. One of the possible reasons for crash like this one is that there is something wrong with the attributes used to create the thread - if it works, we just need to find out which attribute is source of the crash.

kangaroo referenced this issue in kangaroo/coreclr Apr 26, 2015
Fixes issue #801 which was caused by the CoreCLR and PAL being
dlopened, but pthread not being initialized by the runtime linker yet.
FreeBSD requires that the main executable load pthread, rather than a
downstream dependency.  Also fixed a few missing calls to pthread_attr_init
which was causing asserts() on FBSD.
@josteink
Copy link
Member

That's fantastic @kangaroo.

I can confirm this fixes the initial threading issue. I still have problems running HelloWorld.exe, but now it gets way past the WorkerThread spin-up in syncmanager, and we have lots of threads playing around by the time we break it in the debugger.

Giving this another spin in LLDB gives us actual threads and from what I can see managed exceptions:

lldb ./corerun -- -c . HelloWorld.exe 
(lldb) target create "./corerun"
Current executable set to './corerun' (x86_64).
(lldb) settings set -- target.run-args  "-c" "." "HelloWorld.exe"
(lldb) process launch
Process 54350 launching
Process 54350 launched: './corerun' (x86_64)
(lldb) c
error: Process is running.  Use 'process interrupt' to pause execution.
Process 54350 stopped
* thread dotnet/coreclr#4: tid = 101566, 0x00000008013cc6ca libc.so.7`__sys_thr_kill + 10, stop reason = signal SIGUSR1
    frame #0: 0x00000008013cc6ca libc.so.7`__sys_thr_kill + 10
libc.so.7`__sys_thr_kill:
->  0x8013cc6ca <+10>: jb     0x8014a432c
    0x8013cc6d0 <+16>: retq   
    0x8013cc6d1:       nop    
    0x8013cc6d2:       nop    
(lldb) bt all
  thread dotnet/coreclr#6: tid = 101567, 0x000000080083c8cc libthr.so.3
    frame #0: 0x000000080083c8cc libthr.so.3
    frame dotnet/coreclr#1: 0x000000080083ad8e libthr.so.3`??? + 606
    frame dotnet/coreclr#2: 0x00000008021222f3 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(ptnwdNativeWaitData=0x0000000801c270f8, dwTimeout=4294967295, ptwrWakeupReason=0x00007fffff3f9494, pdwSignaledObject=0x00007fffff3f9490) + 691 at synchmanager.cpp:479
    frame dotnet/coreclr#3: 0x00000008021214e6 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::BlockThread(this=0x0000000801c4d280, pthrCurrent=0x0000000801c27000, dwTimeout=4294967295, fAlertable=false, fIsSleep=false, ptwrWakeupReason=0x00007fffff3f96d8, pdwSignaledObject=0x00007fffff3f96ec) + 1686 at synchmanager.cpp:299
    frame dotnet/coreclr#4: 0x0000000802135341 libcoreclrpal.so`CorUnix::InternalWaitForMultipleObjectsEx(pThread=0x0000000801c27000, nCount=1, lpHandles=0x0000000803f64928, bWaitAll=0, dwMilliseconds=4294967295, bAlertable=0) + 4161 at wait.cpp:493
    frame dotnet/coreclr#5: 0x00000008021361d1 libcoreclrpal.so`::WaitForMultipleObjectsEx(nCount=1, lpHandles=0x0000000803f64928, bWaitAll=0, dwMilliseconds=4294967295, bAlertable=0) + 273 at wait.cpp:159
    frame dotnet/coreclr#6: 0x000000080324d953 libcoreclr.so`FinalizerThread::WaitForFinalizerEvent(event=0x0000000801ce5418) + 387 at finalizerthread.cpp:509
    frame dotnet/coreclr#7: 0x000000080324dd95 libcoreclr.so`FinalizerThread::FinalizerThreadWorker(args=0x00007fffff3f9d38) + 277 at finalizerthread.cpp:640
    frame dotnet/coreclr#8: 0x0000000803194fa9 libcoreclr.so`ManagedThreadBase_DispatchInner(pCallState=0x00007fffff3f9d38) + 201 at threads.cpp:10043
    frame dotnet/coreclr#9: 0x000000080319adf2 libcoreclr.so`ManagedThreadBase_DispatchMiddle(pCallState=0x00007fffff3f9d38) + 130 at threads.cpp:10094
    frame dotnet/coreclr#10: 0x000000080319aa9d libcoreclr.so`ManagedThreadBase_DispatchOuter(this=0x00007fffff3f9c30, pParam=0x00007fffff3f9ce0)::$_5::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) const::TryArgs*) const::'lambda'(Param*)::operator()(Param*) const + 29 at threads.cpp:10348
    frame dotnet/coreclr#11: 0x000000080319a8f8 libcoreclr.so`ManagedThreadBase_DispatchOuter(this=0x00007fffff3f9c98, pArgs=0x00007fffff3f9cb0)::$_5::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const + 40 at threads.cpp:10350
    frame dotnet/coreclr#12: 0x0000000803194cd1 libcoreclr.so`ManagedThreadBase_DispatchOuter(pCallState=0x00007fffff3f9d38) + 225 at threads.cpp:10391
    frame dotnet/coreclr#13: 0x0000000803194be0 libcoreclr.so`ManagedThreadBase_NoADTransition(pTarget=0x000000080324dc80, filterType=FinalizerThread)(void*), UnhandledExceptionLocation) + 80 at threads.cpp:10436
    frame dotnet/coreclr#14: 0x0000000803194e6a libcoreclr.so`ManagedThreadBase::FinalizerBase(pTarget=0x000000080324dc80)(void*)) + 26 at threads.cpp:10462
    frame dotnet/coreclr#15: 0x000000080324e5ff libcoreclr.so`FinalizerThread::FinalizerThreadStart(this=0x00007fffff3f9e60, __pEEParam=0x00007fffff3f9e70)::$_0::operator()(FinalizerThread::FinalizerThreadStart(void*)::__EEParam*) const + 95 at finalizerthread.cpp:834
    frame dotnet/coreclr#16: 0x000000080324e3d8 libcoreclr.so`FinalizerThread::FinalizerThreadStart(args=0x0000000000000000) + 488 at finalizerthread.cpp:914
    frame dotnet/coreclr#17: 0x0000000803187158 libcoreclr.so`Thread::intermediateThreadProc(arg=0x0000000801ce54a8) + 152 at threads.cpp:2861
    frame dotnet/coreclr#18: 0x0000000802147039 libcoreclrpal.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000801c27000) + 1401 at thread.cpp:1532
    frame dotnet/coreclr#19: 0x00000008008314f5 libthr.so.3`??? + 277

  thread dotnet/coreclr#4: tid = 101566, 0x000000080083c8cc libthr.so.3
    frame #0: 0x000000080083c8cc libthr.so.3
    frame dotnet/coreclr#1: 0x000000080083ad8e libthr.so.3`??? + 606
    frame dotnet/coreclr#2: 0x00000008021222f3 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(ptnwdNativeWaitData=0x0000000801c268f8, dwTimeout=4294967295, ptwrWakeupReason=0x00007fffff5fa884, pdwSignaledObject=0x00007fffff5fa880) + 691 at synchmanager.cpp:479
    frame dotnet/coreclr#3: 0x00000008021214e6 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::BlockThread(this=0x0000000801c4d280, pthrCurrent=0x0000000801c26800, dwTimeout=4294967295, fAlertable=false, fIsSleep=false, ptwrWakeupReason=0x00007fffff5faac8, pdwSignaledObject=0x00007fffff5faadc) + 1686 at synchmanager.cpp:299
    frame dotnet/coreclr#4: 0x0000000802135341 libcoreclrpal.so`CorUnix::InternalWaitForMultipleObjectsEx(pThread=0x0000000801c26800, nCount=3, lpHandles=0x00007fffff5fadd0, bWaitAll=0, dwMilliseconds=4294967295, bAlertable=0) + 4161 at wait.cpp:493
    frame dotnet/coreclr#5: 0x00000008021361d1 libcoreclrpal.so`::WaitForMultipleObjectsEx(nCount=3, lpHandles=0x00007fffff5fadd0, bWaitAll=0, dwMilliseconds=4294967295, bAlertable=0) + 273 at wait.cpp:159
    frame dotnet/coreclr#6: 0x00000008036d6b53 libcoreclr.so`DebuggerRCThread::MainLoop(this=0x0000000801d30018) + 419 at rcthread.cpp:1240
    frame dotnet/coreclr#7: 0x00000008036d6952 libcoreclr.so`DebuggerRCThread::ThreadProc(this=0x0000000801d30018) + 1570 at rcthread.cpp:1043
    frame dotnet/coreclr#8: 0x00000008036d5f82 libcoreclr.so`DebuggerRCThread::ThreadProcStatic((null)=0x0000000000000000) + 306 at rcthread.cpp:1639
    frame dotnet/coreclr#9: 0x0000000802147039 libcoreclrpal.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000801c26800) + 1401 at thread.cpp:1532
    frame dotnet/coreclr#10: 0x00000008008314f5 libthr.so.3`??? + 277

  thread dotnet/coreclr#5: tid = 101565, 0x000000080149b90a libc.so.7`_open + 10
    frame #0: 0x000000080149b90a libc.so.7`_open + 10
    frame dotnet/coreclr#1: 0x0000000800833745 libthr.so.3`__open + 213
    frame dotnet/coreclr#2: 0x00000008036edfb3 libcoreclr.so`TwoWayPipe::WaitForConnection(this=0x0000000801ccb138) + 179 at twowaypipe.cpp:106
    frame dotnet/coreclr#3: 0x00000008036dd9cd libcoreclr.so`DbgTransportSession::TransportWorker(this=0x0000000801ccafd8) + 349 at dbgtransportsession.cpp:1271
    frame dotnet/coreclr#4: 0x00000008036db4b5 libcoreclr.so`DbgTransportSession::TransportWorkerStatic(pvContext=0x0000000801ccafd8) + 21 at dbgtransportsession.cpp:1188
    frame dotnet/coreclr#5: 0x0000000802147039 libcoreclrpal.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000801c26000) + 1401 at thread.cpp:1532
    frame dotnet/coreclr#6: 0x00000008008314f5 libthr.so.3`??? + 277

  thread dotnet/coreclr#2: tid = 101564, 0x00000008013cba8a libc.so.7`_sigwait + 10
    frame #0: 0x00000008013cba8a libc.so.7`_sigwait + 10
    frame dotnet/coreclr#1: 0x00000008008362ba libthr.so.3`sigwait + 90
    frame dotnet/coreclr#2: 0x000000080206bd03 libcoreclrpal.so`ExternalSignalHandlerThreadRoutine((null)=0x0000000000000000) + 531 at signal.cpp:689
    frame dotnet/coreclr#3: 0x0000000802147039 libcoreclrpal.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000801c25800) + 1401 at thread.cpp:1532
    frame dotnet/coreclr#4: 0x00000008008314f5 libthr.so.3`??? + 277

  thread dotnet/coreclr#3: tid = 101563, 0x000000080148304a libc.so.7`__sys_kevent + 10
    frame #0: 0x000000080148304a libc.so.7`__sys_kevent + 10
    frame dotnet/coreclr#1: 0x0000000802129542 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::ReadBytesFromProcessPipe(this=0x0000000801c4d280, iTimeout=-1, pRecvBuf=0x00007fffffbfdb57, iBytes=1) + 994 at synchmanager.cpp:2386
    frame dotnet/coreclr#2: 0x000000080212811b libcoreclrpal.so`CorUnix::CPalSynchronizationManager::ReadCmdFromProcessPipe(this=0x0000000801c4d280, iPollTimeout=-1, pswcWorkerCmd=0x00007fffffbfdeac, pshridMarshaledData=0x00007fffffbfdea0, pdwData=0x00007fffffbfde9c) + 1195 at synchmanager.cpp:2199
    frame dotnet/coreclr#3: 0x0000000802125c04 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::WorkerThread(pArg=0x0000000801c4d280) + 132 at synchmanager.cpp:1903
    frame dotnet/coreclr#4: 0x0000000802147039 libcoreclrpal.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000801c25000) + 1401 at thread.cpp:1532
    frame dotnet/coreclr#5: 0x00000008008314f5 libthr.so.3`??? + 277

* thread dotnet/coreclr#1: tid = 100641, 0x00000008014a29ba libc.so.7`__sys_sigprocmask + 10, stop reason = signal SIGSTOP
  * frame #0: 0x00000008014a29ba libc.so.7`__sys_sigprocmask + 10
    frame dotnet/coreclr#1: 0x0000000800835c46 libthr.so.3`_sigprocmask + 38
    frame dotnet/coreclr#2: 0x000000080259cfe8 libunwind.so.8`??? + 72
    frame dotnet/coreclr#3: 0x000000080259e9be libunwind.so.8`??? + 14
    frame dotnet/coreclr#4: 0x000000080259a9e6 libunwind.so.8`_ULx86_64_step + 54
    frame dotnet/coreclr#5: 0x0000000802069a51 libcoreclrpal.so`PAL_VirtualUnwind(context=0x00007fffffffcde0, contextPointers=0x0000000000000000) + 129 at seh-unwind.cpp:171
    frame dotnet/coreclr#6: 0x0000000803164fce libcoreclr.so`Thread::VirtualUnwindToFirstManagedCallFrame(pContext=0x00007fffffffcde0) + 78 at stackwalk.cpp:782
    frame dotnet/coreclr#7: 0x0000000803246be9 libcoreclr.so`UnwindManagedExceptionPass1(ex=0x0000000801e97a78) + 89 at exceptionhandling.cpp:4481
    frame dotnet/coreclr#8: 0x00000008032470b1 libcoreclr.so`DispatchManagedException(ex=0x0000000801e97a78) + 17 at exceptionhandling.cpp:4613
    frame dotnet/coreclr#9: 0x0000000803060785 libcoreclr.so`CorHost2::ExecuteAssembly(this=0x0000000801c17528, dwAppDomainId=1, pwzAssemblyPath=0x0000000801eb3718, argc=0, argv=0x0000000000000000, pReturnValue=0x00007fffffffdb6c) + 1637 at corhost.cpp:1304
    frame dotnet/coreclr#10: 0x000000080300e3b5 libcoreclr.so`::ExecuteAssembly(exePath=0x0000000801c16080, coreClrPath=0x0000000801c1a0e0, appDomainFriendlyName=0x000000000040a858, propertyCount=5, propertyKeys=0x00007fffffffda90, propertyValues=0x00007fffffffda60, argc=0, argv=0x0000000000000000, managedAssemblyPath=0x0000000801c160c0, entryPointAssemblyName=0x0000000000000000, entryPointTypeName=0x0000000000000000, entryPointMethodName=0x0000000000000000, exitCode=0x00007fffffffdb6c) + 1829 at unixinterface.cpp:196
    frame dotnet/coreclr#11: 0x0000000000403d97 corerun`ExecuteManagedAssembly(currentExeAbsolutePath=0x0000000801c16080, clrFilesAbsolutePath=0x0000000801c19040, managedAssemblyAbsolutePath=0x0000000801c160c0, managedAssemblyArgc=0, managedAssemblyArgv=0x0000000000000000) + 4359 at corerun.cpp:366
    frame dotnet/coreclr#12: 0x0000000000404a67 corerun`corerun(argc=4, argv=0x00007fffffffe648) + 2807 at corerun.cpp:479
    frame dotnet/coreclr#13: 0x0000000000404b42 corerun`main(argc=4, argv=0x00007fffffffe648) + 34 at corerun.cpp:490
    frame dotnet/coreclr#14: 0x00000000004019af corerun`_start + 367
(lldb) thread backtrace 
* thread dotnet/coreclr#1: tid = 100641, 0x00000008014a29ba libc.so.7`__sys_sigprocmask + 10, stop reason = signal SIGSTOP
  * frame #0: 0x00000008014a29ba libc.so.7`__sys_sigprocmask + 10
    frame dotnet/coreclr#1: 0x0000000800835c46 libthr.so.3`_sigprocmask + 38
    frame dotnet/coreclr#2: 0x000000080259cfe8 libunwind.so.8`??? + 72
    frame dotnet/coreclr#3: 0x000000080259e9be libunwind.so.8`??? + 14
    frame dotnet/coreclr#4: 0x000000080259a9e6 libunwind.so.8`_ULx86_64_step + 54
    frame dotnet/coreclr#5: 0x0000000802069a51 libcoreclrpal.so`PAL_VirtualUnwind(context=0x00007fffffffcde0, contextPointers=0x0000000000000000) + 129 at seh-unwind.cpp:171
    frame dotnet/coreclr#6: 0x0000000803164fce libcoreclr.so`Thread::VirtualUnwindToFirstManagedCallFrame(pContext=0x00007fffffffcde0) + 78 at stackwalk.cpp:782
    frame dotnet/coreclr#7: 0x0000000803246be9 libcoreclr.so`UnwindManagedExceptionPass1(ex=0x0000000801e97a78) + 89 at exceptionhandling.cpp:4481
    frame dotnet/coreclr#8: 0x00000008032470b1 libcoreclr.so`DispatchManagedException(ex=0x0000000801e97a78) + 17 at exceptionhandling.cpp:4613
    frame dotnet/coreclr#9: 0x0000000803060785 libcoreclr.so`CorHost2::ExecuteAssembly(this=0x0000000801c17528, dwAppDomainId=1, pwzAssemblyPath=0x0000000801eb3718, argc=0, argv=0x0000000000000000, pReturnValue=0x00007fffffffdb6c) + 1637 at corhost.cpp:1304
    frame dotnet/coreclr#10: 0x000000080300e3b5 libcoreclr.so`::ExecuteAssembly(exePath=0x0000000801c16080, coreClrPath=0x0000000801c1a0e0, appDomainFriendlyName=0x000000000040a858, propertyCount=5, propertyKeys=0x00007fffffffda90, propertyValues=0x00007fffffffda60, argc=0, argv=0x0000000000000000, managedAssemblyPath=0x0000000801c160c0, entryPointAssemblyName=0x0000000000000000, entryPointTypeName=0x0000000000000000, entryPointMethodName=0x0000000000000000, exitCode=0x00007fffffffdb6c) + 1829 at unixinterface.cpp:196
    frame dotnet/coreclr#11: 0x0000000000403d97 corerun`ExecuteManagedAssembly(currentExeAbsolutePath=0x0000000801c16080, clrFilesAbsolutePath=0x0000000801c19040, managedAssemblyAbsolutePath=0x0000000801c160c0, managedAssemblyArgc=0, managedAssemblyArgv=0x0000000000000000) + 4359 at corerun.cpp:366
    frame dotnet/coreclr#12: 0x0000000000404a67 corerun`corerun(argc=4, argv=0x00007fffffffe648) + 2807 at corerun.cpp:479
    frame dotnet/coreclr#13: 0x0000000000404b42 corerun`main(argc=4, argv=0x00007fffffffe648) + 34 at corerun.cpp:490
    frame dotnet/coreclr#14: 0x00000000004019af corerun`_start + 367
(lldb) thread list 
Process 54350 stopped
  thread dotnet/coreclr#6: tid = 101567, 0x000000080083c8cc libthr.so.3
  thread dotnet/coreclr#4: tid = 101566, 0x000000080083c8cc libthr.so.3
  thread dotnet/coreclr#5: tid = 101565, 0x000000080149b90a libc.so.7`_open + 10
  thread dotnet/coreclr#2: tid = 101564, 0x00000008013cba8a libc.so.7`_sigwait + 10
  thread dotnet/coreclr#3: tid = 101563, 0x000000080148304a libc.so.7`__sys_kevent + 10
* thread dotnet/coreclr#1: tid = 100641, 0x00000008014a29ba libc.so.7`__sys_sigprocmask + 10, stop reason = signal SIGSTOP
(lldb) 

Does this mean we're now actually at a point where we're running managed code? That would be great, but I'm not sure what in HelloWorld.exe is capable of creating managed exceptions :)

Edit: Also worth mentioning that it responds to SIGINT now, and don't need a SIGKILL to be aborted.

@kangaroo
Copy link
Contributor

@josteink Yes, there are issues with exceptions and unwinding. I didn't bother looking into them yet. You might want to try linking libunwind into unixcorerun as well and seeing if that helps.

@josteink
Copy link
Member

@kangaroo That didn't seem to help. In the debugger it seemed to stop at the same places.

That said I've noticed startup is slow. Too slow (I think) for something like coreclr on this beefy hardware.

Using @janvorli's trace-tricks, I got the following log. It's LONG so I've uploaded it as a separate file: http://invalid.ed.ntnu.no/~jostein/pallog.txt

What I notice is that there's a million invocations of COMPlus code. That can't possibly be correct?

Edit: The trace contains some really beefy meat. Look at the following entries:

{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.573: Allocated 0 TCHARs. Don't forget to call LocalFree to free the memory when done.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Growing the buffer.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding U to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Growing the buffer.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding n to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding a to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Growing the buffer.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding b to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding l to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding e to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding   to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Growing the buffer.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding t to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding o to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding   to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding l to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding o to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding a to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding d to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding   to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Growing the buffer.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding D to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding L to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding L to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding   to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding ' to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding l to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding i to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding b to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding c to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding ' to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding : to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.876: Adding   to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding E to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding r to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding r to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding o to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Growing the buffer.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding r to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding   to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding 1 to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding 2 to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding 6 to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding   to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding ( to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding E to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding x to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding c to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding e to the string.
{0x802406400-0x8028bffb0} TRACE [MISC   ] at FormatMessageW.809: Adding p to the string.

So we got corerun trying to access COMPlus on FreeBSD combined with "Unable to load libc". That seems like quite significant show-stoppers.

@janvorli
Copy link
Member

@josteink

What I notice is that there's a million invocations of COMPlus code. That can't possibly be correct?

The COMPlusxxx messages mean that we were checking a config value from an environment variable.
COMPlus is a historical name for .NET.

@janvorli
Copy link
Member

Unable to load libc

I can see that there was this change made for FreeBSD:

#if defined(__APPLE__)
        ShortAsciiName = "libc.dylib";
#elif defined(__FreeBSD__)
        ShortAsciiName = "libc.so";
#else
        ShortAsciiName = LIBC_SO;
#endif

I wonder if the issue is that the libc.so is not the actual shared library, but a LD script pointing to the real one, e.g. libc.so.6. That would cause the problem and that's the reason why Linux uses the LIBC_SO that contains the real name.
On Ubuntu, the libc.so file contains this:

/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/x86_64-linux-gnu/libc_nonshared.a  AS_NEEDED ( /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ) )

@josteink
Copy link
Member

@janvorli Bingo!

$ find / -name 'libc.so*'
/lib/libc.so.7
/usr/lib/libc.so
$

I'll just try replacing the PAL-code with libc.so.7 and see what happens. I'm curious if that's enough to get us going, or if there are more stuff waiting :)

As for COMPlus... I didn't know that. I'll have to start bad-mouthing DCOM instead from now on :)

Edit: YES!!!

$ ./corerun HelloWorld.exe
04/27/2015 17:50:28
$ ./corerun HelloWorld.exe
04/27/2015 17:51:43
$  

@RussellHaley
Copy link

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
os-freebsd FreeBSD OS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants