-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoreCLR freezes with no output on FreeBSD #4172
Comments
There are two things you can try now.
export PAL_API_LEVELS=1
export PAL_DBG_CHANNELS=+all.all:-VIRTUAL.TRACE This should give us some idea on how far it got before the hang. |
@janvorli I had already asked him to run with PAL_DBG_CHANNELS and evidently there was no output. I think the next step (since it also hangs in gdb due to the sigprocmask) is to run with a dtrace entry probe. |
I had some time to run the PAL test suite:
Didn't have time to build up the managed bits before bed. I'll check out the hang from the airport tomorrow. |
Thanks for looking into that @kangaroo. Appreciated. |
Here's the PAL trace: https://gist.github.com/kangaroo/2d1e1640a781eb499cf1 |
I eagerly await your fix. |
@janhenke Would you consider posting the compiled exe and depending DLLs/so-files for others to test as well? I don't have a Windows machine availble to build my own mscoree/mscorwks and I suspect that applies to thers who may be able to help too. Edit: I worked around this by sneakily coopying your files off the VM, but I guess this still applies to others. |
That exe was not meant to be distributed. For everyone please follow the freebsd instructions linked from the README.md in the project root. It contains instructions to download and compile a HelloWorld example. |
I'm able to attach using LLDB, but it seems I do have a lot to learn about it. lldb ./corerun
(lldb) target create "./corerun"
Current executable set to './corerun' (x86_64).
(lldb) settings set target.disable-stdio true
(lldb) process launch --stop-at-entry -- -c . HelloWorld.exe
Process 55773 launching
Process 55773 launched: './corerun' (x86_64)
Process 55773 stopped
(lldb) c
Process 55773 resuming
# hangs until CTRL-Z is pressed I thought if I disabled stdio, I should be able to command the debugger even while the program is ruunning (or in our case) acting up? |
@kangaroo I'm not sure if its a coincidence or not, but I see your PAL traces hangs after two things have happened:
Regarding shared memory, pthreads and FreeBSD, there was a comment on the CI-issue , also asking if there had been any changes on this front. I'm still curious when the original shmemory.cpp file was created, and under what conditions the comments there were written. It might be a coincidence. It might be our PAL pthread-code being setup incorrectly. But I also think it's worth double-checking. Anyone inside Microsoft with access to the original TFS-source able to check out when this file and comment was originally written? If for nothing else, just to leave that theory dead. @jkotas? Can you help with this? |
The original shmemory.cpp was created in 2001 and this comment was included in the initial version of the file. It was written in the context of state of the art FreeBSD at that time. |
@jkotas Thanks for the input. I guess that leaves that issue dead for now, which is good. Using some n00bed up LLDB-attempts, I managed to reproduce about the same things @kangaroo did in his PAL trace-logs: $ lldb ./corerun -- HelloWorld.exe
(lldb) target create "./corerun"
Current executable set to './corerun' (x86_64).
(lldb) settings set -- target.run-args "HelloWorld.exe"
(lldb) process launch
Process 97457 launching
Process 97457 launched: './corerun' (x86_64)
<CTRL-C>
Process 97457 stopped
* thread dotnet/coreclr#1: tid = 100516, 0x00000008014f78ca libthr.so.3, stop reason = signal SIGSTOP
frame #0: 0x00000008014f78ca libthr.so.3
-> 0x8014f78ca: syscall
0x8014f78cc: retq
0x8014f78cd: nop
0x8014f78ce: nop
(lldb) l
406 {
407 const char* clrFilesPath;
408 const char* managedAssemblyPath;
409 const char** managedAssemblyArgv;
410 int managedAssemblyArgc;
411
412 if (!ParseArguments(
413 argc,
414 argv,
415 &clrFilesPath,
(lldb) thread backtrace
* thread dotnet/coreclr#1: tid = 100516, 0x00000008014f78ca libthr.so.3, stop reason = signal SIGSTOP
* frame #0: 0x00000008014f78ca libthr.so.3
frame dotnet/coreclr#1: 0x00000008014f240d libthr.so.3`??? + 109
frame dotnet/coreclr#2: 0x00000008014f1fca libthr.so.3`pthread_mutex_lock + 74
frame dotnet/coreclr#3: 0x00000008011e0969 libc.so.7`??? + 57
frame dotnet/coreclr#4: 0x00000008011cb593 libc.so.7`??? + 179
frame dotnet/coreclr#5: 0x00000008011ea736 libc.so.7`__calloc + 3734
frame dotnet/coreclr#6: 0x00000008014f24de libthr.so.3`??? + 318
frame dotnet/coreclr#7: 0x00000008014f1fca libthr.so.3`pthread_mutex_lock + 74
frame dotnet/coreclr#8: 0x00000008011e0969 libc.so.7`??? + 57
frame dotnet/coreclr#9: 0x00000008011cb593 libc.so.7`??? + 179
frame dotnet/coreclr#10: 0x00000008011ea736 libc.so.7`__calloc + 3734
frame dotnet/coreclr#11: 0x00000008014f3bda libthr.so.3`??? + 394
frame dotnet/coreclr#12: 0x00000008014ebe3e libthr.so.3`_pthread_create + 110
frame dotnet/coreclr#13: 0x00000008021462e9 libcoreclrpal.so`CorUnix::InternalCreateThread(pThread=0x0000000801c20800, lpThreadAttributes=0x0000000000000000, dwStackSize=262144, lpStartAddress=0x0000000802125e00, lpParameter=0x0000000801c49280, dwCreationFlags=0, eThreadType=PalWorkerThread, lpThreadId=0x0000000801c49288, phThread=0x00007fffffffd0f0)(void*), void*, unsigned int, CorUnix::PalThreadType, unsigned int*, void**) + 3801 at thread.cpp:706
frame dotnet/coreclr#14: 0x000000080211fdca libcoreclrpal.so`CorUnix::CPalSynchronizationManager::StartWorker(pthrCurrent=0x0000000801c20800) + 346 at synchmanager.cpp:1592
frame dotnet/coreclr#15: 0x000000080211fc43 libcoreclrpal.so`CorUnix::CPalSynchMgrController::StartWorker(pthrCurrent=0x0000000801c20800) + 35 at synchmanager.cpp:112
frame dotnet/coreclr#16: 0x00000008020990da libcoreclrpal.so`Initialize(argc=1, argv=0x00007fffffffd4e0, flags=255) + 4090 at pal.cpp:449
frame dotnet/coreclr#17: 0x00000008020980ae libcoreclrpal.so`::PAL_Initialize(argc=1, argv=0x00007fffffffd4e0) + 46 at pal.cpp:143
frame dotnet/coreclr#18: 0x000000080209a263 libcoreclrpal.so`::PAL_InitializeCoreCLR(szExePath=0x0000000801c06040, szCoreCLRPath=0x0000000801c080e0, fStayInPAL=1) + 835 at pal.cpp:658
frame dotnet/coreclr#19: 0x000000080300dd6f libcoreclr.so`::ExecuteAssembly(exePath=0x0000000801c06040, coreClrPath=0x0000000801c080e0, appDomainFriendlyName=0x000000000040a832, propertyCount=5, propertyKeys=0x00007fffffffdad0, propertyValues=0x00007fffffffdaa0, argc=0, argv=0x0000000000000000, managedAssemblyPath=0x0000000801c060c0, entryPointAssemblyName=0x0000000000000000, entryPointTypeName=0x0000000000000000, entryPointMethodName=0x0000000000000000, exitCode=0x00007fffffffdbac) + 143 at unixinterface.cpp:129
frame dotnet/coreclr#20: 0x0000000000403d97 corerun`ExecuteManagedAssembly(currentExeAbsolutePath=0x0000000801c06040, clrFilesAbsolutePath=0x0000000801c07040, managedAssemblyAbsolutePath=0x0000000801c060c0, managedAssemblyArgc=0, managedAssemblyArgv=0x0000000000000000) + 4359 at corerun.cpp:366
frame dotnet/coreclr#21: 0x0000000000404a71 corerun`main(argc=2, argv=0x00007fffffffe660) + 2817 at corerun.cpp:479
frame dotnet/coreclr#22: 0x00000000004019af corerun`_start + 367
(lldb) Looking at that stack-trace, we have something to go by, including line-numbers in thread.cpp. The thread-attribute sent in (as far as I can see) is PRIVATE: #define PTHREAD_PROCESS_PRIVATE 0
#define PTHREAD_PROCESS_SHARED 1 Which also eliminates the issue mentioned in the CI-thread as a source of possible concern. We can see there's some thread-mutex action going on, which given the lack of CPU-usage nor other resource consumption may suggest a dead-lock, but how on earth can starting a simple managed assembly cause that? Anyway, trying to reproduce once again, I end up at the same place (two thread mutexes on top of each other), so for now that is my top candidate. So I guess this isn't much, but I think more than nothing, |
Looking up the code at the address of the thread being started, I got a bit further, but just barely: $ lldb ./corerun -- HelloWorld.exe
(lldb) target create "./corerun"
Current executable set to './corerun' (x86_64).
(lldb) settings set -- target.run-args "HelloWorld.exe"
(lldb) process launch
...
<CTRL-C>
frame dotnet/coreclr#13: 0x00000008021462e9 libcoreclrpal.so`CorUnix::InternalCreateThread(pThread=0x0000000801c20800, lpThreadAttributes=0x0000000000000000, dwStackSize=262144, lpStartAddress=0x0000000802125e00, lpParameter=0x0000000801c49280, dwCreationFlags=0, eThreadType=PalWorkerThread, lpThreadId=0x0000000801c49288, phThread=0x00007fffffffd0e0)(void*), void*, unsigned int, CorUnix::PalThreadType, unsigned int*, void**) + 3801 at thread.cpp:706
frame dotnet/coreclr#14: 0x000000080211fdca libcoreclrpal.so`CorUnix::CPalSynchronizationManager::StartWorker(pthrCurrent=0x0000000801c20800) + 346 at synchmanager.cpp:1592
frame dotnet/coreclr#15: 0x000000080211fc43
...
(lldb) image lookup -a 0x0000000802125e00
Address: libcoreclrpal.so[0x0000000000125e00] (libcoreclrpal.so..text + 1015536)
Summary: libcoreclrpal.so`CorUnix::CPalSynchronizationManager::WorkerThread(void*) at synchmanager.cpp:1886 This is the WorkedThread-method found in synchmanager.cpp, but from what I can tell, the thread haven't really been started yet, so I doubt that its contents is of too much interest... So we're pretty much stuck/deadlocked in synchmanager.cpp's StartWorker-method (but the previous stack-trace really told us that already). The strange thing IMO is that this is before we should have any threads to synchronize at all. I honestly find that pretty odd. (lldb) thread list
Process 2245 stopped
* thread dotnet/coreclr#1: tid = 100481, 0x00000008014f78ca libthr.so.3, stop reason = signal SIGSTOP |
Could you also print backtrace of all threads? Using "bt all" command in the lldb. At that point, there should be just one thread, but I'd like to rule out an issue that by some weird way we end up creating many threads. |
Yup. One thread only: (lldb) bt all
* thread dotnet/coreclr#1: tid = 101525, 0x00000008014f78ca libthr.so.3, stop reason = signal SIGSTOP
* frame #0: 0x00000008014f78ca libthr.so.3
frame dotnet/coreclr#1: 0x00000008014f240d libthr.so.3`??? + 109
frame dotnet/coreclr#2: 0x00000008014f1fca libthr.so.3`pthread_mutex_lock + 74
frame dotnet/coreclr#3: 0x00000008011e0969 libc.so.7`??? + 57
frame dotnet/coreclr#4: 0x00000008011cb593 libc.so.7`??? + 179
frame dotnet/coreclr#5: 0x00000008011ea736 libc.so.7`__calloc + 3734
frame dotnet/coreclr#6: 0x00000008014f24de libthr.so.3`??? + 318
frame dotnet/coreclr#7: 0x00000008014f1fca libthr.so.3`pthread_mutex_lock + 74
frame dotnet/coreclr#8: 0x00000008011e0969 libc.so.7`??? + 57
frame dotnet/coreclr#9: 0x00000008011cb593 libc.so.7`??? + 179
frame dotnet/coreclr#10: 0x00000008011ea736 libc.so.7`__calloc + 3734
frame dotnet/coreclr#11: 0x00000008014f3bda libthr.so.3`??? + 394
frame dotnet/coreclr#12: 0x00000008014ebe3e libthr.so.3`_pthread_create + 110
frame dotnet/coreclr#13: 0x00000008021462e9 libcoreclrpal.so`CorUnix::InternalCreateThread(pThread=0x0000000801c20800, lpThreadAttributes=0x0000000000000000, dwStackSize=262144, lpStartAddress=0x0000000802125e00, lpParameter=0x0000000801c49280, dwCreationFlags=0, eThreadType=PalWorkerThread, lpThreadId=0x0000000801c49288, phThread=0x00007fffffffd0e0)(void*), void*, unsigned int, CorUnix::PalThreadType, unsigned int*, void**) + 3801 at thread.cpp:706
frame dotnet/coreclr#14: 0x000000080211fdca libcoreclrpal.so`CorUnix::CPalSynchronizationManager::StartWorker(pthrCurrent=0x0000000801c20800) + 346 at synchmanager.cpp:1592
frame dotnet/coreclr#15: 0x000000080211fc43 libcoreclrpal.so`CorUnix::CPalSynchMgrController::StartWorker(pthrCurrent=0x0000000801c20800) + 35 at synchmanager.cpp:112
frame dotnet/coreclr#16: 0x00000008020990da libcoreclrpal.so`Initialize(argc=1, argv=0x00007fffffffd4d0, flags=255) + 4090 at pal.cpp:449
frame dotnet/coreclr#17: 0x00000008020980ae libcoreclrpal.so`::PAL_Initialize(argc=1, argv=0x00007fffffffd4d0) + 46 at pal.cpp:143
frame dotnet/coreclr#18: 0x000000080209a263 libcoreclrpal.so`::PAL_InitializeCoreCLR(szExePath=0x0000000801c06040, szCoreCLRPath=0x0000000801c080e0, fStayInPAL=1) + 835 at pal.cpp:658
frame dotnet/coreclr#19: 0x000000080300dd6f libcoreclr.so`::ExecuteAssembly(exePath=0x0000000801c06040, coreClrPath=0x0000000801c080e0, appDomainFriendlyName=0x000000000040a832, propertyCount=5, propertyKeys=0x00007fffffffdac0, propertyValues=0x00007fffffffda90, argc=0, argv=0x0000000000000000, managedAssemblyPath=0x0000000801c060c0, entryPointAssemblyName=0x0000000000000000, entryPointTypeName=0x0000000000000000, entryPointMethodName=0x0000000000000000, exitCode=0x00007fffffffdb9c) + 143 at unixinterface.cpp:129
frame dotnet/coreclr#20: 0x0000000000403d97 corerun`ExecuteManagedAssembly(currentExeAbsolutePath=0x0000000801c06040, clrFilesAbsolutePath=0x0000000801c07040, managedAssemblyAbsolutePath=0x0000000801c060c0, managedAssemblyArgc=0, managedAssemblyArgv=0x0000000000000000) + 4359 at corerun.cpp:366
frame dotnet/coreclr#21: 0x0000000000404a71 corerun`main(argc=2, argv=0x00007fffffffe658) + 2817 at corerun.cpp:479
frame dotnet/coreclr#22: 0x00000000004019af corerun`_start + 367
(lldb) After trying this X number of times now, it consistently freezes at this point of code though, so there's definitely something weird going on in (or prior to) |
As an experiment - you can try passing in NULL into pthread_create. One of the possible reasons for crash like this one is that there is something wrong with the attributes used to create the thread - if it works, we just need to find out which attribute is source of the crash. |
Fixes issue #801 which was caused by the CoreCLR and PAL being dlopened, but pthread not being initialized by the runtime linker yet. FreeBSD requires that the main executable load pthread, rather than a downstream dependency. Also fixed a few missing calls to pthread_attr_init which was causing asserts() on FBSD.
That's fantastic @kangaroo. I can confirm this fixes the initial threading issue. I still have problems running Giving this another spin in LLDB gives us actual threads and from what I can see managed exceptions: lldb ./corerun -- -c . HelloWorld.exe
(lldb) target create "./corerun"
Current executable set to './corerun' (x86_64).
(lldb) settings set -- target.run-args "-c" "." "HelloWorld.exe"
(lldb) process launch
Process 54350 launching
Process 54350 launched: './corerun' (x86_64)
(lldb) c
error: Process is running. Use 'process interrupt' to pause execution.
Process 54350 stopped
* thread dotnet/coreclr#4: tid = 101566, 0x00000008013cc6ca libc.so.7`__sys_thr_kill + 10, stop reason = signal SIGUSR1
frame #0: 0x00000008013cc6ca libc.so.7`__sys_thr_kill + 10
libc.so.7`__sys_thr_kill:
-> 0x8013cc6ca <+10>: jb 0x8014a432c
0x8013cc6d0 <+16>: retq
0x8013cc6d1: nop
0x8013cc6d2: nop
(lldb) bt all
thread dotnet/coreclr#6: tid = 101567, 0x000000080083c8cc libthr.so.3
frame #0: 0x000000080083c8cc libthr.so.3
frame dotnet/coreclr#1: 0x000000080083ad8e libthr.so.3`??? + 606
frame dotnet/coreclr#2: 0x00000008021222f3 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(ptnwdNativeWaitData=0x0000000801c270f8, dwTimeout=4294967295, ptwrWakeupReason=0x00007fffff3f9494, pdwSignaledObject=0x00007fffff3f9490) + 691 at synchmanager.cpp:479
frame dotnet/coreclr#3: 0x00000008021214e6 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::BlockThread(this=0x0000000801c4d280, pthrCurrent=0x0000000801c27000, dwTimeout=4294967295, fAlertable=false, fIsSleep=false, ptwrWakeupReason=0x00007fffff3f96d8, pdwSignaledObject=0x00007fffff3f96ec) + 1686 at synchmanager.cpp:299
frame dotnet/coreclr#4: 0x0000000802135341 libcoreclrpal.so`CorUnix::InternalWaitForMultipleObjectsEx(pThread=0x0000000801c27000, nCount=1, lpHandles=0x0000000803f64928, bWaitAll=0, dwMilliseconds=4294967295, bAlertable=0) + 4161 at wait.cpp:493
frame dotnet/coreclr#5: 0x00000008021361d1 libcoreclrpal.so`::WaitForMultipleObjectsEx(nCount=1, lpHandles=0x0000000803f64928, bWaitAll=0, dwMilliseconds=4294967295, bAlertable=0) + 273 at wait.cpp:159
frame dotnet/coreclr#6: 0x000000080324d953 libcoreclr.so`FinalizerThread::WaitForFinalizerEvent(event=0x0000000801ce5418) + 387 at finalizerthread.cpp:509
frame dotnet/coreclr#7: 0x000000080324dd95 libcoreclr.so`FinalizerThread::FinalizerThreadWorker(args=0x00007fffff3f9d38) + 277 at finalizerthread.cpp:640
frame dotnet/coreclr#8: 0x0000000803194fa9 libcoreclr.so`ManagedThreadBase_DispatchInner(pCallState=0x00007fffff3f9d38) + 201 at threads.cpp:10043
frame dotnet/coreclr#9: 0x000000080319adf2 libcoreclr.so`ManagedThreadBase_DispatchMiddle(pCallState=0x00007fffff3f9d38) + 130 at threads.cpp:10094
frame dotnet/coreclr#10: 0x000000080319aa9d libcoreclr.so`ManagedThreadBase_DispatchOuter(this=0x00007fffff3f9c30, pParam=0x00007fffff3f9ce0)::$_5::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) const::TryArgs*) const::'lambda'(Param*)::operator()(Param*) const + 29 at threads.cpp:10348
frame dotnet/coreclr#11: 0x000000080319a8f8 libcoreclr.so`ManagedThreadBase_DispatchOuter(this=0x00007fffff3f9c98, pArgs=0x00007fffff3f9cb0)::$_5::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const + 40 at threads.cpp:10350
frame dotnet/coreclr#12: 0x0000000803194cd1 libcoreclr.so`ManagedThreadBase_DispatchOuter(pCallState=0x00007fffff3f9d38) + 225 at threads.cpp:10391
frame dotnet/coreclr#13: 0x0000000803194be0 libcoreclr.so`ManagedThreadBase_NoADTransition(pTarget=0x000000080324dc80, filterType=FinalizerThread)(void*), UnhandledExceptionLocation) + 80 at threads.cpp:10436
frame dotnet/coreclr#14: 0x0000000803194e6a libcoreclr.so`ManagedThreadBase::FinalizerBase(pTarget=0x000000080324dc80)(void*)) + 26 at threads.cpp:10462
frame dotnet/coreclr#15: 0x000000080324e5ff libcoreclr.so`FinalizerThread::FinalizerThreadStart(this=0x00007fffff3f9e60, __pEEParam=0x00007fffff3f9e70)::$_0::operator()(FinalizerThread::FinalizerThreadStart(void*)::__EEParam*) const + 95 at finalizerthread.cpp:834
frame dotnet/coreclr#16: 0x000000080324e3d8 libcoreclr.so`FinalizerThread::FinalizerThreadStart(args=0x0000000000000000) + 488 at finalizerthread.cpp:914
frame dotnet/coreclr#17: 0x0000000803187158 libcoreclr.so`Thread::intermediateThreadProc(arg=0x0000000801ce54a8) + 152 at threads.cpp:2861
frame dotnet/coreclr#18: 0x0000000802147039 libcoreclrpal.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000801c27000) + 1401 at thread.cpp:1532
frame dotnet/coreclr#19: 0x00000008008314f5 libthr.so.3`??? + 277
thread dotnet/coreclr#4: tid = 101566, 0x000000080083c8cc libthr.so.3
frame #0: 0x000000080083c8cc libthr.so.3
frame dotnet/coreclr#1: 0x000000080083ad8e libthr.so.3`??? + 606
frame dotnet/coreclr#2: 0x00000008021222f3 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(ptnwdNativeWaitData=0x0000000801c268f8, dwTimeout=4294967295, ptwrWakeupReason=0x00007fffff5fa884, pdwSignaledObject=0x00007fffff5fa880) + 691 at synchmanager.cpp:479
frame dotnet/coreclr#3: 0x00000008021214e6 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::BlockThread(this=0x0000000801c4d280, pthrCurrent=0x0000000801c26800, dwTimeout=4294967295, fAlertable=false, fIsSleep=false, ptwrWakeupReason=0x00007fffff5faac8, pdwSignaledObject=0x00007fffff5faadc) + 1686 at synchmanager.cpp:299
frame dotnet/coreclr#4: 0x0000000802135341 libcoreclrpal.so`CorUnix::InternalWaitForMultipleObjectsEx(pThread=0x0000000801c26800, nCount=3, lpHandles=0x00007fffff5fadd0, bWaitAll=0, dwMilliseconds=4294967295, bAlertable=0) + 4161 at wait.cpp:493
frame dotnet/coreclr#5: 0x00000008021361d1 libcoreclrpal.so`::WaitForMultipleObjectsEx(nCount=3, lpHandles=0x00007fffff5fadd0, bWaitAll=0, dwMilliseconds=4294967295, bAlertable=0) + 273 at wait.cpp:159
frame dotnet/coreclr#6: 0x00000008036d6b53 libcoreclr.so`DebuggerRCThread::MainLoop(this=0x0000000801d30018) + 419 at rcthread.cpp:1240
frame dotnet/coreclr#7: 0x00000008036d6952 libcoreclr.so`DebuggerRCThread::ThreadProc(this=0x0000000801d30018) + 1570 at rcthread.cpp:1043
frame dotnet/coreclr#8: 0x00000008036d5f82 libcoreclr.so`DebuggerRCThread::ThreadProcStatic((null)=0x0000000000000000) + 306 at rcthread.cpp:1639
frame dotnet/coreclr#9: 0x0000000802147039 libcoreclrpal.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000801c26800) + 1401 at thread.cpp:1532
frame dotnet/coreclr#10: 0x00000008008314f5 libthr.so.3`??? + 277
thread dotnet/coreclr#5: tid = 101565, 0x000000080149b90a libc.so.7`_open + 10
frame #0: 0x000000080149b90a libc.so.7`_open + 10
frame dotnet/coreclr#1: 0x0000000800833745 libthr.so.3`__open + 213
frame dotnet/coreclr#2: 0x00000008036edfb3 libcoreclr.so`TwoWayPipe::WaitForConnection(this=0x0000000801ccb138) + 179 at twowaypipe.cpp:106
frame dotnet/coreclr#3: 0x00000008036dd9cd libcoreclr.so`DbgTransportSession::TransportWorker(this=0x0000000801ccafd8) + 349 at dbgtransportsession.cpp:1271
frame dotnet/coreclr#4: 0x00000008036db4b5 libcoreclr.so`DbgTransportSession::TransportWorkerStatic(pvContext=0x0000000801ccafd8) + 21 at dbgtransportsession.cpp:1188
frame dotnet/coreclr#5: 0x0000000802147039 libcoreclrpal.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000801c26000) + 1401 at thread.cpp:1532
frame dotnet/coreclr#6: 0x00000008008314f5 libthr.so.3`??? + 277
thread dotnet/coreclr#2: tid = 101564, 0x00000008013cba8a libc.so.7`_sigwait + 10
frame #0: 0x00000008013cba8a libc.so.7`_sigwait + 10
frame dotnet/coreclr#1: 0x00000008008362ba libthr.so.3`sigwait + 90
frame dotnet/coreclr#2: 0x000000080206bd03 libcoreclrpal.so`ExternalSignalHandlerThreadRoutine((null)=0x0000000000000000) + 531 at signal.cpp:689
frame dotnet/coreclr#3: 0x0000000802147039 libcoreclrpal.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000801c25800) + 1401 at thread.cpp:1532
frame dotnet/coreclr#4: 0x00000008008314f5 libthr.so.3`??? + 277
thread dotnet/coreclr#3: tid = 101563, 0x000000080148304a libc.so.7`__sys_kevent + 10
frame #0: 0x000000080148304a libc.so.7`__sys_kevent + 10
frame dotnet/coreclr#1: 0x0000000802129542 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::ReadBytesFromProcessPipe(this=0x0000000801c4d280, iTimeout=-1, pRecvBuf=0x00007fffffbfdb57, iBytes=1) + 994 at synchmanager.cpp:2386
frame dotnet/coreclr#2: 0x000000080212811b libcoreclrpal.so`CorUnix::CPalSynchronizationManager::ReadCmdFromProcessPipe(this=0x0000000801c4d280, iPollTimeout=-1, pswcWorkerCmd=0x00007fffffbfdeac, pshridMarshaledData=0x00007fffffbfdea0, pdwData=0x00007fffffbfde9c) + 1195 at synchmanager.cpp:2199
frame dotnet/coreclr#3: 0x0000000802125c04 libcoreclrpal.so`CorUnix::CPalSynchronizationManager::WorkerThread(pArg=0x0000000801c4d280) + 132 at synchmanager.cpp:1903
frame dotnet/coreclr#4: 0x0000000802147039 libcoreclrpal.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000801c25000) + 1401 at thread.cpp:1532
frame dotnet/coreclr#5: 0x00000008008314f5 libthr.so.3`??? + 277
* thread dotnet/coreclr#1: tid = 100641, 0x00000008014a29ba libc.so.7`__sys_sigprocmask + 10, stop reason = signal SIGSTOP
* frame #0: 0x00000008014a29ba libc.so.7`__sys_sigprocmask + 10
frame dotnet/coreclr#1: 0x0000000800835c46 libthr.so.3`_sigprocmask + 38
frame dotnet/coreclr#2: 0x000000080259cfe8 libunwind.so.8`??? + 72
frame dotnet/coreclr#3: 0x000000080259e9be libunwind.so.8`??? + 14
frame dotnet/coreclr#4: 0x000000080259a9e6 libunwind.so.8`_ULx86_64_step + 54
frame dotnet/coreclr#5: 0x0000000802069a51 libcoreclrpal.so`PAL_VirtualUnwind(context=0x00007fffffffcde0, contextPointers=0x0000000000000000) + 129 at seh-unwind.cpp:171
frame dotnet/coreclr#6: 0x0000000803164fce libcoreclr.so`Thread::VirtualUnwindToFirstManagedCallFrame(pContext=0x00007fffffffcde0) + 78 at stackwalk.cpp:782
frame dotnet/coreclr#7: 0x0000000803246be9 libcoreclr.so`UnwindManagedExceptionPass1(ex=0x0000000801e97a78) + 89 at exceptionhandling.cpp:4481
frame dotnet/coreclr#8: 0x00000008032470b1 libcoreclr.so`DispatchManagedException(ex=0x0000000801e97a78) + 17 at exceptionhandling.cpp:4613
frame dotnet/coreclr#9: 0x0000000803060785 libcoreclr.so`CorHost2::ExecuteAssembly(this=0x0000000801c17528, dwAppDomainId=1, pwzAssemblyPath=0x0000000801eb3718, argc=0, argv=0x0000000000000000, pReturnValue=0x00007fffffffdb6c) + 1637 at corhost.cpp:1304
frame dotnet/coreclr#10: 0x000000080300e3b5 libcoreclr.so`::ExecuteAssembly(exePath=0x0000000801c16080, coreClrPath=0x0000000801c1a0e0, appDomainFriendlyName=0x000000000040a858, propertyCount=5, propertyKeys=0x00007fffffffda90, propertyValues=0x00007fffffffda60, argc=0, argv=0x0000000000000000, managedAssemblyPath=0x0000000801c160c0, entryPointAssemblyName=0x0000000000000000, entryPointTypeName=0x0000000000000000, entryPointMethodName=0x0000000000000000, exitCode=0x00007fffffffdb6c) + 1829 at unixinterface.cpp:196
frame dotnet/coreclr#11: 0x0000000000403d97 corerun`ExecuteManagedAssembly(currentExeAbsolutePath=0x0000000801c16080, clrFilesAbsolutePath=0x0000000801c19040, managedAssemblyAbsolutePath=0x0000000801c160c0, managedAssemblyArgc=0, managedAssemblyArgv=0x0000000000000000) + 4359 at corerun.cpp:366
frame dotnet/coreclr#12: 0x0000000000404a67 corerun`corerun(argc=4, argv=0x00007fffffffe648) + 2807 at corerun.cpp:479
frame dotnet/coreclr#13: 0x0000000000404b42 corerun`main(argc=4, argv=0x00007fffffffe648) + 34 at corerun.cpp:490
frame dotnet/coreclr#14: 0x00000000004019af corerun`_start + 367
(lldb) thread backtrace
* thread dotnet/coreclr#1: tid = 100641, 0x00000008014a29ba libc.so.7`__sys_sigprocmask + 10, stop reason = signal SIGSTOP
* frame #0: 0x00000008014a29ba libc.so.7`__sys_sigprocmask + 10
frame dotnet/coreclr#1: 0x0000000800835c46 libthr.so.3`_sigprocmask + 38
frame dotnet/coreclr#2: 0x000000080259cfe8 libunwind.so.8`??? + 72
frame dotnet/coreclr#3: 0x000000080259e9be libunwind.so.8`??? + 14
frame dotnet/coreclr#4: 0x000000080259a9e6 libunwind.so.8`_ULx86_64_step + 54
frame dotnet/coreclr#5: 0x0000000802069a51 libcoreclrpal.so`PAL_VirtualUnwind(context=0x00007fffffffcde0, contextPointers=0x0000000000000000) + 129 at seh-unwind.cpp:171
frame dotnet/coreclr#6: 0x0000000803164fce libcoreclr.so`Thread::VirtualUnwindToFirstManagedCallFrame(pContext=0x00007fffffffcde0) + 78 at stackwalk.cpp:782
frame dotnet/coreclr#7: 0x0000000803246be9 libcoreclr.so`UnwindManagedExceptionPass1(ex=0x0000000801e97a78) + 89 at exceptionhandling.cpp:4481
frame dotnet/coreclr#8: 0x00000008032470b1 libcoreclr.so`DispatchManagedException(ex=0x0000000801e97a78) + 17 at exceptionhandling.cpp:4613
frame dotnet/coreclr#9: 0x0000000803060785 libcoreclr.so`CorHost2::ExecuteAssembly(this=0x0000000801c17528, dwAppDomainId=1, pwzAssemblyPath=0x0000000801eb3718, argc=0, argv=0x0000000000000000, pReturnValue=0x00007fffffffdb6c) + 1637 at corhost.cpp:1304
frame dotnet/coreclr#10: 0x000000080300e3b5 libcoreclr.so`::ExecuteAssembly(exePath=0x0000000801c16080, coreClrPath=0x0000000801c1a0e0, appDomainFriendlyName=0x000000000040a858, propertyCount=5, propertyKeys=0x00007fffffffda90, propertyValues=0x00007fffffffda60, argc=0, argv=0x0000000000000000, managedAssemblyPath=0x0000000801c160c0, entryPointAssemblyName=0x0000000000000000, entryPointTypeName=0x0000000000000000, entryPointMethodName=0x0000000000000000, exitCode=0x00007fffffffdb6c) + 1829 at unixinterface.cpp:196
frame dotnet/coreclr#11: 0x0000000000403d97 corerun`ExecuteManagedAssembly(currentExeAbsolutePath=0x0000000801c16080, clrFilesAbsolutePath=0x0000000801c19040, managedAssemblyAbsolutePath=0x0000000801c160c0, managedAssemblyArgc=0, managedAssemblyArgv=0x0000000000000000) + 4359 at corerun.cpp:366
frame dotnet/coreclr#12: 0x0000000000404a67 corerun`corerun(argc=4, argv=0x00007fffffffe648) + 2807 at corerun.cpp:479
frame dotnet/coreclr#13: 0x0000000000404b42 corerun`main(argc=4, argv=0x00007fffffffe648) + 34 at corerun.cpp:490
frame dotnet/coreclr#14: 0x00000000004019af corerun`_start + 367
(lldb) thread list
Process 54350 stopped
thread dotnet/coreclr#6: tid = 101567, 0x000000080083c8cc libthr.so.3
thread dotnet/coreclr#4: tid = 101566, 0x000000080083c8cc libthr.so.3
thread dotnet/coreclr#5: tid = 101565, 0x000000080149b90a libc.so.7`_open + 10
thread dotnet/coreclr#2: tid = 101564, 0x00000008013cba8a libc.so.7`_sigwait + 10
thread dotnet/coreclr#3: tid = 101563, 0x000000080148304a libc.so.7`__sys_kevent + 10
* thread dotnet/coreclr#1: tid = 100641, 0x00000008014a29ba libc.so.7`__sys_sigprocmask + 10, stop reason = signal SIGSTOP
(lldb) Does this mean we're now actually at a point where we're running managed code? That would be great, but I'm not sure what in HelloWorld.exe is capable of creating managed exceptions :) Edit: Also worth mentioning that it responds to SIGINT now, and don't need a SIGKILL to be aborted. |
@josteink Yes, there are issues with exceptions and unwinding. I didn't bother looking into them yet. You might want to try linking libunwind into unixcorerun as well and seeing if that helps. |
@kangaroo That didn't seem to help. In the debugger it seemed to stop at the same places. That said I've noticed startup is slow. Too slow (I think) for something like coreclr on this beefy hardware. Using @janvorli's trace-tricks, I got the following log. It's LONG so I've uploaded it as a separate file: http://invalid.ed.ntnu.no/~jostein/pallog.txt What I notice is that there's a million invocations of COMPlus code. That can't possibly be correct? Edit: The trace contains some really beefy meat. Look at the following entries:
So we got corerun trying to access COMPlus on FreeBSD combined with "Unable to load libc". That seems like quite significant show-stoppers. |
The COMPlusxxx messages mean that we were checking a config value from an environment variable. |
I can see that there was this change made for FreeBSD: #if defined(__APPLE__)
ShortAsciiName = "libc.dylib";
#elif defined(__FreeBSD__)
ShortAsciiName = "libc.so";
#else
ShortAsciiName = LIBC_SO;
#endif I wonder if the issue is that the libc.so is not the actual shared library, but a LD script pointing to the real one, e.g. libc.so.6. That would cause the problem and that's the reason why Linux uses the LIBC_SO that contains the real name.
|
@janvorli Bingo! $ find / -name 'libc.so*'
/lib/libc.so.7
/usr/lib/libc.so
$ I'll just try replacing the PAL-code with As for COMPlus... I didn't know that. I'll have to start bad-mouthing DCOM instead from now on :) Edit: YES!!! $ ./corerun HelloWorld.exe
04/27/2015 17:50:28
$ ./corerun HelloWorld.exe
04/27/2015 17:51:43
$ |
Now that we have a successful build, the runtime issues start. Testing our binaries on FreeBSD shows that we are not yet able to really run an application.
Running any assembly currently results in the corerun process simply freezing. It creates one line of empty output and does nothing more. CPU usage remains 0% and the process ignores SIGINT. You need to SIGKILL it to stop it.
We have to investigate the cause of it. I made a system call trace with truss, available here: https://gist.github.com/janhenke/618a1f0e297850694b4a
We have to find our if the problem comes from CoreCLR (PAL?) or CoreFX.
The text was updated successfully, but these errors were encountered: