Skip to content

Apparent GC deadlock in .NET Core 3.1 self-contained Kestrel app #51973

@loop-evgeny

Description

@loop-evgeny

Description

We have a .NET Core 3.1 server app that uses Kestrel to serve web requests. It suddenly froze for no apparent reason on 2021-04-23 and the native stack traces I got using LLDB make it look like a GC deadlock. Possibly related to #41958 (we also have some Grpc threads) or #47700 (we're also running on Ubuntu 18.04). This is the only time we've seen it happen, but the frozen process is still running (still frozen).

Configuration

Self-contained app built using .NET SDK 5.0.201, runtime 3.1.13.
Running on Ubuntu 18.04.5, kernel 4.15.0-65-generic, x64 architecture

Other information

I can provide full stack traces and even a full dump privately, but the stack traces that make me think it's a GC deadlock are below:

  thread #67: tid = 3357, 0x00007fafebd84ad3 libpthread.so.0`__pthread_cond_wait + 579, name = '...', stop reason = signal SIGSTOP
    frame #0: 0x00007fafebd84ad3 libpthread.so.0`__pthread_cond_wait + 579
    frame #1: 0x00007fafea674392 libcoreclr.so`CorUnix::InternalEnterCriticalSection(CorUnix::CPalThread*, _CRITICAL_SECTION*) + 530
    frame #2: 0x00007fafea2d01c5 libcoreclr.so`CrstBase::Enter() + 213
    frame #3: 0x00007fafea4519e2 libcoreclr.so`AppDomain::AssemblyIterator::Next(CollectibleAssemblyHolder<DomainAssembly*>*) + 34
    frame #4: 0x00007fafea45319b libcoreclr.so`AppDomain::EnumStaticGCRefs(void (*)(Object**, ScanContext*, unsigned int), ScanContext*) + 123
    frame #5: 0x00007fafea56ed4e libcoreclr.so`GCToEEInterface::GcScanRoots(void (*)(Object**, ScanContext*, unsigned int), int, int, ScanContext*) + 126
    frame #6: 0x00007fafea515358 libcoreclr.so`SVR::gc_heap::background_mark_phase() + 280
    frame #7: 0x00007fafea5144e6 libcoreclr.so`SVR::gc_heap::gc1() + 438
    frame #8: 0x00007fafea52c76f libcoreclr.so`SVR::gc_heap::bgc_thread_function() + 191
    frame #9: 0x00007fafea5712b2 libcoreclr.so`(anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(void*) + 210
    frame #10: 0x00007fafea6833c8 libcoreclr.so`CorUnix::CPalThread::ThreadEntry(void*) + 520
    frame #11: 0x00007fafebd7e6db libpthread.so.0`start_thread + 219
    frame #12: 0x00007fafeaf6471f libc.so.6`__GI___clone + 63 at clone.S:95
    ...

  thread #74: tid = 13634, 0x00007fafebd84ad3 libpthread.so.0`__pthread_cond_wait + 579, name = '...', stop reason = signal SIGSTOP
    frame #0: 0x00007fafebd84ad3 libpthread.so.0`__pthread_cond_wait + 579
    frame #1: 0x00007fafea679225 libcoreclr.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*) + 309
    frame #2: 0x00007fafea678e34 libcoreclr.so`CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*) + 388
    frame #3: 0x00007fafea67d44f libcoreclr.so`CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int) + 1743
    frame #4: 0x00007fafea67d6f9 libcoreclr.so`WaitForSingleObjectEx + 89
    frame #5: 0x00007fafea419724 libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) + 228
    frame #6: 0x00007fafea511dce libcoreclr.so`SVR::gc_heap::try_allocate_more_space(alloc_context*, unsigned long, unsigned int, int) + 110
    frame #7: 0x00007fafea534641 libcoreclr.so`SVR::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) + 177
    frame #8: 0x00007fafea3b1abb libcoreclr.so`AllocateObject(MethodTable*) + 203
    frame #9: 0x00007fafea3bd766 libcoreclr.so`JIT_New(CORINFO_CLASS_STRUCT_*) + 150
    ...

  thread #75: tid = 13637, 0x00007fafebd84ad3 libpthread.so.0`__pthread_cond_wait + 579, name = '...', stop reason = signal SIGSTOP
    frame #0: 0x00007fafebd84ad3 libpthread.so.0`__pthread_cond_wait + 579
    frame #1: 0x00007fafea679225 libcoreclr.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*) + 309
    frame #2: 0x00007fafea678e34 libcoreclr.so`CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*) + 388
    frame #3: 0x00007fafea67d44f libcoreclr.so`CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int) + 1743
    frame #4: 0x00007fafea67d6f9 libcoreclr.so`WaitForSingleObjectEx + 89
    frame #5: 0x00007fafea419724 libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) + 228
    frame #6: 0x00007fafea50fe6e libcoreclr.so`SVR::GCHeap::GarbageCollectGeneration(unsigned int, gc_reason) + 638
    frame #7: 0x00007fafea510e1d libcoreclr.so`SVR::gc_heap::trigger_gc_for_alloc(int, gc_reason, SVR::GCDebugSpinLock*, bool, SVR::msl_take_state) + 45
    frame #8: 0x00007fafea511fe7 libcoreclr.so`SVR::gc_heap::try_allocate_more_space(alloc_context*, unsigned long, unsigned int, int) + 647
    frame #9: 0x00007fafea5126dc libcoreclr.so`SVR::gc_heap::allocate_more_space(alloc_context*, unsigned long, unsigned int, int) + 300
    frame #10: 0x00007fafea531508 libcoreclr.so`SVR::gc_heap::allocate_large_object(unsigned long, unsigned int, long&) + 104
    frame #11: 0x00007fafea5343bd libcoreclr.so`SVR::GCHeap::AllocLHeap(unsigned long, unsigned int) + 45
    frame #12: 0x00007fafea3b0819 libcoreclr.so`AllocateSzArray(MethodTable*, int, GC_ALLOC_FLAGS, int) + 409
    frame #13: 0x00007fafea3be03b libcoreclr.so`JIT_NewArr1(CORINFO_CLASS_STRUCT_*, long) + 187
    ...

  thread #76: tid = 14196, 0x00007fafebd84ad3 libpthread.so.0`__pthread_cond_wait + 579, name = '...', stop reason = signal SIGSTOP
    frame #0: 0x00007fafebd84ad3 libpthread.so.0`__pthread_cond_wait + 579
    frame #1: 0x00007fafea679225 libcoreclr.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*) + 309
    frame #2: 0x00007fafea678e34 libcoreclr.so`CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*) + 388
    frame #3: 0x00007fafea67d44f libcoreclr.so`CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int) + 1743
    frame #4: 0x00007fafea67d6f9 libcoreclr.so`WaitForSingleObjectEx + 89
    frame #5: 0x00007fafea419724 libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) + 228
    frame #6: 0x00007fafea5014ac libcoreclr.so`SVR::gc_heap::wait_for_gc_done(int) + 92
    frame #7: 0x00007fafea56c5b1 libcoreclr.so`SVR::GCHeap::WaitUntilGCComplete(bool) + 33
    frame #8: 0x00007fafea370767 libcoreclr.so`ThreadpoolMgr::WorkerThreadStart(void*) + 743
    frame #9: 0x00007fafea6833c8 libcoreclr.so`CorUnix::CPalThread::ThreadEntry(void*) + 520
    frame #10: 0x00007fafebd7e6db libpthread.so.0`start_thread + 219
    frame #11: 0x00007fafeaf6471f libc.so.6`__GI___clone + 63 at clone.S:95

  thread #77: tid = 14225, 0x00007fafebd84ad3 libpthread.so.0`__pthread_cond_wait + 579, name = '...', stop reason = signal SIGSTOP
    frame #0: 0x00007fafebd84ad3 libpthread.so.0`__pthread_cond_wait + 579
    frame #1: 0x00007fafea679225 libcoreclr.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*) + 309
    frame #2: 0x00007fafea678e34 libcoreclr.so`CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*) + 388
    frame #3: 0x00007fafea67d44f libcoreclr.so`CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int) + 1743
    frame #4: 0x00007fafea67d6f9 libcoreclr.so`WaitForSingleObjectEx + 89
    frame #5: 0x00007fafea419724 libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) + 228
    frame #6: 0x00007fafea41cb0b libcoreclr.so`Thread::RareDisablePreemptiveGC() + 427
    frame #7: 0x00007fafea41f517 libcoreclr.so`HandleGCSuspensionForInterruptedThread(_CONTEXT*) + 295
    frame #8: 0x00007fafea64f110 libcoreclr.so`inject_activation_handler(int, siginfo_t*, void*) + 128
    frame #9: 0x00007fafebd89980 libpthread.so.0`___lldb_unnamed_symbol1$$libpthread.so.0 + 1
...

(there are more threads with similar stack traces)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-GC-coreclruntriagedNew issue has not been triaged by the area owner

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions