Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC deadlock under Linux (works in Windows) #47700

Closed
slav opened this issue Feb 1, 2021 · 5 comments
Closed

GC deadlock under Linux (works in Windows) #47700

slav opened this issue Feb 1, 2021 · 5 comments
Assignees
Labels
area-GC-coreclr tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Milestone

Comments

@slav
Copy link

slav commented Feb 1, 2021

Description

We're working on updating our application from Windows to Linux and we're consistently running into what seems to be a deadlock in GC.

I see thread on Thread Pool is trying to get more space, enters GC, and then it gets stuck.

 thread #12, name = '.NET ThreadPool', stop reason = signal SIGSTOP
    frame #0: 0x00007fca1aca600c libpthread.so.0`__pthread_cond_wait + 508
    frame #1: 0x00007fca1a0cf42b libcoreclr.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(ptnwdNativeWaitData=0x00007fc2a0085a60, dwTimeout=4294967295, ptwrWakeupReason=0x00007fc32fffdf04, pdwSignaledObject=0x00007fc32fffdf00) at synchmanager.cpp:478
    frame #2: 0x00007fca1a0cf0e1 libcoreclr.so`CorUnix::CPalSynchronizationManager::BlockThread(this=0x000055fadcf4cbc0, pthrCurrent=0x00007fc2a00858c0, dwTimeout=4294967295, fAlertable=false, fIsSleep=<unavailable>, ptwrWakeupReason=0x00007fc32fffdf98, pdwSignaledObject=0x00007fc32fffdf9c) at synchmanager.cpp:301
    frame #3: 0x00007fca1a0d3ac2 libcoreclr.so`CorUnix::InternalWaitForMultipleObjectsEx(pThread=0x00007fc2a00858c0, nCount=<unavailable>, lpHandles=<unavailable>, bWaitAll=<unavailable>, dwMilliseconds=<unavailable>, bAlertable=NO, bPrioritize=NO) at wait.cpp:637
    frame #4: 0x00007fca1a0d3cf9 libcoreclr.so`::WaitForSingleObjectEx(hHandle=0x00000000000000f4, dwMilliseconds=4294967295, bAlertable=NO) at wait.cpp:138
    frame #5: 0x00007fca19e8a07a libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) [inlined] CLREventWaitHelper2(handle=<unavailable>, dwMilliseconds=<unavailable>, alertable=<unavailable>) at synch.cpp:376
    frame #6: 0x00007fca19e8a075 libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) [inlined] CLREventWaitHelper(this=<unavailable>, pParam=0x00007fc32fffe170)::$_1::operator()(CLREventWaitHelper(void*, unsigned int, int)::Param*) const at synch.cpp:401
    frame #7: 0x00007fca19e8a06c libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) at synch.cpp:403
    frame #8: 0x00007fca19e8a002 libcoreclr.so`CLREventBase::WaitEx(this=<unavailable>, dwMilliseconds=4294967295, mode=<unavailable>, syncState=0x0000000000000000) at synch.cpp:470
    frame #9: 0x00007fca19ed1622 libcoreclr.so`SVR::gc_heap::wait_for_gc_done(timeOut=-1) at gc.cpp:10860
    frame #10: 0x00007fca19ee4c0a libcoreclr.so`SVR::GCHeap::GarbageCollectGeneration(this=<unavailable>, gen=0, reason=reason_alloc_soh) at gc.cpp:37740
    frame #11: 0x00007fca19ee61a9 libcoreclr.so`SVR::gc_heap::trigger_gc_for_alloc(this=<unavailable>, gen_number=<unavailable>, gr=<unavailable>, msl=0x000055fadd0a00a0, loh_p=<unavailable>, take_state=<unavailable>) at gc.cpp:13862
    frame #12: 0x00007fca19ee7338 libcoreclr.so`SVR::gc_heap::try_allocate_more_space(this=<unavailable>, acontext=<unavailable>, size=<unavailable>, flags=<unavailable>, gen_number=<unavailable>) at gc.cpp:0
    frame #13: 0x00007fca19f142f1 libcoreclr.so`SVR::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) [inlined] SVR::gc_heap::allocate_more_space(acontext=0x00007fc2a0090608, size=104, flags=0, alloc_generation_number=0) at gc.cpp:14461
    frame #14: 0x00007fca19f142c5 libcoreclr.so`SVR::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) at gc.cpp:14517
    frame #15: 0x00007fca19f142aa libcoreclr.so`SVR::GCHeap::Alloc(this=<unavailable>, context=0x00007fc2a0090608, size=98, flags=0) at gc.cpp:36745
    frame #16: 0x00007fca19df3ee6 libcoreclr.so`AllocateSzArray(MethodTable*, int, GC_ALLOC_FLAGS) at gchelpers.cpp:228
    frame #17: 0x00007fca19df3e7e libcoreclr.so`AllocateSzArray(pArrayMT=<unavailable>, cElements=74, flags=GC_ALLOC_NO_FLAGS) at gchelpers.cpp:0
    frame #18: 0x00007fca19e11f5f libcoreclr.so`JIT_NewArr1(arrayMT=0x00007fc9a0ba1458, size=74) at jithelpers.cpp:2723

Meanwhile Finalizer thread and all GC threads are also stuck in similar manner

thread #37, name = '.NET Server GC', stop reason = signal SIGSTOP
    frame #0: 0x00007fca1aca600c libpthread.so.0`__pthread_cond_wait + 508
    frame #1: 0x00007fca1a0cf42b libcoreclr.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(ptnwdNativeWaitData=0x000055fadd017ea0, dwTimeout=4294967295, ptwrWakeupReason=0x00007fca15a098b4, pdwSignaledObject=0x00007fca15a098b0) at synchmanager.cpp:478
    frame #2: 0x00007fca1a0cf0e1 libcoreclr.so`CorUnix::CPalSynchronizationManager::BlockThread(this=0x000055fadcf4cbc0, pthrCurrent=0x000055fadd017d00, dwTimeout=4294967295, fAlertable=false, fIsSleep=<unavailable>, ptwrWakeupReason=0x00007fca15a09948, pdwSignaledObject=0x00007fca15a0994c) at synchmanager.cpp:301
    frame #3: 0x00007fca1a0d3ac2 libcoreclr.so`CorUnix::InternalWaitForMultipleObjectsEx(pThread=0x000055fadd017d00, nCount=<unavailable>, lpHandles=<unavailable>, bWaitAll=<unavailable>, dwMilliseconds=<unavailable>, bAlertable=NO, bPrioritize=NO) at wait.cpp:637
    frame #4: 0x00007fca1a0d3cf9 libcoreclr.so`::WaitForSingleObjectEx(hHandle=0x000000000000007c, dwMilliseconds=4294967295, bAlertable=NO) at wait.cpp:138
    frame #5: 0x00007fca19e8a07a libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) [inlined] CLREventWaitHelper2(handle=<unavailable>, dwMilliseconds=<unavailable>, alertable=<unavailable>) at synch.cpp:376
    frame #6: 0x00007fca19e8a075 libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) [inlined] CLREventWaitHelper(this=<unavailable>, pParam=0x00007fca15a09b20)::$_1::operator()(CLREventWaitHelper(void*, unsigned int, int)::Param*) const at synch.cpp:401
    frame #7: 0x00007fca19e8a06c libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) at synch.cpp:403
    frame #8: 0x00007fca19e8a002 libcoreclr.so`CLREventBase::WaitEx(this=<unavailable>, dwMilliseconds=4294967295, mode=<unavailable>, syncState=0x0000000000000000) at synch.cpp:470
    frame #9: 0x00007fca19f18fb6 libcoreclr.so`SVR::t_join::join(this=0x00007fca1a2b9100, gch=0x000055fadcfee150, join_id=4) at gc.cpp:830
    frame #10: 0x00007fca19efc1b6 libcoreclr.so`SVR::gc_heap::scan_dependent_handles(this=0x000055fadcfee150, condemned_gen_number=1, sc=0x00007fca15a09c10, initial_scan_p=NO) at gc.cpp:20493
    frame #11: 0x00007fca19eef7bb libcoreclr.so`SVR::gc_heap::mark_phase(this=0x000055fadcfee150, condemned_gen_number=1, mark_only_p=NO) at gc.cpp:20992
    frame #12: 0x00007fca19eeb2eb libcoreclr.so`SVR::gc_heap::gc1(this=0x000055fadcfee150) at gc.cpp:16692
    frame #13: 0x00007fca19ed87ec libcoreclr.so`SVR::gc_heap::garbage_collect(this=<unavailable>, n=<unavailable>) at gc.cpp:0
    frame #14: 0x00007fca19ed7500 libcoreclr.so`SVR::gc_heap::gc_thread_function(this=0x000055fadcfee150) at gc.cpp:5730
    frame #15: 0x00007fca19ed7066 libcoreclr.so`SVR::gc_heap::gc_thread_stub(arg=<unavailable>) at gc.cpp:26149
    frame #16: 0x00007fca19df28fe libcoreclr.so`(anonymous namespace)::CreateNonSuspendableThread(void (*)(void*), void*, char16_t const*)::$_1::__invoke(void*) [inlined] (anonymous namespace)::CreateNonSuspendableThread(this=<unavailable>, argument=<unavailable>)(void*), void*, char16_t const*)::$_1::operator()(void*) const at gcenv.ee.cpp:1444
    frame #17: 0x00007fca19df28bd libcoreclr.so`(anonymous namespace)::CreateNonSuspendableThread(argument=<unavailable>)(void*), void*, char16_t const*)::$_1::__invoke(void*) at gcenv.ee.cpp:1429
    frame #18: 0x00007fca1a0da7ae libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x000055fadd017d00) at thread.cpp:1845
    frame #19: 0x00007fca1ac9ffa3 libpthread.so.0`start_thread + 243
    frame #20: 0x00007fca1a8aa4cf libc.so.6`clone + 63

This happens pretty consistently for us.

Configuration

  • .NET 5.0.2
  • Debian 10
  • 16 core 128 GB ram Azure (L16)
  • We are running 2 instances of the app, both with GC in server mode. So there're 16 GC threads per process. This hasn't been an issue in Windows though.

Regression?

This was broken on .NET Core 3.1. We wanted to see if it got fixed with .NET 5, but doesn't look like there's any difference.

Other information

I did capture full stack traces and dumps. I understand that's needed to diagnose the issue and I can send all of that in private.

My guess is that it might be a situation where GC is trying to suspend all threads for marking (all GC threads are in mark phase), but some thread is waiting on something else and is unable to get suspended? Not sure.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-GC-coreclr untriaged New issue has not been triaged by the area owner labels Feb 1, 2021
@ghost
Copy link

ghost commented Feb 1, 2021

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

We're working on updating our application from Windows to Linux and we're consistently running into what seems to be a deadlock in GC.

I see thread on Thread Pool is trying to get more space, enters GC, and then it gets stuck.

 thread #12, name = '.NET ThreadPool', stop reason = signal SIGSTOP
    frame #0: 0x00007fca1aca600c libpthread.so.0`__pthread_cond_wait + 508
    frame #1: 0x00007fca1a0cf42b libcoreclr.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(ptnwdNativeWaitData=0x00007fc2a0085a60, dwTimeout=4294967295, ptwrWakeupReason=0x00007fc32fffdf04, pdwSignaledObject=0x00007fc32fffdf00) at synchmanager.cpp:478
    frame #2: 0x00007fca1a0cf0e1 libcoreclr.so`CorUnix::CPalSynchronizationManager::BlockThread(this=0x000055fadcf4cbc0, pthrCurrent=0x00007fc2a00858c0, dwTimeout=4294967295, fAlertable=false, fIsSleep=<unavailable>, ptwrWakeupReason=0x00007fc32fffdf98, pdwSignaledObject=0x00007fc32fffdf9c) at synchmanager.cpp:301
    frame #3: 0x00007fca1a0d3ac2 libcoreclr.so`CorUnix::InternalWaitForMultipleObjectsEx(pThread=0x00007fc2a00858c0, nCount=<unavailable>, lpHandles=<unavailable>, bWaitAll=<unavailable>, dwMilliseconds=<unavailable>, bAlertable=NO, bPrioritize=NO) at wait.cpp:637
    frame #4: 0x00007fca1a0d3cf9 libcoreclr.so`::WaitForSingleObjectEx(hHandle=0x00000000000000f4, dwMilliseconds=4294967295, bAlertable=NO) at wait.cpp:138
    frame #5: 0x00007fca19e8a07a libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) [inlined] CLREventWaitHelper2(handle=<unavailable>, dwMilliseconds=<unavailable>, alertable=<unavailable>) at synch.cpp:376
    frame #6: 0x00007fca19e8a075 libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) [inlined] CLREventWaitHelper(this=<unavailable>, pParam=0x00007fc32fffe170)::$_1::operator()(CLREventWaitHelper(void*, unsigned int, int)::Param*) const at synch.cpp:401
    frame #7: 0x00007fca19e8a06c libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) at synch.cpp:403
    frame #8: 0x00007fca19e8a002 libcoreclr.so`CLREventBase::WaitEx(this=<unavailable>, dwMilliseconds=4294967295, mode=<unavailable>, syncState=0x0000000000000000) at synch.cpp:470
    frame #9: 0x00007fca19ed1622 libcoreclr.so`SVR::gc_heap::wait_for_gc_done(timeOut=-1) at gc.cpp:10860
    frame #10: 0x00007fca19ee4c0a libcoreclr.so`SVR::GCHeap::GarbageCollectGeneration(this=<unavailable>, gen=0, reason=reason_alloc_soh) at gc.cpp:37740
    frame #11: 0x00007fca19ee61a9 libcoreclr.so`SVR::gc_heap::trigger_gc_for_alloc(this=<unavailable>, gen_number=<unavailable>, gr=<unavailable>, msl=0x000055fadd0a00a0, loh_p=<unavailable>, take_state=<unavailable>) at gc.cpp:13862
    frame #12: 0x00007fca19ee7338 libcoreclr.so`SVR::gc_heap::try_allocate_more_space(this=<unavailable>, acontext=<unavailable>, size=<unavailable>, flags=<unavailable>, gen_number=<unavailable>) at gc.cpp:0
    frame #13: 0x00007fca19f142f1 libcoreclr.so`SVR::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) [inlined] SVR::gc_heap::allocate_more_space(acontext=0x00007fc2a0090608, size=104, flags=0, alloc_generation_number=0) at gc.cpp:14461
    frame #14: 0x00007fca19f142c5 libcoreclr.so`SVR::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) at gc.cpp:14517
    frame #15: 0x00007fca19f142aa libcoreclr.so`SVR::GCHeap::Alloc(this=<unavailable>, context=0x00007fc2a0090608, size=98, flags=0) at gc.cpp:36745
    frame #16: 0x00007fca19df3ee6 libcoreclr.so`AllocateSzArray(MethodTable*, int, GC_ALLOC_FLAGS) at gchelpers.cpp:228
    frame #17: 0x00007fca19df3e7e libcoreclr.so`AllocateSzArray(pArrayMT=<unavailable>, cElements=74, flags=GC_ALLOC_NO_FLAGS) at gchelpers.cpp:0
    frame #18: 0x00007fca19e11f5f libcoreclr.so`JIT_NewArr1(arrayMT=0x00007fc9a0ba1458, size=74) at jithelpers.cpp:2723

Meanwhile Finalizer thread and all GC threads are also stuck in similar manner

thread #37, name = '.NET Server GC', stop reason = signal SIGSTOP
    frame #0: 0x00007fca1aca600c libpthread.so.0`__pthread_cond_wait + 508
    frame #1: 0x00007fca1a0cf42b libcoreclr.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(ptnwdNativeWaitData=0x000055fadd017ea0, dwTimeout=4294967295, ptwrWakeupReason=0x00007fca15a098b4, pdwSignaledObject=0x00007fca15a098b0) at synchmanager.cpp:478
    frame #2: 0x00007fca1a0cf0e1 libcoreclr.so`CorUnix::CPalSynchronizationManager::BlockThread(this=0x000055fadcf4cbc0, pthrCurrent=0x000055fadd017d00, dwTimeout=4294967295, fAlertable=false, fIsSleep=<unavailable>, ptwrWakeupReason=0x00007fca15a09948, pdwSignaledObject=0x00007fca15a0994c) at synchmanager.cpp:301
    frame #3: 0x00007fca1a0d3ac2 libcoreclr.so`CorUnix::InternalWaitForMultipleObjectsEx(pThread=0x000055fadd017d00, nCount=<unavailable>, lpHandles=<unavailable>, bWaitAll=<unavailable>, dwMilliseconds=<unavailable>, bAlertable=NO, bPrioritize=NO) at wait.cpp:637
    frame #4: 0x00007fca1a0d3cf9 libcoreclr.so`::WaitForSingleObjectEx(hHandle=0x000000000000007c, dwMilliseconds=4294967295, bAlertable=NO) at wait.cpp:138
    frame #5: 0x00007fca19e8a07a libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) [inlined] CLREventWaitHelper2(handle=<unavailable>, dwMilliseconds=<unavailable>, alertable=<unavailable>) at synch.cpp:376
    frame #6: 0x00007fca19e8a075 libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) [inlined] CLREventWaitHelper(this=<unavailable>, pParam=0x00007fca15a09b20)::$_1::operator()(CLREventWaitHelper(void*, unsigned int, int)::Param*) const at synch.cpp:401
    frame #7: 0x00007fca19e8a06c libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) at synch.cpp:403
    frame #8: 0x00007fca19e8a002 libcoreclr.so`CLREventBase::WaitEx(this=<unavailable>, dwMilliseconds=4294967295, mode=<unavailable>, syncState=0x0000000000000000) at synch.cpp:470
    frame #9: 0x00007fca19f18fb6 libcoreclr.so`SVR::t_join::join(this=0x00007fca1a2b9100, gch=0x000055fadcfee150, join_id=4) at gc.cpp:830
    frame #10: 0x00007fca19efc1b6 libcoreclr.so`SVR::gc_heap::scan_dependent_handles(this=0x000055fadcfee150, condemned_gen_number=1, sc=0x00007fca15a09c10, initial_scan_p=NO) at gc.cpp:20493
    frame #11: 0x00007fca19eef7bb libcoreclr.so`SVR::gc_heap::mark_phase(this=0x000055fadcfee150, condemned_gen_number=1, mark_only_p=NO) at gc.cpp:20992
    frame #12: 0x00007fca19eeb2eb libcoreclr.so`SVR::gc_heap::gc1(this=0x000055fadcfee150) at gc.cpp:16692
    frame #13: 0x00007fca19ed87ec libcoreclr.so`SVR::gc_heap::garbage_collect(this=<unavailable>, n=<unavailable>) at gc.cpp:0
    frame #14: 0x00007fca19ed7500 libcoreclr.so`SVR::gc_heap::gc_thread_function(this=0x000055fadcfee150) at gc.cpp:5730
    frame #15: 0x00007fca19ed7066 libcoreclr.so`SVR::gc_heap::gc_thread_stub(arg=<unavailable>) at gc.cpp:26149
    frame #16: 0x00007fca19df28fe libcoreclr.so`(anonymous namespace)::CreateNonSuspendableThread(void (*)(void*), void*, char16_t const*)::$_1::__invoke(void*) [inlined] (anonymous namespace)::CreateNonSuspendableThread(this=<unavailable>, argument=<unavailable>)(void*), void*, char16_t const*)::$_1::operator()(void*) const at gcenv.ee.cpp:1444
    frame #17: 0x00007fca19df28bd libcoreclr.so`(anonymous namespace)::CreateNonSuspendableThread(argument=<unavailable>)(void*), void*, char16_t const*)::$_1::__invoke(void*) at gcenv.ee.cpp:1429
    frame #18: 0x00007fca1a0da7ae libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x000055fadd017d00) at thread.cpp:1845
    frame #19: 0x00007fca1ac9ffa3 libpthread.so.0`start_thread + 243
    frame #20: 0x00007fca1a8aa4cf libc.so.6`clone + 63

This happens pretty consistently for us.

Configuration

  • .NET 5.0.2
  • Debian 10
  • 16 core 128 GB ram Azure (L16)
  • We are running 2 instances of the app, both with GC in server mode. So there're 16 GC threads per process. This hasn't been an issue in Windows though.

Regression?

This was broken on .NET Core 3.1. We wanted to see if it got fixed with .NET 5, but doesn't look like there's any difference.

Other information

I did capture full stack traces and dumps. I understand that's needed to diagnose the issue and I can send all of that in private.

My guess is that it might be a situation where GC is trying to suspend all threads for marking (all GC threads are in mark phase), but some thread is waiting on something else and is unable to get suspended? Not sure.

Author: slav
Assignees: -
Labels:

area-GC-coreclr, untriaged

Milestone: -

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Feb 1, 2021
@mangod9 mangod9 added this to the 6.0.0 milestone Feb 1, 2021
@janvorli janvorli self-assigned this Feb 3, 2021
@janvorli
Copy link
Member

janvorli commented Feb 3, 2021

So I’ve found that it is a glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=25847. The reported of the issue has analyzed the problem very diligently and found and proposed a fix to glibc to fix that. He/she has hit it in a C# application too, but other people have reported to hit it in Occam runtime and Python. The proposed fix fixed it for all of them.
Ubuntu is also tracing this bug (https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1899800), it was fixed using the above mentioned patch in Groovy (20.10), but not yet in Bionic (18.04) and Focal (20.04). It is not affecting Xenial (16.04) because it has an older glibc with a different implementation of the cond variables.
The glibc itself was not fixed yet.

So I believe that if you could use a distro with older glibc (the bug was introduced in glibc 2.27) that has a different implementation of the cond variables (Ubuntu 16.04, CentOS 7, Debian 9, …), this problem should not occur.

@slav
Copy link
Author

slav commented Feb 8, 2021

So far the issue hasn't manifested under Ubuntu 16.04 or Debian 9. It seems the issue is outside of control of .NET dev teams and can be closed.

@janvorli
Copy link
Member

janvorli commented Feb 8, 2021

@slav thank you for confirming that!

@janvorli janvorli closed this as completed Feb 8, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Mar 11, 2021
@hoyosjs hoyosjs added the tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly label Apr 12, 2021
@hoyosjs
Copy link
Member

hoyosjs commented Apr 13, 2021

This is a known issue in glibc 2.27+ in workstealing of the pthread condvars tracked on their side under https://sourceware.org/bugzilla/show_bug.cgi?id=25847. As of now, no shipping OS has a patched glibc.

Possible workarounds:

  • Use alpine as they don't rely on glibc.
  • Use a distro which uses a glibc version prior to 2.27. This includes debian 9, CentOS 7, and Ubuntu 16.04 as janvorli pointed out.
  • If you are using docker containers, this Dockerfile by @mthalman shows how to patch glibc for a 5.0 SDK container which is based in Debian 10:
# Sample ASP.NET Core Dockerfile that builds glibc with the patch for https://sourceware.org/bugzilla/show_bug.cgi?id=25847
# The critical lines to use here are 6-22, 37.

FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build

RUN echo "deb-src http://deb.debian.org/debian buster main" >> /etc/apt/sources.list \
    && echo "deb-src http://security.debian.org/debian-security buster/updates main" >> /etc/apt/sources.list \
    && echo "deb-src http://deb.debian.org/debian buster-updates main" >> /etc/apt/sources.list \
    && apt-get update \
    && apt-get install -y --no-install-recommends \
    dpkg-dev devscripts \
    && apt-get source glibc \
    && apt-get build-dep -y glibc \
    && cd /glibc-* \
    # Apply patch for https://sourceware.org/bugzilla/show_bug.cgi?id=25847
    && curl "https://sourceware.org/bugzilla/attachment.cgi?id=12484&action=diff&collapsed=&headers=1&format=raw" | \
        patch nptl/pthread_cond_wait.c \
    # Disable tests (some fail when run in a container)
    && sed -i 's/\(RUN_TESTSUITE = \)yes/\1no/' debian/rules \
    # Build glibc
    && debuild -b -uc -us \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /source

COPY *.sln .
COPY aspnetapp/*.csproj ./aspnetapp/
RUN dotnet restore

COPY aspnetapp/. ./aspnetapp/
WORKDIR /source/aspnetapp
RUN dotnet publish -c release -o /app --no-restore

FROM mcr.microsoft.com/dotnet/aspnet:5.0
WORKDIR /app
COPY --from=build /app ./
COPY --from=build /glibc-2.28/build-tree/amd64-libc/libc.so /lib/x86_64-linux-gnu/libc-2.28.so
ENTRYPOINT ["dotnet", "aspnetapp.dll"]

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-GC-coreclr tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Projects
None yet
Development

No branches or pull requests

4 participants