Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gen0_bricks_cleared flag needs to be propagated when we change heap count #90457

Merged
merged 2 commits into from
Aug 15, 2023

Conversation

Maoni0
Copy link
Member

@Maoni0 Maoni0 commented Aug 12, 2023

Fixes #90452

@cshung discovered that we are hitting the debug break in check_gen0_bricks -

00 (Inline Function) --------`--------     KERNELBASE!DebugBreak [minkernel\kernelbase\debug.c @ 143] 
01 00000015`38c7ef18 00007ff8`4080baa3     KERNELBASE!wil::details::DebugBreak+0x2 [onecore\internal\sdk\inc\wil\opensource\wil\result_macros.h @ 1737] 
02 (Inline Function) --------`--------     coreclr!GCToOSInterface::DebugBreak+0x6 [C:\Dev\runtime\src\coreclr\gc\windows\gcenv.windows.cpp @ 652] 
03 00000015`38c7ef20 00007ff8`4087cb33     coreclr!SVR::gc_heap::check_gen0_bricks+0x8f [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 12475] 
04 00000015`38c7ef50 00007ff8`4087bd00     coreclr!SVR::gc_heap::garbage_collect+0x57 [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 23687] 
05 00000015`38c7f290 00007ff8`4087ba1c     coreclr!SVR::gc_heap::gc_thread_function+0x2e0 [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 7119] 
06 00000015`38c7f6e0 00007ff8`408cc5c4     coreclr!SVR::gc_heap::gc_thread_stub+0x14c [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 36813] 
07 (Inline Function) --------`--------     coreclr!`anonymous-namespace'::CreateNonSuspendableThread::__l2::<lambda_510aaaafbe1c92975f9e29788eb5579d>::operator()+0x65 [C:\Dev\runtime\src\coreclr\vm\gcenv.ee.cpp @ 1490] 
08 00000015`38c7f8d0 00007ff8`f5f17614     coreclr!<lambda_510aaaafbe1c92975f9e29788eb5579d>::<lambda_invoker_cdecl>+0x74 [C:\Dev\runtime\src\coreclr\vm\gcenv.ee.cpp @ 1492] 
09 00000015`38c7f900 00007ff8`f68826b1     KERNEL32!BaseThreadInitThunk+0x14 [clientcore\base\win32\client\thread.c @ 64] 
0a 00000015`38c7f930 00000000`00000000     ntdll!RtlUserThreadStart+0x21 [minkernel\ntdll\rtlstrt.c @ 1153]

What happened was when we change the heap count, in heap X we get a region from heap Y and the gen0_bricks_cleared flag from Y says false but heap X says true. So when we check the bricks on heap X, we assume it’s true but it’s not. I can see this because the region check_gen0_bricks complains about is

0:013> .frame 3
03 00000070`6597f430 00007ffc`d8026aee     clrgc!SVR::gc_heap::check_gen0_bricks+0x65 [D:\runtime-datas\src\coreclr\gc\gc.cpp @ 12632] 
0:013> ?? gen0_region
class SVR::heap_segment * 0x000001b1`5340bed8

I added some dprintfs to print the gen0 regions on the old heaps with this flag as false and this is a region coming from heap#2 whose gen0_bricks_cleared flag was false -

[68756]trying to change heap count 24 -> 8
[68756]h2 g0 1b15340bed8 // print out gen0 regions from heaps whose gen0_bricks_cleared was false
[68756]g0: total budget: 62898032 budget per heap: 2620752
[68756]g1: total budget: 6283176 budget per heap: 261800
[68756]g2: total budget: 6283224 budget per heap: 261808
[68756]g3: total budget: 75497472 budget per heap: 3145728
[68756]g4: total budget: 75497472 budget per heap: 3145728

the fix is to detect if any heap has this flag as false and if so make all heaps’ flag false (tracking which region is moved from which other heap is something we need additional recording for and it’s not really worth doing just for this)

@ghost
Copy link

ghost commented Aug 12, 2023

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

@cshung discovered that we are hitting the debug break in check_gen0_bricks -

00 (Inline Function) --------`--------     KERNELBASE!DebugBreak [minkernel\kernelbase\debug.c @ 143] 
01 00000015`38c7ef18 00007ff8`4080baa3     KERNELBASE!wil::details::DebugBreak+0x2 [onecore\internal\sdk\inc\wil\opensource\wil\result_macros.h @ 1737] 
02 (Inline Function) --------`--------     coreclr!GCToOSInterface::DebugBreak+0x6 [C:\Dev\runtime\src\coreclr\gc\windows\gcenv.windows.cpp @ 652] 
03 00000015`38c7ef20 00007ff8`4087cb33     coreclr!SVR::gc_heap::check_gen0_bricks+0x8f [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 12475] 
04 00000015`38c7ef50 00007ff8`4087bd00     coreclr!SVR::gc_heap::garbage_collect+0x57 [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 23687] 
05 00000015`38c7f290 00007ff8`4087ba1c     coreclr!SVR::gc_heap::gc_thread_function+0x2e0 [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 7119] 
06 00000015`38c7f6e0 00007ff8`408cc5c4     coreclr!SVR::gc_heap::gc_thread_stub+0x14c [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 36813] 
07 (Inline Function) --------`--------     coreclr!`anonymous-namespace'::CreateNonSuspendableThread::__l2::<lambda_510aaaafbe1c92975f9e29788eb5579d>::operator()+0x65 [C:\Dev\runtime\src\coreclr\vm\gcenv.ee.cpp @ 1490] 
08 00000015`38c7f8d0 00007ff8`f5f17614     coreclr!<lambda_510aaaafbe1c92975f9e29788eb5579d>::<lambda_invoker_cdecl>+0x74 [C:\Dev\runtime\src\coreclr\vm\gcenv.ee.cpp @ 1492] 
09 00000015`38c7f900 00007ff8`f68826b1     KERNEL32!BaseThreadInitThunk+0x14 [clientcore\base\win32\client\thread.c @ 64] 
0a 00000015`38c7f930 00000000`00000000     ntdll!RtlUserThreadStart+0x21 [minkernel\ntdll\rtlstrt.c @ 1153]

What happened was when we change the heap count, in heap X we get a region from heap Y and the gen0_bricks_cleared flag from Y says false but heap X says true. So when we check the bricks on heap X, we assume it’s true but it’s not. I can see this because the region check_gen0_bricks complains about is

0:013> .frame 3
03 00000070`6597f430 00007ffc`d8026aee     clrgc!SVR::gc_heap::check_gen0_bricks+0x65 [D:\runtime-datas\src\coreclr\gc\gc.cpp @ 12632] 
0:013> ?? gen0_region
class SVR::heap_segment * 0x000001b1`5340bed8

I added some dprintfs to print the gen0 regions on the old heaps with this flag as false and this is a region coming from heap#2 whose gen0_bricks_cleared flag was false -

[68756]trying to change heap count 24 -> 8
[68756]h2 g0 1b15340bed8 // print out gen0 regions from heaps whose gen0_bricks_cleared was false
[68756]g0: total budget: 62898032 budget per heap: 2620752
[68756]g1: total budget: 6283176 budget per heap: 261800
[68756]g2: total budget: 6283224 budget per heap: 261808
[68756]g3: total budget: 75497472 budget per heap: 3145728
[68756]g4: total budget: 75497472 budget per heap: 3145728

the fix is to detect if any heap has this flag as false and if so make all heaps’ flag false (tracking which region is moved from which other heap is something we need additional recording for and it’s not really worth doing just for this)

Author: Maoni0
Assignees: Maoni0
Labels:

area-GC-coreclr

Milestone: -

Copy link
Member

@cshung cshung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the assessment. Since this happens only during heap count changes (which shouldn't happen frequently), the increased latency due to the clearing of the bricks is probably acceptable.

src/coreclr/gc/gc.cpp Show resolved Hide resolved
src/coreclr/gc/gc.cpp Show resolved Hide resolved
@Maoni0 Maoni0 merged commit 55c896f into dotnet:main Aug 15, 2023
109 checks passed
@ghost ghost locked as resolved and limited conversation to collaborators Sep 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assert failure: brick_table[b] != 0
2 participants