Improve dynamic heap count #87619

PeterSolMS · 2023-06-15T14:51:39Z

Change the logic around check_heap_count so it's called on a single thread - this eliminates the cost of the join call from check_heap_count. When a change in heap count is found to be necessary, set the gc_start_event to wake up the other GC threads so they can participate in rethreading free list items.

Stress testing this with STRESS_DYNAMIC_HEAP_COUNT showed some issues (mostly assert failures) which are addressed as well.

- Fix hang bug caused by race condition in change_heap_count - Change way we store dynamic heap input metrics to make it easier to surface them via ETW events. - Refactor enter_spin_lock_msl into an inlineable part and a slower, more complex out-of-line part. - Subtract time spent in safe_switch_to_thread and WiatLongerNoInstru from msl wait time - this makes this metric much less noisy. - add more diagnostic output to check_heap_count and change_heap_count. - add more spinning to EnterFinalizeLock to address slow suspensions in some ASP.NET benchmarks.

…ad and change_heap_count on n_heap threads. This substantially reduces the time spent in check_heap_count because we can get rid of the join in that method.

…applies to the total budget, not the budget per heap. This is equivalent, as long the heap count stays the same. If the heap count increases drastically, this change should reduce the amount of overshoot in total gen 0 budget that we see.

- fix assert in gc_thread_function that fired for inactive GC threads. - fix build error in gc_thread_function for builds without dynamic heap count - in trigger_gc_for_alloc, leave soh msl before triggering a collection - fix STRESS_DYNAMIC_HEAP_COUNT issue where we tried to set n_heaps to 0. - fixed initialization issue with dynamic_heap_count_data.new_n_heaps - fixed assert failure for allocation of finalizable objects while heap count is changing. - wrapped gc_lock around the body of GCHeap::GetTotalBytesInUse to ensure n_heaps wouldn't change while we iterate. - removed the gc_lock around the body of GCHeap::ApproxTotalBytesInUse

…neous changes.

ghost · 2023-06-15T14:51:58Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Change the logic around check_heap_count so it's called on a single thread - this eliminates the cost of the join call from check_heap_count. When a change in heap count is found to be necessary, set the gc_start_event to wake up the other GC threads so they can participate in rethreading free list items.

Stress testing this with STRESS_DYNAMIC_HEAP_COUNT showed some issues (mostly assert failures) which are addressed as well.

Author:	PeterSolMS
Assignees:	PeterSolMS
Labels:	`area-GC-coreclr`
Milestone:	-

…tely.

Maoni0 · 2023-06-16T03:10:01Z

src/coreclr/gc/gc.cpp

@@ -6991,7 +6991,8 @@ void gc_heap::gc_thread_function ()

    while (1)
    {
-        assert (!gc_t_join.joined());
+        // inactive GC threads may observe gc_t_join.joined() being true here
+        assert ((n_heaps <= heap_number) || !gc_t_join.joined());


how can joined be true for the inactive GC threads?

gc_t_join.joined() just returns the value of a static field that will flicker based on what the active GC threads are doing, so occasionally this will fire for the inactive threads.

Maoni0 · 2023-06-16T03:18:28Z

src/coreclr/gc/gc.cpp

@@ -24910,6 +24929,8 @@ void gc_heap::recommission_heap()

 void gc_heap::check_heap_count ()
 {
+    dynamic_heap_count_data.new_n_heaps = n_heaps;


since you are doing this here, you don't need to do it again here since it doesn't change inbetween

if (gc_heap::background_running_p()) { // can't have background gc running while we change the number of heaps // so it's useless to compute a new number of heaps here dynamic_heap_count_data.new_n_heaps = n_heaps; } else

You're right. Thanks for spotting!

Maoni0 · 2023-06-16T03:21:05Z

src/coreclr/gc/gc.cpp

+            new_n_heaps = min (dynamic_heap_count_data.lowest_heap_with_msl_uoh, new_n_heaps);
+
+            // but not down to zero, obviously...
+            new_n_heaps = max (new_n_heaps, 1);


Suggested change

new_n_heaps = min (dynamic_heap_count_data.lowest_heap_with_msl_uoh, new_n_heaps);

// but not down to zero, obviously...

new_n_heaps = max (new_n_heaps, 1);

new_n_heaps = min ((dynamic_heap_count_data.lowest_heap_with_msl_uoh + 1), new_n_heaps);

since dynamic_heap_count_data.lowest_heap_with_msl_uoh is a heap index...

Actually, no - the intention of this code is specifically to stress the situation where we are excluding a heap with a taken lock. But when the lowest heap is heap 0, we can't exclude that.

ahh you are right, if it's heap#3, we'd want to exclude heap#3.

Maoni0 · 2023-06-16T03:46:34Z

src/coreclr/gc/gc.cpp

            // to register the object for finalization on the heap it was allocated on
-            hp = acontext->get_alloc_heap()->pGenGCHeap;
-            assert ((newAlloc == nullptr) || (hp == gc_heap::heap_of ((uint8_t*)newAlloc)));
+            hp = (newAlloc == nullptr) ? acontext->get_alloc_heap()->pGenGCHeap : gc_heap::heap_of ((uint8_t*)newAlloc);


looks like you've simply gotten rid of the assert. probably meant something like

// the heap may have changed due to heap balancing - it's important // to register the object for finalization on the heap it was allocated on // but if that heap went out of service it wouldn't be the same hp = acontext->get_alloc_heap()->pGenGCHeap; assert ((newAlloc == nullptr) #ifndef DYNAMIC_HEAP_COUNT || (hp == gc_heap::heap_of ((uint8_t*)newAlloc)) #endif //!DYNAMIC_HEAP_COUNT );

I have observed the assert firing, and I think it was because acontext->get_alloc_heap()->pGenGCHeap went out of service. It wouldn't be good to try to register an object for finalization on that heap.

Thus I think this part: hp = ... gc_heap::heap_of ((uint8_t*)newAlloc) is actually important.

I have #ifdef'd the code so we use the new version (without the assert) in the DYNAMIC_HEAP_COUNT case and the previous code otherwise.

ahh indeed we need to set the correct hp. I did not notice we have the finalization part below this code.

Maoni0 · 2023-06-16T03:49:41Z

now that the synchronization around changing the heap count becomes non trivial, it would be great if you could write a comment for gc_thread_function that talks about when n_heaps update is supposed to be observed by GC threads, when the decommissioned threads are supposed to wake up and etc.

PeterSolMS · 2023-06-16T09:26:55Z

I agree with non-trivial, will add a comment as suggested.

PeterSolMS · 2023-06-21T13:36:21Z

src/coreclr/gc/gc.cpp

@@ -7123,6 +7145,10 @@ void gc_heap::gc_thread_function ()
            {
                gradual_decommit_in_progress_p = decommit_step (DECOMMIT_TIME_STEP_MILLISECONDS);
            }
+#ifdef DYNAMIC_HEAP_COUNT
+            // check if we should adjust the number of heaps
+            check_heap_count();


Note that user threads can actually start executing during the call to check_heap_count. This implies that the more space lock can be taken again.

Maoni0

LGTM!

PeterSolMS added 6 commits June 7, 2023 17:08

Restructure the code so that check_heap_count is executed on one thre…

4662d83

…ad and change_heap_count on n_heap threads. This substantially reduces the time spent in check_heap_count because we can get rid of the join in that method.

Remove budget related changes, which go into a separate PR, undo erro…

32091c0

…neous changes.

Merge with main

9bf3210

PeterSolMS requested review from cshung, Maoni0, mangod9 and mrsharm June 15, 2023 14:51

dotnet-issue-labeler bot added the area-GC-coreclr label Jun 15, 2023

ghost assigned PeterSolMS Jun 15, 2023

Fix build issues - I hadn't removed the budget related changes comple…

eb37008

…tely.

Maoni0 reviewed Jun 16, 2023

View reviewed changes

Address code review feedback.

188dd07

PeterSolMS commented Jun 21, 2023

View reviewed changes

Maoni0 approved these changes Jun 27, 2023

View reviewed changes

PeterSolMS merged commit 1ffb77d into dotnet:main Jun 27, 2023
105 of 108 checks passed

dotnet locked as resolved and limited conversation to collaborators Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve dynamic heap count #87619

Improve dynamic heap count #87619

PeterSolMS commented Jun 15, 2023

ghost commented Jun 15, 2023

Maoni0 Jun 16, 2023

PeterSolMS Jun 16, 2023

Maoni0 Jun 16, 2023

PeterSolMS Jun 16, 2023

Maoni0 Jun 16, 2023

PeterSolMS Jun 16, 2023

Maoni0 Jun 17, 2023

Maoni0 Jun 16, 2023

PeterSolMS Jun 16, 2023

Maoni0 Jun 17, 2023

Maoni0 commented Jun 16, 2023

PeterSolMS commented Jun 16, 2023

PeterSolMS Jun 21, 2023

Maoni0 left a comment

Improve dynamic heap count #87619

Improve dynamic heap count #87619

Conversation

PeterSolMS commented Jun 15, 2023

ghost commented Jun 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Maoni0 commented Jun 16, 2023

PeterSolMS commented Jun 16, 2023

Choose a reason for hiding this comment

Maoni0 left a comment

Choose a reason for hiding this comment