New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve dynamic heap count #87619
Improve dynamic heap count #87619
Conversation
- Fix hang bug caused by race condition in change_heap_count - Change way we store dynamic heap input metrics to make it easier to surface them via ETW events. - Refactor enter_spin_lock_msl into an inlineable part and a slower, more complex out-of-line part. - Subtract time spent in safe_switch_to_thread and WiatLongerNoInstru from msl wait time - this makes this metric much less noisy. - add more diagnostic output to check_heap_count and change_heap_count. - add more spinning to EnterFinalizeLock to address slow suspensions in some ASP.NET benchmarks.
…ad and change_heap_count on n_heap threads. This substantially reduces the time spent in check_heap_count because we can get rid of the join in that method.
…applies to the total budget, not the budget per heap. This is equivalent, as long the heap count stays the same. If the heap count increases drastically, this change should reduce the amount of overshoot in total gen 0 budget that we see.
- fix assert in gc_thread_function that fired for inactive GC threads. - fix build error in gc_thread_function for builds without dynamic heap count - in trigger_gc_for_alloc, leave soh msl before triggering a collection - fix STRESS_DYNAMIC_HEAP_COUNT issue where we tried to set n_heaps to 0. - fixed initialization issue with dynamic_heap_count_data.new_n_heaps - fixed assert failure for allocation of finalizable objects while heap count is changing. - wrapped gc_lock around the body of GCHeap::GetTotalBytesInUse to ensure n_heaps wouldn't change while we iterate. - removed the gc_lock around the body of GCHeap::ApproxTotalBytesInUse
Tagging subscribers to this area: @dotnet/gc Issue DetailsChange the logic around check_heap_count so it's called on a single thread - this eliminates the cost of the join call from check_heap_count. When a change in heap count is found to be necessary, set the gc_start_event to wake up the other GC threads so they can participate in rethreading free list items. Stress testing this with STRESS_DYNAMIC_HEAP_COUNT showed some issues (mostly assert failures) which are addressed as well.
|
@@ -6991,7 +6991,8 @@ void gc_heap::gc_thread_function () | |||
|
|||
while (1) | |||
{ | |||
assert (!gc_t_join.joined()); | |||
// inactive GC threads may observe gc_t_join.joined() being true here | |||
assert ((n_heaps <= heap_number) || !gc_t_join.joined()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can joined be true for the inactive GC threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gc_t_join.joined()
just returns the value of a static field that will flicker based on what the active GC threads are doing, so occasionally this will fire for the inactive threads.
src/coreclr/gc/gc.cpp
Outdated
@@ -24910,6 +24929,8 @@ void gc_heap::recommission_heap() | |||
|
|||
void gc_heap::check_heap_count () | |||
{ | |||
dynamic_heap_count_data.new_n_heaps = n_heaps; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you are doing this here, you don't need to do it again here since it doesn't change inbetween
if (gc_heap::background_running_p())
{
// can't have background gc running while we change the number of heaps
// so it's useless to compute a new number of heaps here
dynamic_heap_count_data.new_n_heaps = n_heaps;
}
else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Thanks for spotting!
src/coreclr/gc/gc.cpp
Outdated
new_n_heaps = min (dynamic_heap_count_data.lowest_heap_with_msl_uoh, new_n_heaps); | ||
|
||
// but not down to zero, obviously... | ||
new_n_heaps = max (new_n_heaps, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_n_heaps = min (dynamic_heap_count_data.lowest_heap_with_msl_uoh, new_n_heaps); | |
// but not down to zero, obviously... | |
new_n_heaps = max (new_n_heaps, 1); | |
new_n_heaps = min ((dynamic_heap_count_data.lowest_heap_with_msl_uoh + 1), new_n_heaps); |
since dynamic_heap_count_data.lowest_heap_with_msl_uoh
is a heap index...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no - the intention of this code is specifically to stress the situation where we are excluding a heap with a taken lock. But when the lowest heap is heap 0, we can't exclude that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh you are right, if it's heap#3, we'd want to exclude heap#3.
src/coreclr/gc/gc.cpp
Outdated
// to register the object for finalization on the heap it was allocated on | ||
hp = acontext->get_alloc_heap()->pGenGCHeap; | ||
assert ((newAlloc == nullptr) || (hp == gc_heap::heap_of ((uint8_t*)newAlloc))); | ||
hp = (newAlloc == nullptr) ? acontext->get_alloc_heap()->pGenGCHeap : gc_heap::heap_of ((uint8_t*)newAlloc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you've simply gotten rid of the assert. probably meant something like
// the heap may have changed due to heap balancing - it's important
// to register the object for finalization on the heap it was allocated on
// but if that heap went out of service it wouldn't be the same
hp = acontext->get_alloc_heap()->pGenGCHeap;
assert ((newAlloc == nullptr)
#ifndef DYNAMIC_HEAP_COUNT
|| (hp == gc_heap::heap_of ((uint8_t*)newAlloc))
#endif //!DYNAMIC_HEAP_COUNT
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have observed the assert firing, and I think it was because acontext->get_alloc_heap()->pGenGCHeap
went out of service. It wouldn't be good to try to register an object for finalization on that heap.
Thus I think this part: hp = ... gc_heap::heap_of ((uint8_t*)newAlloc)
is actually important.
I have #ifdef'd the code so we use the new version (without the assert) in the DYNAMIC_HEAP_COUNT case and the previous code otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh indeed we need to set the correct hp. I did not notice we have the finalization part below this code.
now that the synchronization around changing the heap count becomes non trivial, it would be great if you could write a comment for |
I agree with non-trivial, will add a comment as suggested. |
@@ -7123,6 +7145,10 @@ void gc_heap::gc_thread_function () | |||
{ | |||
gradual_decommit_in_progress_p = decommit_step (DECOMMIT_TIME_STEP_MILLISECONDS); | |||
} | |||
#ifdef DYNAMIC_HEAP_COUNT | |||
// check if we should adjust the number of heaps | |||
check_heap_count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that user threads can actually start executing during the call to check_heap_count. This implies that the more space lock can be taken again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Change the logic around check_heap_count so it's called on a single thread - this eliminates the cost of the join call from check_heap_count. When a change in heap count is found to be necessary, set the gc_start_event to wake up the other GC threads so they can participate in rethreading free list items.
Stress testing this with STRESS_DYNAMIC_HEAP_COUNT showed some issues (mostly assert failures) which are addressed as well.