New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix budgets for dynamic heap count and add smoothing to overhead computation #87618
Fix budgets for dynamic heap count and add smoothing to overhead computation #87618
Conversation
… the total budget remains the same when increasing heap count, but the budget per heap is kept when decreasing heap count. We still respect the minimum size per generation, and we now also do the exponential smoothing for the budget based on the totals per generation, not the per-heap values.
…used to reduce the number of heap count changes where the benefit is small. Fixed issue where we set dd_desired_allocation to 0 which caused issues with heap balancing.
Tagging subscribers to this area: @dotnet/gc Issue DetailsWhen changing heap counts, we used to keep the budgets per heap constant - the heaps coming into service would just inherit the budgets from heap 0. Testing shows this to be inappropriate, as it causes short term peaks in memory consumption when heap count increases quickly. It seems more appropriate therefore to keep total budget (over all heaps) constant, and, similarly, apply exponential smoothing to the total budgets, not the per-heap-budgets. During investigation, it was found that a few more fields in the dynamic_data_table need to be initialized or recomputed when heaps are coming into service. We also found that sometimes heap counts are changed due to small temporary fluctuations in measured GC overhead. The fix is to use a smoothed value to make decisions in situation where the estimated performance difference is small, but keep the median-of-three estimate where it shows a big difference, so we can still react fast in that situation.
|
src/coreclr/gc/gc.cpp
Outdated
@@ -25019,7 +25032,22 @@ void gc_heap::check_heap_count () | |||
|
|||
// the middle element is the median overhead percentage | |||
float median_percent_overhead = percent_overhead[1]; | |||
dprintf (6666, ("median overhead: %d%%", median_percent_overhead)); | |||
|
|||
// apply exponential smoothing over 3 median_percent_overhead readings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should say "apply exponential smoothing and use 1/3 for the smoothing factor"? since it's not just over 3 reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
src/coreclr/gc/gc.cpp
Outdated
const float smoothing_factor = 0.333f; | ||
float smoothed_median_percent_overhead = dynamic_heap_count_data.smoothed_median_percent_overhead; | ||
if (smoothed_median_percent_overhead != 0.0f) | ||
{ | ||
// average it with the previous value | ||
smoothed_median_percent_overhead = median_percent_overhead*smoothing_factor + smoothed_median_percent_overhead*(1.0f - smoothing_factor); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of 0.333f, it seems a bit better to just do median_percent_overhead / smoothing + (smoothed_median_percent_overhead / smoothing) * (smoothing - 1)
like what we do in the exponential_smooth
method. it's consistent and more elegant. or you could refactor the exponential_smooth
method and call it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, guess I was trying to avoid the floating point divisions... but once per GC it hardly matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see; makes sense. another option is to change the exponential_smoothing
implementation, if you want to avoid division... my point was simply that there's an inconsistency.
LGTM. just a couple of comments above. waiting to see results of rerunning benchmarks with this.. |
size_t gen_size = hp->generation_size (gen_idx); | ||
dd_fragmentation (dd) = generation_free_list_space (gen); | ||
assert (gen_size >= dd_fragmentation (dd)); | ||
dd_current_size (dd) = gen_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I missed this one - dd_current_size
does not include fragmentation so this should be dd_current_size (dd) = gen_size - dd_fragmentation (dd);
src/coreclr/gc/gc.cpp
Outdated
int new_n_heaps = n_heaps; | ||
if (median_percent_overhead > 5.0f) | ||
if (median_percent_overhead > 10.0f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should also be smoothed_median_percent_overhead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Peter explained to me this was actually intentional
When changing heap counts, we used to keep the budgets per heap constant - the heaps coming into service would just inherit the budgets from heap 0. Testing shows this to be inappropriate, as it causes short term peaks in memory consumption when heap count increases quickly.
It seems more appropriate therefore to keep total budget (over all heaps) constant, and, similarly, apply exponential smoothing to the total budgets, not the per-heap-budgets.
During investigation, it was found that a few more fields in the dynamic_data_table need to be initialized or recomputed when heaps are coming into service.
We also found that sometimes heap counts are changed due to small temporary fluctuations in measured GC overhead. The fix is to use a smoothed value to make decisions in situation where the estimated performance difference is small, but keep the median-of-three estimate where it shows a big difference, so we can still react fast in that situation.