Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix budgets for dynamic heap count and add smoothing to overhead computation #87618

Merged
merged 11 commits into from Jun 28, 2023

Conversation

PeterSolMS
Copy link
Contributor

When changing heap counts, we used to keep the budgets per heap constant - the heaps coming into service would just inherit the budgets from heap 0. Testing shows this to be inappropriate, as it causes short term peaks in memory consumption when heap count increases quickly.

It seems more appropriate therefore to keep total budget (over all heaps) constant, and, similarly, apply exponential smoothing to the total budgets, not the per-heap-budgets.

During investigation, it was found that a few more fields in the dynamic_data_table need to be initialized or recomputed when heaps are coming into service.

We also found that sometimes heap counts are changed due to small temporary fluctuations in measured GC overhead. The fix is to use a smoothed value to make decisions in situation where the estimated performance difference is small, but keep the median-of-three estimate where it shows a big difference, so we can still react fast in that situation.

… the total budget remains the same when increasing heap count, but the budget per heap is kept when decreasing heap count.

We still respect the minimum size per generation, and we now also do the exponential smoothing for the budget based on the totals per generation, not the per-heap values.
…used to reduce the number of heap count changes where the benefit is small.

Fixed issue where we set dd_desired_allocation to 0 which caused issues with heap balancing.
@ghost
Copy link

ghost commented Jun 15, 2023

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

When changing heap counts, we used to keep the budgets per heap constant - the heaps coming into service would just inherit the budgets from heap 0. Testing shows this to be inappropriate, as it causes short term peaks in memory consumption when heap count increases quickly.

It seems more appropriate therefore to keep total budget (over all heaps) constant, and, similarly, apply exponential smoothing to the total budgets, not the per-heap-budgets.

During investigation, it was found that a few more fields in the dynamic_data_table need to be initialized or recomputed when heaps are coming into service.

We also found that sometimes heap counts are changed due to small temporary fluctuations in measured GC overhead. The fix is to use a smoothed value to make decisions in situation where the estimated performance difference is small, but keep the median-of-three estimate where it shows a big difference, so we can still react fast in that situation.

Author: PeterSolMS
Assignees: PeterSolMS
Labels:

area-GC-coreclr

Milestone: -

@@ -25019,7 +25032,22 @@ void gc_heap::check_heap_count ()

// the middle element is the median overhead percentage
float median_percent_overhead = percent_overhead[1];
dprintf (6666, ("median overhead: %d%%", median_percent_overhead));

// apply exponential smoothing over 3 median_percent_overhead readings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should say "apply exponential smoothing and use 1/3 for the smoothing factor"? since it's not just over 3 reading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Comment on lines 25037 to 25043
const float smoothing_factor = 0.333f;
float smoothed_median_percent_overhead = dynamic_heap_count_data.smoothed_median_percent_overhead;
if (smoothed_median_percent_overhead != 0.0f)
{
// average it with the previous value
smoothed_median_percent_overhead = median_percent_overhead*smoothing_factor + smoothed_median_percent_overhead*(1.0f - smoothing_factor);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of 0.333f, it seems a bit better to just do median_percent_overhead / smoothing + (smoothed_median_percent_overhead / smoothing) * (smoothing - 1) like what we do in the exponential_smooth method. it's consistent and more elegant. or you could refactor the exponential_smooth method and call it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, guess I was trying to avoid the floating point divisions... but once per GC it hardly matters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; makes sense. another option is to change the exponential_smoothing implementation, if you want to avoid division... my point was simply that there's an inconsistency.

@Maoni0
Copy link
Member

Maoni0 commented Jun 16, 2023

LGTM. just a couple of comments above. waiting to see results of rerunning benchmarks with this..

size_t gen_size = hp->generation_size (gen_idx);
dd_fragmentation (dd) = generation_free_list_space (gen);
assert (gen_size >= dd_fragmentation (dd));
dd_current_size (dd) = gen_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I missed this one - dd_current_size does not include fragmentation so this should be dd_current_size (dd) = gen_size - dd_fragmentation (dd);

int new_n_heaps = n_heaps;
if (median_percent_overhead > 5.0f)
if (median_percent_overhead > 10.0f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also be smoothed_median_percent_overhead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peter explained to me this was actually intentional

@PeterSolMS PeterSolMS merged commit a355d5f into dotnet:main Jun 28, 2023
107 checks passed
@dotnet dotnet locked as resolved and limited conversation to collaborators Jul 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants