Fix budgets for dynamic heap count and add smoothing to overhead computation #87618

PeterSolMS · 2023-06-15T14:38:05Z

When changing heap counts, we used to keep the budgets per heap constant - the heaps coming into service would just inherit the budgets from heap 0. Testing shows this to be inappropriate, as it causes short term peaks in memory consumption when heap count increases quickly.

It seems more appropriate therefore to keep total budget (over all heaps) constant, and, similarly, apply exponential smoothing to the total budgets, not the per-heap-budgets.

During investigation, it was found that a few more fields in the dynamic_data_table need to be initialized or recomputed when heaps are coming into service.

We also found that sometimes heap counts are changed due to small temporary fluctuations in measured GC overhead. The fix is to use a smoothed value to make decisions in situation where the estimated performance difference is small, but keep the median-of-three estimate where it shows a big difference, so we can still react fast in that situation.

… the total budget remains the same when increasing heap count, but the budget per heap is kept when decreasing heap count. We still respect the minimum size per generation, and we now also do the exponential smoothing for the budget based on the totals per generation, not the per-heap values.

…used to reduce the number of heap count changes where the benefit is small. Fixed issue where we set dd_desired_allocation to 0 which caused issues with heap balancing.

ghost · 2023-06-15T14:38:16Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

When changing heap counts, we used to keep the budgets per heap constant - the heaps coming into service would just inherit the budgets from heap 0. Testing shows this to be inappropriate, as it causes short term peaks in memory consumption when heap count increases quickly.

It seems more appropriate therefore to keep total budget (over all heaps) constant, and, similarly, apply exponential smoothing to the total budgets, not the per-heap-budgets.

During investigation, it was found that a few more fields in the dynamic_data_table need to be initialized or recomputed when heaps are coming into service.

We also found that sometimes heap counts are changed due to small temporary fluctuations in measured GC overhead. The fix is to use a smoothed value to make decisions in situation where the estimated performance difference is small, but keep the median-of-three estimate where it shows a big difference, so we can still react fast in that situation.

Author:	PeterSolMS
Assignees:	PeterSolMS
Labels:	`area-GC-coreclr`
Milestone:	-

Maoni0 · 2023-06-16T05:26:19Z

src/coreclr/gc/gc.cpp

@@ -25019,7 +25032,22 @@ void gc_heap::check_heap_count ()

            // the middle element is the median overhead percentage
            float median_percent_overhead = percent_overhead[1];
-            dprintf (6666, ("median overhead: %d%%", median_percent_overhead));
+
+            // apply exponential smoothing over 3 median_percent_overhead readings


should say "apply exponential smoothing and use 1/3 for the smoothing factor"? since it's not just over 3 reading.

Maoni0 · 2023-06-16T05:29:49Z

src/coreclr/gc/gc.cpp

+            const float smoothing_factor = 0.333f;
+            float smoothed_median_percent_overhead = dynamic_heap_count_data.smoothed_median_percent_overhead;
+            if (smoothed_median_percent_overhead != 0.0f)
+            {
+                // average it with the previous value
+                smoothed_median_percent_overhead = median_percent_overhead*smoothing_factor + smoothed_median_percent_overhead*(1.0f - smoothing_factor);
+            }


instead of 0.333f, it seems a bit better to just do median_percent_overhead / smoothing + (smoothed_median_percent_overhead / smoothing) * (smoothing - 1) like what we do in the exponential_smooth method. it's consistent and more elegant. or you could refactor the exponential_smooth method and call it here.

Ok, guess I was trying to avoid the floating point divisions... but once per GC it hardly matters.

I see; makes sense. another option is to change the exponential_smoothing implementation, if you want to avoid division... my point was simply that there's an inconsistency.

Maoni0 · 2023-06-16T05:30:45Z

LGTM. just a couple of comments above. waiting to see results of rerunning benchmarks with this..

Maoni0 · 2023-06-18T06:36:02Z

src/coreclr/gc/gc.cpp

+                size_t gen_size = hp->generation_size (gen_idx);
+                dd_fragmentation (dd) = generation_free_list_space (gen);
+                assert (gen_size >= dd_fragmentation (dd));
+                dd_current_size (dd) = gen_size;


sorry I missed this one - dd_current_size does not include fragmentation so this should be dd_current_size (dd) = gen_size - dd_fragmentation (dd);

Maoni0 · 2023-06-20T07:12:54Z

src/coreclr/gc/gc.cpp

            int new_n_heaps = n_heaps;
-            if (median_percent_overhead > 5.0f)
+            if (median_percent_overhead > 10.0f)


this should also be smoothed_median_percent_overhead

Peter explained to me this was actually intentional

PeterSolMS added 5 commits June 13, 2023 18:10

Set desired allocation and current size appropriately.

1bc86b2

Remove erroneous changes.

60000be

Merge branch 'main' into Fix_budgets_for_dynamic_heap_count

37d496d

Add exponential smoothing to the measure GC overhead value - this is …

48a5fd7

…used to reduce the number of heap count changes where the benefit is small. Fixed issue where we set dd_desired_allocation to 0 which caused issues with heap balancing.

PeterSolMS requested a review from Maoni0 June 15, 2023 14:38

dotnet-issue-labeler bot added the area-GC-coreclr label Jun 15, 2023

ghost assigned PeterSolMS Jun 15, 2023

PeterSolMS requested review from cshung, mrsharm and mangod9 June 15, 2023 14:38

Undo logging changes.

819469b

Maoni0 reviewed Jun 16, 2023

View reviewed changes

Address code review feedback.

fd52903

Maoni0 reviewed Jun 18, 2023

View reviewed changes

Maoni0 reviewed Jun 20, 2023

View reviewed changes

Address code review feedback.

54cee2e

Maoni0 approved these changes Jun 27, 2023

View reviewed changes

PeterSolMS added 3 commits June 27, 2023 12:20

Merge with main.

c703a76

Fix merge issues.

878c420

Fix some merge issues.

deb6819

PeterSolMS merged commit a355d5f into dotnet:main Jun 28, 2023
107 checks passed

dotnet locked as resolved and limited conversation to collaborators Jul 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix budgets for dynamic heap count and add smoothing to overhead computation #87618

Fix budgets for dynamic heap count and add smoothing to overhead computation #87618

PeterSolMS commented Jun 15, 2023

ghost commented Jun 15, 2023

Maoni0 Jun 16, 2023

PeterSolMS Jun 16, 2023

Maoni0 Jun 16, 2023

PeterSolMS Jun 16, 2023

Maoni0 Jun 18, 2023

Maoni0 commented Jun 16, 2023

Maoni0 Jun 18, 2023

Maoni0 Jun 20, 2023

Maoni0 Jun 20, 2023

Fix budgets for dynamic heap count and add smoothing to overhead computation #87618

Fix budgets for dynamic heap count and add smoothing to overhead computation #87618

Conversation

PeterSolMS commented Jun 15, 2023

ghost commented Jun 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Maoni0 commented Jun 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment