Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a time-based decay to the linear allocation model. #55174

Merged
merged 3 commits into from
Jul 16, 2021

Conversation

PeterSolMS
Copy link
Contributor

Real world scenario that motivated this allocated a huge amount of data once a day, leading to high gen 0 and gen 1 budgets (several GB).

After this, there was relatively little activity, cause gen 1 budget to take several hours to normalize.

The new logic discounts the previous desired allocation over a period of 5 minutes.

Real world scenario that motivated this allocated a huge amount of data once a day, leading to high gen 0 and gen 1 budgets (several GB).

After this, there was relatively little activity, cause gen 1 budget to take several hours to normalize.

The new logic discounts the previous desired allocation over a period of 5 minutes.
@ghost
Copy link

ghost commented Jul 5, 2021

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Real world scenario that motivated this allocated a huge amount of data once a day, leading to high gen 0 and gen 1 budgets (several GB).

After this, there was relatively little activity, cause gen 1 budget to take several hours to normalize.

The new logic discounts the previous desired allocation over a period of 5 minutes.

Author: PeterSolMS
Assignees: -
Labels:

area-GC-coreclr

Milestone: -

@PeterSolMS
Copy link
Contributor Author

This addresses issue #52592.

@@ -36767,6 +36765,9 @@ size_t gc_heap::desired_new_allocation (dynamic_data* dd,
float f = 0;
size_t max_size = dd_max_size (dd);
size_t new_allocation = 0;
float time_since_previous_collection_secs = (dd_time_clock (dd) - dd_previous_time_clock (dd))*1e-6f;


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove one empty line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, I think I should remove both of the empty lines - no reason to separate the declarations here, I think.

{
if ((allocation_fraction < 0.95) && (allocation_fraction > 0.0))
{
const float decay_time = 5*60.0f; // previous desired allocation expires over 5 minutes
float decay_factor = (decay_time <= time_since_previous_collection_secs) ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we build a test scenario in GCPerfSim to validate that the decay has the desired effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try, but it may be a bit involved.

@Maoni0
Copy link
Member

Maoni0 commented Jul 7, 2021

this seems to be along the same lines of what we are doing for WKS GC in generation_to_condemn where we check against dd_time_clock_interval so I'm wondering if we should take advantage of that, ie, use dd_time_clock_interval (but we should specify different values for Server GC - I would suggest much bigger values there) as decay_time so it's generation aware.

also the way the change is now is modifying the existing behavior, it seems like it would be better to not modify the existing behavior when a GC happens soon enough, and start decaying when the elapsed time between this and last GC (of that generation) is longer than decay_time (and the longer it has been the less effect it should have). what do you think?

@PeterSolMS
Copy link
Contributor Author

I thought about making the decay_time generation aware along the lines you suggested, but didn't see a good reason to do it. Gen 1 is the only generation where we collect routinely without having exhausted the budget.

Regarding your remark about modifying existing behavior, I did this on purpose so a case where gen 1 GCs happen often enough, but very early (i.e. only exhausting a tiny fraction of the gen 1 budget), wouldn't lead to us taking a long time to ramp down the gen 1 budget. As it happens, a decay time of 5 minutes is short enough to fix the test case (which does about 6 gen 1 GCs in the first hour after loading the big chunk of data), but it seems unwise to rely on this - with more load, of course the interval between gen 1 GCs may drop below the decay_time.

@Maoni0
Copy link
Member

Maoni0 commented Jul 8, 2021

I thought about making the decay_time generation aware along the lines you suggested, but didn't see a good reason to do it. Gen 1 is the only generation where we collect routinely without having exhausted the budget.

in high memory load situation we could also collect gen2 often without having exhausted the budget but you could argue that the next gen2 will also happen without having exhausted budget due to the same reason and also gen2's surv rate is usually very high anyway.

Regarding your remark about modifying existing behavior, I did this on purpose so a case where gen 1 GCs happen often enough, but very early (i.e. only exhausting a tiny fraction of the gen 1 budget), wouldn't lead to us taking a long time to ramp down the gen 1 budget. As it happens, a decay time of 5 minutes is short enough to fix the test case (which does about 6 gen 1 GCs in the first hour after loading the big chunk of data), but it seems unwise to rely on this - with more load, of course the interval between gen 1 GCs may drop below the decay_time.

it's always a worry that when you change the existing perf behavior you will regress someone, thus the concern.

also the idea of the fraction is we take some percentage from the current budget and some from the previous budget, but now with decay_factor we are simply discounting some of budget. shouldn't this be

float prev_allocation_factor = (1.0 - allocation_fraction) * decay_factor;

new_allocation = (size_t)((1.0 - prev_allocation_factor) * new_allocation + 
                          prev_allocation_factor * previous_desired_allocation);

if we want to make the previous budget count for less based on the time factor?

@PeterSolMS
Copy link
Contributor Author

also the idea of the fraction is we take some percentage from the current budget and some from the previous budget, but now with decay_factor we are simply discounting some of budget. shouldn't this be

float prev_allocation_factor = (1.0 - allocation_fraction) * decay_factor;

new_allocation = (size_t)((1.0 - prev_allocation_factor) * new_allocation +
prev_allocation_factor * previous_desired_allocation);

I think you are absolutely right. Will change the code accordingly.

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@PeterSolMS PeterSolMS merged commit 0ac3a5b into dotnet:main Jul 16, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants