[API Proposal]: Let the application author tell us to be idle. #66037

cshung · 2022-03-01T23:18:25Z

Background and motivation

Some of our components consumed some resources that could be freed. For example, the GC keeps some committed memory around to serve allocation requests. This is reasonable if the application is going on actively using memory, but it would be a waste if the application is going to be idle for a long period of time. In the container scenario, it would advantageous for the system as a whole if the idle process relinquishes as many resources as possible to increase deployment density.

API Proposal

I am proposing to add a new enum value on GC.CollectionMode as follows:

public enum GCCollectionMode
{
  // ...
  Maximal
}

When the GC receives this value, it will try to decommit as much memory as it can.

API Usage

GC.Collect(GCCollectionMode.Maximal);

Alternative Designs

Initially, I was proposing an API like this:

bool System.AppContext.Idle { get; set; }

System.EventHandler System.AppContext.Idle.OnIdleChanged(object sender, EventArgs args);

This allowed the individual subsystems (e.g. libraries) to subscribe to the event and perform the necessary release. GC could then be the last one, after all the resources have been released, and then finally we clean up the memory.

The advantage of such an API would be the ease of use for an operator. In that case, he doesn't need to figure out the individual pieces to release resources.

The disadvantage of such an API is that the library authors have to make many policy decisions, such as to what extent to release the resources. It might fit some customers but not fit others. It is difficult to make a one size fit all solution.

We can easily regain the advantage by having a consolidated list to document the best practice for releasing resources and entering the idle state.

Risks

Misuse of this API may lead to frequent commit/decommit of memory, and that is expensive.

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2022-03-01T23:18:28Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2022-03-02T00:30:56Z

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Some of our components consumed some resources that could be freed. For example, the GC keeps some committed memory around to serve allocation requests. This is reasonable if the application is going on actively using memory, but it would be a waste if the application is going to be idle for a long period of time. In the container scenario, it would advantageous for the system as a whole if the idle process relinquishes as many resources as possible to increase deployment density.

API Proposal

I am proposing to add a couple of APIs on System.AppContext as follows;

bool System.AppContext.Idle { get; set; }

System.EventHandler System.AppContext.Idle.OnIdleChanged(object sender, EventArgs args);

The exact implementation for this would be TBD. My idea would be to invoke the managed callback first (so that if there are any cache trimming or allocation, that would be served first, and then we callback into the native code to start decommitting memory or other native go idle operations that we would like to do.

API Usage

AppContext.Idle = true;
// ...
// and then later, when we want to wake up
// ...
AppContext.Idle = false;

Alternative Designs

No response

Risks

No response

Author:	cshung
Assignees:	-
Labels:	`api-suggestion`, `area-System.Runtime`, `untriaged`
Milestone:	-

jkotas · 2022-03-02T00:44:19Z

In the container scenario, it would advantageous for the system as a whole if the idle process relinquishes as many resources as possible to increase deployment density.

There is a trade-off between how much to relinquish and how long it is going to take to warm up the process again.

To relinquish as many resources as possible in the container scenario, it is best to shut down the container and start it up again once there is work to do again. It is how serverless containers typically operate.

I think the name and design for this API should be about cache trimming of various levels, not about going idle.

cshung · 2022-03-03T06:01:00Z

@zackliu
Hopefully, this proposal is going to address your needs.

zackliu · 2022-03-03T06:34:28Z

From our scenario, it exactly meet our requirement. In a cloud service, most of resources are in very low traffic, we can't shutdown the container as they're in use. But they're committed too much memory (Gen0 part) even though they're not really in use even when the whole machine is reaching OOM. So we want an API that we can tell the process to free memory aggressively if we think they're "idle".

swgriffith · 2022-03-04T19:20:10Z

In the container scenario, it would advantageous for the system as a whole if the idle process relinquishes as many resources as possible to increase deployment density.

There is a trade-off between how much to relinquish and how long it is going to take to warm up the process again.

To relinquish as many resources as possible in the container scenario, it is best to shut down the container and start it up again once there is work to do again. It is how serverless containers typically operate.

I think the name and design for this API should be about cache trimming of various levels, not about going idle.

Terminating the container works, but if you're running in an orchestrator with autoscaling, like Kubernetes and the Horizontal Pod Autoscaler, the orchestrator is ultimately responsible for terminating the pod. If you try to use memory as the scaling metric then your calculation gets skewed and you may never fully scale down.

For example, I'm running a dotnet 6 app in AKS with a 100Mi request and 100Mi limit on the pod, and the HPA configured to scale based on memory exceeding 70% utilization (i.e. 70Mi). Scale up works fine when I send load to the application. When the load completely stops the pods remain as follows and the HPA never scales down beyond 8 pods:

Pods:
clippy-655576985-2x9b5 96Mi
clippy-655576985-c45sd 51Mi
clippy-655576985-ggtkl 53Mi
clippy-655576985-l5gnw 52Mi
clippy-655576985-mfxr8 83Mi
clippy-655576985-sd9mb 51Mi
clippy-655576985-w4qfm 55Mi
clippy-655576985-xrgb4 56Mi

Kubernetes HPA Calculation
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
8 = CEIL[8 * ((Average Pod Memory = 62)/(100*.70))]

So at some point, not being able to free up memory will cause the HPA to stop scaling down. Ultimately, for this scenario, scaling based on a memory metric isn't a viable option but it would be nice to have that option and I think this API would

cshung · 2022-03-17T23:04:25Z

@swgriffith, just to confirm. Am I correct to assume that if this API is implemented, you will use that to help with your scaling down issue?

ghost · 2022-04-13T20:18:28Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Some of our components consumed some resources that could be freed. For example, the GC keeps some committed memory around to serve allocation requests. This is reasonable if the application is going on actively using memory, but it would be a waste if the application is going to be idle for a long period of time. In the container scenario, it would advantageous for the system as a whole if the idle process relinquishes as many resources as possible to increase deployment density.

API Proposal

I am proposing to add a couple of APIs on System.AppContext as follows;

bool System.AppContext.Idle { get; set; }

System.EventHandler System.AppContext.Idle.OnIdleChanged(object sender, EventArgs args);

The exact implementation for this would be TBD. My idea would be to invoke the managed callback first (so that if there are any cache trimming or allocation, that would be served first, and then we callback into the native code to start decommitting memory or other native go idle operations that we would like to do.

API Usage

AppContext.Idle = true;
// ...
// and then later, when we want to wake up
// ...
AppContext.Idle = false;

Alternative Designs

No response

Risks

No response

Author:	cshung
Assignees:	-
Labels:	`api-suggestion`, `area-GC-coreclr`, `api-ready-for-review`
Milestone:	7.0.0

terrajobst · 2022-04-21T17:39:20Z

Video

Makes sense
- We should rename it because Maximal makes it a bit difficult to add more modes later
- Aggressive seems like a good name
What is currently framed as the alternative design seems mostly complementary and helpful for libraries to participate in an overall decommission of resources when an application goes idle
- @GrabYourPitchforks mentioned that we did a similar API in .NET Framework 4.5 for ASP.NET.
- This might be especially helpful for containers

namespace System;

public enum GCCollectionMode
{
    // Existing:
    // Default = 0,
    // Forced = 1,
    // Optimized = 2,
    Aggressive = 3
}

plaisted · 2022-04-21T18:26:12Z

To relinquish as many resources as possible in the container scenario, it is best to shut down the container and start it up again once there is work to do again. It is how serverless containers typically operate.

This is possible for some scenarios but comes with downsides (cold start time, etc). This proposal would really help with workers we have running in kuberenetes reading messages off a queue that sit idle for significant periods of time. After processing a message they hold on to a significant amount of memory (an extra 500+ Mb in some cases) that doesn't get released. We may be able to get around this with a HPA that ignores memory and just checks CPU but would be nice not to have to add that complexity since the workers only actually require 1 milli cpu and ~25 Mb memory to keep warm and listening to the queue.

cshung added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 1, 2022

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 1, 2022

teo-tsirpanis added the area-System.Runtime label Mar 2, 2022

tannergooding removed the untriaged New issue has not been triaged by the area owner label Mar 14, 2022

cshung added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Apr 13, 2022

cshung added this to the 7.0.0 milestone Apr 13, 2022

cshung added area-GC-coreclr and removed area-System.Runtime labels Apr 13, 2022

cshung mentioned this issue Apr 20, 2022

GC does not release memory easily on kubernetes cluster in workstation mode #49317

Closed

terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Apr 21, 2022

tannergooding removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 21, 2022

cshung mentioned this issue May 23, 2022

Aggressive GC #69695

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label May 23, 2022

cshung closed this as completed in #69695 Jun 1, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 1, 2022

ghost locked as resolved and limited conversation to collaborators Jul 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: Let the application author tell us to be idle. #66037

[API Proposal]: Let the application author tell us to be idle. #66037

cshung commented Mar 1, 2022 •

edited

Loading

dotnet-issue-labeler bot commented Mar 1, 2022

ghost commented Mar 2, 2022

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

jkotas commented Mar 2, 2022

cshung commented Mar 3, 2022

zackliu commented Mar 3, 2022

swgriffith commented Mar 4, 2022

cshung commented Mar 17, 2022

ghost commented Apr 13, 2022

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

terrajobst commented Apr 21, 2022 •

edited by dotnet-api-review bot

Loading

plaisted commented Apr 21, 2022

[API Proposal]: Let the application author tell us to be idle. #66037

[API Proposal]: Let the application author tell us to be idle. #66037

Comments

cshung commented Mar 1, 2022 • edited Loading

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

dotnet-issue-labeler bot commented Mar 1, 2022

ghost commented Mar 2, 2022

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

jkotas commented Mar 2, 2022

cshung commented Mar 3, 2022

zackliu commented Mar 3, 2022

swgriffith commented Mar 4, 2022

cshung commented Mar 17, 2022

ghost commented Apr 13, 2022

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

terrajobst commented Apr 21, 2022 • edited by dotnet-api-review bot Loading

plaisted commented Apr 21, 2022

cshung commented Mar 1, 2022 •

edited

Loading

terrajobst commented Apr 21, 2022 •

edited by dotnet-api-review bot

Loading