Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Let the application author tell us to be idle. #66037

Closed
cshung opened this issue Mar 1, 2022 · 10 comments · Fixed by #69695
Closed

[API Proposal]: Let the application author tell us to be idle. #66037

cshung opened this issue Mar 1, 2022 · 10 comments · Fixed by #69695
Labels
api-approved API was approved in API review, it can be implemented area-GC-coreclr
Milestone

Comments

@cshung
Copy link
Member

cshung commented Mar 1, 2022

Background and motivation

Some of our components consumed some resources that could be freed. For example, the GC keeps some committed memory around to serve allocation requests. This is reasonable if the application is going on actively using memory, but it would be a waste if the application is going to be idle for a long period of time. In the container scenario, it would advantageous for the system as a whole if the idle process relinquishes as many resources as possible to increase deployment density.

API Proposal

I am proposing to add a new enum value on GC.CollectionMode as follows:

public enum GCCollectionMode
{
  // ...
  Maximal
}

When the GC receives this value, it will try to decommit as much memory as it can.

API Usage

GC.Collect(GCCollectionMode.Maximal);

Alternative Designs

Initially, I was proposing an API like this:

bool System.AppContext.Idle { get; set; }

System.EventHandler System.AppContext.Idle.OnIdleChanged(object sender, EventArgs args);

This allowed the individual subsystems (e.g. libraries) to subscribe to the event and perform the necessary release. GC could then be the last one, after all the resources have been released, and then finally we clean up the memory.

The advantage of such an API would be the ease of use for an operator. In that case, he doesn't need to figure out the individual pieces to release resources.

The disadvantage of such an API is that the library authors have to make many policy decisions, such as to what extent to release the resources. It might fit some customers but not fit others. It is difficult to make a one size fit all solution.

We can easily regain the advantage by having a consolidated list to document the best practice for releasing resources and entering the idle state.

Risks

Misuse of this API may lead to frequent commit/decommit of memory, and that is expensive.

@cshung cshung added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 1, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 1, 2022
@ghost
Copy link

ghost commented Mar 2, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Some of our components consumed some resources that could be freed. For example, the GC keeps some committed memory around to serve allocation requests. This is reasonable if the application is going on actively using memory, but it would be a waste if the application is going to be idle for a long period of time. In the container scenario, it would advantageous for the system as a whole if the idle process relinquishes as many resources as possible to increase deployment density.

API Proposal

I am proposing to add a couple of APIs on System.AppContext as follows;

bool System.AppContext.Idle { get; set; }

System.EventHandler System.AppContext.Idle.OnIdleChanged(object sender, EventArgs args);

The exact implementation for this would be TBD. My idea would be to invoke the managed callback first (so that if there are any cache trimming or allocation, that would be served first, and then we callback into the native code to start decommitting memory or other native go idle operations that we would like to do.

API Usage

AppContext.Idle = true;
// ...
// and then later, when we want to wake up
// ...
AppContext.Idle = false;

Alternative Designs

No response

Risks

No response

Author: cshung
Assignees: -
Labels:

api-suggestion, area-System.Runtime, untriaged

Milestone: -

@jkotas
Copy link
Member

jkotas commented Mar 2, 2022

In the container scenario, it would advantageous for the system as a whole if the idle process relinquishes as many resources as possible to increase deployment density.

There is a trade-off between how much to relinquish and how long it is going to take to warm up the process again.

To relinquish as many resources as possible in the container scenario, it is best to shut down the container and start it up again once there is work to do again. It is how serverless containers typically operate.

I think the name and design for this API should be about cache trimming of various levels, not about going idle.

@cshung
Copy link
Member Author

cshung commented Mar 3, 2022

@zackliu
Hopefully, this proposal is going to address your needs.

@zackliu
Copy link

zackliu commented Mar 3, 2022

From our scenario, it exactly meet our requirement. In a cloud service, most of resources are in very low traffic, we can't shutdown the container as they're in use. But they're committed too much memory (Gen0 part) even though they're not really in use even when the whole machine is reaching OOM. So we want an API that we can tell the process to free memory aggressively if we think they're "idle".

@swgriffith
Copy link

In the container scenario, it would advantageous for the system as a whole if the idle process relinquishes as many resources as possible to increase deployment density.

There is a trade-off between how much to relinquish and how long it is going to take to warm up the process again.

To relinquish as many resources as possible in the container scenario, it is best to shut down the container and start it up again once there is work to do again. It is how serverless containers typically operate.

I think the name and design for this API should be about cache trimming of various levels, not about going idle.

Terminating the container works, but if you're running in an orchestrator with autoscaling, like Kubernetes and the Horizontal Pod Autoscaler, the orchestrator is ultimately responsible for terminating the pod. If you try to use memory as the scaling metric then your calculation gets skewed and you may never fully scale down.

For example, I'm running a dotnet 6 app in AKS with a 100Mi request and 100Mi limit on the pod, and the HPA configured to scale based on memory exceeding 70% utilization (i.e. 70Mi). Scale up works fine when I send load to the application. When the load completely stops the pods remain as follows and the HPA never scales down beyond 8 pods:

Pods:
clippy-655576985-2x9b5 96Mi
clippy-655576985-c45sd 51Mi
clippy-655576985-ggtkl 53Mi
clippy-655576985-l5gnw 52Mi
clippy-655576985-mfxr8 83Mi
clippy-655576985-sd9mb 51Mi
clippy-655576985-w4qfm 55Mi
clippy-655576985-xrgb4 56Mi

Kubernetes HPA Calculation
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
8 = CEIL[8 * ((Average Pod Memory = 62)/(100*.70))]

So at some point, not being able to free up memory will cause the HPA to stop scaling down. Ultimately, for this scenario, scaling based on a memory metric isn't a viable option but it would be nice to have that option and I think this API would

@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Mar 14, 2022
@cshung
Copy link
Member Author

cshung commented Mar 17, 2022

@swgriffith, just to confirm. Am I correct to assume that if this API is implemented, you will use that to help with your scaling down issue?

@cshung cshung added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Apr 13, 2022
@cshung cshung added this to the 7.0.0 milestone Apr 13, 2022
@ghost
Copy link

ghost commented Apr 13, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Some of our components consumed some resources that could be freed. For example, the GC keeps some committed memory around to serve allocation requests. This is reasonable if the application is going on actively using memory, but it would be a waste if the application is going to be idle for a long period of time. In the container scenario, it would advantageous for the system as a whole if the idle process relinquishes as many resources as possible to increase deployment density.

API Proposal

I am proposing to add a couple of APIs on System.AppContext as follows;

bool System.AppContext.Idle { get; set; }

System.EventHandler System.AppContext.Idle.OnIdleChanged(object sender, EventArgs args);

The exact implementation for this would be TBD. My idea would be to invoke the managed callback first (so that if there are any cache trimming or allocation, that would be served first, and then we callback into the native code to start decommitting memory or other native go idle operations that we would like to do.

API Usage

AppContext.Idle = true;
// ...
// and then later, when we want to wake up
// ...
AppContext.Idle = false;

Alternative Designs

No response

Risks

No response

Author: cshung
Assignees: -
Labels:

api-suggestion, area-GC-coreclr, api-ready-for-review

Milestone: 7.0.0

@terrajobst
Copy link
Member

terrajobst commented Apr 21, 2022

Video

  • Makes sense
    • We should rename it because Maximal makes it a bit difficult to add more modes later
    • Aggressive seems like a good name
  • What is currently framed as the alternative design seems mostly complementary and helpful for libraries to participate in an overall decommission of resources when an application goes idle
    • @GrabYourPitchforks mentioned that we did a similar API in .NET Framework 4.5 for ASP.NET.
    • This might be especially helpful for containers
namespace System;

public enum GCCollectionMode
{
    // Existing:
    // Default = 0,
    // Forced = 1,
    // Optimized = 2,
    Aggressive = 3
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Apr 21, 2022
@tannergooding tannergooding removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 21, 2022
@plaisted
Copy link

To relinquish as many resources as possible in the container scenario, it is best to shut down the container and start it up again once there is work to do again. It is how serverless containers typically operate.

This is possible for some scenarios but comes with downsides (cold start time, etc). This proposal would really help with workers we have running in kuberenetes reading messages off a queue that sit idle for significant periods of time. After processing a message they hold on to a significant amount of memory (an extra 500+ Mb in some cases) that doesn't get released. We may be able to get around this with a HPA that ignores memory and just checks CPU but would be nice not to have to add that complexity since the workers only actually require 1 milli cpu and ~25 Mb memory to keep warm and listening to the queue.

@cshung cshung mentioned this issue May 23, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 23, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 1, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-GC-coreclr
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants