Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: GC.RefreshMemoryLimit #70601

Closed
cshung opened this issue Jun 10, 2022 · 29 comments
Closed

[API Proposal]: GC.RefreshMemoryLimit #70601

cshung opened this issue Jun 10, 2022 · 29 comments
Labels
api-approved API was approved in API review, it can be implemented api-suggestion Early API idea and discussion, it is NOT ready for implementation area-GC-coreclr
Milestone

Comments

@cshung
Copy link
Member

cshung commented Jun 10, 2022

Background and motivation

In a cloud service scenario, demand comes and goes. To be cost-effective, it is important that our services can scale up and scale down on resource consumption as the demand fluctuates.

While the GC is capable of de-commit unused memory (using aggressive GC manually or waiting until the automatic aging de-commit kicks in), there is no guarantee that the application is not going to allocate more as requests come back. It is much safer if we allow the container to shrink its memory limit after the scale down happened so that the operator can put more instances on the box without worrying about running out of memory just because the GC heuristics believe they are more memory than it is.

As of now, we can change the container limit as we wish, but the GC won't notice. Here I propose an API to notify the GC when that happens.

API Proposal

namespace System;
public static class GC
{
        /// <summary>
        ///
        /// Instructs the Garbage Collector to reconfigure itself by detecting the various memory limits on the system.
        ///
        /// In addition to actual physical memory limit and container limit settings, these configuration settings can be overwritten:
        ///
        /// - GCHeapHardLimit
        /// - GCHeapHardLimitPercent
        /// - GCHeapHardLimitSOH
        /// - GCHeapHardLimitLOH
        /// - GCHeapHardLimitPOH
        /// - GCHeapHardLimitSOHPercent
        /// - GCHeapHardLimitLOHPercent
        /// - GCHeapHardLimitPOHPercent
        ///
        /// Instead of updating the environment variable (which will not be read), these are overridden setting a ulong value in the AppContext.
        ///
        /// For example, you can use AppContext.SetData("GCHeapHardLimit", (ulong) 100 * 1024 * 1024) to override the GCHeapHardLimit to a 100M.
        ///
        /// As a limitation, for 32-bit systems (e.g. x86, ARM) or OSX, if we do not already have a heap hard limit set, we will not establish a new heap hard limit.
        ///
        /// As of now, this API is feature preview only and subject to changes as necessary.
        ///
        /// </summary>
        ///
        /// <returns> A status code to indicate the operation's result:
        /// 0 - The refresh operation succeed
        /// 1 - The operation failed because the new memory limit is too low
        /// 2 - The new hard limit configuration settings is invalid
        ///
        /// We reserve the rights to add more status code as we evolve the implementation.
        ///
        /// </returns>
        [System.Runtime.Versioning.RequiresPreviewFeaturesAttribute("RefreshMemoryLimit is in preview.")]
        public static int RefreshMemoryLimit()
}

API Usage

GC.RefreshMemoryLimit();

Alternative Designs

An alternative design is to have some system mechanism that do that automatically when the container memory limit change happens. This will involve

  1. Non-trivial work with various containers that we support, and
  2. No way to communicate if the container size reduction is infeasible (*).

The API is meant to be used by people who host services. Therefore it makes sense to make sure this API is CLR Hosting API friendly.

Risks

(*) We reserve the right to refuse the container size reduction when it is infeasible (it might be too tight, leaving the GC no room to maneuver).

But the operator might have already reduced the container size - hmm - we need to do something about that.

Certain things can't easily change - it is easy to say that we wanted to reduce the number of cores as well, but it would be hard to implement changing the number of heaps.

Here is a proof-of-concept prototype, feel free to try it out and leave us feedback.

@cshung cshung added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-GC-coreclr labels Jun 10, 2022
@cshung cshung added this to the Future milestone Jun 10, 2022
@ghost
Copy link

ghost commented Jun 10, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

In a cloud service scenario, demand comes and goes. To be cost-effective, it is important that our services can scale up and scale down on resource consumption as the demand fluctuates.

While the GC is capable of de-commit unused memory (using aggressive GC manually or waiting until the automatic aging de-commit kicks in), there is no guarantee that the application is not going to allocate more as requests come back. It is much safer if we allow the container to shrink its memory limit after the scale down happened so that the operator can put more instances on the box without worrying about running out of memory just because the GC heuristics believe they are more memory than it is.

As of now, we can change the container limit as we wish, but the GC won't notice. Here I propose an API to notify the GC when that happens.

API Proposal

namespace System;
public static class GC
{
  // ...
  bool RefreshMemoryLimit();
}

API Usage

GC.RefreshMemoryLimit();

Alternative Designs

An alternative design is to have some system mechanism that do that automatically when the container memory limit change happens. This will involve

  1. Non-trivial work with various containers that we support, and
  2. No way to communicate if the container size reduction is infeasible (*).

Risks

(*) We reserve the right to refuse the container size reduction when it is infeasible (it might be too tight, leaving the GC no room to maneuver).

But the operator might have already reduced the container size - hmm - we need to do something about that.

Certain things can't easily change - it is easy to say that we wanted to reduce the number of cores as well, but it would be hard to implement changing the number of heaps.

Here is a proof-of-concept prototype, feel free to try it out and leave us feedback.

Author: cshung
Assignees: -
Labels:

api-suggestion, area-GC-coreclr

Milestone: Future

@zackliu
Copy link

zackliu commented Jun 13, 2022

Reloading memory is awesome. I support the manual API instead of auto refreshing.

@davidni
Copy link
Contributor

davidni commented Jul 18, 2022

Related: Environment.ProcessorCount suffers from a similar issue. The result is cached on first access and the runtime assumes the number of processors cannot change, which is false. Relevant source (I think): util.cpp:GetCurrentProcessCpuCount().

See also: Environment.ProcessorCount behavior on Windows

@normj
Copy link

normj commented Mar 17, 2023

This API would be important for us at AWS. For .NET AWS Lambda functions the .NET runtime does not understand the memory limits put in place by the Lambda runtime. If we had this method available to us we could as part of our .NET Lambda startup code take the value from the AWS_LAMBDA_FUNCTION_MEMORY_SIZE environment variable set by the Lambda service into a value for DOTNET_GCHeapHardLimit and then call GC.RefreshMemoryLimit();

@cshung
Copy link
Member Author

cshung commented Apr 10, 2023

@normj, as I tried to implement the logic to read the DOTNET_GCHeapHardLimit during RefreshMemoryLimit, I am hitting issues with CLRConfig. In particular, the current config subsystem is optimized for loading environment variables only once during process startup, fixing that will require ugly hacks and impact the whole subsystem.

After discussing with @AaronRobinsonMSFT, we believe the better approach is to communicate these values through AppContext.SetData API here, will that work for you?

@normj
Copy link

normj commented Apr 13, 2023

@cshung If I understand correctly I could detect what the actual memory limit is based on the Lambda runtime environment and then call something like the following

long lambdaMemoryLimit = determineLambdaMemory();


AppContext.SetData("GCHeapHardLimit", lambdaMemoryLimit);

That seems cleaner then setting an in-process environment variable and then pinging the .NET runtime to go take another look at the environment variables.

@normj
Copy link

normj commented Apr 13, 2023

Out of curiosity assuming the memory value being set is still significantly greater then what is currently being consumed will there be any performance cost setting a new memory limit?

@cshung
Copy link
Member Author

cshung commented Apr 15, 2023

Out of curiosity assuming the memory value being set is still significantly greater than what is currently being consumed will there be any performance cost in setting a new memory limit?

In the short term - the act of setting the limit itself won't cost much, all it does is walk the heap data structures and set some integer fields.

In the long term - obviously - how the application performs after switching the limits depends on what the application does.

One way to think about the problem is to think of this API as switching gears. How fast the car runs depends on a lot of stuff, how much gas you give (i.e. allocations), whether are you running against a slope (i.e. survivals), and what gears you are on. Your car probably won't slow down right away when you tune down the gears, but hopefully, eventually, it will as the car moved longer and the pedals are off.

I'd love to see that after refreshing the memory limit, after a reasonable amount of time to let the current situation change, the application should behave as if the memory limit was specified during startup.

While that's an ideal situation I would like to see, the reality may or may not be what we wanted it to be. The best way to tell is to actually experiment with it.

@normj
Copy link

normj commented Apr 17, 2023

Thanks for the info @cshung. In AWS Lambda we are most sensitive to anything that affects startup which also includes bootstrapping the Lambda runtime. When we have this API we would make the call really early in our startup process so at that point there shouldn't be much memory usage. So I'm hoping us telling the .NET runtime the true memory limit early in the startup phase should have negligible affect to startup performance.

@cshung
Copy link
Member Author

cshung commented Apr 19, 2023

@normj, @davidni, @zackliu: A private implementation of the API is now implemented and merged in the main branch as #83707, it will be available as part of .NET 8 Preview 4.

This is currently implemented as a private API to allow trying out without committing to the API shape just yet. Please try it out and let me know if you hit any issues.

@normj
Copy link

normj commented Apr 19, 2023

@cshung My use cases for needing this API are on Linux so the current limitation for the PR being for windows only will make it hard to meaningfully try this out.

@cshung
Copy link
Member Author

cshung commented Apr 19, 2023

@cshung My use cases for needing this API are on Linux so the current limitation for the PR being for Windows only will make it hard to meaningfully try this out.

The code should work for all platforms since the change is not platform specific. In fact, it compiles and passed all the existing tests we have for all platforms we support. The PR comment was inaccurate and I updated it. The comment really meant I haven't got a setup to test it on Linux/OSX yet. If you could help and see if that actually works on Linux for your scenario, that would be great.

@normj
Copy link

normj commented Apr 19, 2023

@cshung Great! I'll give it a try when preview 4 is released

@cshung cshung modified the milestones: Future, 8.0.0 May 1, 2023
@cshung cshung added the api-ready-for-review API is ready for review, it is NOT ready for implementation label May 1, 2023
@bartonjs
Copy link
Member

bartonjs commented May 9, 2023

Video

  • We generally believe that the API should be returning an enum, instead of having the docs describing integer returns. But, given that there's a proposal that it should actually be using HRESULT on the basis that it's expected to be called by native hosts (and HRESULT is the host interface response, even on non-Windows), int seems OK.
namespace System;
public static partial class GC
{
    [System.Runtime.Versioning.RequiresPreviewFeaturesAttribute("RefreshMemoryLimit is in preview.")]
    public static int RefreshMemoryLimit()
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 9, 2023
@jkotas
Copy link
Member

jkotas commented May 9, 2023

it should actually be using HRESULT on the basis that it's expected to be called by native hosts (and HRESULT is the host interface response, even on non-Windows), int seems OK.

This is not how we are designing hosting APIs these days. Anything people need for hosting, we design it as first-class managed API first and then tell people to wrap it in an interop wrapper for hosting

If we add this API in the current shape, it is creating bad precent for introducing unnaturally looking APIs any time somebody wants to call something from unmanaged host.

@bartonjs
Copy link
Member

bartonjs commented May 9, 2023

@jkotas So you think the enum approach is better, then?

@jkotas
Copy link
Member

jkotas commented May 9, 2023

The return value indicates error conditions. The API should return void, and appropriate exception (e.g. InvalidOperationException) should be thrown for error conditions. We just had an offline discussion about it. I assume that you have no issues with changing the return value to void and indicate errors by throwing exceptions, correct?

@stephentoub
Copy link
Member

I assume that you have no issues with changing the return value to void and indicate errors by throwing exceptions, correct?

That was the first thing I asked for in the API review meeting :) but was told we couldn't do that because common usage would be from native code via hosting APIs and exception throwing would be problematic for that. If that's not the case, please do make it void and throw for errors.

@cshung
Copy link
Member Author

cshung commented May 9, 2023

We are still planning to call this API through the hosting API, but we are working towards a scheme that would allow the caller to catch the exception by having some sort of injected managed wrapper functions.

Once we have got this sorted out we can move towards using exceptions instead of error codes.

@normj
Copy link

normj commented May 18, 2023

I'm trying to test this out as part of Preview 4 but I'm not seeing any changes. I suspect I'm calling the AppContext.SetData incorrectly.

What I'm doing is at the start of my app I'm running this method using reflection to call the current private _RefreshMemoryLimit. I can see that the method.Invoke is being called

    private static void AdjustMemorySettings()
    {
        AppContext.SetData("System.GC.HeapHardLimit", 209715200);

        var gcType = typeof(GC);
        var method = gcType.GetMethod("_RefreshMemoryLimit", BindingFlags.NonPublic | BindingFlags.Static);
        if (method == null)
            throw new Exception("Failed to find _RefreshMemoryLimit method through reflection");

        var result = method.Invoke(null, new object[] { });
        Console.WriteLine("Result of Refresh: " + result);
    }

But then later when I call GC.GetGCMemoryInfo().TotalAvailableMemoryBytes I'm not seeing any change in the returned available memory. I have tried System.GC.HeapHardLimit and GCHeapHardLimit as they key for SetData. The method.Invoke for _RefreshMemoryLimit returns 0. This is running in AWS Lambda which means Linux and no containers for this example.

@cshung
Copy link
Member Author

cshung commented May 19, 2023

Thanks for trying it out!

I think you need AppContext.SetData("System.GC.HeapHardLimit", (ulong)209715200);

@normj
Copy link

normj commented May 19, 2023

@cshung No luck still. I used your SetData call and even added a GC.Collect for good measure but the GC.GetGCMemoryInfo().TotalAvailableMemoryBytes is still not using the value I passed in for SetData. I have tested this now on both Linux and Windows with the same behavior of the SetData call being ignored.

My current version of adjusting memory

    private static void AdjustMemorySettings()
    {
        AppContext.SetData("System.GC.HeapHardLimit", (ulong)209715200);

        var gcType = typeof(GC);
        var method = gcType.GetMethod("_RefreshMemoryLimit", BindingFlags.NonPublic | BindingFlags.Static);
        if (method == null)
            throw new Exception("Failed to find _RefreshMemoryLimit method through reflection");

        var result = method.Invoke(null, new object[] { });
        GC.Collect();
        Console.WriteLine("Result of Refresh: " + result);
    }

@cshung
Copy link
Member Author

cshung commented May 19, 2023

Sorry for the confusion, I think I missed a couple of things.
The key should be GCHeapHardLimit instead of System.GC.HeapHardLimit.

When I ran locally, I am seeing a bad bug that crashes on the first GC. The problem is that I accidentally swapped an assignment statement. I am going to get it fixed as soon as I can.

@normj
Copy link

normj commented May 20, 2023

Using GCHeapHardLimit did the trick and I see GC.GetGCMemoryInfo().TotalAvailableMemoryBytes reported the adjust memory. Sadly I also got the crash from the GC collect but I'm able to get far enough in my testing that it isn't blocking.

When we add support for .NET 8 in AWS Lambda I can add in our runtime startup code something like the following to look for the AWS Lambda memory size and configure the .NET runtime with the configured value unless the user had already configured a value less then that.

private static void AdjustMemorySettings()
{
    int lambdaMemoryInMb;
    if (!int.TryParse(Environment.GetEnvironmentVariable("AWS_LAMBDA_FUNCTION_MEMORY_SIZE"), out lambdaMemoryInMb))
        return;

    ulong memoryInBytes = (ulong)lambdaMemoryInMb * 1048576;

    // If the user has already configured the hard heap limit to something lower then is available 
    // then make no adjustments to honor the user's setting.
    if ((ulong)GC.GetGCMemoryInfo().TotalAvailableMemoryBytes < memoryInBytes)
        return;

    AppContext.SetData("GCHeapHardLimit", memoryInBytes);

    // Replace with non-reflection version once RefreshMemoryLimit is public
    var gcType = typeof(GC);
    var method = gcType.GetMethod("_RefreshMemoryLimit", BindingFlags.NonPublic | BindingFlags.Static);
    if (method == null)
        throw new Exception("Failed to find _RefreshMemoryLimit method through reflection");

    method.Invoke(null, new object[] { });
}

@cshung
Copy link
Member Author

cshung commented May 20, 2023

@normj, the bug is fixed in this PR and it is merged. It should be available in the next daily build.

@mrsharm
Copy link
Member

mrsharm commented Jun 22, 2023

@cshung will validate this with another customer.

@mangod9
Copy link
Member

mangod9 commented Aug 2, 2023

@cshung assume this issue is ok to close now?

@normj
Copy link

normj commented Aug 2, 2023

If this matters I was planning implementing the above code for real in AWS Lambda based on latest previews this month to make sure everything is working as expected.

@mangod9
Copy link
Member

mangod9 commented Aug 3, 2023

Closing as done.

@mangod9 mangod9 closed this as completed Aug 3, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Sep 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented api-suggestion Early API idea and discussion, it is NOT ready for implementation area-GC-coreclr
Projects
None yet
Development

No branches or pull requests

9 participants