Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: GC.RegisterMemoryPressureCallback #53895

Open
Sergio0694 opened this issue Jun 8, 2021 · 70 comments
Open

Proposal: GC.RegisterMemoryPressureCallback #53895

Sergio0694 opened this issue Jun 8, 2021 · 70 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-GC-coreclr
Milestone

Comments

@Sergio0694
Copy link
Contributor

Sergio0694 commented Jun 8, 2021

Background and Motivation

There are a number of scenarios where library authors would like to perform some kind of cleanup/trimming of data structures or other objects, with the assumption that this operation should ideally run when there's enough memory pressure and as long as some target object is alive (typically the object that's being trimmed). An example of this is TlsOverPerCoreLockedStacksArrayPool<T> in the runtime, that leverages the internal Gen2GcCallback type to automatically execute trimming whenever a gen 2 GC collection occurs. Other library authors (myself included) have had the need to achieve something like this as well, for instance:

  • WeakReferenceMessenger in Microsoft.Toolkit.Mvvm (link)
  • Ben.StringIntern (link)
  • Some other results from grep.app (link), 17 in total as of now

The current approach relies on just copying the Gen2GcCallback type, but that has a number of issues:

  • For one, it's not a public API, so you have to copy-paste the whole class
  • It relies on finalization, which is inherently a bit wonky and with its own set of complications
  • It strictly ties the execution of the callback to gen 2 collections, which might not be the ideal solution

This proposal is for a proper, built-in API to register a callback to be executed when there's memory pressure. This means that the runtime/GC would also be free to decouple the execution from just strictly when a gen 2 collection occurs, leaving more space to possibly make decisions on how to execute the callback as well. It would also be easier to use for consumers, and more reliable (for instance, running arbitrary code from a finalizer is not really the best idea in the first place).

Proposed API

The API shape I had in mind was to essentially just expose the same features as Gen2GcCallback, but with a single API:

namespace System
{
    public static class GC
    {
        public static void RegisterMemoryPressureCallback(Func<object?, bool> callback, object? target);

        // Or possibly (see comments below)
        public static void RegisterMemoryPressureCallback(Func<object?, GCMemoryInfo, bool> callback, object? target);
    }
}

The state can either be null, or some instance. If it's an instance, then the GC will keep that reference as a weak reference (passing this instance to this API will not keep that state alive after the call to RegisterMemoryPressureCallback), and also automatically unregister the callback when the object is collected. Meaning that if you do pass a target instance, then the callback can assume that the input value will never be null. This would allow consumers to achieve the same as the two overloads for Gen2GcCallback: either just pass a null target and ignore the input state in the callback, which would just act as a static and stateless callback, or pass some target instance and then get it as input in the callback.

NOTE: as per this comment by @GrabYourPitchforks, the API is using the Unsafe- prefix to indicate that it doesn't capture/flow the execution context. This isn't used by any of the existing use case scenarios for Gen2GcCallback mentioned above anyway, and consumers really needing the functionality could still just handle that manually anyway.

This API would be easy to use and it would support all the existing scenarios as with Gen2GcCallback.

Usage Examples

See any of the existing Gen2GcCallback usages linked above.

Alternative Designs

Here's is an alternative proposal with an interface-based callback from @GrabYourPitchforks:

namespace System
{
    public interface IMemoryPressureCallback
    {
        bool Invoke();
    }

    public static class GC
    {
        public static void RegisterMemoryPressureCallback(IMemoryPressureCallback callback, bool strongReference);
    }
}

Quoting from Levi:

"The reason for using an interface instead of a delegate here is that the delegate is a separate instance from the target object, and we don't want to risk the delegate being collected while the target is still alive, etc. Interfaces solve this problem."

My main issue with this design is that, quoting myself from our previous discussion:

"How would you use this on a public type if you don't want to make the fact it inherits from that interface visible to consumers? [...] if you then need that state object to hold a weak reference to your actual instance, then you'd have to (somewhat unintuitively) use a weak reference to your instance to that state object, and instead make your state object a string reference in that API. That seems a bit clunky and not expressing that nice callback-object relationship in the API as easily."

Open questions

  • Should the Unsafe prefix be used for the API to indicate no execution state capturing? (Jan suggested no)
  • What should the behavior be in case the callback throws an exception?
  • Should the callback be considered as invalid and unregistered?
  • Should we expose the API with also a GCMemoryInfo param given that most consumers would likely need it?
  • If so, is Func<T1, T2, TResult> fine or would we want to declare a custom delegate (maybe nested withing the GC type)?

Risks

None that I can see. The API would also be in the GC type which is only ever used by more advanced users. Either of the proposed APIs would also be much more reliable and generally better than using Gen2GcCallback like is the case today. Additionally, not having arbitrary code being run from within the finalizer queue reduces the chance of possible other issues (eg. someone locking from there or something).

@Sergio0694 Sergio0694 added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 8, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jun 8, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@GrabYourPitchforks
Copy link
Member

If we maintain a strong ref to the callback, its entire object graph remains alive. So the delegate model might not be as fragile as I'd originally thought.

@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Jun 8, 2021

@GrabYourPitchforks Yeah, my thinking was that consumers would have to ensure that the input delegate is either wrapping a static method, or a stateless lambda. I agree that that's potentially error prone, really my main concern with the interface-based pattern though is that it'd make existing patterns with a target instance much more verbose and even more error prone in some cases, unless I'm reading that wrong. From this last comment I think we might agree on this now then 😄

For reference (and for others reading), what I'm saying is that, assuming the target instance is a public type that you don't want consumers to see the interface applied to, then things get very complicated and error prone with the interface-based callback.
Consider WeakReferenceMessenger from the MVVM Toolkit, with either of the two proposed APIs.

Case 1 (function)

public sealed class WeakReferenceMessenger
{
    public WeakReferenceMessenger()
    {
        static void MemoryPressureCallback(object? target)
        {
            ((WeakReferenceMessenger)target).Cleanup();
        }

        GC.RegisterMemoryPressureCallback(MemoryPressureCallback, this);
    }

    private void Cleanup() { }
}

Case 2 (interface)

public sealed class WeakReferenceMessenger
{
    public WeakReferenceMessenger()
    {
        GC.RegisterMemoryPressureCallback(new CleanupCallback(this), true);
    }

    private sealed class CleanupCallback : IMemoryPressureCallback
    {
        private readonly WeakReference<WeakReferenceMessenger> target;

        public CleanupCallback(WeakReferenceMessenger target)
        {
            this.target = new WeakReference<WeakReferenceMessenger>(target);
        }

        public bool Invoke()
        {
            if (target.TryGetTarget(out WeakReferenceMessenger? messenger))
            {
                messenger.Cleanup();
                return true;
            }

            return false;
        }
    }

    private void Cleanup() { }
}

This seems way more verbose and actually more error prone to me, as consumers would have to remember to set the reference to be strong, because they'd actually be referencing the callback wrapper and not the actual target instance. They'd then also have to manually have a weak reference for the actual target, and then also check whether that's alive in the callback 🤔

The other API shape just seems overall much easier to use and with also less chances to get things wrong in general. Thoughts?

@teo-tsirpanis
Copy link
Contributor

Would supporting unregistering a callback make sense? The method could return an IDisposable.

@ghost
Copy link

ghost commented Jun 8, 2021

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

There are a number of scenarios where library authors would like to perform some kind of cleanup/trimming of data structures or other objects, with the assumption that this operation should ideally run when there's enough memory pressure and as long as some target object is alive (typically the object that's being trimmed). An example of this is TlsOverPerCoreLockedStacksArrayPool<T> in the runtime, that leverages the internal Gen2GcCallback type to automatically execute trimming whenever a gen 2 GC collection occurs. Other library authors (myself included) have had the need to achieve something like this as well, for instance:

  • WeakReferenceMessenger in Microsoft.Toolkit.Mvvm (link)
  • Ben.StringIntern (link)
  • Some other results from grep.app (link), 17 in total as of now

The current approach relies on just copying the Gen2GcCallback type, but that has a number of issues:

  • For one, it's not a public API, so you have to copy-paste the whole class
  • It relies on finalization, which is inherently a bit wonky and with its own set of complications
  • It strictly ties the execution of the callback to gen 2 collections, which might not be the ideal solution

This proposal is for a proper, built-in API to register a callback to be executed when there's memory pressure. This means that the runtime/GC would also be free to decouple the execution from just strictly when a gen 2 collection occurs, leaving more space to possibly make decisions on how to execute the callback as well. It would also be easier to use for consumers, and more reliable (for instance, running arbitrary code from a finalizer is not really the best idea in the first place).

Proposed API

The API shape I had in mind was to essentially just expose the same features as Gen2GcCallback, but with a single API:

namespace System
{
    public static class GC
    {
        public static void RegisterMemoryPressureCallback(Func<object?, bool> callback, object? target);
    }
}

The state can either be null, or some instance. If it's an instance, then the GC will keep that reference as a weak reference (passing this instance to this API will not keep that state alive after the call to RegisterMemoryPressureCallback), and also automatically unregister the callback when the object is collected. Meaning that if you do pass a target instance, then the callback can assume that the input value will never be null. This would allow consumers to achieve the same as the two overloads for Gen2GcCallback: either just pass a null target and ignore the input state in the callback, which would just act as a static and stateless callback, or pass some target instance and then get it as input in the callback.

This API would be easy to use and it would support all the existing scenarios as with Gen2GcCallback.

Usage Examples

See any of the existing Gen2GcCallback usages linked above.

Alternative Designs

Here's is an alternative proposal with an interface-based callback from @GrabYourPitchforks:

namespace System
{
    public interface IMemoryPressureCallback
    {
        bool Invoke();
    }

    public static class GC
    {
        public static void RegisterMemoryPressureCallback(IMemoryPressureCallback callback, bool strongReference);
    }
}

Quoting from Levi:

"The reason for using an interface instead of a delegate here is that the delegate is a separate instance from the target object, and we don't want to risk the delegate being collected while the target is still alive, etc. Interfaces solve this problem."

My main issue with this design is that, quoting myself from our previous discussion:

"How would you use this on a public type if you don't want to make the fact it inherits from that interface visible to consumers? [...] if you then need that state object to hold a weak reference to your actual instance, then you'd have to (somewhat unintuitively) use a weak reference to your instance to that state object, and instead make your state object a string reference in that API. That seems a bit clunky and not expressing that nice callback-object relationship in the API as easily."

Risks

None that I can see. The API would also be in the GC type which is only ever used by more advanced users. Either of the proposed APIs would also be much more reliable and generally better than using Gen2GcCallback like is the case today. Additionally, not having arbitrary code being run from within the finalizer queue reduces the chance of possible other issues (eg. someone locking from there or something).

Author: Sergio0694
Assignees: -
Labels:

api-suggestion, area-GC-coreclr, untriaged

Milestone: -

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Jun 8, 2021
@mangod9 mangod9 added this to the Future milestone Jun 8, 2021
@cshung
Copy link
Member

cshung commented Jun 8, 2021

In case the assembly used to define the delegate is collectible (and meant to be collected), wouldn't this design prevents the assembly from being unloaded?

@GrabYourPitchforks
Copy link
Member

Re: IDisposable, back in Full Framework ASPNET we had a similar concept and opted not to use IDisposable. The reason for this is that callers often have the mindset that they need to dispose of IDisposable objects when they're finished, and that they need to contort their code in order to track lifetimes and call this method deterministically.

In the case of ASPNET, we had a type ISubscriptionToken (see interface definition and APIs which expose these instances) that we used in place of IDisposable for these scenarios. It was intended to convey "you're free to unsubscribe from these notifications if you really need to, but really we expect you to be subscribed in perpetuity, so please simply discard this token if that describes your use case."

@GrabYourPitchforks
Copy link
Member

In case the assembly used to define the delegate is collectible (and meant to be collected), wouldn't this design prevents the assembly from being unloaded?

The delegate would be strongly held until the next memory pressure event, then it would be invoked. The delegate may choose to reregister itself by returning true. If it does not reregister itself, its becomes eligible for collection, which would presumably then allow the entire assembly to be unloaded.

@Sergio0694
Copy link
Contributor Author

Related to this, also if the delegate was stateful and targeting an object that's in that assembly as well, then it'd simply be automatically removed on the next event. As in:

  • The target would be collected
  • The callback would be unregistered automatically
  • Then the delegate would be collected
  • Finally the whole assembly could be unloaded as well

If the delegate was instead not stateful then yeah you'd just have to manually return false from the callback to unregister it and allow the delegate to be collected, like Levi said as well 🙂

@GrabYourPitchforks
Copy link
Member

If we're talking about callback models we probably also have to figure out ExecutionContext stuff, flow suppression, blah blah.

@Sergio0694
Copy link
Contributor Author

I mean my thinking here was to open the proposal to start a discussion and get an area owner to have a look and comment on whether that'd be something that would be reasonable to add at all, and if so then go through the various open questions and details to get the fancy api-ready-to-review label 😄

But sure going through some more details already sounds great! Can you elaborate a bit more on those points you raised? In general my idea for the API was that it wouldn't give you any guarantees on where the callback would be invoked. As in, the runtime would be free to invoke it from any arbitrary thread (not necessarily the one that registered it, and consecutive invocations of the same callbacks wouldn't give guarantees to use the same thread either) at some point in time where it considered memory pressure to be relevant (eg. before/after a gen 2 collection, or with whatever other heuristics) 🤔

@GrabYourPitchforks
Copy link
Member

Re: flowing context, normally when we capture a delegate for execution on a separate thread, we're expected to flow the ExecutionContext. This captures async locals, thread culture and identity, and a few other thread-ambient characteristics, and it restores those on the target thread before the delegate is invoked.

The pattern we use is that these sorts of methods capture the EC by default, and we have sibling Unsafe*-prefixed methods that don't capture the EC. If we wanted the behavior of this API to be that it never captures the EC, that's fine. We'd likely have only the Unsafe*-prefixed method in that scenario. And if the caller really wants to capture the EC, they can do so manually.

@Sergio0694
Copy link
Contributor Author

As discussed on Discord as well, yeah that sounds totally reasonable to me, especially given that I would expect that to simplify the implementation and give the runtime more freedom to handle things internally however it needs to. Also, with respect to all those existing use case scenarios of Gen2GcCallback today, none of them is relying on flowing context, so I would say it's reasonable to assume that consumers of this API would generally not need/expect that either (and if that was the case they could just take care of that manually anyway). I'll go ahead and update the proposed API with this info 😄

@GrabYourPitchforks
Copy link
Member

For context, the reason the EC stuff matters here is that as an implementation detail we may want to consider dispatching all of these callbacks via ThreadPool.[Unsafe]QueueUserWorkItem when GC pressure is seen. Otherwise we could end up with potentially long-running user code executing on the finalizer thread, which isn't ideal. But this implementation detail might force us to confront these threading behaviors sooner rather than later.

@Sergio0694
Copy link
Contributor Author

I see, yeah that makes perfect sense, thanks for the additional info!
I've updated the API proposal and added a note about the name with a link to your previous comment as well, for future reference. Guess now we just need to wait for an area owner to have a look to confirm whether the proposal is reasonable and that there aren't other missing open questions that need to be resolved, and then hopefully we can get this marked for review 😄

@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Jul 19, 2021

Was wondering whether @Maoni0 or someone from the GC team could share some thoughts on the proposal, now that planning for .NET 7 is starting again. Specifically, assuming the design @GrabYourPitchforks and I landed on looks reasonable, I was wondering whether the issue could be assigned the .NET 7 milestone so it could be looked at during internal planning, and/or it could get the "api ready for review" tag in case the proposal seemed acceptable as is. Or if that wasn't the case, if you could share some feedbacks on what to improve or change if needed, so we could iterate on it and work towards making it ready for review.

Thanks! 😄

@jkotas
Copy link
Member

jkotas commented Jul 19, 2021

the API is using the Unsafe- prefix to indicate that it doesn't capture/flow the execution context.

We have number of similar low-level global callbacks that do not capture execution context. For example, AppDomain.UnhandledException or NativeLibrary.SetDllImportResolver. I do not think that it makes sense for this API to have Unsafe prefix.

@jkotas
Copy link
Member

jkotas commented Jul 19, 2021

If it's an instance, then the GC will keep that reference as a weak reference (passing this instance to this API will not keep that state alive after the call to RegisterMemoryPressureCallback), and also automatically unregister the callback when the object is collected

We have used different pattern to solve the same problem in the recent PosixSignal APIs: #50527 (comment) . Should we have some sort of consistency on how we are exposing callbacks like this?

@jkotas
Copy link
Member

jkotas commented Jul 19, 2021

Does the API need to communicate the severity of the Memory pressure and/or allow only registering for certain levels?

As is, the API would not really be sufficient replacement for what is done in ArrayPool:

@Sergio0694
Copy link
Contributor Author

"We have number of similar low-level global callbacks that do not capture execution context."

Oh, right. I don't have a strong preference on either, especially given that the behavior would properly be documented in the API anyway and I think it'd be reasonable to assume that developers using this API would take the time to read its docs first. I liked the Unsafe prefix mostly for consistency with other APIs with similar behavior like Levi mentioned (eg. UnsafeQueueUserWorkItem), but if we already don't have complete consistency due to those two examples you mentioned and if you feel like it'd be better to drop the prefix here, I'm happy to do so and update the proposal 😄

"We have used different pattern to solve the same problem in the recent PosixSignal APIs [...]. Should we have some sort of consistency on how we are exposing callbacks like this?"

Some reasons for the current proposal were, in no particular order:

  • Just adding an API to the existing GC class to keep things simpler (no new classes, and "GC" made sense).
  • Keep the same behavior as with Gen2GcCallback, where you just set it up once and forget about it. As in, you don't have a (disposable) instance representing the registration that you need to store anywhere and worry about.
  • By only taking the target instance as input and returning the cancellation, we wouldn't need a dedicated type for the context, which we'd also have to decide where to define. Adding that to System didn't seem justified given the API being so niche, and putting it in a separate namespace would be weird given the API is still in System.GC. Maybe a nested type in GC? I guess I figured just using a Func<object, bool> would be the simplest solution, though feedbacks are welcome on this. For example, do you think it would be better to have instead something like some MemoryPressureCallbackContext (made up name) class with a property for the target instance, and a bool property to re-register the callback or unregister it? Related question: in this scenario, I assume the extra allocations for this context class passed to each callback/invocation would be negligible? 🤔

"Does the API need to communicate the severity of the Memory pressure and/or allow only registering for certain levels?"

For the current proposal, the goal was to offer a proper replacement for the Gen2GcCallback trick, so I assumed that eg. the ArrayPool<T> implementation could just replace that bit and keep the rest of the logic as is. Right now that memory pressure check is done after Gen2GcCallback triggers the callback, as in, the memory pressure logic isn't part of the callback invocation logic anyway. I do agree though that it would be nice to also offer some additional control to it, either in a separate overload or in the same API with possibly a default parameter. Question: would it be acceptable though to essentially fix some thresholds for the levels in the API itself? That is, suppose we exposed some MemoryPressureLevel enum and used that as a parameter for this API, then ArrayPool<T> would have to rely on whatever threshold would be used there as well in order to just use that. Would that be ok and not constitute a limitation to ArrayPool<T> possibly having to be tweaked in the future? Related: #52098.

@Maoni0
Copy link
Member

Maoni0 commented Jul 19, 2021

the whole GC team is still heads down with the .NET 6 work. but that shouldn't prevent you from having a discussion about this - it just means someone from the GC team will look at the discussion in the not too distant future.

@Sergio0694
Copy link
Contributor Author

Thank you for chiming in! And yeah I mostly just wanted to start a conversation again on this to get more feedbacks and eventually settle on a good API proposal for FXDC to review. Given that we cut it pretty close with DependentHandle in .NET 6 I figured it'd be a good idea to bump this soon enough to have a chance for it to make it to .NET 7, especially as I'm sure there will be a number of questions, concerns and implementation details to discuss here 😄

Jan raised some great points (unsurprisingly 🙂) so I'm curious to hear back on that, and then I'll be looking forward for the GC team to also share some thoughts on this once work for .NET 6 on that end is done as well, as you mentioned. Thanks!

@mqudsi
Copy link

mqudsi commented Aug 18, 2021

if you don't want to make the fact it inherits from that interface visible to consumers

I'm not sure this is actually a good reason. If this is about preventing code breakage (semantic versioning) should you wish to no longer implement GC pressure callbacks, you can still keep implementing the interface but never register with the GC. If it's about not wanting to expose the fact that your code does this at all, then I'd have to ask why? If anything, it makes the behavior of the code more introspective (a developer wondering if it's a terrible idea to use your cache or whatever can see that it implements IMemoryPressureCallback and feel reassured) and if it's about worrying if someone other than the GC ends up calling the method, just implement it explicitly (bool IMemoryPressureCallback.Invoke() { Foo(); }) so the only way anyone can call it is if they explicitly call ((IMemoryPressureCallback)foo).Invoke() and trust that there's probably a very good reason why they're doing so (it's arcane enough that I think it's a safe assumption).

I liked the Unsafe prefix mostly for consistency with other APIs with similar behavior like Levi mentioned (eg. UnsafeQueueUserWorkItem),

This is borderline bikeshedding, but I saw that as mainly to distinguish the EC-free routine from the existing .QueueUserWorkItem() method, and to nudge anyone not knowing which of the two to use to just stick with the regular version. Personally, I don't like the Unsafe prefix in this context, since it doesn't really tell you anything (and since it's just a convention, there's no XmlDoc <unsafebecause>....</unsafebecause> or what have you to quickly bring you up to speed on just why/how it's unsafe. (And to be more pedantic, they're not actually unsafe methods, they just behave differently from how you might - or might not - expect them to).

On MemoryPressureLevel: I think a callback with "low" "medium" "high" isn't a bad idea except that no one will really know what "low" or "medium" means and there's a concern would be that everyone might think their object is the most expensive object to re-allocate if/when needed; in effect, you might be inadvertently creating a Game Theory meets Tragedy of the Commons type of scenario. However, that ship's actually sailed since this proposed API will ship after GC.GetMemoryInfo() has been made public, so if you don't provide a MemoryPressureLevel parameter in the callback (or at registration time), then people will just end up calling GC.GetMemoryInfo() themselves, which might a) incur overhead if called by lots of registered instances, b) result in people mis-interpreting what percentage represents what kind of pressure.

Maybe it would be best for the callback to look more like bool InvokeOrWhatever(object target, GCMemoryInfo memoryInfo);
That way, the default is "there's (some) GC pressure, please free up memory now" and hopefully that's what most external consumers would use. But if someone really knows what they are doing and, e.g. the standard library itself, knows (some of) their objects are used by pretty much everyone all the time, then the GCMemoryInfo instance can be queried to figure out how to act accordingly.

This would avoid a thundering herd responding to the GC pressure callback with their own calls to GC.GetMemoryInfo(), each allocating a new instance - instead, a snapshot of the GCMemoryInfo could be captured before invoking the callback on the registered instances, and they could all get a reference to the same readonly struct.

@Sergio0694
Copy link
Contributor Author

"If this is about preventing code breakage (semantic versioning) should you wish to no longer implement GC pressure callbacks, you can still keep implementing the interface but never register with the GC. If it's about not wanting to expose the fact that your code does this at all, then I'd have to ask why?"

This would strictly be an internal implementation detail for types supporting this, so I really don't feel like this should leak through the public API surface. It's not so much about consumers being able to explicitly calling the cleanup method on their end (though that would also not necessarily be good), but more about this just looking like bad design to me. Not to mention as I said the fact that you'd be somewhat limited if you wanted to later on change the way the type does cleanup, as you'd then have to keep that dummy interface there to avoid breaking consumers using your type.

"Personally, I don't like the Unsafe prefix in this context, since it doesn't really tell you anything"

I mean, as I mentioned I mostly liked that suggestion due to the reason Levi mentioned, but I wouldn't really mind if we ditched the Unsafe prefix as Jan mentioned. If he thinks it's not necessary here, then sure, that's absolutely fine by me, let's do that 😄

"On MemoryPressureLevel: I think a callback with "low" "medium" "high" isn't a bad idea except that no one will really know what "low" or "medium" means [...] since this proposed API will ship after GC.GetMemoryInfo() has been made public, so if you don't provide a MemoryPressureLevel parameter in the callback (or at registration time), then people will just end up calling GC.GetMemoryInfo() themselves, which might a) incur overhead if called by lots of registered instances, b) result in people mis-interpreting what percentage represents what kind of pressure."

I actually don't mind this idea, especially after seeing that other types we used as reference, such as TlsOverPerCoreLockedStacksArrayPool<T>, do exactly the same anyway, which makes sense. For instance:

public bool Trim()
{
int currentMilliseconds = Environment.TickCount;
Utilities.MemoryPressure pressure = Utilities.GetMemoryPressure();

As a strawman updated proposal, I guess we could have something like this:

namespace System
{
    public static class GC
    {
        public static void RegisterMemoryPressureCallback(Func<object?, GCMemoryInfo, bool> callback, object? target);
    }
}

At this point my concern though is that a Func<,,> like that might not be the most intuitive thing ever, so we might potentially want to define a special purpose delegate type in some location (eg. nested inside the GC class?).

Another open question: would the fact that all invoked delegates would get the same memory info be important? Eg. in case one target did some trimming and the GC run right before the next one, then that memory info would become stale. Would that be an issue, or would that be considered fine given that handlers would always be assumed to be invoked in a non-specific order?

@cshung
Copy link
Member

cshung commented Oct 28, 2021

@Sergio0694, @Wraith2, @antonfirsov, @rickbrew and others who might be interested. I have a prototype branch that should work and would like validation from your scenarios! Check out the documentation as well to get started.

@AraHaan
Copy link
Member

AraHaan commented Oct 29, 2021

@Sergio0694, @Wraith2, @antonfirsov, @rickbrew and others who might be interested. I have a prototype branch that should work and would like validation from your scenarios! Check out the documentation as well to get started.

Will it trigger multiple times once 1 callback is registered every time it needs to do a Gen2 GC.

Also what if most of the memory is inside of a static MemoryCache variable? Can we use that callback to then force that cache to be cleared then since I think the GC ignores static memory?

I also run into this problem where I basically memory cache some discord event information in order to reduce the usage of api requests (to avoid getting temp api banned on their end by hitting 10k ratelimited requests in under 10 minutes) so having an api like this where the memory cache does not get cleared too early or never clears at all will help me as well. Especially in cases where I might have duplicates of the same object that updated and only want to keep only the newest items of a specific type of data returned from Discord for specific users, roles, channels, servers, etc.

@cshung
Copy link
Member

cshung commented Oct 29, 2021

Will it triggers multiple times once 1 callback is registered every time it needs to do a Gen2 GC.

Right now, the same callback will be called multiple times when the GC believes we are running low on memory. There are other cases that a blocking gen 2 GC may happen (e.g. user is calling GC.Collect) and the callback will not be executed in that case.

Also what if most of the memory is inside of a static MemoryCache variable? Can we use that callback to then force that cache to be cleared then since I think the GC ignores static memory?

When you say static, do you mean C# static like this?

public class MemoryCache
{
  public static byte[] data = new byte[10000];
}

Assuming this is the only reference to the byte array, setting it to null will should let the GC collects it in the next collection.

public void Callback() {
   MemoryCache.data = null;
}

@AraHaan
Copy link
Member

AraHaan commented Oct 29, 2021

Yes I mean that the variable / property that holds the data is a C# static. However they are not always nullable in my codebase (some of them are list based) and as such I have to then calculate which items in the list I need to evict in said list like collection (perhaps I could make each item have an embedded timestamp of when it was added to it and then compare similar items with that timestamp to keep only the newest one and collect the older ones).

@cshung
Copy link
Member

cshung commented Oct 29, 2021

Yes I mean that the variable / property that holds the data is a C# static. However they are not always nullable in my codebase (some of them are list based) and as such I have to then calculate which items in the list I need to evict in said list like collection (perhaps I could make each item have an embedded timestamp of when it was added to it and then compare similar items with that timestamp to keep only the newest one and collect the older ones).

My prototype is only capable to give a callback, what does the callback do the relieve the memory pressure is flexible and you can do whatever makes sense for your application. Your idea to eliminate entries that are least recently used seems very reasonable to me.

@Sergio0694
Copy link
Contributor Author

Hey @cshung, this is awesome! Thank you for putting together an initial prototype 😄

I'll see if I have time to actually test this in one of my scenarios soon, had a few feedbacks in the meantime.

"Currently it only support one callback. Instead of maintaining a list of callback in the native runtime, I am thinking about letting the managed code to do it."

I think we absolutely need support for multiple callbacks for this to be a viable API to actually support, yes. Storing the callback(s) in managed code makes sense to me, especially considering the idea was to stop using the finalizer thread to run the callbacks, and use something like ThreadPool.UnsafeQueueUserWorkItem instead. Given all the ThreadPool infrastructure is in managed code, makes sense to me to also keep the list/collection of registered callbacks in managed code as well? 🤔

Regarding the "who keeps the callback alive?" point and the originally proposed signature, do we all agree on how should the various registered components be conceptually linked together? My assumption was to have something like this:

  • The target object is stored as a weak GC handle
  • The callback is stored as a dependent handle from the target object to the callback
  • The callback would be an Action<object> to avoid capturing the target object. It could be a delegate wrapping a static method (or, the instance method of a statically cached delegate by Roslyn, that's fine), but the point is it wouldn't be eg. a lambda capturing the target as well in its closure. This also mirrors the setup people use today with Gen2GcCallback.

This should make it so that:

  • The target would always be free to be collected
  • The callback would always remain alive automatically as long as the target is alive

Thoughts on this? Curious to hear if there's any other approaches any of you had in mind for this 🙂

@rickbrew
Copy link
Contributor

  • The callback is stored as a dependent handle from the target object to the callback

I was just going to suggest that. I recently solved something similar in Paint.NET: a static event registrar, where I store a WeakReference to the callback, and then I use ConditionalWeakTable to staple callback to callback.Target. This avoids pinning the callback and the target; the rest of the system (the app, that is) can keep the target alive as normal.

@jkotas
Copy link
Member

jkotas commented Oct 29, 2021

I have a prototype branch that should work and would like validation from your scenarios!

This callback would not be sufficient for ArrayPool. ArrayPool would keep using Gen2GcCallback directly if API like this was introduced.

@cshung
Copy link
Member

cshung commented Oct 29, 2021

I have a prototype branch that should work and would like validation from your scenarios!

This callback would not be sufficient for ArrayPool. ArrayPool would keep using Gen2GcCallback directly if API like this was introduced.

Would you mind elaborating on what is missing?

@jkotas
Copy link
Member

jkotas commented Oct 29, 2021

ArrayPool trimming is triggered by more than just low memory pressure.

@cshung
Copy link
Member

cshung commented Oct 29, 2021

I think we absolutely need support for multiple callbacks for this to be a viable API to actually support, yes.

I am thinking about this flow of events:

  1. User calls GC.RegisterMemoryPressureCallback.
  2. If this is the first call to this function, register a static method defined in the GC to the native runtime.
  3. Store the user-provided delegate (and target object?) in some data structure managed by the GC class that satisfies the liveness constraints.
  4. When the callback happens, iterate through the user-provided delegates, and dispatch them to ThreadPool for execution.

Right now I have only done 1 and 2. 3 and 4 are outside of my regular development work and would take significant time for me to complete. I was hoping that either the community can help me with that, or we could experiment with the prototype and get the feedback we needed without that. This will probably lead to a much faster iteration time than I completed them all but it doesn't solve the problem.

Just to re-iterate, this is meant to be a prototype. Of course, I agree we will need multiple callbacks when we eventually release the feature if we decide to.

@cshung
Copy link
Member

cshung commented Oct 29, 2021

ArrayPool trimming is triggered by more than just low memory pressure.

Got it, it probably doesn't make sense to trim only when we run out of memory.

This API wasn't intended to be the only signal for trimming ArrayPool. It is intended to be a Gen2GcCallback replacement, in the sense that it solves the issues mentioned in the initial comment.

@Sergio0694
Copy link
Contributor Author

@cshung Love the enthusiasm and I'm really glad to see some traction on this proposal! I'd be happy to help out with 3) and 4) given those points would primarily only touch the managed runtime 😊

On this note: if we need another medium to have a quicker back and forth as we work on this, you'd be welcome in the #lowlevel channel in the C# Discord server. Tanner and Levi are there as well so we'd have other folks from the .NET libs teams to brainstorm ideas with too 😄

@jkotas
Copy link
Member

jkotas commented Oct 29, 2021

Which components that the initial comment talks about would be able to replace Gen2GcCallback with this API? I have looked at TlsOverPerCoreLockedStacksArrayPool, WeakReferenceMessenger, Ben.StringIntern. I do not think any of these would be able to replace Gen2GcCallback with this API. All of them trim on more than just memory pressure.

@cshung
Copy link
Member

cshung commented Oct 29, 2021

Which components that the initial comment talks about would be able to replace Gen2GcCallback with this API? I have looked at TlsOverPerCoreLockedStacksArrayPool, WeakReferenceMessenger, Ben.StringIntern. I do not think any of these would be able to replace Gen2GcCallback with this API. All of them trim on more than just memory pressure.

I am not sure I understand, if I replace line 292 here

Gen2GcCallback.Register(s => ((TlsOverPerCoreLockedStacksArrayPool<T>)s).Trim(), this);

With the GC.RegisterMemoryPressureCallback(...)

Then I miss some events that would have called Trim() that it wasn't calling before? Do you mean some other reasons that we did a gen 2 GC but not caused by high memory pressure? Or something else?

@jkotas
Copy link
Member

jkotas commented Oct 29, 2021

Do you mean some other reasons that we did a gen 2 GC but not caused by high memory pressure?

Correct.

TlsOverPerCoreLockedStacksArrayPool does time-based trimming. It will trim items that have not been used for a while even if there is no memory pressure at all (look for MillisecondsTimeStamp). Gen2 GCs are just used as a convenient trigger for this bookkeeping process.

Similarly, Ben.StringIntern does LRU style trimming, even if there is no memory pressure at all.

@Sergio0694
Copy link
Contributor Author

"TlsOverPerCoreLockedStacksArrayPool does time-based trimming. It will trim items that have not been used for a while even if there is no memory pressure at all (look for MillisecondsTimeStamp). Gen2 GCs are just used as a convenient trigger for this bookkeeping process."

I have a couple questions on this:

  • Even if that type also uses time-based trimming, isn't that separate than the Gen2GcCallback trimming? As in, even if this proposed API couldn't relate both, wouldn't it still be better as a replacement for Gen2GcCallback on its own?
  • You mentioned time-based trimming isn't ideal in general due to power consumption even when the app is idle, eg. on mobile devices. How would you personally imagine a viable API that could replace both those solutions in TlsOverPerCoreLockedStacksArrayPool being structured? Would you imagine it'd be the same as this but implemented differently, or would it be a separate API with some other characteristics?

@jkotas
Copy link
Member

jkotas commented Oct 30, 2021

Even if that type also uses time-based trimming, isn't that separate than the Gen2GcCallback trimming?

Here is how ArrayPool trimming works today: Every once in a while, the array pool goes through all pooled items and trims the one that has not been used for some time. The aggressiveness of the trimming is controlled by memory pressure. The Gen2 GCs are used as a convenient approximation for "every once in a while". It is based on a simplifying assumption that ArrayPool usage is likely corelated with regular allocations in typical app.

If this API is implemented as a callback triggered by a small subset of Gen2 GCs, it is not going to better than Gen2GcCallback nor be able to replace Gen2GcCallback.

How would you personally imagine a viable API that could replace both those solutions

Good question. Here are my thoughts:

The cache management is a policy heavy problem. There is no one universal good policy.

For example, consider an app that handles bursty traffic. Good P99 latency is the most important metric of this app.. The app author deploys it on a dedicated VM with a plenty of memory. The cache trimming is counterproductive for an app like that. Cache trimming is just going to make P99 latency worse, without improving any metric that matters.

On the other hand, consider a desktop app that is meant to be in the background most of the time, and only occasionally in the foreground. It is desirable for the caches to be trimmed while the app is in the background, so that it does not consume memory unnecessarily. Nobody likes to see apps running in the background to consume GBs of memory.

We have been reactively introducing settings that allow developers to tweak policies of various caches in the system. RetainVM settings or RegEx.CacheSize API are examples of such settings. However, most caches (including arraypool) have hardcoded non-configurable policies.

It is not unusual to see appmodels to try to trim memory based on interesting events. For example, Windows Phone or UWP apps triggered GC before getting suspended, to minimize working set of the suspended process. High-density freemium web hosting plays similar tricks sometimes.

I think the set of APIs for cache management should:

  • Allow app developers to control and configure the system-wide policy.
  • Allow app models to participate in the policy.
  • Allow libraries with caches to carry out the policy.
  • Default policy is pay-for-play (consumes CPU cycles only when the rest of the system is active)

Also, I think that it is important to be able to arbitrage effectively between different unrelated caches somehow. There are caches with items that do not occupy too much memory and take a lot of time to recreate (e.g. RegEx cache); on the other end, there are caches that occupy a lot of memory and take little time to recreate (e.g. ArrayPool). The system should ensure that the caches that are cheap to recreate are trimmed first and more aggressively. I have mentioned this issue above, but it did not seem to resonate. We can ignore this point for now.

@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Oct 30, 2021

I do feel like we're moving a bit away from what the original proposal was, which is to offer a built-in replacement for Gen2GcCallback that didn't suffer from the drawbacks that that approach has (having to copy-paste the code, having cleanup code run on the finalizer thread, etc.), but I do agree that if we are to expose a public API, adding a "better one" that can actually address this issue in general makes sense. Since we're brainstorming ideas, here's a very rough draft following Jan's comments. NOTE: I don't mean this to be a perfectly refined proposal, I'm mostly just curious to know whether something like this is something that's moving towards the direction you think is the correct one for this 😄

Consider something like this:

enum GcCallbackPolicy
{
    // Basically "whenever Gen2GcCallback gets called today"
    Periodic,

    // Only when there's actually memory pressure
    MemoryPressureOnly
}

static void RegisterGcCallback(Func<object?, bool> callback, object? target, GcCallbackPolicy policy);

Now, the idea for this API shape is this:

  • The GcCallbackPolicy enum would allow library authors to indicate the "default policy" for their scenarios, without having to try to estimate the actual memory usage of their cache, which Jan said would likely be very difficult to do and likely not optimal. With this option, for instance, a type like ArrayPool<T> would use "Periodic" (to keep having the callback invoked more often just like it does today with Gen2GcCallback), whereas eg. a type like WeakReferenceMessenger could use "MemoryPressureOnly" given the cached data structures are lightweight and it's not a big deal to keep them alive until there's high memory pressure, and the overhead of having repeated callback invocations more often is not worth it.
  • The system could then override these registered policies when needed. For instance:
    • In the example of "apps that handle bursty traffic for which P99 latency is the most important metric", the app author could configure the app (through an AppContext flag, or a COMPlus flag, or whatever) to let the runtime know that "I don't care about memory usage, I just want maximum speed". The runtime could then override the default policies for registered callbacks, and eg. also call those registered as periodic only when there's memory pressure.
    • Vice versa, something like apps that wanted to minimize working set memory when minimized could also override this behavior, and have it so the runtime could invoke all callbacks regardless (including those reegistered as "MemoryPressureOnly"), so that they could force a GC collect before suspension and reclaim as much memory as possible despite what library authors had intended for their registered callbacks.

Again, to be clear, not suggesting this API shape is ideal/final, just want to keep iterating on the API shape to see whether we can figure out what would one that would satisfy all these requirements that have been brought up look like 😊
Specifically:

  • What aspects of this API shape would not be enough for those requirements, and why?
  • Would we need callers to be able to indicate more options when reegistering callbacks? If so, which ones?
  • Is the overall direction with this not the right one, and if so, how should we change it?

@jkotas
Copy link
Member

jkotas commented Oct 30, 2021

having cleanup code run on the finalizer thread

That's not always a drawback. It is appropriate to run the cleanup on finalizer thread in some cases. For example, I do not think we would want to add more overhead to the arraypool trimming by scheduling it somewhere else. I expect number of other Gen2GcCallback uses would fall into the same bucket.

@jkotas
Copy link
Member

jkotas commented Oct 30, 2021

type like WeakReferenceMessenger could use "MemoryPressureOnly" given the cached data structures are lightweight and it's not a big deal to keep them alive until there's high memory pressure

If if is the case, why is not WeakReferenceMessenger implemented this way today? If I am reading the code correctly, the WeakReferenceMessenger trimming is done always, without accounting for any memory pressure.

Or why WeakReferenceMessenger needs the automatic trimming on memory pressure at all? Is it worth the overhead?

@AraHaan
Copy link
Member

AraHaan commented Nov 1, 2021

I do feel like we're moving a bit away from what the original proposal was, which is to offer a built-in replacement for Gen2GcCallback that didn't suffer from the drawbacks that that approach has (having to copy-paste the code, having cleanup code run on the finalizer thread, etc.), but I do agree that if we are to expose a public API, adding a "better one" that can actually address this issue in general makes sense. Since we're brainstorming ideas, here's a very rough draft following Jan's comments. NOTE: I don't mean this to be a perfectly refined proposal, I'm mostly just curious to know whether something like this is something that's moving towards the direction you think is the correct one for this 😄

Consider something like this:

enum GcCallbackPolicy
{
    // Basically "whenever Gen2GcCallback gets called today"
    Periodic,

    // Only when there's actually memory pressure
    MemoryPressureOnly
}

static void RegisterGcCallback(Func<object?, bool> callback, object? target, GcCallbackPolicy policy);

Now, the idea for this API shape is this:

* The `GcCallbackPolicy` enum would allow library authors to indicate the "default policy" for their scenarios, without having to try to estimate the actual memory usage of their cache, which Jan said would likely be very difficult to do and likely not optimal. With this option, for instance, a type like `ArrayPool<T>` would use "Periodic" (to keep having the callback invoked more often just like it does today with `Gen2GcCallback`), whereas eg. a type like `WeakReferenceMessenger` could use "MemoryPressureOnly" given the cached data structures are lightweight and it's not a big deal to keep them alive until there's high memory pressure, and the overhead of having repeated callback invocations more often is not worth it.

* The system could then override these registered policies when needed. For instance:
  
  * In the example of "apps that handle bursty traffic for which P99 latency is the most important metric", the app author could configure the app (through an `AppContext` flag, or a `COMPlus` flag, or whatever) to let the runtime know that "I don't care about memory usage, I just want maximum speed". The runtime could then override the default policies for registered callbacks, and eg. also call those registered as periodic only when there's memory pressure.
  * Vice versa, something like apps that wanted to minimize working set memory when minimized could also override this behavior, and have it so the runtime could invoke all callbacks regardless (including those reegistered as "MemoryPressureOnly"), so that they could force a GC collect before suspension and reclaim as much memory as possible despite what library authors had intended for their registered callbacks.

Again, to be clear, not suggesting this API shape is ideal/final, just want to keep iterating on the API shape to see whether we can figure out what would one that would satisfy all these requirements that have been brought up look like 😊 Specifically:

* What aspects of this API shape would not be enough for those requirements, and why?

* Would we need callers to be able to indicate more options when reegistering callbacks? If so, which ones?

* Is the overall direction with this not the right one, and if so, how should we change it?

I think this is perfect.

type like WeakReferenceMessenger could use "MemoryPressureOnly" given the cached data structures are lightweight and it's not a big deal to keep them alive until there's high memory pressure

If if is the case, why is not WeakReferenceMessenger implemented this way today? If I am reading the code correctly, the WeakReferenceMessenger trimming is done always, without accounting for any memory pressure.

Or why WeakReferenceMessenger needs the automatic trimming on memory pressure at all? Is it worth the overhead?

I feel like it's worth at least allowing to configure the trimming policy of WeakReferenceManager. However what about apps that want to do periodic trimming + memory pressure trimming? I think the idea that @Sergio0694 posed needs to add an Both option.

@Joe4evr
Copy link
Contributor

Joe4evr commented Nov 1, 2021

However what about apps that want to do periodic trimming + memory pressure trimming? I think the idea that Sergio0694 posed needs to add an Both option.

I was writing an entire paragraph on why the suggestion is good, but Both is not an ideal name, but then I realized: What about making this a [Flags] enum? Then users have the option of passing in Periodic | MemoryPressureOnly, or simply All if they want to register for potential future additions as well.

@Sergio0694
Copy link
Contributor Author

Re @jkotas:

"That's not always a drawback. It is appropriate to run the cleanup on finalizer thread in some cases. For example, I do not think we would want to add more overhead to the arraypool trimming by scheduling it somewhere else. I expect number of other Gen2GcCallback uses would fall into the same bucket."

The rationale for that was (after discussing with @GrabYourPitchforks) that it could've avoided a number of issues that can arise from code running in the finalizer thread, if people are not very careful about what they're doing there. For instance, taking a lock on the finalizer thread is a very bad idea, but not everyone knows that. I guess one could also make the argument that people using such an API would be aware of these details as well though, sure. Would you imagine this API to just always invoke callbacks inline on the finalizer thread, or maybe that it would expose an option to choose where to be dispatched?

"[...] why is not WeakReferenceMessenger implemented this way today? If I am reading the code correctly, the WeakReferenceMessenger trimming is done always, without accounting for any memory pressure. Or why WeakReferenceMessenger needs the automatic trimming on memory pressure at all? Is it worth the overhead?"

That is correct, today there are no checks for memory pressure. One of the reasons was that since we can't take a lock on the finalizer thread, we're just trying to acquire one to then do the cleanup. I wanted to avoid cases where the callback is only invoked when there's really a lot of pressure, then it fails to acquire the lock because someone else holds it at that moment, and then the trimming is just skipped entirely. This is certainly something that could be improved though. As to why it needs automatic trimming in the first place, the rationale here is that the type should be as easy to use as possible, with consumers not having to care about the implementation detail or with manually trimming it. The reason why it needs trimming is that registered recipients can be collected at any time (as they're weak references), which can result in leftover items in the internal mapping (ie. key/value pairs for message types that actually hold no recipients at all). Trimming just ensures the internal data structures are kept as lean as possible, which both saves a tiny bit of memory and can speedup broadcast operations as they won't have to waste time going through empty collections looking for actual recipients to send messages to.

Re. @AraHaan:

"However what about apps that want to do periodic trimming + memory pressure trimming?"

I feel like this is just a misunderstanding due to me using a bad name for that Periodic option. As the comment says, I just meant that that one indicates "every gen 2 GC collect like Gen2GcCallback does today", whereas MemoryPressureOnly would basically "filter" that and only actually invoke callbacks when there is high memory pressure. The latter is a subset of the former, so there is no need to have a way to combine two options: just use Periodic in that case, which covers both.

@jkotas
Copy link
Member

jkotas commented Nov 10, 2021

For instance, taking a lock on the finalizer thread is a very bad idea

Taking a long lock on a global singleton data structure is a very bad idea. It does not matter which thread you use to take the lock.

It is fine to take a short lock on the finalizer thread. It happens a lot. In fact, the ArrayPool trimming takes a short lock on the finalizer thread too.

I guess one could also make the argument that people using such an API would be aware of these details as well though

Right.

@Sergio0694
Copy link
Contributor Author

"Taking a long lock on a global singleton data structure is a very bad idea."

To be clear, the lock is on an internal data structure, not on the publicly exposed singleton instance. Conceptually it should be the same as eg. ArrayPool<T>.Shared internally taking a lock, or any other type that is also available as a singleton that does some form of internal synchronization, no? We're not locking on an instance that external users could also directly access.

Also, to further clarify this, using the singleton instance is not really the recommended approach anyway. That property mostly exists to make the transition easier for MvvmLight users, which were used to the singleton instance for the previous messenger type there. The recommended approach for the MVVM Toolkit is ideally to just inject the messenger with DI into viewmodels. Not also including the singleton would've made the migration too difficult for existing users though, we we added it for convenience.

"It is fine to take a short lock on the finalizer thread."

I guess I might've misunderstood how bad this was when I talked about it with @GrabYourPitchforks a while back.
My understanding was that it would be better to avoid it as much as possible unless absolutely necessary.
If that's not the case and short locks are fine, then good to know 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-GC-coreclr
Projects
None yet
Development

No branches or pull requests