-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak from linked cancellation token registrations #78180
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @mangod9 Issue DetailsDescriptionContextWe currently work on an ASP NET Core service that handles several thousands of request per seconds per instance. Internally, each request leads to several additional I/Os, with different response time requirements. In order to avoid unnecessarily waiting for an I/O completion when a request must be completed under a given time, we heavily rely on cancellation tokens and more specifically CancellationTokenSource.CreateLinkedTokenSource to merge the different timeout policies. All good so far, this is probably quite a standard way to do this. A while ago, we noticed that our cancellation system was leading to a lot of thread contention (about 100k contention events / s according to ETW ContentionStart / Stop events). The problemSince we have been "pooling" cancellation tokens, despite the thread contention improvements, we have noticed weird memory patterns (some kind of memory leak). We've tried different implementations for pooling tokens, but the results were always the same, up to a point that we began questioning the framework itself. The reproIt wasn't clear until we have made a simple repro. You can find it here: .
In short, the results are that whenever we pool tokens and keep a reference to cancellation token registrations, there is a memory leak. Here are some graphs I've made using dotMemory: No pooling + Registrations = No leak (Reference)
Pooling + Registrations = Leak(The results are the same, whether I use Pooling + No registration = No leak ???I have created Here is the memory profile anyway: Reproduction Stepshttps://github.com/ogxd/cancellation-tokens-mem-leak-repro Expected behaviorExpected behavior is no memory leak Actual behaviorIt seems there is a memory leak. Regression?I don't know Known WorkaroundsThe Configuration
Other informationNo response
|
If you run the repro longer, does the leak persist, or is it bounded at some point?
Your repro isn't disposing anything though. Regardless of what the exact cause of this issue is here, you may find https://github.com/microsoft/reverse-proxy/blob/main/src/ReverseProxy/Utilities/ActivityCancellationTokenSource.cs useful depending on exactly what you're doing with these cancellation tokens. This approach is practically allocation-free when timeouts aren't occurring. |
That's the finalizer, which is only invoked when there are no more rooted references to this instance and the GC is able to collect it. And those registrations have a reference back to this instance. When you have: CancellationTokenSource cts1 = ...;
CancellationTokenSource cts2 = CancellationTokenSource.CreateLinkedTokenSource(cts1.Token); that's effectively storing a reference to cts2 into cts1 so that when cts1 has cancellation requested, that request will propagate to cts2. Until you Dispose of cts2, cts1 is going to continue to hold that reference. If you don't Dispose cts2, as long as cts1 is rooted, cts2 will also thus be rooted, and it will never have its finalizer called. |
Given what you said, I was thinking why not unregistering right after the cancellation occurred? public class LinkedCancellationTokenSource : CancellationTokenSource
{
private readonly CancellationTokenRegistration? _registration1;
private readonly CancellationTokenRegistration? _registration2;
// first bit = canceled
// second bit = disposed
private int _state = 0;
public LinkedCancellationTokenSource(CancellationToken token1, CancellationToken token2)
{
if (token1.CanBeCanceled)
_registration1 = token1.Register(CancelAndUnregister);
if (token2.CanBeCanceled)
_registration2 = token2.Register(CancelAndUnregister);
}
private void CancelAndUnregister()
{
int state = Interlocked.Or(ref _state, 1);
if (state == 0)
{
Unregister();
Cancel();
}
}
protected override void Dispose(bool disposing)
{
int state = Interlocked.Or(ref _state, 2);
if ((state & 2) != 2)
{
Debug.Assert(state < 2, "State should either be default or canceled, but not disposed");
Unregister();
base.Dispose(disposing);
}
}
private void Unregister()
{
_registration1?.Unregister();
_registration2?.Unregister();
}
} Do you see any issue in doing this? |
I played a little more with the repro and I think the reason for this "memory leak" is probably not a bug but the combination of these factors:
By disposing the using var cts = CancellationTokenSource.CreateLinkedTokenSource(token1, token2);
await FooAsync(cts.Token); Still I wonder if the solution proposed above (unregister in token cancel callback instead of unregistering only in |
Here I come back with an answer to my own question 😀 I still think this API is not very practical and can lead to memory safety issues, but at least now I understand it. For now, I decided to leverage roslyn analyzers with CA2000 to enforce disposal at build time. At least that's one less concern. Closing this issue. |
We've avoided doing that in the past as it would require extra virtual method invocations somewhere along the Cancel path, whereas right now the only one that's involved is the one for Dispose. But, technically I expect it could be done. |
Description
Context
We currently work on an ASP NET Core service that handles several thousands of request per seconds per instance. Internally, each request leads to several additional I/Os, with different response time requirements. In order to avoid unnecessarily waiting for an I/O completion when a request must be completed under a given time, we heavily rely on cancellation tokens and more specifically CancellationTokenSource.CreateLinkedTokenSource to merge the different timeout policies. All good so far, this is probably quite a standard way to do this.
A while ago, we noticed that our cancellation system was leading to a lot of thread contention (about 100k contention events / s according to ETW ContentionStart / Stop events).
Fortunately, we saw this 11 years old blog post from @stephentoub and by sharing the cancellation tokens the thread contention when down from 100k to something like 10 🎉
The problem
Since we have been "pooling" cancellation tokens, despite the thread contention improvements, we have noticed weird memory patterns (some kind of memory leak). We've tried different implementations for pooling tokens, but the results were always the same, up to a point that we began questioning the framework itself.
The repro
It wasn't clear until we have made a simple repro. You can find it here: cancellation-tokens-mem-leak-repro.
In this repro you can:
FrameworkTimeoutCancellationTokenProvider
) or pooling them (CoalescingTimeoutCancellationTokenProvider
)FrameworkLinkedCancellationTokenProvider
) or 2 custom implementations (CustomLinkedCancellationTokenProvider
andCustomLinkedNoRegistrationCancellationTokenProvider
)In short, the results are that whenever we pool tokens and keep a reference to cancellation token registrations, there is a memory leak.
Here are some graphs I've made using dotMemory:
No pooling + Registrations = No leak (Reference)
It's flat after a few seconds = no leak (verified by looking at GC survivors)
Pooling + Registrations = Leak
(The results are the same, whether I use
![image](https://user-images.githubusercontent.com/12692438/201168576-36bab107-248b-4f96-af66-f57b2a28231e.png)
![image](https://user-images.githubusercontent.com/12692438/201168943-ce7235f1-2255-41a8-8271-9553a03b9e14.png)
![image](https://user-images.githubusercontent.com/12692438/201169675-293c9b04-6559-42ed-918e-af1ff5c40694.png)
![image](https://user-images.githubusercontent.com/12692438/201169766-6aab5fcd-0256-4192-b6dd-5ca87ed7d734.png)
FrameworkLinkedCancellationTokenProvider
orCustomLinkedCancellationTokenProvider
so no need to show the results for both).Memory is growing over time: it's leaking.
So what's the leak?
As you can see there are a lot of registrations and
Linked2CancellationTokenSource
surviving GC (it shouldn't)With dotMemory we can trace back the retention path for a single surviving
Linked2CancellationTokenSource
instance. It's very deep, so here is the beginning:And here is the end (GC roots):
Pooling + No registration = No leak ???
I have created
CustomLinkedNoRegistrationCancellationTokenProvider
which is similar toLinked2CancellationTokenSource
BUT it doesn't keep a reference to registration. It turns out that doing so we can completely get rid of the leak. It feels weird to me however to not dispose anIDisposable
, and this is why I would like to investigate this with you. This is probably more a workaround than a proper solution.Here is the memory profile anyway:
![image](https://user-images.githubusercontent.com/12692438/201170564-51911fa4-e6e3-42e0-b1a7-999f43d7c16b.png)
![image](https://user-images.githubusercontent.com/12692438/201170946-69d62f9b-ebd1-4492-b30c-8dbfd32d7c7e.png)
After a few seconds it's flat, which is a good sign. We can also look at GC survivors over 20 seconds:
There are far less cancellation token sources surviving. Those remaining one are probably expected given how the repo is made.
Reproduction Steps
https://github.com/ogxd/cancellation-tokens-mem-leak-repro
Expected behavior
Expected behavior is no memory leak
Actual behavior
It seems there is a memory leak.
Regression?
I don't know
Known Workarounds
The
CustomLinkedNoRegistrationCancellationTokenProvider
from the repro is a workaround, but it would be better to understand/fix the issue before relying on this in a production environment.Configuration
Other information
No response
The text was updated successfully, but these errors were encountered: