Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in Microsoft.Extensions.Caching.Memory when handling exceptions #3533

Closed
eerhardt opened this issue Sep 16, 2020 · 5 comments
Closed

Comments

@eerhardt
Copy link
Member

Duplicating dotnet/runtime#42321 in dotnet/extensions in order to allow this issue to be backported to 3.1.

Description

We are leaking objects when calling MemoryCache.GetOrCreate and the factory method is throwing an exception.

Repro

Run the following application, take a memory snapshot, run for a while and take another snapshot

using Microsoft.Extensions.Caching.Memory;
using System;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            var cache = new MemoryCache(new MemoryCacheOptions());
            while (true)
            {
                try
                {
                    int value = cache.GetOrCreate("hello", entry =>
                    {
                        throw new Exception();
                        return 10;
                    });
                }
                catch
                {
                    Console.WriteLine("Get Failed");
                }
            }
        }
    }
}

image

Analysis

It appears the issue is that we aren't Disposing the CacheEntry instances when an exception is being thrown:

https://github.com/dotnet/runtime/blob/33dba9518b4eb7fbc487fadc9718c408f95a826c/src/libraries/Microsoft.Extensions.Caching.Abstractions/src/MemoryCacheExtensions.cs#L98-L112

The reason we aren't calling Dispose is to fix an issue that we were caching null when an Exception was being thrown. See aspnet/Caching#216.

Disposing the CacheEntry is important because every time you create a CacheEntry, it gets stuck in an AsyncLocal "stack":

https://github.com/dotnet/runtime/blob/33dba9518b4eb7fbc487fadc9718c408f95a826c/src/libraries/Microsoft.Extensions.Caching.Memory/src/CacheEntryHelper.cs#L28-L36

and disposing it is what "pops" it off the stack:

https://github.com/dotnet/runtime/blob/33dba9518b4eb7fbc487fadc9718c408f95a826c/src/libraries/Microsoft.Extensions.Caching.Memory/src/CacheEntryHelper.cs#L50-L63

We should always be Disposing the entry. That way the ScopeLease object is always disposed, and the stack is cleared.

However, we still need to fix the original problem: Don't cache null when an Exception happens. The reason this happens is because Disposing the CacheEntry is what "commits the entry into the cache". This method gets called from CacheEntry.Dispose:

https://github.com/dotnet/runtime/blob/33dba9518b4eb7fbc487fadc9718c408f95a826c/src/libraries/Microsoft.Extensions.Caching.Memory/src/MemoryCache.cs#L111-L234

To fix this, we should set a flag indicating whether the CacheEntry.Value was ever set. If it wasn't set, we shouldn't be committing the value into the cache.

cc @Tratcher @Pilchie

@eerhardt eerhardt added this to the 3.1.x milestone Sep 16, 2020
@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Sep 17, 2020

From the source code, Microsoft.Extensions.Caching.Memory 2.1.2 (the newest version supported for ASP.NET Core 2.1 on .NET Framework) appears to have the same issue. Is the fix going to be backported to 2.1.* as well?

Apart from CacheExtensions.GetOrCreate, it looks like a similar leak can occur in public static TItem Set<TItem>(this IMemoryCache cache, object key, TItem value, IChangeToken expirationToken) if entry.AddExpirationToken(expirationToken) throws ArgumentNullException because expirationToken == null. That could only be triggered by a bug in the caller, though.

In contrast, CacheExtensions.GetOrCreateAsync seems immune to this issue because it is an async method and the modified value of CacheEntryHelper.Scopes cannot propagate back to the caller.

@eerhardt
Copy link
Member Author

Is the fix going to be backported to 2.1.* as well?

Yes, it looks like @wtgodbe is porting it to 2.1 with #3536.

it looks like a similar leak can occur in public static TItem Set<TItem>(this IMemoryCache cache, object key, TItem value, IChangeToken expirationToken) if entry.AddExpirationToken(expirationToken) throws ArgumentNullException

Good point. In general, IDiposable objects should always be disposed. We should fix all the places that call CreateEntry.

In contrast, CacheExtensions.GetOrCreateAsync seems immune to this issue because it is an async method and the modified value of CacheEntryHelper.Scopes cannot propagate back to the caller.

GetOrCreateAsync is not immune to this problem (the original scenario we discovered this issue is using the async overload). CacheEntryHelper.Scopes is backed by a static AsyncLocal variable - it will still cause a memory leak.

@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Sep 17, 2020

GetOrCreateAsync is not immune to this problem (the original scenario we discovered this issue is using the async overload). CacheEntryHelper.Scopes is backed by a static AsyncLocal variable - it will still cause a memory leak.

I cannot reproduce the leak with the following program using GetOrCreateAsync of Microsoft.Extensions.Caching.Memory 2.1.2 on .NET Core 2.1.22. Private working set climbs to about 6700 K and then stays there. The getScopes() assertions do not fail.

using System;
using System.Diagnostics;
using System.Reflection;
using System.Threading.Tasks;
using Microsoft.Extensions.Caching.Memory;

namespace CachingAsyncLeak
{
    class Program
    {
        static void Main()
        {
            MainAsync().Wait();
        }

        static async Task MainAsync()
        {
            Type cacheEntryHelperType = Type.GetType(
                "Microsoft.Extensions.Caching.Memory.CacheEntryHelper, Microsoft.Extensions.Caching.Memory",
                throwOnError: true);
            PropertyInfo scopesProperty = cacheEntryHelperType.GetProperty(
                "Scopes",
                BindingFlags.Static | BindingFlags.NonPublic);
            var getScopes = (Func<object>)scopesProperty.GetMethod.CreateDelegate(typeof(Func<object>));

            using (var cache = new MemoryCache(new MemoryCacheOptions()))
            {
                while (true)
                {
                    Debug.Assert(getScopes() == null);

                    try
                    {
                        await cache.GetOrCreateAsync("key", Factory); // Tried .ConfigureAwait(false), too.

#pragma warning disable CS1998 // Async method lacks 'await' operators and will run synchronously
                        async Task<int> Factory(ICacheEntry entry)
#pragma warning restore CS1998 // Async method lacks 'await' operators and will run synchronously
                        {
                            Debug.Assert(getScopes() != null);
                            throw new ApplicationException("Ouch.");
                        }
                    }
                    catch (ApplicationException)
                    {
                    }
                }
            }
        }
    }
}
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp2.1</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.Extensions.Caching.Memory" Version="2.1.2" />
  </ItemGroup>

</Project>

@eerhardt
Copy link
Member Author

Thanks for the catch here. I believe you are right, GetOrCreateAsync should be immune to the exception problem. With this knowledge I went back to the original scenario which led me to open this bug, and took another look. I discovered another case where we can leak memory:

When we have 2 caches, an "outer" cache depending on an "inner" cache. The "inner" cache entries will maintain a reference back to the "outer" cache entries through its _scope property. If the inner cache entries are longer lived than the outer cache entries - the outer cache entries will be preserved (leaked) for as long as the inner cache entries are valid.

To fix that leak, I also null out the _scope object during Dispose() of the CacheEntry object. - dotnet/runtime#42355. That will allow the outer cache entries to be GC'd when they are expired from their cache.

@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Sep 18, 2020

I suspect there is a third risk of leaks, if the factory starts a long-lived chain of tasks or threads whose execution contexts inherit the AsyncLocal<T>.Value of CacheEntryHelper._scopes. If code running in those execution contexts adds other cache entries for unrelated reasons (even to a separate MemoryCache instance), then the CacheEntry.Dispose methods of the new cache entries might see the first cache entry in CacheEntryHelper.Current and possibly add an unbounded number of expiration tokens to it, even though the first cache entry was already added to the cache with a value that did not depend on the other cache entries.

I don't have time to test that scenario right now. Anyway, if it really is a problem, I think it can be handled as a separate issue (https://github.com/dotnet/extensions/issues/3547), perhaps only documented rather than fixed.

A fix there might involve assigning CacheEntryStack._entry = null during CacheEntry.Dispose. That way, if a CacheEntryStack is left in some execution contexts after CacheEntry.Dispose, it would no longer cause expiration options to be propagated to the CacheEntry and would not prevent it from being collected. If the CacheEntryStack._previous field were also deleted (it is only read by debuggers), then execution contexts would not be able to accumulate a chain of stale CacheEntryStack instances, either.

@StephenBonikowsky StephenBonikowsky added this to New Issues in .NET Core impacting internal partners via automation Sep 21, 2020
@StephenBonikowsky StephenBonikowsky moved this from New Issues to 3.1 Servicing - Committed in .NET Core impacting internal partners Sep 21, 2020
@wtgodbe wtgodbe closed this as completed Oct 21, 2020
.NET Core impacting internal partners automation moved this from 3.1 Servicing - Committed to Done Oct 21, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Nov 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

4 participants
@eerhardt @wtgodbe @KalleOlaviNiemitalo and others