Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async work started in ICacheEntry scope can keep the entry alive longer than needed #42992

Closed
KalleOlaviNiemitalo opened this issue Sep 21, 2020 · 8 comments

Comments

@KalleOlaviNiemitalo
Copy link

Describe the bug

If a method creates an ICacheEntry and then starts asynchronous work (e.g. task, timer, or thread) before it commits the entry to a cache (by calling Dispose), then the asynchronous work inherits an execution context in which CacheEntryHelper._scopes indirectly refers to the entry. This reference can prevent the cache entry from being collected as garbage, even after the entry has been committed to the cache and expired from the cache.

Furthermore, if the asynchronous work creates its own cache entries and adds expiration tokens to them, then those expiration tokens propagate to the outer cache entry, even though the outer cache entry was already assigned a value and committed. This prevents the expiration tokens from being collected as garbage and may cause the outer cache entry to expire prematurely.

To Reproduce

Run the following test against version 2.1.2 of package Microsoft.Extensions.Caching.Memory on .NET Framework 4.8. This code uses CreateCache and TestExpirationToken from the existing tests of the package.

        [Fact]
        public void ExpirationTokenDoesNotPropagateToCommittedCacheEntry()
        {
            IMemoryCache cache = CreateCache();
            const string parentKey = "parent";
            const string orphanKey = "orphan";

            using (var parentCommitted = new SemaphoreSlim(0, 1))
            {
                Task orphanTask;
                ICacheEntry parentEntry;
                using (parentEntry = cache.CreateEntry(parentKey))
                {
                    // Start async work that inherits the execution context.
                    orphanTask = OrphanAsync();

                    parentEntry.SetValue(1);
                }

                // No expiration tokens were added to the parent,
                // and the orphan entry has not been created yet.
                Assert.Empty(parentEntry.ExpirationTokens);

                // Let the async method create the orphan entry.
                parentCommitted.Release();

                // Wait until the async method commits the orphan entry.
                orphanTask.Wait();

                // The expiration token of the orphan entry must not have
                // propagated to the parent entry, which had already
                // been committed before the orphan entry was created.
                Assert.Empty(parentEntry.ExpirationTokens);

                async Task OrphanAsync()
                {
#if MEMORY_CACHE_INTERNAL
                    // This async method must inherit the execution context
                    // in which the parent entry is current.
                    Assert.Same(parentEntry, CacheEntryHelper.Current);
#endif

                    // This method has not awaited yet, so the caller is
                    // still blocked and cannot have released the semaphore.
                    Assert.Equal(0, parentCommitted.CurrentCount);

                    // Await until the parent entry has been committed.
                    await parentCommitted.WaitAsync();

#if MEMORY_CACHE_INTERNAL
                    // Now that the parent entry has been committed,
                    // it must no longer be current.
                    Assert.Null(CacheEntryHelper.Current);
#endif

                    // Create another cache entry and give it an expiration token.
                    // This expiration token must not propagate to the parent entry.
                    using (ICacheEntry orphanEntry = cache.CreateEntry(orphanKey))
                    {
                        orphanEntry.AddExpirationToken(new TestExpirationToken());
                        orphanEntry.SetValue(2);
                    }
                }
            }
        }

Expected behavior

The test passes.

Actual behavior

The test fails at Assert.Null(CacheEntryHelper.Current) if MEMORY_CACHE_INTERNAL is defined, or at Assert.Empty(parentEntry.ExpirationTokens) otherwise.

Additional context

A fix might involve assigning CacheEntryStack._entry = null during CacheEntry.Dispose. That way, if a CacheEntryStack is left in some execution contexts after CacheEntry.Dispose, it would no longer cause expiration options to be propagated to the CacheEntry and would not prevent it from being collected. If the CacheEntryStack._previous field were also deleted (it is only read by debuggers), then execution contexts would not be able to accumulate a chain of stale CacheEntryStack instances, either.

Alternatively, one could document the issue and recommend using ExecutionContext.SuppressFlow around async operations that are started within the scope of a cache entry.

Originally noted in dotnet/extensions#3533 (comment)

@BrennanConroy BrennanConroy transferred this issue from dotnet/extensions Oct 2, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Threading untriaged New issue has not been triaged by the area owner labels Oct 2, 2020
@ghost
Copy link

ghost commented Oct 5, 2020

Tagging subscribers to this area: @eerhardt, @maryamariyan
See info in area-owners.md if you want to be subscribed.

@maryamariyan maryamariyan added this to Uncommitted in ML, Extensions, Globalization, etc, POD. via automation Oct 7, 2020
@maryamariyan maryamariyan removed the untriaged New issue has not been triaged by the area owner label Oct 7, 2020
@maryamariyan maryamariyan added this to the 6.0.0 milestone Oct 7, 2020
@eerhardt
Copy link
Member

eerhardt commented Oct 7, 2020

This may have gotten fixed with #42355. We should investigate if it did.

@KalleOlaviNiemitalo
Copy link
Author

After reading the #42355 changes, I don't believe they fixed this.

@KalleOlaviNiemitalo
Copy link
Author

A fix might involve assigning CacheEntryStack._entry = null during CacheEntry.Dispose.

Such a fix is no longer possible after #45563 removed class CacheEntryStack. Now that CacheEntryHelper uses AsyncLocal<CacheEntry>, there is no way to disconnect the CacheEntry from execution contexts that may have inherited it.

@KalleOlaviNiemitalo
Copy link
Author

#45592 would let an application developer disable the implicit linking of cache entries and thereby avoid this problem. However:

  • The developer would first have to understand that they need to configure MemoryCacheOptions this way. Perhaps that can be achieved by updating code samples.

    If an application is already leaking memory and the developer is trying to figure out why, then the developer might see that some of the leaked objects have type CacheEntry, and comments in CacheEntry.cs could help the developer understand how to fix the leak. I suspect most of the leaked memory will be in other types of objects, though.

  • The option might be difficult to adopt if the application currently relies on propagation of expiration tokens from synchronously created cache entries and just doesn't need it for asynchronous.

@StephenCleary
Copy link
Contributor

StephenCleary commented Feb 20, 2021

The AsyncLocal<T> behavior is much more surprising than just a memory leak. I just spent the evening tracking down an InvalidOperationException: Collection was modified; enumeration operation may not execute. that was coming from here because an entry ended up being its own parent!

The original code is here; I'm trying to build an async cache that stores TaskCompletionSource<T> items. This test reliably reproduces the issue:

[Fact]
public async Task MemoryCache_FailureObserved()
{
    var cache = new MemoryCache(new MemoryCacheOptions());

    TaskCompletionSource<int> tcs = new();
    CancellationTokenSource cts = new();
    var entry = cache.CreateEntry("key");
    entry.Value = tcs;
    entry.AddExpirationToken(new CancellationChangeToken(cts.Token));
    entry.Dispose();

    InvokeAndPropagateCompletion(cache.CreateEntry("key"), tcs);
    var task = tcs.Task;
    await task;
    cache.TryGetValue("key", out _);

#pragma warning disable 1998
    async void InvokeAndPropagateCompletion(ICacheEntry cacheEntry, TaskCompletionSource<int> tcs)
#pragma warning restore 1998
    {
        if (cache.TryGetValue(cacheEntry.Key, out TaskCompletionSource<int> existingTcs) && existingTcs == tcs)
        {
            using (cacheEntry)
                cacheEntry.Value = tcs;
        }

        tcs.TrySetResult(13);
    }
}

I find the implicit behavior surprising. I consider it a pit of failure, and I am in agreement with @davidfowl that linked cache entries should be off by default. Perhaps I'm missing something, but are "linked cache entries" documented anywhere?

In the meantime, I've adopted the workaround below, which works if you replace every instance of memoryCache.CreateEntry(key) with SafeCacheEntry.Create(memoryCache, key):

internal sealed class SafeCacheEntry : ICacheEntry
{
    private readonly ICacheEntry _cacheEntryImplementation;

    private SafeCacheEntry(ICacheEntry cacheEntryImplementation) => _cacheEntryImplementation = cacheEntryImplementation;

    public static ICacheEntry Create(IMemoryCache cache, object key)
    {
        return AsyncCreateEntry().GetAwaiter().GetResult();

#pragma warning disable 1998
        async Task<ICacheEntry> AsyncCreateEntry() => new SafeCacheEntry(cache.CreateEntry(key));
#pragma warning restore 1998
    }

    public void Dispose()
    {
        AsyncDispose().GetAwaiter().GetResult();

#pragma warning disable 1998
        async Task AsyncDispose() => _cacheEntryImplementation.Dispose();
#pragma warning restore 1998
    }

    public object Key => _cacheEntryImplementation.Key;

    public object Value
    {
        get => _cacheEntryImplementation.Value;
        set => _cacheEntryImplementation.Value = value;
    }

    public DateTimeOffset? AbsoluteExpiration
    {
        get => _cacheEntryImplementation.AbsoluteExpiration;
        set => _cacheEntryImplementation.AbsoluteExpiration = value;
    }

    public TimeSpan? AbsoluteExpirationRelativeToNow
    {
        get => _cacheEntryImplementation.AbsoluteExpirationRelativeToNow;
        set => _cacheEntryImplementation.AbsoluteExpirationRelativeToNow = value;
    }

    public TimeSpan? SlidingExpiration
    {
        get => _cacheEntryImplementation.SlidingExpiration;
        set => _cacheEntryImplementation.SlidingExpiration = value;
    }

    public IList<IChangeToken> ExpirationTokens => _cacheEntryImplementation.ExpirationTokens;

    public IList<PostEvictionCallbackRegistration> PostEvictionCallbacks => _cacheEntryImplementation.PostEvictionCallbacks;

    public CacheItemPriority Priority
    {
        get => _cacheEntryImplementation.Priority;
        set => _cacheEntryImplementation.Priority = value;
    }

    public long? Size
    {
        get => _cacheEntryImplementation.Size;
        set => _cacheEntryImplementation.Size = value;
    }
}

This wrapper type works by forcing the CacheEntry constructor and Dispose to run in their own extremely short-lived async local scopes. This prevents the AsyncLocal<T> values from being seen anywhere else, essentially turning off linked cache entries.

@maryamariyan maryamariyan modified the milestones: 6.0.0, 7.0.0 Jul 22, 2021
@ghost ghost moved this from 6.0.0 to Untriaged in ML, Extensions, Globalization, etc, POD. Jul 22, 2021
@maryamariyan maryamariyan moved this from Untriaged to 7.0.0 in ML, Extensions, Globalization, etc, POD. Aug 4, 2021
@maryamariyan maryamariyan modified the milestones: 7.0.0, Future Dec 23, 2021
@ghost ghost moved this from 7.0.0 to Untriaged in ML, Extensions, Globalization, etc, POD. Dec 23, 2021
@ghost ghost moved this from Untriaged to Future in ML, Extensions, Globalization, etc, POD. Dec 23, 2021
@adamsitnik
Copy link
Member

I consider it a pit of failure, and I am in agreement with @davidfowl that #45592 should be off by default

They are since .NET 7: #57648.
Breaking change doc: https://learn.microsoft.com/en-us/dotnet/core/compatibility/core-libraries/7.0/memorycache-tracking

Since this almost undocumented feature is not enabled by default anymore, I really doubt we are ever going to spend time working on a fix. Having said that, I am going to close the issue.

If you still believe that it's important please send a PR with a fix, I would be more than happy to review it and help with the benchmarking.

@adamsitnik adamsitnik closed this as not planned Won't fix, can't repro, duplicate, stale Nov 15, 2023
@KalleOlaviNiemitalo
Copy link
Author

How about a doc change at MemoryCacheOptions.TrackLinkedCacheEntries, warning about risks if the developer enables this option again?

@github-actions github-actions bot locked and limited conversation to collaborators Dec 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants