Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetOrCreate is not atomic which is a serious issue for a cache ( eg poor perf and memory issues) #708

Open
bklooste opened this issue Aug 14, 2018 · 11 comments

Comments

Projects
None yet
@bklooste
Copy link

commented Aug 14, 2018

This blog provides the detail

https://tpodolak.com/blog/2017/12/13/asp-net-core-memorycache-getorcreate-calls-factory-method-multiple-times/

We see poor performance in 1 in 10 queries due to multiple invokes when not needed even under non high load conditions and have had to resort to a similar solution to that blog post which is not great .

This was closed with no good reason aspnet/Caching#359

@sebastienros

This comment has been minimized.

Copy link
Member

commented Aug 14, 2018

There are two distinctive concerns in the issue you are referencing:

  • The result that is added could be a different one which was returned
  • The lambda is called multiple times concurrently

Based on the blog post you wrote it's clear that you disagree with the fact that the lambda is not blocked from reentrancy "by default". But this behavior is by design (as you also mentioned), as it is for the ConcurrentDictionary.

The idea is that each scenario is different and some might not require a lock. You are free to lock the lambda if you think it's not a cheap operation and even two calls are worst. After all if the default implementation had to lock it would do it quite aggressively on the key, however you might want to minimize the locking by providing a better implementation, which you are showing adequately in your blog post.

And for this reason I can only expect this issue will be closed too.

@xied75

This comment has been minimized.

Copy link

commented Oct 2, 2018

@sebastienros I'm here for the 2nd concern you mentioned, i.e. block re-entrance. Although I agree what you said that each scenario is different, that lock-free might be possible. But in my opinion a default implementation that prevents re-entry would be hugely useful for us lay persons, since, given we are "educated" enough to seek the help from a cache, therefore populating the cache must be something high cost, therefore by default to allow re-entry sounds contradictory.

@yngndrw-sage

This comment has been minimized.

Copy link

commented Nov 29, 2018

@sebastienros I'd argue that it doesn't make sense to have a lock-less cache-miss scenario. Let's consider a common scenario where the cache is used during retrieval of something which comes from an external source, I.e. a web request. This may or may not be deterministic, it doesn't really matter either way for this discussion. In this scenario, the caller for a given key has to wait for the web request and parallel web requests does not improve latency. (In-fact they will reduce performance due to the extra resources required)

The typical solution to this is the double-checked lock that was mentioned in the article tha @bklooste linked to. In this implementation a cache hit is typically lock-free and locks are only used during a refresh of the stored item for that key.

A better solution to the one in the article would be to use an "async/awaitable keyed lock", but this needs to be bounded so that you don't leak memory over time. This has the advantage of improving the scope of the locking to just the keys that are being refreshed, but the disadvantage that you effectively need a second level of cache (This time with an "async/awaitable lock" which operates globally across all of the keys) for the sole purpose of managing the keyed locks.

I'll demonstrate with some pseudo code: (Assume it's all async)

Item GetOrCreate(key, factory)
{
	if (memoryCache.TryGet(key, out item))
	{
		return item
	}

	using (await GetKeyedLock(key).Lock())
	{
		if (memoryCache.TryGet(key, out item))
		{
			return item
		}

		item = await factory(key)
		memoryCache.Set(key, item)

		return item;
	}
}

Lock GetKeyedLock(key)
{
	key = key + "¦lock";
	if (memoryCache.TryGet(key, out lock))
	{
		return lock
	}

	using (await Lock())
	{
		if (memoryCache.TryGet(key, out lock))
		{
			return lock
		}

		lock = new Lock()
		memoryCache.Set(key, lock)

		return lock;
	}
}

@aspnet-hello aspnet-hello transferred this issue from aspnet/Caching Dec 13, 2018

@aspnet-hello aspnet-hello added this to the Discussions milestone Dec 13, 2018

@mbirtwistle

This comment has been minimized.

Copy link

commented Jan 15, 2019

I agree that a method called GetOrCreate sounds like it will work atomically. I also took 'Threadsafe' in the documentation as a confirmation that it was atomic. Otherwise what's the point in the method? If its not atomic I might as well make multiple Get / Create calls and know exactly what I have to deal with regarding locking. You really need to make this more clear in the documentation of GetOrCreate as it makes a fundamental difference in whether the Create Lambda is the actual expensive operation, or a cheap Lazy constructor whose .Value lambda is the expensive operation.

@JunTaoLuo JunTaoLuo removed this from the Discussions milestone Feb 2, 2019

@justinzhong

This comment has been minimized.

Copy link

commented Feb 11, 2019

I came across this issue as well and agree with @mbirtwistle the current state as it stands require manually placing locks to guarantee consistent output. I have created a simple demo that runs 100K concurrent requests on MemoryCache.GetOrCreate and ConcurrentDictionary.GetOrCreate using Lazy factory with side-by-side comparison that shows GetOrCreate is neither re-entrant nor protect its internal state, i.e. the payload of a never expiring cache is updated by making concurrent calls to GetOrCreate.

memorycache-vs-concurrentdictionary

The demo shows that GetOrCreate returns different instances of Lazy that yields different results, which somewhat defeats the purpose of moving the expensive operation to a Lazy factory. Please correct me if you see any mistakes in the way I constructed the demo, otherwise I hope the aspnet team would look at this and hopefully address this in the next release please.

Note: the demo code is extended from reading the related issue aspnet/Caching#359 by @amit-ezra

Edit: mentioning specs
.NET Core 2.2
Microsoft.Extensions.Caching.Memory 2.2.0.0

@bklooste

This comment has been minimized.

Copy link
Author

commented Feb 24, 2019

Its especially an issue with newer devs - the general expectation of how it work is not met - which results in investigations , tickets and reduces peoples opinion of Core . Caching is pretty important .

Many people have this issue now without even knowing about it.

@guylando

This comment has been minimized.

Copy link

commented Mar 15, 2019

+1

@BladeWise

This comment has been minimized.

Copy link

commented Mar 16, 2019

These are the extensions I am currently using to ensure that get-or-add and set operations are atomic.
I tested it with a modified version of the code from @justinzhong and it seems to behave as expected.
Synchronization is kept to a minimum, in case of a cache miss a global semaphores concurrent dictionary is accessed and a semaphore is acquired on a per-memorycache-key basis.

namespace Extensions
{
    using System;
    using System.Collections.Concurrent;
    using System.Threading;
    using System.Threading.Tasks;
    using Microsoft.Extensions.Caching.Memory;

    public static class MemoryCacheExtensions
    {
        private static readonly ConcurrentDictionary<int, SemaphoreSlim> _semaphores = new ConcurrentDictionary<int, SemaphoreSlim>();

        public static async Task<(T Value, bool Created)> GetOrCreateAtomicAsync<T>(this IMemoryCache memoryCache, object key, Func<ICacheEntry, T> factory)
        {
            if (memoryCache.TryGetValue(key, out var value))
                return ((T)value, false);

            var isOwner = false;
            var semaphoreKey = (memoryCache, key).GetHashCode();
            if (!_semaphores.TryGetValue(semaphoreKey, out var semaphore))
            {
                SemaphoreSlim createdSemaphore = null;
                semaphore = _semaphores.GetOrAdd(semaphoreKey, k => createdSemaphore = new SemaphoreSlim(1)); // Try to add the value, this is not atomic, so multiple semaphores could be created, but just one will be stored!

                if (createdSemaphore != semaphore)
                    createdSemaphore.Dispose(); // This semaphore was not the one that made it into the dictionary, will not be used!
                else
                    isOwner = true;
            }

            await semaphore.WaitAsync()
                           .ConfigureAwait(false); // Await the semaphore!
            try
            {
                if (!memoryCache.TryGetValue(key, out value))
                {
                    var entry = memoryCache.CreateEntry(key);
                    entry.SetValue(value = factory(entry));
                    entry.Dispose();
                    return ((T)value, true);
                }

                return ((T)value, false);
            }
            finally
            {
                if (isOwner)
                    _semaphores.TryRemove(semaphoreKey, out _);
                semaphore.Release();
            }
        }

        public static async Task<T> SetAtomicAsync<T>(this IMemoryCache memoryCache, object key, Func<ICacheEntry, T, T> factory)
        {
            var isOwner = false;
            var semaphoreKey = (memoryCache, key).GetHashCode();
            if (!_semaphores.TryGetValue(semaphoreKey, out var semaphore))
            {
                SemaphoreSlim createdSemaphore = null;
                semaphore = _semaphores.GetOrAdd(semaphoreKey, k => createdSemaphore = new SemaphoreSlim(1)); // Try to add the value, this is not atomic, so multiple semaphores could be created, but just one will be stored!

                if (createdSemaphore != semaphore)
                    createdSemaphore?.Dispose(); // This semaphore was not the one that made it into the dictionary, will not be used!
                else
                    isOwner = true;
            }

            await semaphore.WaitAsync()
                           .ConfigureAwait(false); // Await the semaphore!
            try
            {
                var currentValue = default(T);
                if (memoryCache.TryGetValue(key, out var v))
                    currentValue = (T)v;

                T newValue;
                var entry = memoryCache.CreateEntry(key);
                entry.SetValue(newValue = factory(entry, currentValue));
                entry.Dispose();

                return newValue;
            }
            finally
            {
                if (isOwner)
                    _semaphores.TryRemove(semaphoreKey, out _);
                semaphore.Release();
            }
        }
    }
}
@phatcher

This comment has been minimized.

Copy link

commented Apr 27, 2019

@BladeWise I think you can simplify slightly as although GetOrAdd is not thread safe, it does return a consistent result at least according to this blog post,

So I think

SemaphoreSlim createdSemaphore = null;
locks.GetOrAdd(key, k => createdSemaphore = new SemaphoreSlim(1)); // Try to add the value, this is not atomic, so multiple semaphores could be created, but just one will be stored!
semaphore = locks[key]; // Re-read the value from the dictionary, to ensure the used value is exactly the value stored in the dictionary!

can reduce to

SemaphoreSlim createdSemaphore = null;
semaphore = locks.GetOrAdd(key, k => createdSemaphore = new SemaphoreSlim(1)); // Try to add the value, this is not atomic, so multiple semaphores could be created, but just one will be stored!

You could go the whole hog and use a Lazy to create it which would ensure only one semaphore was ever created.

@BladeWise

This comment has been minimized.

Copy link

commented Apr 27, 2019

@phatcher You are correct, moreover

semaphore = locks[key];

introduces a race condition, causing a key not found exception in case the key is created and removed by another thread, while the current thread performs the double check. So, just the GetOrAdd should be used as you suggested.

@Timovzl

This comment has been minimized.

Copy link

commented May 21, 2019

The result that is added could be a different one which was returned

@sebastienros Are you sure that the returned value can differ from the value that ends up being added to the cache? aspnet/Caching#359 (comment) seems to contradict this:

and this would be the same with MemoryCache.AddOrGetExisting from .net framework. and by same I mean you would get the same value from all threads (even if lambda was called multiple times)

If the latter is true, then the behavior matches ConcurrentDictionary and could be deemed correct and expected.

In that case, we could easily wrap the values in a Lazy with the right LazyThreadSafetyMode if we truly want to prevent ever creating multiple values. And let's face it: creating multiple values simultaneously, with one winner, is generally acceptable. Either we lock and we wait for the initial creator to finish creating, or we perform the same operation simultaneously and end up using their value. Generally not worth fussing over unless the creation is excessively expensive or has side effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.