Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify ConcurrentDictionary use, taking into account GetOrAdd can call creation callback more than once #417

Closed
rclabo opened this issue Feb 9, 2021 · 3 comments · Fixed by #441

Comments

@rclabo
Copy link
Contributor

rclabo commented Feb 9, 2021

Recently I saw that David Fowler, lead architect on the .Net Core team at Microsoft, commented that one of the common issues that he has observed is that when people use ConcurrentDictionary they don't take into account that the GetOrAdd method executes the creation callback more than once.

And this can be a problem if the code is creating something expensive for example.

Apparently this is documented but certainly not the behavior that one would expect. We should review the use of ConcurrentDictionary GetOrAdd calls to ensure we are accounting for this odd behavior.

David_Fowler_comment

@jeme
Copy link
Contributor

jeme commented Feb 16, 2021

David Fowlers statement makes it sound like the implementation would call "the" callback (singular) multiple times, I don't feel like that is quite an accurate way of describing the behavior. In the actual behavior, it makes a bit more sense why this happens.

The basics is that the factory is called outside of a lock, which means if you call GetOrAdd simultaneously on two threads, each or their value factories will be executed.

This may be a good read: https://andrewlock.net/making-getoradd-on-concurrentdictionary-thread-safe-using-lazy/

Usages:

ConcurrentDictionary<TKey, TValue>

Reference source:

GetOrAdd

Lucene.Net:
https://github.com/apache/lucenenet/blob/master/src/Lucene.Net/Support/IO/FileSupport.cs#L363
-> Probably not that big of a deal, paths are always of a limited length so the code should be reasonably fast and produce the same result.

https://github.com/apache/lucenenet/blob/master/src/Lucene.Net/Support/Configuration/EnvironmentVariablesConfigurationProvider.cs#L67
-> Probably not that big of a deal, It should produce the same result and is not likely to be that expensive.

https://github.com/apache/lucenenet/blob/master/src/Lucene.Net/Search/SearcherLifetimeManager.cs#L174
-> Already uses lazy so the worst case scenario here is that we end up creating to Lazy objects, then discarding one of them immediately and never running it's factory.

Lucene.Net.TestFramework:

return configurationCache.GetOrAdd(testDirectory, (key) =>

-> Could be candidate for using Lazy pattern unless we say that this is "TestFramework" and therefore unlikely to cause a problem. Also I did not dive deep into what the builder does and how expensive it is. The fix is not that problematic in the end though.

Lucene.Net.Spatial:

var p = provider.GetOrAdd(FieldName, f => new PointPrefixTreeFieldCacheProvider(m_grid, FieldName, m_defaultFieldValuesArrayLen));

-> Assuming inexpensive call to constructor.

AddOrUpdate

Could not find any usages.

ConditionalWeakTable<TKey, TValue>

Reference source:

GetValue

AddOrUpdate

GetOrCreateValue

LurchTable<TKey, TValue>

GetOrAdd

AddOrUpdate

@NightOwl888
Copy link
Contributor

@jeme

Thanks for the insight. Lazy<T> does seem like a reasonable remedy in most (all?) cases where the call is expensive or needs to be atomic for another reason.

NOTE: I have read that LazyInitializer is faster than Lazy<T>, but given the fact that the former uses static methods I am not sure if we can find a way to make use of it in this scenario.

However, GetOrAdd is not the only API we need to watch out for. There are several usages of APIs of collections that may have been assumed to be atomic where locks have been replaced with these API calls, where the documentation states the calls are made outside of the context of locks.

ConditionalWeakTable<TKey, TValue>

ConcurrentDictionary<TKey, TValue>

LurchTable<TKey, TValue>

In addition, we should expand the search to include all of Lucene.NET's dependencies that we maintain.

Although the documentation doesn't specify that locking is an issue with TryGet, TryAdd, TryRemove, we should probably do a code review on those methods of the above collections (where applicable) as well as any caches that are built using them to ensure we are not missing any important locks.

Just doing a quick survey, I noticed there are some calls that should have locks in J2N and it is suspected that this may be the source of concurrency issues with ThaiTokenizer and ICUTokenizer tests of Lucene.NET, which are known to fail without extra locking that was not part of the original design as a result of missing locks in ICU4N.

@rclabo
Copy link
Contributor Author

rclabo commented Feb 16, 2021

@jeme Great research. Thanks for pointing out Andrew Locks' article which explains the situation well. It was a good read.

NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 12, 2021
… issues with adding new items to the cache and reading RamBytesUsed() method (see apache#417)
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 12, 2021
…e with loading the cache by using Lazy<T>. Fixes apache#319. Also see apache#417.
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 12, 2021
…ing it doesn't matter if multiple threads compete to populate the ConditionalWeakTable. See apache#417.
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 13, 2021
…ovider): Added comments to indicate that multiple threads competing in the valueFactory is okay in these cases. See apache#417.
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 13, 2021
…y: Use Lazy<T> to ensure the configurationCache.GetOrAdd() factory is atomic. See apache#417.
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 13, 2021
…tDictionary to make the valueFactory atomic. See apache#417. Fixes apache#319.
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 13, 2021
… by using GetOrAdd() instead of TryGetValue. See apache#417.
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 13, 2021
… the CleanupTemporaryFiles() method to be more in line with the original Java implementation, including not allowing new files/directories to be added to the queue concurrently with the deletion process. See apache#417.
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 13, 2021
…ag<T> with ConcurrentQueue<T> because we need to be sure the underlying implementation guarantees order and the extra call to Reverse() was just slowing things down. See apache#417.
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 13, 2021
…the reason we use Lazy<T> is to make the create operation atomic. See apache#417.
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 14, 2021
… the CleanupTemporaryFiles() method to be more in line with the original Java implementation, including not allowing new files/directories to be added to the queue concurrently with the deletion process. See apache#417.
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 14, 2021
…ag<T> with ConcurrentQueue<T> because we need to be sure the underlying implementation guarantees order and the extra call to Reverse() was just slowing things down. See apache#417.
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Mar 14, 2021
…the reason we use Lazy<T> is to make the create operation atomic. See apache#417.
NightOwl888 added a commit that referenced this issue Mar 14, 2021
… issues with adding new items to the cache and reading RamBytesUsed() method (see #417)
NightOwl888 added a commit that referenced this issue Mar 14, 2021
…e with loading the cache by using Lazy<T>. Fixes #319. Also see #417.
NightOwl888 added a commit that referenced this issue Mar 14, 2021
…ing it doesn't matter if multiple threads compete to populate the ConditionalWeakTable. See #417.
NightOwl888 added a commit that referenced this issue Mar 14, 2021
…ovider): Added comments to indicate that multiple threads competing in the valueFactory is okay in these cases. See #417.
NightOwl888 added a commit that referenced this issue Mar 14, 2021
…y: Use Lazy<T> to ensure the configurationCache.GetOrAdd() factory is atomic. See #417.
NightOwl888 added a commit that referenced this issue Mar 14, 2021
…tDictionary to make the valueFactory atomic. See #417. Fixes #319.
NightOwl888 added a commit that referenced this issue Mar 14, 2021
… by using GetOrAdd() instead of TryGetValue. See #417.
NightOwl888 added a commit that referenced this issue Mar 14, 2021
… the CleanupTemporaryFiles() method to be more in line with the original Java implementation, including not allowing new files/directories to be added to the queue concurrently with the deletion process. See #417.
NightOwl888 added a commit that referenced this issue Mar 14, 2021
…ag<T> with ConcurrentQueue<T> because we need to be sure the underlying implementation guarantees order and the extra call to Reverse() was just slowing things down. See #417.
NightOwl888 added a commit that referenced this issue Mar 14, 2021
…the reason we use Lazy<T> is to make the create operation atomic. See #417.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants