-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Add GetOrAddAsync to ConcurrentDictionary #83636
Comments
Tagging subscribers to this area: @dotnet/area-system-collections Issue DetailsBackground and motivation
The current proposal is pretty straightforward, the idea is to introduce To be more consistent I also propose to add API Proposalnamespace System.Collections.Concurrent;
public class ConcurrentDictionary<TKey, TValue> : IDictionary<TKey, TValue>, IDictionary, IReadOnlyDictionary<TKey, TValue> where TKey : notnull
{
public TValue GetOrAdd(TKey key, Func<TKey, TValue> valueFactory)
+ public Task<TValue> GetOrAddAsync(TKey key, Func<TKey, Task<TValue>> valueFactory)
public TValue GetOrAdd<TArg>(TKey key, Func<TKey, TArg, TValue> valueFactory, TArg factoryArgument)
+ public Task<TValue> GetOrAddAsync<TArg>(TKey key, Func<TKey, TArg, Task<TValue>> valueFactory, TArg factoryArgument)
...
public TValue AddOrUpdate<TArg>(TKey key, Func<TKey, TArg, TValue> addValueFactory, Func<TKey, TValue, TArg, TValue> updateValueFactory, TArg factoryArgument)
+ public Task<TValue> AddOrUpdateAsync<TArg>(TKey key, Func<TKey, TArg, Task<TValue>> addValueFactory, Func<TKey, TValue, TArg, Task<TValue>> updateValueFactory, TArg factoryArgument)
public TValue AddOrUpdate(TKey key, Func<TKey, TValue> addValueFactory, Func<TKey, TValue, TValue> updateValueFactory)
+ public Task<TValue> AddOrUpdateAsync(TKey key, Func<TKey, Task<TValue>> addValueFactory, Func<TKey, TValue, Task<TValue>> updateValueFactory)
public TValue AddOrUpdate(TKey key, TValue addValue, Func<TKey, TValue, TValue> updateValueFactory)
+ public Task<TValue> AddOrUpdateAsync(TKey key, TValue addValue, Func<TKey, TValue, Task<TValue>> updateValueFactory)
} API Usagevar results = new ConcurrentDictionary<int, string>();
await Parallel.ForEachAsync(Enumerable.Range(1, 1000), async (i, _) =>
{
await results.GetOrAddAsync(i, GetSomethingAsync);
});
async Task<string> GetSomethingAsync(int i)
{
await Task.Delay(10);
return "Test";
} Alternative DesignsAdd another class that would be async by nature? Dot not use RisksI'm not sure it scales well to add an overload for each sync method.
|
So those overloads would complete synchronously in case the factories aren't called, is that right? In such case, those methods should return |
.... There's nothing preventing using |
The async here is for the case when the factory callback performs async operations like web requests. |
From the documentation for
.... which means that, if we keep this behavior, then simply storing the
Do note that because of the above, whatever web requests you're making would need to be idempotent (even in the case that these methods were implemented) |
What is the expected behavior of the proposed APIs? Are the |
Wasn't clear to me if I should use a ValueTask or a Task, now it is. Thx!
It's mainly to have a nice syntax, similar to the sync one. BTW you'll want to await if you need the result right away: var result = await cache.GetOrAddAsync(key, GetSomethingAsync);
You're right, that's a good counter argument to not add those overloads 🤔
I didn't think much about |
I've had this gist, that I've been using in various personal projects for a while. |
With using System.Collections.Concurrent;
using DotNext.Threading;
var results = new ConcurrentDictionary<int, AsyncLazy<string>>();
await Parallel.ForEachAsync(Enumerable.Range(1, 100), async (i, cancellationToken) =>
{
var result = await results.GetOrAdd(0, x => new AsyncLazy<string>(async t =>
{
Console.WriteLine($"Compute {x}");
await Task.Delay(10, t);
return $"Result {x}";
})).WithCancellation(cancellationToken);
Console.WriteLine(result);
}); @davidfowl Could you explain why in your gist you don't insert in the dictionary the factory task directly? EDIT: It's fine with Lazy too: using System.Collections.Concurrent;
var results = new ConcurrentDictionary<int, Lazy<Task<string>>>();
await Parallel.ForEachAsync(Enumerable.Range(1, 100), async (i, cancellationToken) =>
{
var result = await results.GetOrAdd(0, x => new Lazy<Task<string>>(async () =>
{
Console.WriteLine($"Compute {x}");
await Task.Delay(10);
return $"Test {x}";
})).Value;
Console.WriteLine(result);
}); |
Humm it's linked to "//The factory method will only run once." ? You return the task so that callers can await this one directly? Aren't you missing a lock so that only one EDIT: ok I see the |
Imagine that you have a ConcurrentDictionary<string, int> dict = new();
PeriodicTime timer = new(TimeSpan.FromSeconds(1));
while (await timer.WaitForNextTickAsync())
dict.AddOrUpdate("Key1", 1, (_, existing) => existing + 1); Then you add another asynchronous flow, that uses the proposed API await dict.AddOrUpdateAsync("Key1", 1, async (_, existing) =>
{
await Task.Delay(TimeSpan.FromSeconds(2));
return existing - 1;
}); The second asynchronous flow will deadlock, because the |
I don't foresee us adding async methods to ConcurrentDictionary. The existing helpers like GetOrAdd are there only because they represent a significant use case, but they can also be implemented on top of the primitive TryGetValue/TryAdd/TryRemove/TryUpdate methods, as can the proposed methods, which can also use those same helpers. For example, the proposed public static async ValueTask<TValue> GetOrAddAsync<TKey, TValue>(this ConcurrentDictionary<TKey, TValue> d, TKey key, Func<TKey, ValueTask<TValue>> valueFactory) =>
d.TryGetValue(key, out TValue value) ? value : d.GetOrAdd(key, await valueFactory(key)); If you want different semantics, obviously you can compose it however you'd like. |
This is a very common caching use case (that's why the MemoryCache has an extension method), especially when it prevents the cache stampede problem. I don't know if that means we should add it to concurrent dictionary, but I'd reckon it's common enough to put somewhere in the core libraries. |
So you would go with a locking strategy similar to the one in your gist then? I feel like it's a nicer behavior but it's not consistent with the original
Yep that's the initial goal at my proposal, make it built-in. An extension method would be fine too, a little less discoverable. When migrating, code from sync to async, you have a lot of modifications to be made and I believe it's important that the framework helps the migration path. Async is viral so when you start modifying one method you end modifying a dozen of other methods and if it ends up modifying too much code, you can be lazy and do a "sync over async".
Maybe I should remove Should I just edit the description / title? |
Your gist's solution to that is a completely different method than what's being proposed and by its very nature isn't one that can be added to CD but would need to remain an extension method as it operates on a specific TValue only. If there's such an alternate proposal to discuss, we can, but that's not what this issue was proposing nor what I was answering. It also is one specific take on how to address that problem, and means that you don't get the alternate behavior you might want sometimes, of actually allowing concurrent adds to compete, not serializing behind a potential failure, not falling all concurrent requests when the first fails, etc. And its behavior diverges from the sync counterpart beyond just using asynchrony, and such divergence is something we strive to avoid as it means the behavior of code changes just by going async. |
I am not sure what is the protocol here about updating API proposals, but my guess is that removing the The |
I just edited the description / title. As for using a It's just a little less obvious when migrating code from sync to async and if the dictionary is still used by some sync code you would need to wrap with a Task. |
That's not a given, and the existing sync |
Based on the source code, it is. First the
The "called multiple times" could mean either multiple times on the current thread, or multiple times on multiple threads (but at most once per thread). In reality the second meaning is the correct one. |
Ah, I missed the "each asynchronous flow" part of your comment. So yes. That doesn't really solve anything,, though - generally the problem is going to be that it gets invoked multiple times at all, not 'single time per thread'. |
@Clockwork-Muse pardon me a small digression. Some time ago we had a long discussion about the expected effects of a delegate failure. What would be your expectation in case the |
Uh. Normally it's not an issue for |
@Clockwork-Muse yep, it would be unusual to use the |
I guess it depends on the use case, you might want to put a special value in the dictionary when it fails. For instance you get a
If the |
@jairbubbles - Maybe. Depending on implementation of the collection, if the factory throws an error it could permanently corrupt the collection in subtle (or hopefully not-so subtle) ways. This is almost always the case for single-threaded collections that accept lambdas (since usually if those fail your entire program is borked anyways). The problem for a collection like this is what happens if the collection is implemented in such a way that while the first call is running a second call comes in and gets the task from the first call (as it does in that example gist). The item has been added, so it now returns an exception for that item when you Is that important? 🤷 What do you want the application to do? |
The gist is for a
My use case was pretty simple and switching to a So the real question would be: do we want to help users in that migration path? Is it important to make their life easier? I would say it's the essence of .NET 😊 |
The gist provided earlier explicitly sets things up so multiple callers will wait on the same task. You're right you can't have multiple initiations of the same task, but (for |
This is correct. Waiting the same
Hmm, why did you prefer David Fowler's gist over Stephen Toub's one-liner? Stephen Toub's |
My current use case is very simple, all the parallel tasks are inserting with their own key so I don't need David Fowler's gist. With no extension method one can write: var results = new ConcurrentDictionary<int, Task<string>>();
await Parallel.ForEachAsync(Enumerable.Range(1, 100), async (i, cancellationToken) =>
{
await results.GetOrAdd(i, async x =>
{
await Task.Delay(10);
return $"Test {x}";
});
}); (see https://dotnetfiddle.net/OcMxbM) With Stephen Toub's one liner I can write: var results = new ConcurrentDictionary<int, string>();
await Parallel.ForEachAsync(Enumerable.Range(1, 100), async (i, cancellationToken) =>
{
await results.GetOrAddAsync(i, async x =>
{
await Task.Delay(10);
return $"Test {x}";
});
}); (see https://dotnetfiddle.net/cE1Ma6) The original code was sync and looked like that: var results = new ConcurrentDictionary<int, string>();
Parallel.ForEach(Enumerable.Range(1, 100), i =>
{
results.GetOrAdd(i, x =>
{
Thread.Sleep(10);
return $"Test {x}";
});
}); (see https://dotnetfiddle.net/TNptVk) With the second option I don't change |
So, assuming that the goal was to migrate from sync to async as smoothly as possible, why did you switch from the original simple |
In fact my first migration was like that: var results = new ConcurrentDictionary<int, string>();
await Parallel.ForEachAsync(Enumerable.Range(1, 100), async (i, cancellationToken) =>
{
var result = await DoSomethingAsync(project);
results .TryAdd(project, result);
}); As I don't need to get, I know it's not in the dictionary. It's a bit of a digression but the more I think about it, I feel like we should be able to wrap the result of the var results =await Parallel.ForEachAsync(Enumerable.Range(1, 100), DoSomethingAsync);
// And results would wrap that in a dictionary so that I can access the result efficiently |
This has been proposed (and rejected) in: Provide a result returning Parallel.ForEachAsync overload. |
😮 But it was returning an |
That proposal includes |
EDIT: remove
AddOrUpateAsync
based on comments in the thread. (see @theodorzoulias's comment) + return aValueTask
Background and motivation
ConcurrentDictionary.GetOrAdd
is pretty useful to store results in a thread safe way. But when your code is async, it's missing an overload to pass it a task.The current proposal is pretty straightforward, the idea is to introduce
GetOrAddAsync
methods that would use aTask<TValue>
rather than aValue
.To be more consistent I also propose to addAddOrUpdateAsync
overloads.API Proposal
API Usage
Alternative Designs
Use an extension method as the base class exposes everything that's needed. (see @stephentoub's comment)
Add another class that would be async by nature?Use a
Dictionary<TKey, Task<TValue>>
rather than aDictionary<TKey, TValue>
(see @Clockwork-Muse's comment)Dot not use
ConcurrentDictionary
when your code is async? Though I would add that when migrating code from sync to async you don't want to change everything.Risks
I'm not sure it scales well to add an overload for each sync method.
The text was updated successfully, but these errors were encountered: