-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Dictionary<K,V>.Remove during enumeration #26314
Comments
Looks good. A few minor changes:
namespace System.Collections.Generic
{
public class Dictionary<TKey, TValue>
{
public int RemoveAll(Predicate<KeyValuePair<TKey, TValue>> match);
}
} |
Would it be accurate to say that the following is required?
|
@zakaluka welcome. Actually we're quite specific about new API, and the review above is asking for an addition only to Dictionary, as shown: https://github.com/dotnet/corefx/issues/29979#issuecomment-396678693 In terms of process it takes 2 PR's because it is in CoreCLR and the tests are in CoreFX. It is not hard at all though. You'll want to fork both repos locally. Then 1 Check out the developer guide for CoreFX and CoreCLR After that you will probably need to iterate a few times to get everything the way you want it and tests passing. Tip: in both repos, after building the first time from the root with When you are happy please put up a PR in CoreCLr and a PR in CoreFX to match, and link each to the issue and each other so we can take a look! |
@zakaluka I sent you a collaborator invite. If you accept, I can assign you this issue. It is OK to use it for learning, we are in no hurry. BTW if you accept collaborator, you will be auto subscribed to the repo. You probably want to switch that off as it's a lot of email. |
@danmosemsft Thank you. I've accepted the invite and can definitely work on this issue. I will start with setting up the environments, then I'll dive into the files you point to. |
Great, it's yours. Ping if you get stuck! |
@zakaluka I've been looking into this and another issue on dictionary, I've recently worked my way through setting up coreclr and corefx so if you want a hand let me know. |
@danmosemsft I've got as far as step 4 but i'm hitting a problem with running the perf tests. Specifically the command:
results in:
It's right that the file it wants doesn't exist, either in the testhosts dir in corefx or the source in my coreclr build. I don't know what is supposed to produce it. |
Done, no change. There are a number of build errors but nothing directly related to what i'm doing, the usual epsilon stuff, some avx apicompat and something in windowsresourcemanager, nothing that should really affect what i'm doing in collections. The Coreclr build has nothing but a few warnings related to tests that haven't been updated to the 2.2 runtime yet. Overall pretty clean so i'm not sure what I might have got wrong. |
That file gets generated as part of the build of corefx when calling a full build.cmd that executes with no errors, so if you had errors you won't have the deps file. You can check more info of how we generate the file here: |
So I can't build a subproject because I can't build the entire tree? that doesn't make a lot of sense. How am I supposed to get into a state where I can build anything? Wait and hope the entire tree builds error free at some point? |
I've got everything building after lots of messing around with git merging and rebasing. So now I can try some tests. The first thing I wanted to try was the concurrent access protection that has been put into the other remove methods. The good news is that RemoveAll doesn't seem to be susceptible to the existing test, the bad news is I can't see if there needs to be some more complicated test. RemoveAll doesn't use the buckets to find the item so it doesn't get caught as easily but it will still iterate the entries next list so feasibly it could get caught I just don't know how to put it into that state. Could you take a look @MarcoRossignoli since you've recently been deep in these internals perhaps you'll see what I can't. Implementation with test and bench harness is here. The RemoveAll method implementation is here |
Hi @Wraith2!
I think that also RemoveAll could break , here we could have a loop!
I added new tests yesterday, have you tried to put your row here https://github.com/dotnet/corefx/blob/master/src/System.Collections/tests/Generic/Dictionary/Dictionary.Generic.Tests.ConcurrentAccessDetection.netcoreapp.cs#L34 ?
Why do you move dic code on your repo and not fork from corefx? |
Yup, first thing I tried, my version is here. I was really hoping I could just use your test and not have to do additional work ;) That file is just a copy of yours with my method added in and then some fiddling with the reflection to try other states on the entries array. It passes your test (where I expected it not to) which is why i'm looking for another way to break it. The place you point out is the location that concerns me but I can't find a way to setup a loop there. I was working in it in a local project just because it's faster than working in the full corefx repo. when I've got the behaviour correct I can port only the required changes back into coreclr/corefx and produce pull requests from there. |
Ahhh sorry you're right...i did't browse all files, missed test.cs
Next days i'll clone your repo and try to break...if you find a "solution" before let me know! |
Thanks for that, the test worked in the ValueType case but not for ref types. I worked out a way to alter the entries inside the callback from the match predicate which works for both. I've left your tests alone and added my own derived ones in the same file. PR's are above if you want to check them out, thanks for the help. |
I am wondering whether it would be better to address the scenario that this is trying to solve by removing the |
@Wraith2 I sent you a collaborator invite so I can assign you this issue - if you accept, it will subscribe you to all changes in the repo, you'll probably want to switch that off. |
New PR's made |
…merator behavior change (See dotnet/runtime#26314)
Rationale
Removing all entries in a Dictionary satisfying a predicate is a fairly common requirement but it is unnecessarily inefficient.
Motivation
This came up recently in the context of the efficiency of a 1st party service.
Details
At present to remove all entries in a dictionary satisfying a predicate you must do something like
or equivalently in Linq
This is O(2n) plus the cost of allocating and resizing the list, in about 10 lines. Also, the second pass through the dictionary requires calculating a hashcode for every remove.
If we offered a Remove overload on Dictionary<K,V> that accepted a predicate, we could do it in O(n) with no list, 1 line, with no hash code computation:
Essentially this is because internally we are able to safely remove from the dictionary while iterating over it forwards, while the public iterator will throw if you attempt to do this.
Proposed API
Existing API
Here are existing precedents:
The return value is the number of removes. That's easy for us to compute and potentially useful, and matches the existing method on
List<T>
.We do not return a list of the items removed as that means we have the allocation back. (On the immutable collections, the RemoveAll methods do return the list, but they can do it without allocating.)
Open issues
*
SortedList<TKey, TValue>
copies half of its storage on every single remove, with a predicate it could keep a temporary list of holes it removed, and then coalesce in one go. With enough holes, that might be faster than Array.Copy for each remove.*
SortedDictionary<TKey, TValue>
would need investigation to figure out whether it is possible to safely enumerate its backing tree while modifying it.[edit - changed to add existing API, and to match
List<T>
]The text was updated successfully, but these errors were encountered: