Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Dictionary<K,V>.Remove during enumeration #26314

Closed
danmoseley opened this issue May 30, 2018 · 21 comments
Closed

Allow Dictionary<K,V>.Remove during enumeration #26314

danmoseley opened this issue May 30, 2018 · 21 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections good first issue Issue should be easy to implement, good for first-time contributors
Milestone

Comments

@danmoseley
Copy link
Member

Rationale

Removing all entries in a Dictionary satisfying a predicate is a fairly common requirement but it is unnecessarily inefficient.

Motivation

This came up recently in the context of the efficiency of a 1st party service.

Details

At present to remove all entries in a dictionary satisfying a predicate you must do something like

var removes = new List<object>();
foreach(var entry in dict)
{
    if (predicate(entry.Key))
            removes.Add(entry.Key);
}
foreach(var entry in removes)
{
    dict.Remove(entry);
}

or equivalently in Linq

foreach (var entry in dict.Where(entry => predicate(entry.Key)).ToList() ) 
{
  dict.Remove(entry.Key);
}

This is O(2n) plus the cost of allocating and resizing the list, in about 10 lines. Also, the second pass through the dictionary requires calculating a hashcode for every remove.

If we offered a Remove overload on Dictionary<K,V> that accepted a predicate, we could do it in O(n) with no list, 1 line, with no hash code computation:

dict.Remove(predicate);

Essentially this is because internally we are able to safely remove from the dictionary while iterating over it forwards, while the public iterator will throw if you attempt to do this.

Proposed API

namespace System.Collections.Generic
{
    public class Dictionary<TKey, TValue>
    {
        public int RemoveAll(Predicate<TKey> match) { throw null; }
        public int RemoveAll(Predicate<KeyValuePair<TKey, TValue>> match) { throw null; }
    }
}

Existing API

Here are existing precedents:

    public class List<T> 
    {
        public int RemoveAll(Predicate<T> match) { throw null; } 
    }

    public partial class SortedSet<T>
    {
        public int RemoveWhere(Predicate<T> match) { throw null; }
    }

    public class Hashset<T>
    {
        public int RemoveWhere(Predicate<T> match) { throw null; }
    }

The return value is the number of removes. That's easy for us to compute and potentially useful, and matches the existing method on List<T>.

We do not return a list of the items removed as that means we have the allocation back. (On the immutable collections, the RemoveAll methods do return the list, but they can do it without allocating.)

Open issues

  1. Is the overload that takes key and value worthwhile? I don't have a scenario, but it seems reasonable.
  2. Should this be on any other collections? I suggest not (unless scenario arises).
    * SortedList<TKey, TValue> copies half of its storage on every single remove, with a predicate it could keep a temporary list of holes it removed, and then coalesce in one go. With enough holes, that might be faster than Array.Copy for each remove.
    *SortedDictionary<TKey, TValue> would need investigation to figure out whether it is possible to safely enumerate its backing tree while modifying it.

[edit - changed to add existing API, and to match List<T>]

@terrajobst
Copy link
Member

terrajobst commented Jun 12, 2018

Looks good. A few minor changes:

  • Let's not expose RemoveAll on sets as they already have RemoveWhere.
  • Let's not expose RemoveAll with the delegate just taking the key as this might result in people indexing in the delegate, causing a double-lookup.
  • Ideally, reading from the dictionary during execution of RemoveAll should work.
  • We must handle the case where the predicate modifies the dictionary. We must throw reliably.
namespace System.Collections.Generic
{
    public class Dictionary<TKey, TValue>
    {
        public int RemoveAll(Predicate<KeyValuePair<TKey, TValue>> match);
    }
}

@zakaluka
Copy link

Would it be accurate to say that the following is required?

  1. For the following directories in src, add RemoveAll to collections that don't already have a similar function (like sRemoveAll or RemoveWhere)
    • System.Collections
    • System.Collections.Specialized
    • System.Collections.NonGeneric
    • System.Collections.Immutable
    • System.Collections.Concurrent
  2. For all dictionary collections, add a Remove(key) method if it doesn't have it already
  3. Add corresponding tests for all new methods

@danmoseley
Copy link
Member Author

danmoseley commented Jun 16, 2018

@zakaluka welcome. Actually we're quite specific about new API, and the review above is asking for an addition only to Dictionary, as shown: https://github.com/dotnet/corefx/issues/29979#issuecomment-396678693

In terms of process it takes 2 PR's because it is in CoreCLR and the tests are in CoreFX. It is not hard at all though. You'll want to fork both repos locally. Then

1 Check out the developer guide for CoreFX and CoreCLR
2. In CoreCLR, make a local change to add the API to Dictionary here and build CoreCLR from the root.
3. In CoreFX, add it to the reference surface area here. Also in CoreFX add tests for it here
4. Build CoreFX, but patching in your modified CoreCLR, like this. Basically if you run build.cmd or build-tests.cmd you add -- /p:CoreCLROverridePath=somepath and if you run msbuild you add /p:CoreCLROverridePath=somepath. This will cause your tests to run against your implementation,

After that you will probably need to iterate a few times to get everything the way you want it and tests passing. Tip: in both repos, after building the first time from the root with build after later changes you can build directly in the .csproj folder with msbuild.

When you are happy please put up a PR in CoreCLr and a PR in CoreFX to match, and link each to the issue and each other so we can take a look!

@danmoseley
Copy link
Member Author

@zakaluka I sent you a collaborator invite. If you accept, I can assign you this issue. It is OK to use it for learning, we are in no hurry.

BTW if you accept collaborator, you will be auto subscribed to the repo. You probably want to switch that off as it's a lot of email.

@zakaluka
Copy link

@danmosemsft Thank you. I've accepted the invite and can definitely work on this issue. I will start with setting up the environments, then I'll dive into the files you point to.

@danmoseley
Copy link
Member Author

Great, it's yours. Ping if you get stuck!

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 16, 2018

@zakaluka I've been looking into this and another issue on dictionary, I've recently worked my way through setting up coreclr and corefx so if you want a hand let me know.

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 20, 2018

@danmosemsft I've got as far as step 4 but i'm hitting a problem with running the perf tests. Specifically the command:

..\..\build.cmd . -Release -- /p:CoreCLROverridePath=E:\Programming\csharp7\coreclr\bin\Product\Windows_NT.x64.Release\

results in:

Using E:\Programming\csharp7\corefx\bin\testhost\netcoreapp-Windows_NT-Release-x64\ as the test runtime folder.
  Executing in E:\Programming\csharp7\corefx\bin\tests\System.Collections.Performance.Tests\netcoreapp-Windows_NT-Release-x64\
  ----- start 15:25:56.30 ===============  To repro directly: =====================================================
  pushd E:\Programming\csharp7\corefx\bin\tests\System.Collections.Performance.Tests\netcoreapp-Windows_NT-Release-x64\
  call E:\Programming\csharp7\corefx\bin\testhost\netcoreapp-Windows_NT-Release-x64\\dotnet.exe xunit.console.netcore.exe System.Collections.Performance.Tests.dll
   -xml testResults.xml -notrait Benchmark=true -notrait category=nonnetcoreapptests -notrait category=nonwindowstests  -notrait category=OuterLoop -notrait catego
  ry=failing
  popd
  ===========================================================================================================
  Error initializing the dependency resolver: A fatal error was encountered, missing dependencies manifest at: E:\Programming\csharp7\corefx\bin\testhost\netcoreap
  p-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\Microsoft.NETCore.App.deps.json
  ----- end 15:25:56.31 ----- exit code -2147450741 ----------------------------------------------------------

It's right that the file it wants doesn't exist, either in the testhosts dir in corefx or the source in my coreclr build. I don't know what is supposed to produce it.

@danmoseley
Copy link
Member Author

@joperezr any idea? I've never seen this.
@Wraith2 try clean -all at the root and try again?

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 20, 2018

Done, no change. There are a number of build errors but nothing directly related to what i'm doing, the usual epsilon stuff, some avx apicompat and something in windowsresourcemanager, nothing that should really affect what i'm doing in collections. The Coreclr build has nothing but a few warnings related to tests that haven't been updated to the 2.2 runtime yet. Overall pretty clean so i'm not sure what I might have got wrong.

@joperezr
Copy link
Member

That file gets generated as part of the build of corefx when calling a full build.cmd that executes with no errors, so if you had errors you won't have the deps file. You can check more info of how we generate the file here:
https://github.com/dotnet/corefx/blob/618ea91f4fdcdccbad87926af5094597c6d78255/src/dirs.proj#L19-L40

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 21, 2018

So I can't build a subproject because I can't build the entire tree? that doesn't make a lot of sense. How am I supposed to get into a state where I can build anything? Wait and hope the entire tree builds error free at some point?

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 22, 2018

I've got everything building after lots of messing around with git merging and rebasing. So now I can try some tests. The first thing I wanted to try was the concurrent access protection that has been put into the other remove methods.

The good news is that RemoveAll doesn't seem to be susceptible to the existing test, the bad news is I can't see if there needs to be some more complicated test. RemoveAll doesn't use the buckets to find the item so it doesn't get caught as easily but it will still iterate the entries next list so feasibly it could get caught I just don't know how to put it into that state. Could you take a look @MarcoRossignoli since you've recently been deep in these internals perhaps you'll see what I can't.

Implementation with test and bench harness is here. The RemoveAll method implementation is here

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Jun 22, 2018

Hi @Wraith2!

RemoveAll doesn't use the buckets to find the item so it doesn't get caught as easily but it will still iterate the entries next list so feasibly it could get caught I just don't know how to put it into that state

I think that also RemoveAll could break , here we could have a loop!

I just don't know how to put it into that state

I added new tests yesterday, have you tried to put your row here https://github.com/dotnet/corefx/blob/master/src/System.Collections/tests/Generic/Dictionary/Dictionary.Generic.Tests.ConcurrentAccessDetection.netcoreapp.cs#L34 ?

Implementation with test and bench harness is here. The RemoveAll method implementation is here

Why do you move dic code on your repo and not fork from corefx?

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 22, 2018

Yup, first thing I tried, my version is here. I was really hoping I could just use your test and not have to do additional work ;) That file is just a copy of yours with my method added in and then some fiddling with the reflection to try other states on the entries array. It passes your test (where I expected it not to) which is why i'm looking for another way to break it.

The place you point out is the location that concerns me but I can't find a way to setup a loop there.

I was working in it in a local project just because it's faster than working in the full corefx repo. when I've got the behaviour correct I can port only the required changes back into coreclr/corefx and produce pull requests from there.

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Jun 22, 2018

Yup, first thing I tried, my version is here

Ahhh sorry you're right...i did't browse all files, missed test.cs

The place you point out is the location that concerns me but I can't find a way to setup a loop there.

Next days i'll clone your repo and try to break...if you find a "solution" before let me know!

@MarcoRossignoli
Copy link
Member

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 24, 2018

Thanks for that, the test worked in the ValueType case but not for ref types. I worked out a way to alter the entries inside the callback from the match predicate which works for both. I've left your tests alone and added my own derived ones in the same file. PR's are above if you want to check them out, thanks for the help.

@jkotas
Copy link
Member

jkotas commented Jun 25, 2018

I am wondering whether it would be better to address the scenario that this is trying to solve by removing the _version bump from Remove method instead (ie allowing Remove to be called during enumeration in general). It would be more flexible (e.g. you can abort the enumeration anytime), allocation free (no need to pack the closure into a delegate), and not introduce code bloat into the Dictionary. Thoughts?

@danmoseley
Copy link
Member Author

@Wraith2 I sent you a collaborator invite so I can assign you this issue - if you accept, it will subscribe you to all changes in the repo, you'll probably want to switch that off.

@Wraith2
Copy link
Contributor

Wraith2 commented Jul 10, 2018

New PR's made

@danmoseley danmoseley changed the title Add Dictionary<K,V>.Remove(predicate) Allow Dictionary<K,V>.Remove during enumeration Jul 29, 2018
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
AlexUstinov added a commit to servicetitan/dataobjects-net that referenced this issue Apr 22, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections good first issue Issue should be easy to implement, good for first-time contributors
Projects
None yet
Development

No branches or pull requests

8 participants