Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Add optimization for ImmutableList<T>.AddRange #67

Merged

Conversation

PatrickMcDonald
Copy link
Contributor

...when adding non-empty range to non-empty list

The AddRange method in ImmutableList<T> currently adds the items in a loop, rebalancing after each add.

This PR will add the entire enumerable in one batch, giving a significant performance improvement

85% improvement when adding an array to to an immutable list
98% improvement when adding an immutable list to an immutable list

@PatrickMcDonald
Copy link
Contributor Author

Timings here:

Optimized; iterations: 100000; Time: 00:00:00.2002234
Original; iterations: 100000; Time: 00:00:01.4049949
Optimized List; iterations: 100000; Time: 00:00:00.0413923
Original List; iterations: 100000; Time: 00:00:01.9118593

Test code:

var starterList = ImmutableList.CreateRange(Enumerable.Range(5, 100));
var expected = ImmutableList.CreateRange(Enumerable.Range(5, 150));
var diff = Enumerable.Range(105, 50).ToArray();
var diffList = ImmutableList.CreateRange(diff);

// Test
VerifyListsEqual(expected, starterList.AddRange(diff));
VerifyListsEqual(expected, starterList.AddRangeOriginal(diff));

int iterations = 100000;

Time("Optimized", iterations, () => starterList.AddRange(diff));
Time("Original", iterations, () => starterList.AddRangeOriginal(diff));
Time("Optimized List", iterations, () => starterList.AddRange(diffList));
Time("Original List", iterations, () => starterList.AddRangeOriginal(diffList));

private static void Time(string label, int iterations, Action f)
{
    // Warmup
    f.Invoke();

    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    var sw = System.Diagnostics.Stopwatch.StartNew();

    for (int i = 0; i < iterations; i++)
    {
        f.Invoke();
    }

    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    var elapsed = sw.Elapsed;

    Console.WriteLine("{0}; iterations: {1}; Time: {2}", label, iterations, elapsed);
}

@adamralph
Copy link
Contributor

Nice work. I wonder if it's worth adding a performance test project to the solution, to contain these kind of tests. Not for running as part of the build, but for manual invocation whilst tuning performance.

@PatrickMcDonald
Copy link
Contributor Author

I did see some commented-out unit tests for performance elsewhere. I wouldn't mind guidance on how to publish performance tests, on another project I have pushed performance scripts to a separate branch on my own fork just to keep them somewhere.

@adamralph
Copy link
Contributor

Personally, I would put them in a *.Tests.Performance.csproj and write them as xUnit.net tests, and put their invocation in a build target/task which isn't included in the default targets/tasks.

@AArnott
Copy link

AArnott commented Nov 13, 2014

@adamralph These tests aren't run by a target/task anywhere. But the cloud build will run all tests discovered in assemblies that match a particular pattern (I don't know which pattern they have set up). And VS itself will run (by default) all unit tests. So having them enabled in any project will tend to get them run in normal functional runs, which seems undesirable until we have a better user story for that. That's why the existing perf tests have commented out [Fact] attributes.

So for now, I recommend you can keep the perf test methods, but please mark them as skipped or just comment out the attribute as has been done in this project.

@nguerrera
Copy link
Contributor

[Minor note: we're avoiding the Skip feature of xunit as each one causes a build warning in the command line build. If someone knows how to fix that, then we can go back to the Skip feature, but until then we'll continue to comment out the attributes instead.]

@AArnott
Copy link

AArnott commented Nov 13, 2014

@PatrickMcDonald Thank you very much for the pull request and for providing the perf results demonstrating the improvement. I agree this looks great. As for the code review itself, I need to study it more to remind myself how the tree balancing code works to make sure it can compensate from bulk imbalances. I'll report back here when I sign off.

@adamralph
Copy link
Contributor

Ah ok. I'm not a fan of the VS test runner nor CI test discovery for this
reason and others. But if thats the decided approach then your solution
makes sense.
On 13 Nov 2014 17:12, "Nick Guerrera" notifications@github.com wrote:

[Minor note: we're avoiding the Skip feature of xunit as each one causes a
build warning in the command line build. If someone knows how to fix that,
then we can go back to the Skip feature, but until then we'll continue to
comment out the attributes instead.]


Reply to this email directly or view it on GitHub
#67 (comment).

result = this.Mutate(right: newRight);
}

return MakeBalanced(result);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that MakeBalanced appears to only be able to tolerate slightly imbalanced trees (it makes at most a small set of rotations). It assumes that the tree is only imbalanced by one operation. But the code you've added can put a tree significantly out of balance, so MakeBalanced would then need enhancement to compensate.
Whether I'm reading the code wrong or MakeBalanced has to be enhanced, I think we need to see some focused functional tests added to ensure the tree remains balanced after this operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it maybe looks like the performance gains are too good to be true, that's a shame.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this may be fixed by making MakeBalanced recursive? Some of the speed gain will be lost here, but probably not all of it.

@PatrickMcDonald
Copy link
Contributor Author

I've added another commit which applies the same optimization to ImmutableList<T>.InsertRange and also improves test coverage of the two relevant methods. Let me know if you want me to squash these commits, I'm wary of rebasing without notice.

@@ -177,12 +188,13 @@ public void InsertRangeTest()
Assert.Throws<ArgumentOutOfRangeException>(() => list.InsertRange(1, new[] { 1 }));
Assert.Throws<ArgumentOutOfRangeException>(() => list.InsertRange(-1, new[] { 1 }));

list = list.InsertRange(0, new[] { 1, 4, 5 });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid changing existing unit tests. Add new ones instead. Existing unit tests may be deliberately testing very specific cases that aren't obvious and changing them may eliminate the test for that unique case.

@AArnott
Copy link

AArnott commented Nov 13, 2014

@PatrickMcDonald Thanks for your concern about rebasing. As is your default position, please fix issues with additional commits rather than rebase, particularly after you file a pull request.

@lessneek
Copy link

@PatrickMcDonald Why do you get sw.Elapsed after GC.Collect() // and etc. but not right after tested action?

@PatrickMcDonald
Copy link
Contributor Author

@inTagger The idea is that I want to include the garbage collection in the timing, but only that relating to the action I am testing. I read about this in Eric Lippert's series in Tech.Pro http://tech.pro/tutorial/1433/performance-benchmark-mistakes-part-four

@PatrickMcDonald
Copy link
Contributor Author

After fixing the rebalancing, I am getting the following times:

Optimized; iterations: 100000; Time: 00:00:00.3508221
Original; iterations: 100000; Time: 00:00:01.5487822
Optimized List; iterations: 100000; Time: 00:00:00.1190315
Original List; iterations: 100000; Time: 00:00:02.1738423

This equates to ~77% improvement adding arrays, ~95% improvement adding ImmutableLists

@AArnott
Copy link

AArnott commented Nov 15, 2014

Great. I'll review soon and get back to you.

@PatrickMcDonald
Copy link
Contributor Author

Thanks @AArnott

internal Node InsertRange(int index, IEnumerable<T> keys)
{
Requires.Range(index >= 0 && index <= this.Count, "index");

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires.NotNull(keys, "keys");

@AArnott
Copy link

AArnott commented Nov 16, 2014

Curious: did your tests fail with your first iteration before the extra balancing code?

int batchSize = 32;
for (int i = 0; i < 128; i++)
{
list = list.AddRange(Enumerable.Range(batchSize * i + 1, batchSize));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add your VerifyBalanced(list.Root); here within the loop as well?

@AArnott
Copy link

AArnott commented Nov 16, 2014

Patrick, I added some verifications of my own (based on my own CR comments). A new test I wrote fails in your VerifyBalanced method. Do you want to merge into your branch and check it out?

https://github.com/AArnott/corefx/tree/immutablelist-addrange-optimization

@AArnott
Copy link

AArnott commented Nov 16, 2014

@PatrickMcDonald The good news is the test I added fails even before your change. So once we get it figured out (either you or I... whoever gets to it first) we should be able to merge your change.
Thanks again for your contribution. I'm really looking forward to the perf gain you've created here.

AArnott and others added 2 commits November 16, 2014 09:01
InsertRangeRandomBalanceTest currently fails on all of my runs because VerifyBalanced finds height differences of 2 or -2.
One failing run is with rand seed: 1135977305
 - This fixes failing InsertRangeRandomBalanceTest
Also refactor AddRangeBalanceTest to add random batches and verify
  balance after each
@PatrickMcDonald
Copy link
Contributor Author

@AArnott The current Insert method currently has a bug where it does not always keep the tree balanced. Your InsertRange test used the builder to insert ranges, which added nodes in a loop rather than calling insertRange. Changing the builder to use AddRange and InsertRange fixes this.

I'm guessing that the InsertRange bug should be fixed in a separate pull request, I have a simple failing test and a fix for this, however it uses changes in this pull request (specifically the adding of the internal Root property, and also the VerifyBalanced helper method)

Alternatively I can add a failing test and a fix in this PR, please let me know what you would like me to do.

@AArnott
Copy link

AArnott commented Nov 16, 2014

Sounds awesome.
I'll accept either combined or individual pull requests. I would encourage individual pull requests because then if we have to revert this larger one (say, if a bug is discovered) we haven't lost the goodness of the simple bug fix. If you do separate them out, please merge the new smaller branch into this one rather than rebase this one since we try to avoid rebasing pushed commits. :)

@AArnott
Copy link

AArnott commented Nov 16, 2014

Actually hang on, I think I misunderstood your last comment. Yes, let's have one pull request that changes both the ImmutableList and its Builder to call your new methods on Node. But it seems we have a bug in Node.Insert itself if it's leaving the tree at a +/- 2 level of imbalance (unless that is actually by design, which I need to research).

@AArnott
Copy link

AArnott commented Nov 16, 2014

It looks like Node.MakeBalanced was buggy, and your new BalanceNode already has the fix (to call Balance directly instead of using the IsLeftHeavy and IsRightHeavy methods. I'm preparing a fix now.

@PatrickMcDonald
Copy link
Contributor Author

I have also started working on a fix for the MakeBalanced method, would I be allowed the opportunity to fix it?

@PatrickMcDonald
Copy link
Contributor Author

PR created for fix, depending on which PR is accepted first I can rebase the other, as there is shared code between the 2 PRs. I assume this PR will have to be rebased anyway, based on suggestions to squash commits to tidy up history in other PRs.

@AArnott
Copy link

AArnott commented Nov 16, 2014

Thanks, @PatrickMcDonald. I will merge #104 first. Please do not rebase this pull request. I don't believe in rebasing pushed commits as it screws up everybody else's history that have already included your commits. We should instead merge the updated master into the branch tracking this pull request and resolve merge conflicts.
I've already built several commits on top of yours for this pull request, including moving some of your verification helper methods into another class so we can apply similar fixes to other collections, so I'll take care of the merge for this pull request and resolving conflicts after we merge #104.

You've really been great in preparing this and working with me to get this ready for taking into the product. I look forward to seeing more from you!

@PatrickMcDonald
Copy link
Contributor Author

Thanks @AArnott, that was alot of fun - hopefully I can find more stuff to do!

@AArnott
Copy link

AArnott commented Nov 17, 2014

Hmmm... I guess the branch from you I merged didn't have your most recent commit, so this pull request wasn't closed by the merge. Do you have time to merge master into your branch and resolve conflicts?

@PatrickMcDonald
Copy link
Contributor Author

Okay I've never done a merge before, please tell me I didn't mess that up!

@dotnet-bot dotnet-bot merged commit a6653da into dotnet:master Nov 17, 2014
@PatrickMcDonald PatrickMcDonald deleted the immutablelist-addrange-optimization branch November 17, 2014 08:52
@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016
filipnavara pushed a commit to filipnavara/corefx that referenced this pull request Jun 1, 2018
Fix exporting of SystemNative_GetNonCryptographicallySecureRandomBytes
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
8 participants