[WIP ] System.Linq optimisations via architecture change #34208

manofstick · 2018-12-21T20:14:46Z

(update: sorry if people misinterpreted my message here. This is a radical overhaul of the internal implementation of the System.Linq namespace. It is not complete yet, although I was hoping to spark interest and if someone was excited by the possibilities then maybe get some help finishing this prototype. I am under no illusions that such a radical change has only a small possibly of ultimately being merged, but feel that it may be possible. I feel that this is possible because the API guidelines for the rest of .net explicitly and recommend against use of Linq because even in its present state it is too garbage heavy - even when effort has been put in to minimise. So if we choose to just accept that Linq is garbage heavy and then look for alternative solutions then... Well, you have this...)

This is a replacement architecture for System.Linq.

This is a proposal/proof-of-concept - think of it as a Christmas gift - hack-upon it during the holidays to see what you think!

This is not complete - but as it is "plug compatible" existing implementations of IEnumerables work seemlessly (although those are areas where performance may not be optimial - note also that I have gutted all the previous performance enhancing code).

This relies heavily on generics and has thrown the thought of minimizing garbage out (i.e. things like the initial IEnumerator being the same object as the IEnumerable).

It is slower on smaller collections (up to maybe a few 10s of items? This is mainly due to the lookups in JIT_VirtualFunctionPointer due to the generics use) but faster on larger collections (where the JIT_VirtualFunctionPointer cost drops away). Where the existing System.Linq has a specialized optimization it is pretty hard to beat though. ChainLinq currently doesn't optimize the ElementAt, Last, etc. familiy of optimizations.

So there are plenty of reasons to be skeptical, but maybe, just maybe, it could see the light of day on day!

OK, some details.

So why is existing Linq performance bad and what are the current performance work around? This really boils down to two key ideas. The first is that in abstracting away from the underlying collection means that you loose the benefits benefits of how to access that collection, and secondly, the access through MoveNext and Current is quite chatty.

ChainLinq isolates the collection (which it refers to as a Consumable) from the filter/transform (referred to as a Link). Consumable is defined as:

internal abstract class Consumable<T> : IEnumerable<T>
{
    public abstract Result Consume<Result>(Consumer<T, Result> consumer);

    public abstract IEnumerator<T> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

Which you can see is just an IEnumerable with an extra function Consume. The Consumer is an actually such as Aggregate or ToList.

Creating Consumable uses the following classes:

internal abstract class ConsumableForAddition<T> : Consumable<T>
{
    public abstract Consumable<U> AddTail<U>(ILink<T, U> transform);
}

abstract class ConsumableForMerging<T> : ConsumableForAddition<T>
{
    public abstract object TailLink { get; }
    public abstract Consumable<V> ReplaceTailLink<Unknown, V>(ILink<Unknown, V> newLink);
}

Where a ILink<,> is a function lijke Select or Where.

Anyway, I have implemented a lot of the Linq functionality using this framework, but I don't think I'll be able to do much over the next while (weeks?) so thought it might be interesting for people to play around with.

But I will try to create some simple performance examples is some follow up posts.

manofstick · 2019-02-03T05:43:48Z

@stephentoub

Given my inability to get any single person in the world excited by this, maybe I'll look to spinning it out into it's own nuget package - although keeping this alive with your test servers running is beneficial to its development...

Anyway, in my limited understanding of the MIT License which corefx is licensed under is that this should be OK? i.e. most of the code here is a rewrite to the new underlying architecture, but I would be copying the stubs for the Enumerable extension, some algorithmic stuff from GroupBy and OrderBy functions as well as the test suite.

So does that fly or not?

stephentoub · 2019-02-06T03:58:08Z

Anyway, in my limited understanding of the MIT License which corefx is licensed under is that this should be OK?

@richlander, can you comment on this?

dzmitry-lahoda · 2019-02-27T18:11:15Z

@manofstick , so your approach is one to one swap of LINQ so that all tests pass? Or do some breaking changes exist? It approach consistently better on all benchmarks? So some benchmarks will be considerably slower?

manofstick · 2019-02-27T19:01:36Z

@dzmitry-lahoda

There should be no breaking changes.

Until it's complete there will be some performance penalty for jumping between the difference implementations - it shouldn't be too bad. There will especially be a penalty as I gutted all the previous optimisations!

In a finished product this should be almost always more performant. One place where it wouldn't be would be where an aggregation function is used without any transforms (no where/select etc) and the underlying enumerable is not something that's optimised for (is not an array, list or output of this library).

richlander · 2019-03-03T17:38:25Z

@manofstick -- You can take parts of corefx with your implementation. No problem. If you do that, you should add a third party notice to your repo. Yours would have a single entry, for the corefx repo.

danmoseley · 2019-03-08T18:50:22Z

@manofstick given the discussion above, would it make sense to close this PR in favor of you creating a Nuget package (which you can certainly link to here)? Then later, when it is more mature and proven, we could possible continue the dicussion?

Thanks again for investing time into this.

manofstick · 2019-03-08T19:38:58Z

Spinning off to seperate project... (when I get time :-)

manofstick · 2019-08-06T09:51:28Z

For anyone interested (crickets!) I have finally started the process of spinning this out to is own project. Few new ideas too... Available here for anyone who wants to help!

manofstick added 30 commits December 14, 2018 16:42

Baseline for ChainLinq

ba8c492

Added Aggregate Consumers

a1e3d6b

Added Any/All

6ebc027

Added Average

23d49e4

Added Contains

cb73b9d

Added Distinct

26ae662

Gutted all the existing Select performance improvements

699973c

Added SelectMany

123629f

Added Range

f2e948e

Moved and cleaned SelectMany enumerators

4c84541

Added Count

0243653

Fix project file

afbe8af

Added Array handling

59e65d2

Added Concat

cad3a4d

Added Sum

eb80027

Consolidated enumerable consumption

85c0e9e

minor refactoring

056dd7f

Append/Prepend built on top of Concat

d4469bd

Added Where

bae05ed

Added Take

f35c7be

Added Skip

df06b1f

Removed tests where optimizations are not yet implemented

0683e64

Added extra layer to class hierachery for internal identification

2bd9dd6

Optimization on Range for Skip/Take

836d517

Fixed Take for Range

283234f

Avoid Composite object when Identity link

84f19d9

Optimization of Merging Skip links

7ad835a

Added Select merging

53f2238

Some Optimizations for Count

e9721db

Optimized Count for SelectMany

867e784

Fixed where SpeedOpt file was in project

abbc516

manofstick added 5 commits February 6, 2019 16:45

Generalized the IPipelineXXX optimization interfaces

9199068

Added a Set using default comparer

1ce670a

Added Except Link

0ef66a1

Slight cleanup of Except

222db93

Fix last checkin mistake

499f35e

manofstick mentioned this pull request Feb 10, 2019

Collaboration or ideas sharing... kevin-montrose/LinqAF#9

Open

manofstick added 7 commits February 11, 2019 16:59

Rearranged code for better perf

eb590a9

Allow section of arrays to be enumerated

f799fe0

Added a pooler for grouping

8a21094

Optimize the case where only a single value is stored in group

70687a1

Merge remote-tracking branch 'Microsoft/master' into ChainLinqFence

96ee05b

Fixes after merge

75e098c

More fixes from merge...

e1dfdee

karelz added the area-System.Linq label Mar 4, 2019

karelz assigned manofstick, jaredpar, 333fred and cston Mar 4, 2019

manofstick closed this Mar 8, 2019

karelz added this to the 3.0 milestone Mar 18, 2019

manofstick mentioned this pull request Nov 1, 2019

Special case strings in System.Linq.Enumerable.OrderBy #42286

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP ] System.Linq optimisations via architecture change #34208

[WIP ] System.Linq optimisations via architecture change #34208

manofstick commented Dec 21, 2018 •

edited

Loading

manofstick commented Feb 3, 2019

stephentoub commented Feb 6, 2019

dzmitry-lahoda commented Feb 27, 2019 •

edited

Loading

manofstick commented Feb 27, 2019

richlander commented Mar 3, 2019 •

edited

Loading

danmoseley commented Mar 8, 2019

manofstick commented Mar 8, 2019

manofstick commented Aug 6, 2019

[WIP ] System.Linq optimisations via architecture change #34208

[WIP ] System.Linq optimisations via architecture change #34208

Conversation

manofstick commented Dec 21, 2018 • edited Loading

manofstick commented Feb 3, 2019

stephentoub commented Feb 6, 2019

dzmitry-lahoda commented Feb 27, 2019 • edited Loading

manofstick commented Feb 27, 2019

richlander commented Mar 3, 2019 • edited Loading

danmoseley commented Mar 8, 2019

manofstick commented Mar 8, 2019

manofstick commented Aug 6, 2019

manofstick commented Dec 21, 2018 •

edited

Loading

dzmitry-lahoda commented Feb 27, 2019 •

edited

Loading

richlander commented Mar 3, 2019 •

edited

Loading