-
Notifications
You must be signed in to change notification settings - Fork 4.9k
[WIP ] System.Linq optimisations via architecture change #34208
Conversation
Given my inability to get any single person in the world excited by this, maybe I'll look to spinning it out into it's own nuget package - although keeping this alive with your test servers running is beneficial to its development... Anyway, in my limited understanding of the MIT License which corefx is licensed under is that this should be OK? i.e. most of the code here is a rewrite to the new underlying architecture, but I would be copying the stubs for the Enumerable extension, some algorithmic stuff from So does that fly or not? |
@richlander, can you comment on this? |
@manofstick , so your approach is one to one swap of LINQ so that all tests pass? Or do some breaking changes exist? It approach consistently better on all benchmarks? So some benchmarks will be considerably slower? |
There should be no breaking changes. Until it's complete there will be some performance penalty for jumping between the difference implementations - it shouldn't be too bad. There will especially be a penalty as I gutted all the previous optimisations! In a finished product this should be almost always more performant. One place where it wouldn't be would be where an aggregation function is used without any transforms (no where/select etc) and the underlying enumerable is not something that's optimised for (is not an array, list or output of this library). |
@manofstick -- You can take parts of corefx with your implementation. No problem. If you do that, you should add a third party notice to your repo. Yours would have a single entry, for the corefx repo. |
@manofstick given the discussion above, would it make sense to close this PR in favor of you creating a Nuget package (which you can certainly link to here)? Then later, when it is more mature and proven, we could possible continue the dicussion? Thanks again for investing time into this. |
Spinning off to seperate project... (when I get time :-) |
For anyone interested (crickets!) I have finally started the process of spinning this out to is own project. Few new ideas too... Available here for anyone who wants to help! |
(update: sorry if people misinterpreted my message here. This is a radical overhaul of the internal implementation of the System.Linq namespace. It is not complete yet, although I was hoping to spark interest and if someone was excited by the possibilities then maybe get some help finishing this prototype. I am under no illusions that such a radical change has only a small possibly of ultimately being merged, but feel that it may be possible. I feel that this is possible because the API guidelines for the rest of .net explicitly and recommend against use of Linq because even in its present state it is too garbage heavy - even when effort has been put in to minimise. So if we choose to just accept that Linq is garbage heavy and then look for alternative solutions then... Well, you have this...)
This is a replacement architecture for
System.Linq
.This is a proposal/proof-of-concept - think of it as a Christmas gift - hack-upon it during the holidays to see what you think!
This is not complete - but as it is "plug compatible" existing implementations of
IEnumerable
s work seemlessly (although those are areas where performance may not be optimial - note also that I have gutted all the previous performance enhancing code).This relies heavily on generics and has thrown the thought of minimizing garbage out (i.e. things like the initial
IEnumerator
being the same object as theIEnumerable
).It is slower on smaller collections (up to maybe a few 10s of items? This is mainly due to the lookups in
JIT_VirtualFunctionPointer
due to the generics use) but faster on larger collections (where the JIT_VirtualFunctionPointer cost drops away). Where the existing System.Linq has a specialized optimization it is pretty hard to beat though. ChainLinq currently doesn't optimize the ElementAt, Last, etc. familiy of optimizations.So there are plenty of reasons to be skeptical, but maybe, just maybe, it could see the light of day on day!
OK, some details.
So why is existing Linq performance bad and what are the current performance work around? This really boils down to two key ideas. The first is that in abstracting away from the underlying collection means that you loose the benefits benefits of how to access that collection, and secondly, the access through
MoveNext
andCurrent
is quite chatty.ChainLinq
isolates the collection (which it refers to as aConsumable
) from the filter/transform (referred to as aLink
).Consumable
is defined as:Which you can see is just an
IEnumerable
with an extra functionConsume
. TheConsumer
is an actually such asAggregate
orToList
.Creating Consumable uses the following classes:
Where a
ILink<,>
is a function lijkeSelect
orWhere
.Anyway, I have implemented a lot of the Linq functionality using this framework, but I don't think I'll be able to do much over the next while (weeks?) so thought it might be interesting for people to play around with.
But I will try to create some simple performance examples is some follow up posts.