Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

[WIP ] System.Linq optimisations via architecture change #34208

Closed
wants to merge 100 commits into from

Conversation

manofstick
Copy link
Contributor

@manofstick manofstick commented Dec 21, 2018

(update: sorry if people misinterpreted my message here. This is a radical overhaul of the internal implementation of the System.Linq namespace. It is not complete yet, although I was hoping to spark interest and if someone was excited by the possibilities then maybe get some help finishing this prototype. I am under no illusions that such a radical change has only a small possibly of ultimately being merged, but feel that it may be possible. I feel that this is possible because the API guidelines for the rest of .net explicitly and recommend against use of Linq because even in its present state it is too garbage heavy - even when effort has been put in to minimise. So if we choose to just accept that Linq is garbage heavy and then look for alternative solutions then... Well, you have this...)

This is a replacement architecture for System.Linq.

This is a proposal/proof-of-concept - think of it as a Christmas gift - hack-upon it during the holidays to see what you think!

This is not complete - but as it is "plug compatible" existing implementations of IEnumerables work seemlessly (although those are areas where performance may not be optimial - note also that I have gutted all the previous performance enhancing code).

This relies heavily on generics and has thrown the thought of minimizing garbage out (i.e. things like the initial IEnumerator being the same object as the IEnumerable).

It is slower on smaller collections (up to maybe a few 10s of items? This is mainly due to the lookups in JIT_VirtualFunctionPointer due to the generics use) but faster on larger collections (where the JIT_VirtualFunctionPointer cost drops away). Where the existing System.Linq has a specialized optimization it is pretty hard to beat though. ChainLinq currently doesn't optimize the ElementAt, Last, etc. familiy of optimizations.

So there are plenty of reasons to be skeptical, but maybe, just maybe, it could see the light of day on day!

OK, some details.

So why is existing Linq performance bad and what are the current performance work around? This really boils down to two key ideas. The first is that in abstracting away from the underlying collection means that you loose the benefits benefits of how to access that collection, and secondly, the access through MoveNext and Current is quite chatty.

ChainLinq isolates the collection (which it refers to as a Consumable) from the filter/transform (referred to as a Link). Consumable is defined as:

internal abstract class Consumable<T> : IEnumerable<T>
{
    public abstract Result Consume<Result>(Consumer<T, Result> consumer);

    public abstract IEnumerator<T> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

Which you can see is just an IEnumerable with an extra function Consume. The Consumer is an actually such as Aggregate or ToList.

Creating Consumable uses the following classes:

internal abstract class ConsumableForAddition<T> : Consumable<T>
{
    public abstract Consumable<U> AddTail<U>(ILink<T, U> transform);
}

abstract class ConsumableForMerging<T> : ConsumableForAddition<T>
{
    public abstract object TailLink { get; }
    public abstract Consumable<V> ReplaceTailLink<Unknown, V>(ILink<Unknown, V> newLink);
}

Where a ILink<,> is a function lijke Select or Where.

Anyway, I have implemented a lot of the Linq functionality using this framework, but I don't think I'll be able to do much over the next while (weeks?) so thought it might be interesting for people to play around with.

But I will try to create some simple performance examples is some follow up posts.

@manofstick
Copy link
Contributor Author

@stephentoub

Given my inability to get any single person in the world excited by this, maybe I'll look to spinning it out into it's own nuget package - although keeping this alive with your test servers running is beneficial to its development...

Anyway, in my limited understanding of the MIT License which corefx is licensed under is that this should be OK? i.e. most of the code here is a rewrite to the new underlying architecture, but I would be copying the stubs for the Enumerable extension, some algorithmic stuff from GroupBy and OrderBy functions as well as the test suite.

So does that fly or not?

@stephentoub
Copy link
Member

Anyway, in my limited understanding of the MIT License which corefx is licensed under is that this should be OK?

@richlander, can you comment on this?

@dzmitry-lahoda
Copy link

dzmitry-lahoda commented Feb 27, 2019

@manofstick , so your approach is one to one swap of LINQ so that all tests pass? Or do some breaking changes exist? It approach consistently better on all benchmarks? So some benchmarks will be considerably slower?

@manofstick
Copy link
Contributor Author

@dzmitry-lahoda

There should be no breaking changes.

Until it's complete there will be some performance penalty for jumping between the difference implementations - it shouldn't be too bad. There will especially be a penalty as I gutted all the previous optimisations!

In a finished product this should be almost always more performant. One place where it wouldn't be would be where an aggregation function is used without any transforms (no where/select etc) and the underlying enumerable is not something that's optimised for (is not an array, list or output of this library).

@richlander
Copy link
Member

richlander commented Mar 3, 2019

@manofstick -- You can take parts of corefx with your implementation. No problem. If you do that, you should add a third party notice to your repo. Yours would have a single entry, for the corefx repo.

@danmoseley
Copy link
Member

@manofstick given the discussion above, would it make sense to close this PR in favor of you creating a Nuget package (which you can certainly link to here)? Then later, when it is more mature and proven, we could possible continue the dicussion?

Thanks again for investing time into this.

@manofstick
Copy link
Contributor Author

Spinning off to seperate project... (when I get time :-)

@manofstick manofstick closed this Mar 8, 2019
@karelz karelz added this to the 3.0 milestone Mar 18, 2019
@manofstick
Copy link
Contributor Author

For anyone interested (crickets!) I have finally started the process of spinning this out to is own project. Few new ideas too... Available here for anyone who wants to help!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Linq * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.