-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Linq performance improvement suggestions #14366
Comments
The underlying problem seems to be because You may want to mention this particular problem in issue #14352 |
Did you mean something like this: https://github.com/MarcinJuraszek/corefx/commit/efd50b9f56be93047f3b7b2a6929637b6b567f7d It's not really tested or anything, but that's the general idea how I see some improvements around LINQ queries that use It would be nice to have tests around System.Linq first before attempting to optimize some corner cases (and to make sure nothing breaks when doing the change). |
Exactly. But the implementation of |
Basing it on
I do agree that making some improvements how |
Yes, it would require a change in For
In issue dotnet/corefx#1187 was mentioned that it will be coming soon. So this is not a problem. |
I really like all the suggestions. I think this issue could be split into couple smaller ones, e.g. one for Also, whoever will end up working on any of them should also include unit tests around given method, to make sure it still works the way it's expected. Kind of TDD, where e.g. you write tests around existing |
Maybe good idea would be to just provide a few additional overloads that allow provide initial capacity:
|
From my point of view, the number of changes would not be very large, so I suggest not to split this issue into several smaller ones. But the tests (which definitely should be done) potentially will be much larger than modifications by themselves. So it would be a good idea to create a separate issue for them. In fact, I found that one already exists for the entire System.Linq (#1143). So I don't know, if it is better to create a new issue or just wait for completion of that one.
Yes, it would be great. From my calculations the gain should be 2x for simple transformation by |
@rosieks, good suggestion. But it would require a change in public API. |
@rosieks I think that library should be smart enough to use all possible information to make it more efficient, without forcing users to explicitly specify it. E.g. when input is @ikopylov Getting UT for the entire System.Linq will take a while as there is quite a bit of code to test. I think it would be better to split that effort into different parts of the library, so that it's more obvious what's already done, what's left, who's working on what. That would also make it easier for multiple people to prepare changes for different methods. I still think the same applies to perf improvements. Getting perf tests seems like a good work that could be totally separated from both UT and perf improvements. |
I've updated my fork to use |
Ok. Let's split this issue into several smaller ones.
@MarcinJuraszek, what do you think about that? |
Awesome. I have some thoughts on this task.
List<int> source = new List<int>() { 1, 2, 3 };
var seq = source.Select(o => o + 1);
var arr1 = seq.ToArray();
Assert.AreEqual(source.Count, arr1.Length);
source.Add(10);
var arr2 = seq.ToArray();
Assert.AreEqual(source.Count, arr2.Length);
|
I really like how you split the items into smaller pieces! I'd add 7. Add performance tests for System.Linq
|
Great.
Iteration should still be there, but it will be a fast for-cycle instead of heavy foreach-cycle. // 350ms (1000000 elements x 1000 times)
static int ForArray(int[] array)
{
int result = 0;
for (int i = 0; i < array.Length; i++)
result += array[i];
return result;
}
// 740ms (1000000 elements x 1000 times)
static int ForList(List<int> array)
{
int result = 0;
for (int i = 0; i < array.Count; i++)
result += array[i];
return result;
}
// 2205ms (1000000 elements x 1000 times)
static int ForEachList(List<int> array)
{
int result = 0;
foreach (int elem in array)
result += elem;
return result;
}
// 4918ms (1000000 elements x 1000 times)
static int ForEachEnumerable(IEnumerable<int> array)
{
int result = 0;
foreach (int elem in array)
result += elem;
return result;
} |
I think one issue with a list of tasks is OK. How about starting with perf tests, so we can actually see if implementing improvements like Also, would be nice to see someone from .NET team approving the plan and/or giving some comments about it. |
I have prepared the changes for ToArray, ToDictionary task: dotnet/corefx@master...ikopylov:linq_to_array_to_dictionary
I agree. I'll try to add some perf-tests. |
I've completed performance tests: dotnet/corefx@master...ikopylov:linq_perf_tests |
Didn't look deeply, but looks good after taking a quick look. I was thinking about making perf tests separate project instead of putting it in the UT project. I don't think it's worth running them as part of UT, I would rather have separate .exe I could run (outside of VS) that would print perf for all the operations. |
What about |
I checked how performance tests implemented in other projects and found that they all done in the same way.
That can easily be done for unit-tests by executing them with
Actually, I haven't found a way to run tests inside VS, so I always run them using the stated command.
Right. I've added them to the task list. |
Sounds good. I didn't actually look. What projects already have perf tests?
You can run tests in VS15 by setting test project as Startup Project and just running debug session. For perf tests it's important to compile using Release flavor and run them outside of devenv. Can you add a command that can be used to do that into your Pull Request + update first post in this issue when it's accepted? |
For example,
Steps are quite simple:
|
Actually - @VSadov, can you take a look at reviewing this? Thanks! |
Performance tests for System.Linq (issue #1182)
I really like the overall direction. We should do an API review for this. I've labelled the issue accordingly. |
What are the next steps? Should we just wait to hear back from the API Review Board? When should we expect that to happen? |
@MarcinJuraszek - for the API reviews, I believe @terrajobst conducts API reviews in the open for this on a weekly cadence. @terrajobst - do we have a more lightweight process since this doesn't change public surface area, but instead is suggesting for implementation optimizations for being aware of more knowledge-rich collections? |
If there are no APIs being added then it doesn't need to be API reviewed. Based on the task from the description I assumed that this will result in new APIs:
|
No, there is no API change. It will add new internal iterator classes to make publicly available API faster for some cases. |
While improving ToArray and ToDictionary methods I encountered an unexpected drawback of type conversion. As it turned out, casting to the generic interface with covariant or contravariant type parameter is a very slow operation. According to my tests it is 20 times slower than the cast to the interface (generic or non-generic) without covariance and contravariance. Unfortunately, The information that I have found by examining the source code of CoreCLR:
|
@terrajobst, public API will not be changed. All changes are internal to System.Linq. |
@ikopylov - since the optimization mostly relies on Count, perhaps specialcasing nongeneric ICollection is sufficient? I am not sure if we can rely on indexing (or copying) functionality when iterating something hat is not an array. There is one, fairly inconvenient from performance prospective contract, that many mutable collections implement - they do not allow mutations while iterating. On the other hand indexers are ok with mutations, so replacing one with another is observable. Not sure if this is a big enough change in behavior to be considered compat breaking. |
@terrajobst - yes, I believe there is no public API changes. However there could be subtle changes in behavior that might need to be called out.
There might still be enough questions for a small discussion even though there are no surface changes. |
I agree with that. In order not to break the behaviour we should use
This is the problem of the library which contains such ill-implemented collections. I've updated a task list slightly to reflect last changes with |
I've just sent a Pull Request for |
We reviewed this issue today. We don't believe it's ready yet. Please the notes for more details. |
We reviewed this item and concluded that it should be split. @stephentoub, could you take a stab at this? |
This wasn't really a reviewable item, as there wasn't any concrete API being proposed, as discussed earlier in the thread ;) It's a set of proposed improvements to performance in existing implementation rather than new API. Through this thread as well as a bunch of PRs that have happened in the interim, we did agree that we don't want to expose existing collection interfaces out of LINQ, e.g. the object returned by At this point, given the original tasks called out on this thread, it looks like the only concrete remaining items would be around specializing methods like ToDictionary to have more knowledge of the original source. I suggest that if someone wants to implement such optimizations and do detailed measurements, such optimizations with associated measurements could be submitted as individual PRs, and an issue like this isn't needed to track them. So I'm going to leave this closed and not open new issues. Obviously folks should feel free to open subsequent issues / PRs for individual topics to be discussed / solutions implemented. Thanks for the good discussion! |
With Linq-to-Objects it is quite common practice to perform a series of transformations, and then materialize sequence to a concrete collection type by calling ToArray(), ToList(), ToDictionary(). These operations would work much faster if they knew the number of elements in the sequence.
Currently System.Linq.Enumerable has special treatment for ICollection interface only.
I suppose that additional support for IReadOnlyCollection can improve performance in some cases, because through it we can figure out the total number of elements.
Another problem with System.Linq is that in many cases the information about the number of elements is lost. One of the most common example:
Obviously Select does not change the number of elements in the sequence, but ToList method cannot take advantage of that. Information is lost.
In this particular scenario, it would be great for Select to return some SelectIterator instance, which implements IReadOnlyCollection, and thereby passes the number of elements to the subsequent methods.
Steps to measure performance gain:
[Fact]
attribute above that method;bin\tests\Windows_NT.AnyCPU.Release\System.Linq.Tests\aspnetcore50\
;CoreRun.exe xunit.console.netcore.exe System.Linq.Tests.dll -parallel none -trait "Perf=true"
;Casting to
IReadOnlyCollection<T>
is a slow operation so it is not a good idea to check if this interface implemented. The performance drop will be most noticeable on the small collections.Tasks:
Select
: add iterators forList<T>
,Array[T]
,ICollection<T>
;ToArray
,ToDictionary
: add special support forICollection<T>
(and, very carefully, forIReadOnlyCollection<T>
) to get the initial capacity;ToList
(???): add special support forIReadOnlyCollection<T>
to get the initial capacity (separated as it will affect System.Collections.Generic);OrderBy(Descending)/ThanBy(Descending)
: implement special iterator forICollection<T>
to propagateCount
;Cast
,Reverse
: add iterators forICollection<T>
to propagateCount
;Range
,Repeat
: add an iterator that implementsICollection<T>
;Skip
,Take
: add an iterator that handleICollection<T>
;The text was updated successfully, but these errors were encountered: