-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Enumerable.Any #23397
Comments
Performance measurements > intuition. |
I could swear I've seen @JonHanna add this optimization at some point... |
Agree. I've updated my issue, although as not as thorough it would have to be but at least I threw in some code. |
How much does the performance decrease for types that aren't |
@svick added another case per your request. |
You may remember me discussing investigations of it when someone else suggested it while I was doing similar types of optimisations. All of this class of optimisation adds cost to the case where the optimisation isn't available, since we spend a small amount of time testing if we can take the optimal path, and then find we cannot. In most of these cases that's worth it because the gain, when we can take it, is large, especially if it changes order of operation from O(n) to O(1). Here though it's only able to change a O(1) operation to an O(1) operation with better constant costs, so it's not as big a win for the potential cost. It's also not always a win at all. Arrays are faster to do the |
According to this, it's slower to run some code than to do two attempted casts and then run the exact same code. I think it's reasonable to have some scepticism here. |
Yeah, so I was wondering, what could be the explanation to that? |
It's certainly intriguing. I wonder if the branch prediction is benefiting the same data hitting it repeatedly more in the count-using case than the other. That would be annoying because it wouldn't be as likely to benefit real cases (I'll happily take a weird counter-intuitive performance benefit if I've reason it would actually still hold in real cases). I also note that there's not false case hit in the tests, which would also be worth considering (some enumerators can have internally different paths for empty collections). |
@weitzhandler Microbenchmarks are prone to various issues. Instead of trying to figure out if your code suffers from one of those, I think it would be simpler to use BenchmarkDotNet. It's fairly easy to use and makes sure you're measuring what you want to measure. |
@svick Changing to: static (TimeSpan, TimeSpan) GetResults<T>(IEnumerable<T> collection)
{
var iterationStopwatch = new Stopwatch();
var countStopwatch = new Stopwatch();
var max = 1000000;
bool any;
iterationStopwatch.Start();
for (int i = 0; i <= max; i++)
{
Write($"\riteration: {i:########}/{max:########}{new string(' ', 20)}");
any = AnyIteration(collection);
}
iterationStopwatch.Stop();
countStopwatch.Start();
for (int i = 0; i <= max; i++)
{
Write($"\rcount: {i:########}/{max:########}{new string(' ', 20)}");
any = AnyCount(collection);
}
countStopwatch.Stop();
return (iterationStopwatch.Elapsed, countStopwatch.Elapsed);
} resulted in:
I'm closing this issue then, because the performance is subjective. At least it's here for further reference. |
Incidentally, where did you get that "current code" from. That looks quite a bit out of date. |
Current code in master for reference: |
|
I can't say enough good things about BenchmarkDotNet. You gotta try it out! |
@Clockwork-Muse you were right. list:
iteration: 00:00:02.1775856
count: 00:00:01.3231993
***************************
array:
iteration: 00:00:02.1587138
count: 00:00:02.4961830
***************************
hashSets:
iteration: 00:00:04.0008384
count: 00:00:01.3617311
***************************
arrayClients:
iteration: 00:00:02.3180954
count: 00:00:02.6931834
***************************
enumerable:
iteration: 00:00:04.2760573
count: 00:00:05.2728051
*************************** |
But it is often very different to corefx. Linq would be an example of where corefx is particularly different to reference source. |
@JonHanna They are identical (browser vs. GitHub), maybe not the very latest source, but it's pretty synced and is usually enough to what I need, especially given the responsive and quick code search option. The code you see in my original post I shrinked for brevity. |
The code in the first post seems to be from the netfx one at http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,1165 It's close enough in this case, but no in many others within |
@JonHanna I don't get it, in what is it different than https://source.dot.net/#System.Linq/System/Linq/AnyAll.cs,11, other than using a literal in the exception? |
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Hi,
Currently, the
Enumerable.Any
creates an iterator and attempts to iterate once to obtain the result.Wouldn't it be faster (if we have a heavy resource collection) to first attempt getting the collection size if available?
Current code:
Proposed code:
Here's a sloppy test case:
Results:
As you can see, iteration is always way more expensive.
Related: https://github.com/dotnet/corefx/issues/23578
The text was updated successfully, but these errors were encountered: