Optimize Enumerable.Any #23397

weitzhandler · 2017-08-31T05:06:10Z

Hi,

Currently, the Enumerable.Any creates an iterator and attempts to iterate once to obtain the result.
Wouldn't it be faster (if we have a heavy resource collection) to first attempt getting the collection size if available?

Current code:

public static bool Any<TSource>(this IEnumerable<TSource> source) {
    if (source == null) throw Error.ArgumentNull("source");
    using (IEnumerator<TSource> e = source.GetEnumerator()) {
        if (e.MoveNext()) return true;
    }
    return false;
}

Proposed code:

public static bool Any<TSource>(this IEnumerable<TSource> source) {
     if (source == null) throw Error.ArgumentNull("source");
     if (source is ICollection<T> collectionT) return collectionT.Count > 0;
     if (source is ICollection collection) return collection.Count > 0;
     using (IEnumerator<TSource> e = source.GetEnumerator()) {
         return e.MoveNext()
    }
 }

Here's a sloppy test case:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Net.Http;
using static System.Console;

static class Program
{
  static void Main()
  {
    var list = Enumerable.Range(0, short.MaxValue).ToList();
    var array = new int[short.MaxValue];
    
    new HttpClient(); //is first initialization more expensive?
    var hashSets = 
      new HashSet<HttpClient>(Enumerable.Range(1, short.MaxValue).Select(i => new HttpClient()));
    var arrayClients =
      Enumerable.Range(1, short.MaxValue).Select(i => new HttpClient()).ToArray();

    IEnumerable<int> enumerate()
    {
      for (int i = 0; i < short.MaxValue; i++)
        yield return i;
    }                 
    var enumerable = enumerate();

    PrintResult(nameof(list), GetResults(list));
    PrintResult(nameof(array), GetResults(array));
    PrintResult(nameof(hashSets), GetResults(hashSets));
    PrintResult(nameof(arrayClients), GetResults(arrayClients));
    PrintResult(nameof(enumerable), GetResults(enumerable));

    ReadLine();
  }

  static void PrintResult(string type, (TimeSpan, TimeSpan) results)
  {
    WriteLine();
    WriteLine($"{type}:");
    WriteLine($"iteration: {results.Item1}");
    WriteLine($"count:     {results.Item2}");
    WriteLine(new string('*', 27));
  }

  static (TimeSpan, TimeSpan) GetResults<T>(IEnumerable<T> collection)
  {

    var iterationStopwatch = new Stopwatch();
    var countStopwatch = new Stopwatch();

    var max = 1000000;

    bool any;
    for (int i = 0; i <= max; i++)
    {
      Write($"\riteration: {i:########}/{max:########}");

      iterationStopwatch.Start();
      any = AnyIteration(collection);
      iterationStopwatch.Stop();

      countStopwatch.Start();
      any = AnyCount(collection);
      countStopwatch.Stop();
    }

    return (iterationStopwatch.Elapsed, countStopwatch.Elapsed);
  }

  static bool AnyCount<TSource>(IEnumerable<TSource> source)
  {
    if (source is ICollection<TSource> collectionT) return collectionT.Count > 0;
    if (source is ICollection collection) return collection.Count > 0;
    using (IEnumerator<TSource> e = source.GetEnumerator())
      if (e.MoveNext()) return true;
    return false;
  }

  static bool AnyIteration<TSource>(IEnumerable<TSource> source)
  {
    using (IEnumerator<TSource> e = source.GetEnumerator())
      if (e.MoveNext()) return true;
    return false;
  }                                                         
}

Results:

iteration: 1000000/1000000
list:
iteration: 00:00:00.1867072
count:     00:00:00.0511948
***************************
iteration: 1000000/1000000
array:
iteration: 00:00:00.1680355
count:     00:00:00.1238209
***************************
iteration: 1000000/1000000
hashSets:
iteration: 00:00:00.1696227
count:     00:00:00.0413590
***************************
iteration: 1000000/1000000
arrayClients:
iteration: 00:00:00.1396707
count:     00:00:00.1151786
***************************
iteration: 1000000/1000000
enumerable:
iteration: 00:00:00.1782133
count:     00:00:00.0876572
***************************

As you can see, iteration is always way more expensive.

Related: https://github.com/dotnet/corefx/issues/23578

The text was updated successfully, but these errors were encountered:

jnm2 · 2017-08-31T07:03:47Z

Performance measurements > intuition.

khellang · 2017-08-31T07:53:40Z

I could swear I've seen @JonHanna add this optimization at some point...

weitzhandler · 2017-08-31T08:05:57Z

@jnm2

Performance measurements > intuition.

Agree. I've updated my issue, although as not as thorough it would have to be but at least I threw in some code.

svick · 2017-08-31T08:27:37Z

How much does the performance decrease for types that aren't ICollections, like yield return enumerables?

weitzhandler · 2017-08-31T08:40:15Z

@svick added another case per your request.

Fixes #23700

JonHanna · 2017-08-31T09:08:32Z

I could swear I've seen @JonHanna add this optimization at some point...

You may remember me discussing investigations of it when someone else suggested it while I was doing similar types of optimisations.

All of this class of optimisation adds cost to the case where the optimisation isn't available, since we spend a small amount of time testing if we can take the optimal path, and then find we cannot. In most of these cases that's worth it because the gain, when we can take it, is large, especially if it changes order of operation from O(n) to O(1). Here though it's only able to change a O(1) operation to an O(1) operation with better constant costs, so it's not as big a win for the potential cost.

It's also not always a win at all. Arrays are faster to do the GetEnumerator(), MoveNext() and Dispose() calls on than to cast to an interface and then call Count on through that interface. T[] will in fact lose about as much as List<T> will gain if this optimisation is used, other types will generally gain, but by different amounts. The difference is greater still if its an empty array since that case already has an optimisation in its enumerator (generally not a case we really worry about when looking at the gains or losses here, but given the point of Any() worth considering) and greater again if the array is being dealt with covariantly (we have a string[] that was cast to IEnumerable<object>, etc.)

JonHanna · 2017-08-31T09:11:23Z

***************************
iteration: 1000000/1000000
enumerable:
iteration: 00:00:00.1782133
count:     00:00:00.0876572
***************************

According to this, it's slower to run some code than to do two attempted casts and then run the exact same code. I think it's reasonable to have some scepticism here.

weitzhandler · 2017-08-31T09:21:05Z

@JonHanna

According to this, it's slower to run some code than to do two attempted casts and then run the exact same code. I think it's reasonable to have some scepticism here.

Yeah, so I was wondering, what could be the explanation to that?

JonHanna · 2017-08-31T09:30:59Z

It's certainly intriguing. I wonder if the branch prediction is benefiting the same data hitting it repeatedly more in the count-using case than the other. That would be annoying because it wouldn't be as likely to benefit real cases (I'll happily take a weird counter-intuitive performance benefit if I've reason it would actually still hold in real cases).

I also note that there's not false case hit in the tests, which would also be worth considering (some enumerators can have internally different paths for empty collections).

svick · 2017-08-31T09:35:20Z

@weitzhandler Microbenchmarks are prone to various issues. Instead of trying to figure out if your code suffers from one of those, I think it would be simpler to use BenchmarkDotNet. It's fairly easy to use and makes sure you're measuring what you want to measure.

weitzhandler · 2017-08-31T09:55:21Z

@svick
I was wrong. It was the microbenchmarks indeed.

Changing to:

static (TimeSpan, TimeSpan) GetResults<T>(IEnumerable<T> collection)
{
  var iterationStopwatch = new Stopwatch();
  var countStopwatch = new Stopwatch();

  var max = 1000000;
  bool any;

  iterationStopwatch.Start();
  for (int i = 0; i <= max; i++)
  {
    Write($"\riteration: {i:########}/{max:########}{new string(' ', 20)}");
    any = AnyIteration(collection);
  }
  iterationStopwatch.Stop();

  countStopwatch.Start();
  for (int i = 0; i <= max; i++)
  {
    Write($"\rcount: {i:########}/{max:########}{new string(' ', 20)}");
    any = AnyCount(collection);
  }
  countStopwatch.Stop();

  return (iterationStopwatch.Elapsed, countStopwatch.Elapsed);
}

resulted in:

count: 1000000/1000000
list:
iteration: 00:00:08.5466694
count:     00:00:08.8490420
***************************
count: 1000000/1000000
array:
iteration: 00:00:08.3404633
count:     00:00:08.2994561
***************************
count: 1000000/1000000
hashSets:
iteration: 00:00:08.3081030
count:     00:00:08.2321981
***************************
count: 1000000/1000000
arrayClients:
iteration: 00:00:08.2333401
count:     00:00:08.2799143
***************************
count: 1000000/1000000
enumerable:
iteration: 00:00:08.2697450
count:     00:00:08.7120273
***************************

I'm closing this issue then, because the performance is subjective.
It's slightly faster on arrays and hashsets but much slower on lists and enumerables. And again, this was a mock up test nothing really intense.

At least it's here for further reference.

JonHanna · 2017-08-31T13:18:56Z

Incidentally, where did you get that "current code" from. That looks quite a bit out of date.

jnm2 · 2017-08-31T13:22:27Z

Current code in master for reference:

https://github.com/dotnet/corefx/blob/53ab4f27097da80a3a358389fbda3393f4fbcc7c/src/System.Linq/src/System/Linq/AnyAll.cs#L11-L22

Clockwork-Muse · 2017-08-31T16:09:44Z

Write($"\riteration: {i:########}/{max:########}{new string(' ', 20)}");
This is almost certainly distorting your results in the updated version, since you're doing it every single iteration, irrespective of any other problems. Try to do nothing except the operation under consideration in performance tests.

jnm2 · 2017-08-31T16:18:09Z

I can't say enough good things about BenchmarkDotNet. You gotta try it out!

weitzhandler · 2017-09-01T08:09:48Z

@JonHanna I took it from here. I like the reference source because easy and fast browsing and searching of types.

weitzhandler · 2017-09-01T08:13:02Z

@Clockwork-Muse you were right.
Here are the results with a plain loop (max = 100000000):

list:
iteration: 00:00:02.1775856
count:     00:00:01.3231993
***************************

array:
iteration: 00:00:02.1587138
count:     00:00:02.4961830
***************************

hashSets:
iteration: 00:00:04.0008384
count:     00:00:01.3617311
***************************

arrayClients:
iteration: 00:00:02.3180954
count:     00:00:02.6931834
***************************

enumerable:
iteration: 00:00:04.2760573
count:     00:00:05.2728051
***************************

JonHanna · 2017-09-01T08:54:15Z

@weitzhandler

I like the reference source because easy and fast browsing and searching of types.

But it is often very different to corefx. Linq would be an example of where corefx is particularly different to reference source.

weitzhandler · 2017-09-01T09:19:09Z

@JonHanna
There are two reference sources, one for the .NET Framework, and the other for .NET Core.

They are identical (browser vs. GitHub), maybe not the very latest source, but it's pretty synced and is usually enough to what I need, especially given the responsive and quick code search option.

The code you see in my original post I shrinked for brevity.

JonHanna · 2017-09-01T09:22:27Z

The code in the first post seems to be from the netfx one at http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,1165 It's close enough in this case, but no in many others within Enumerable

weitzhandler · 2017-09-01T10:05:38Z

@JonHanna I don't get it, in what is it different than https://source.dot.net/#System.Linq/System/Linq/AnyAll.cs,11, other than using a literal in the exception?

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

weitzhandler referenced this issue in weitzhandler/corefx Aug 31, 2017

Optimizations to Enumerable.Any

371ad53

Fixes #23700

weitzhandler closed this as completed Aug 31, 2017

Dotnet-GitSync-Bot referenced this issue in Dotnet-GitSync-Bot/corefx Apr 4, 2019

Don't null out buffer in MemoryStream on Capacity reset (#23700)

a591af4

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

danmoseley referenced this issue in dotnet/corefx Apr 4, 2019

Don't null out buffer in MemoryStream on Capacity reset (#23700)

eeb61c3

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 2.1.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Enumerable.Any #23397

Optimize Enumerable.Any #23397

weitzhandler commented Aug 31, 2017

jnm2 commented Aug 31, 2017

khellang commented Aug 31, 2017

weitzhandler commented Aug 31, 2017

svick commented Aug 31, 2017

weitzhandler commented Aug 31, 2017

JonHanna commented Aug 31, 2017

JonHanna commented Aug 31, 2017

weitzhandler commented Aug 31, 2017

JonHanna commented Aug 31, 2017

svick commented Aug 31, 2017

weitzhandler commented Aug 31, 2017 •

edited

Loading

JonHanna commented Aug 31, 2017

jnm2 commented Aug 31, 2017 •

edited

Loading

Clockwork-Muse commented Aug 31, 2017

jnm2 commented Aug 31, 2017

weitzhandler commented Sep 1, 2017

weitzhandler commented Sep 1, 2017 •

edited

Loading

JonHanna commented Sep 1, 2017

weitzhandler commented Sep 1, 2017

JonHanna commented Sep 1, 2017

weitzhandler commented Sep 1, 2017 •

edited

Loading

Optimize Enumerable.Any #23397

Optimize Enumerable.Any #23397

Comments

weitzhandler commented Aug 31, 2017

jnm2 commented Aug 31, 2017

khellang commented Aug 31, 2017

weitzhandler commented Aug 31, 2017

svick commented Aug 31, 2017

weitzhandler commented Aug 31, 2017

JonHanna commented Aug 31, 2017

JonHanna commented Aug 31, 2017

weitzhandler commented Aug 31, 2017

JonHanna commented Aug 31, 2017

svick commented Aug 31, 2017

weitzhandler commented Aug 31, 2017 • edited Loading

JonHanna commented Aug 31, 2017

jnm2 commented Aug 31, 2017 • edited Loading

Clockwork-Muse commented Aug 31, 2017

jnm2 commented Aug 31, 2017

weitzhandler commented Sep 1, 2017

weitzhandler commented Sep 1, 2017 • edited Loading

JonHanna commented Sep 1, 2017

weitzhandler commented Sep 1, 2017

JonHanna commented Sep 1, 2017

weitzhandler commented Sep 1, 2017 • edited Loading

weitzhandler commented Aug 31, 2017 •

edited

Loading

jnm2 commented Aug 31, 2017 •

edited

Loading

weitzhandler commented Sep 1, 2017 •

edited

Loading

weitzhandler commented Sep 1, 2017 •

edited

Loading