Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Use Count in Enumerable.Any if available #40377

Merged
merged 2 commits into from
Aug 17, 2019

Conversation

stephentoub
Copy link
Member

We've been hesitant to make this change in the past, as it adds several interface checks which do show up in microbenchmarks (as is evidenced below).

However, wide-spread "wisdom" is that Any() is as fast or faster than Count() > 0, and there are even FxCop rules/analyzers that warn about using the latter instead of the former, but in its current form that can frequently be incorrect: if the source does implement ICollection<T>, generally its Count is O(1) and allocation-free, whereas Any() will almost always end up allocating an enumerator.

On balance, it seems better to just have Any() map closely to Count() so that their performance can be reasoned about in parallel. I'd like a second opinion, though. @cston? @ahsonkhan? @bartonjs?

using System.Collections.Generic;
using System.Linq;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser]
public class Program
{
    public static void Main(string[] args) => BenchmarkSwitcher.FromTypes(new[] { typeof(Program) }).Run(args);

    private static IEnumerable<int> Iterator() { yield return 1; }

    public IEnumerable<object[]> Sources()
    {
        yield return new object[] { "Empty", Enumerable.Empty<int>() };
        yield return new object[] { "Range", Enumerable.Range(0, 10) };
        yield return new object[] { "List", new List<int>() { 1, 2, 3 } };
        yield return new object[] { "int[]", new int[] { 1, 2, 3 } };
        yield return new object[] { "int[].Select", new int[] { 1, 2, 3 }.Select(i => i) };
        yield return new object[] { "int[].Select.Where", new int[] { 1, 2, 3 }.Select(i => i).Where(i => i % 2 == 0) };
        yield return new object[] { "Iterator", Iterator() };
        yield return new object[] { "Iterator.Select", Iterator().Select(i => i) };
        yield return new object[] { "Iterator.Select.Where", Iterator().Select(i => i).Where(i => i % 2 == 0) };
    }

    [Benchmark]
    [ArgumentsSource(nameof(Sources))]
    public void Any(string name, object source) => Unsafe.As<IEnumerable<int>>(source).Any();
}

produces:

Method Toolchain name source Mean Allocated
Any New Empty Syste(...)nt32] [42] 6.966 ns -
Any Old Empty Syste(...)nt32] [42] 5.421 ns -
Any New Iterator Progr(...)>d__1 [22] 20.192 ns 32 B
Any Old Iterator Progr(...)>d__1 [22] 13.645 ns 32 B
Any New Iterator.Select Syste(...)nt32] [76] 42.764 ns 88 B
Any Old Iterator.Select Syste(...)nt32] [76] 35.661 ns 88 B
Any New Itera(...)Where [21] Syste(...)nt32] [62] 74.852 ns 144 B
Any Old Itera(...)Where [21] Syste(...)nt32] [62] 65.916 ns 144 B
Any New List Syste(...)nt32] [47] 3.979 ns -
Any Old List Syste(...)nt32] [47] 12.500 ns 40 B
Any New Range Syste(...)rator [36] 7.972 ns -
Any Old Range Syste(...)rator [36] 15.880 ns 40 B
Any New int[] System.Int32[] 11.606 ns -
Any Old int[] System.Int32[] 9.594 ns 32 B
Any New int[].Select Syste(...)nt32] [71] 8.505 ns -
Any Old int[].Select Syste(...)nt32] [71] 19.888 ns 48 B
Any New int[].Select.Where Syste(...)nt32] [62] 59.662 ns 104 B
Any Old int[].Select.Where Syste(...)nt32] [62] 48.749 ns 104 B

ps @adamsitnik, I could not figure out how to get the benchmark to take an IEnumerable<int>; everything I tried resulted in errors like error CS0266: Cannot implicitly convert type 'object' to 'System.Collections.Generic.IEnumerable<int>'. This is with benchmarkdotnet 11.5.

Copy link
Member

@bartonjs bartonjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The perf losses seem low, particularly when considered long-term (zero allocation means less GC noise, so the few ns spent in the if might still be better).

Perhaps it's also better to let the compiler further optimize things for people?

public static bool Any<T>(this ICollection<T> source);
public static int Count<T>(this ICollection<T> source);
...

?

@stephentoub
Copy link
Member Author

Perhaps it's also better to let the compiler further optimize things for people?

https://github.com/dotnet/corefx/issues/7580

@stephentoub
Copy link
Member Author

The perf losses seem low, particularly when considered long-term (zero allocation means less GC noise, so the few ns spent in the if might still be better).

Ok, thanks for weighing in.

@ahsonkhan
Copy link
Member

ahsonkhan commented Aug 16, 2019

The perf losses seem low, particularly when considered long-term (zero allocation means less GC noise, so the few ns spent in the if might still be better).

The usages that went down to zero alloc, generally also improved in runtime perf (within the microbenchmark results shared).
The usages that got slower don't benefit from the zero allocation (the allocations are the same), so there are no long-term savings to offset that.

However, I don't know how heavily Iterators usage is, so the small regression seems fine.

Edit: I just realized Iterator is just an IEnumerable<int> and not a type, so ignore that.

On balance, it seems better to just have Any() map closely to Count() so that their performance can be reasoned about in parallel.

In that case, why not go all in and implement Any() in terms of Count(), or are the savings from not using GetCount (for IIListProvider) when it isn't "cheap", worth the custom implementation?

@stephentoub
Copy link
Member Author

why not go all in and implement Any() in terms of Count()

Because that makes Any O(N) instead of O(1) when none of the interface are implemented, a common case.

Copy link
Member

@ahsonkhan ahsonkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ahsonkhan
Copy link
Member

ahsonkhan commented Aug 17, 2019

Because that makes Any O(N) instead of O(1) when none of the interface are implemented, a common case.

Ah, right, missed the iterator loop at the end (one iteration for any vs n for count).

We've been hesitant to make this change in the past, as it adds several interface checks.  However, wide-spread "wisdom" is that `Any()` is as fast or faster than `Count() > 0`, and there are even FxCop rules/analyzers that warn about using the latter instead of the former, but in its current form that can frequently be incorrect: if the source does implement `ICollection<T>`, generally its `Count` is O(1) and allocation-free, whereas `Any()` will almost always end up allocating an enumerator.  On balance, it seems better to just have `Any()` map closely to `Count()` so that their performance can be reasoned about in parallel.
@adamsitnik
Copy link
Member

I could not figure out how to get the benchmark to take an IEnumerable<int>

I've fixed that in dotnet/BenchmarkDotNet#1228

@stephentoub
Copy link
Member Author

I've fixed that

Thanks, @adamsitnik.

@karelz karelz added this to the 5.0 milestone Dec 19, 2019
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Use Count in Enumerable.Any if available

We've been hesitant to make this change in the past, as it adds several interface checks.  However, wide-spread "wisdom" is that `Any()` is as fast or faster than `Count() > 0`, and there are even FxCop rules/analyzers that warn about using the latter instead of the former, but in its current form that can frequently be incorrect: if the source does implement `ICollection<T>`, generally its `Count` is O(1) and allocation-free, whereas `Any()` will almost always end up allocating an enumerator.  On balance, it seems better to just have `Any()` map closely to `Count()` so that their performance can be reasoned about in parallel.

* Add test coverage for Enumerable.Any


Commit migrated from dotnet/corefx@9021bc1
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
7 participants