Avoid enumeration allocation via interface #1579

Closed
benaadams opened this Issue Sep 17, 2015 · 16 comments

Projects

None yet

4 participants

@benaadams
Contributor

Can the Jitter/runtime resolve down one level when calling foreach on an interface where a strongly typed enumerator exists? Or otherwise prevent boxing; can always just be for the foreach case.

There may need to be a pattern; for example the Interface enumerator returning result of call to strongly typed Enumerator.

Enumerating Dictionary doesn't allocate; however casting to an IDictionary then enumerating allocates. Shouldn't it follow the same code path as Dictionary?

var dict = new Dictionary<string, string>();
var idict = (IDictionary<string, string>)dict;

for (var i = 0; i < 1000000; i++)
{
    // Doesn't allocate
    foreach (var item in dict)
    {
        ;
    }
}

for (var i = 0; i < 1000000; i++)
{
    // Allocates 998k
    foreach (var item in idict)
    {
        ;
    }
}

allocs

@stephentoub
Member

Shouldn't it follow the same code path as Dictionary?

It does: the same Enumerator is returned. But when accessed via the interface, you're using the interface method that's typed to return IEnumerator<KeyValuePair<TKey,TValue>> rather than Dictionary<TKey, TValue>.Enumerator, so the struct gets boxed.

https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Collections/Generic/Dictionary.cs#L269-L271

@benaadams
Contributor

IEnumerator<KeyValuePair<TKey, TValue>> IEnumerable<KeyValuePair<TKey, TValue>>.GetEnumerator() boxing should effect Dictionary equally?

Wouldn't it be IDictionaryEnumerator IDictionary.GetEnumerator() on this line? https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Collections/Generic/Dictionary.cs#L682

@benaadams
Contributor

Ah, that's non-generic

@stephentoub
Member

boxing should effect Dictionary equally?

No, it has a strongly-typed struct Enumerator that's returned from its GetEnumerator() method:
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Collections/Generic/Dictionary.cs#L265-L267

foreach binds to this pattern more tightly than it does to the interface implementation.

@benaadams
Contributor

Obviously is been this way for years; but could there any potential work around similar to the one for arrays?

var array = new string[0];
var iList = (IList<string>)array;

for (var i = 0; i < 1000000; i++)
{
    // Doesn't allocate
    foreach (var item in array)
    {
        ;
    }
}

for (var i = 0; i < 1000000; i++)
{
    // Doesn't allocate
    foreach (var item in iList)
    {
        ;
    }
}

dsd

Doesn't alloc obv in picture (comment is wrong)

@stephentoub
Member

Are you specifically concerned about the case of iterating over an empty Dictionary via the interface? That's the only one that array optimizes in this way.

@benaadams
Contributor

Ah yeah :( For reference:

array-enumerator-allocating

Was wondering if the was some kind of pattern that could be introduced, for example if the interface enumerator returned the result of the strongly typed enumerator function:

IEnumerator<KeyValuePair<TKey, TValue>> IEnumerable<KeyValuePair<TKey, TValue>>.GetEnumerator() {
            return GetEnumerator();
        }

The jit could resolve down one level; and bind to the strongly typed enumerator?

@mikedn
Contributor
mikedn commented Sep 18, 2015

The jit could resolve down one level; and bind to the strongly typed enumerator?

It could, in certain cases. This is related to devirtualization, see #1532 as well.

That said, if you want performance it's best to avoid interfaces. Code like the above may be optimized but such code doesn't make sense and similar code that does make sense is less likely to be optimized.

@benaadams
Contributor

Would be happy if it worked with the standard generic collections and their generic interface's IEnumerables/foreach as a starter; obviously it would be better if their was a pattern that could be used for people who implemented those interfaces.

@mikedn
Contributor
mikedn commented Sep 18, 2015

obviously it would be better if their was a pattern that could be used for people who implemented those interfaces

The best that collection implementers can do is to ensure that the explicitly implemented GetEnumerator returns the a struct enumerator just like the "ducking" GetEnumerator does, the standard generic collections already do that. Beyond that it's up to the JIT compiler to figure it out.

@benaadams
Contributor

For context; in aspnet iterating the headers IDictionary in aspnet allocates 256 kBytes for request and 256 kBytes for response per 4k requests (bottom line):

idictionary

When used as a Dictionary it doesn't allocate anything (bottom line):

idictionary

However the default implementation of the headers IDictionary in Kestrel isn't a Dictionary; and it can also be user overridden, hence the importance of the interface, so it can't be cast or expressed as concrete type.

This flexibility comes at a cost for 1M rps of 128 MBytes of garbage per second; and at 6M rps of 768 MBytes per second or 46 GBytes per minute for the GC to clean up by using an interface for Enumerable.

@benaadams benaadams changed the title from Enumerating IDictionary Allocates to Avoid enumeration allocation via interface Sep 24, 2015
@mattwarren
Contributor

I know that the extra allocations are the main issue, but just for reference here's the perf difference:

BenchmarkDotNet=v0.7.7.0
OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, ProcessorCount=8
HostCLR=MS.NET 4.0.30319.0, Arch=64-bit  [RyuJIT]
Type=Framework_DictionaryVsIDictionary  Mode=Throughput  Platform=HostPlatform  Jit=HostJit  .NET=HostFramework  
Method AvrTime StdDev op/s
DictionaryEnumeration 22.3095 ns 0.3227 ns 44,823,991.41
IDictionaryEnumeration 33.8472 ns 0.3365 ns 29,544,535.06

and here's the BenchmarkDotNet benchmark I used (easier than hand-rolling benchmarks each time)

public class Framework_DictionaryVsIDictionary
{
    Dictionary<string, string> dict;
    IDictionary<string, string> idict;

    [Setup]
    public void Setup()
    {
        dict = new Dictionary<string, string>();
        idict = (IDictionary<string, string>)dict;
    }

    [Benchmark]
    public Dictionary<string, string> DictionaryEnumeration()
    {
        // Doesn't allocate
        foreach (var item in dict)
        {
            ;
        }
        return dict;
    }

    [Benchmark]
    public IDictionary<string, string> IDictionaryEnumeration()
    {
        // Allocates 998k
        foreach (var item in idict)
        {
            ;
        }
        return idict;
    }
}
@mattwarren
Contributor

And just for completeness, here's the results when using different sized Dictionaries:

Method InitialSize AvrTime StdDev op/s
DictionaryEnumeration 0 22.2737 ns 0.3171 ns 44,896,091.66
DictionaryEnumeration 1 30.0156 ns 0.4659 ns 33,316,034.57
DictionaryEnumeration 5 57.6569 ns 1.6896 ns 17,343,980.51
DictionaryEnumeration 10 103.7891 ns 2.0078 ns 9,634,922.66
DictionaryEnumeration 100 807.9180 ns 7.7998 ns 1,237,749.37
DictionaryEnumeration 1000 7,838.9546 ns 18.3068 ns 127,568.03
IDictionaryEnumeration 0 33.3727 ns 0.1456 ns 29,964,588.40
IDictionaryEnumeration 1 51.3341 ns 1.0662 ns 19,480,210.95
IDictionaryEnumeration 5 130.5508 ns 1.9283 ns 7,659,862.79
IDictionaryEnumeration 10 216.5261 ns 7.0871 ns 4,618,399.40
IDictionaryEnumeration 100 1,768.5816 ns 14.4940 ns 565,424.87
IDictionaryEnumeration 1000 17,350.8851 ns 132.1878 ns 57,633.98

The benchmark now looks like this:

public class Framework_DictionaryVsIDictionary
{
    Dictionary<string, string> dict;
    IDictionary<string, string> idict;

    [Params(0, 1, 5, 10, 100, 1000)]
    public int InitialSize = 0;

    [Setup]
    public void Setup()
    {
        dict = new Dictionary<string, string>(InitialSize);
        idict = (IDictionary<string, string>)dict;

        foreach (var value in Enumerable.Range(0, InitialSize))
            dict.Add(value.ToString(), "Value:" + value.ToString());
    }

    [Benchmark]
    public Dictionary<string, string> DictionaryEnumeration()
    {
        // Doesn't allocate
        foreach (var item in dict)
        {
            ;
        }
        return dict;
    }

    [Benchmark]
    public IDictionary<string, string> IDictionaryEnumeration()
    {
        // Allocates 998k
        foreach (var item in idict)
        {
            ;
        }
        return idict;
    }
}
@stephentoub
Member

Yes, the issue isn't just enumerator allocations, it's also interface-based dispatch. In addition to boxing the enumerator, the MoveNext and Current calls made per element go from being potentially-inlineable non-virtual calls to being interface calls.

@mattwarren
Contributor

ah, thanks for the explanation, that explains the "extra" overhead I see when doing the benchmarks with the different InitialSize values.

@stephentoub
Member

I don't think there's any action to be taken here. If there's a concrete request for something specific to be done, please feel free to reopen and share. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment