Performance decrease when using interfaces #9105

Open
Thealexbarney opened this Issue Jan 25, 2017 · 32 comments

Projects

None yet

9 participants

@Thealexbarney

The general guidelines that I see for writing in C# usually say something like, "Accept as general of an object as you can for your parameters." For example, instead of requiring a List as a parameter when you don't actually use List specific methods, accept an IList, or even an IEnumerable if you can. This way you're not tied to a specific implementation. The basic concept of interfaces.

This works well in theory, but in practice, I've found that it's usually a bad idea when it comes to interfaces like IList, ICollection, or IEnumerable. In many of my projects, I've found that changing methods' parameter types from arrays or Lists to interfaces such as IList result in a 2-4x increase in execution time for the program.

Because virtual calls are used when using interfaces, making it so the JIT doesn't inline them, indexer access time is greatly increased. I was curious just how big the performance decrease was, so I put together a little demo program to try and benchmark this, and the results were worse than I expected:

Times are in milliseconds

18       Array
218      Array as IList
26       List
223      List as IList
19       ArrayWrapper
188      ArrayWrapper as IList

Using an IList resulted in about an order of magnitude longer execution time than not using an IList. I wondered if some of this was due to the extra stuff the CLR does when accessing an Array or List as an IList, so I made a simple wrapper class around an array. This resulted in some improvement, but not much.

This virtually negates the advantage of something like accepting an IList as a parameter instead of a T[] to decouple the code from any specific implementation. A new developer might expect that passing an object to a method accepting an IList would result in the same performance as if the method accepted that object's concrete type.

Is there any way to lessen this kind of performance loss? Or is it unavoidable without changing the JIT to be able to inline virtual calls?

@mikedn
Contributor
mikedn commented Jan 25, 2017

The general guidelines that I see for writing in C# usually say something like, "Accept as general of an object as you can for your parameters."

Such guidelines need to be taken with a grain of salt. They usually make sense in public APIs but beyond that collection interfaces should be used only when needed. It makes no sense whatsoever to have a private method taking IList<T> when all its callers use List<T>.

Is there any way to lessen this kind of performance loss? Or is it unavoidable without changing the JIT to be able to inline virtual calls?

Not really. JIT improvements could help here and there but in the general case there's not much that can be done.

@benaadams
Contributor
benaadams commented Jan 25, 2017 edited

@Thealexbarney you can optimise specific cases with tight loops by casting; for example changing your SumIList method to the following would allow the the inlines when ilist is a list

private static int SumIList(IList<int> ilist)
{
    int result = 0;

    var list = ilist as List<int>;
    if (list != null)
    {
        var length = list.Count;
        for (int i = 0; i < length; i++)
            result += list[i];
    }
    else
    {
        var length = ilist.Count;
        for (int i = 0; i < length; i++)
            result += ilist[i];
    }

    return result;
}

In your example you are also doing a double interface dispatch per loop (for both count and item) and comparing that against a fully inlined bounds-check eliminated loop. The time should halve if you cache the count when going via the interface.

@jkotas
Member
jkotas commented Jan 25, 2017 edited

Abstractions have performance costs. Pretty much all runtime abstractions - interfaces, generics, async, linq, ... - have extra performance costs compared to their less abstract equivalents. This overhead does not matter for most of the code and it is often worth paying to get the productivity benefits that these abstractions provide. However, one has to be careful about this overhead for the performance critical parts.

@vancem wrote a great articles on this topic years ago. Links to the archived copies can be found here.

@jkotas jkotas added the question label Jan 25, 2017
@vancem
Member
vancem commented Jan 25, 2017

As Jan points out, abstractions often have costs. This is part of what makes design hard, because you have to make tradeoffs. We have problems with people going 'overboard' both ways. We often have people designing things without any concern for perf, often with really bad consequences. More rarely, we have people who try to optimize 'everything' and end up making hard-to-maintain code.

What we really want is to take advantage of a 90%-10% dynamic that typically happens with software. It is typically the case that WELL UNDER 10% of your code in is on high volume (HOT) code paths. Moreover, it is NOT THAT hard to anticipate this. In this code, there is a good chance you need to NOT be as abstract (e.g. using Arrays instead of IList), and think carefully about the APIs (make them chunky, so that they are not called often to do little work). The other 90% of your code you can skew toward all the good software abstractions etc that make the code understandable, reusable and maintainable.

In the article Jan mentions above I go into more detail, but it is not rocket science, it is al about exploiting the 90-10% characteristic. People who design general purpose libraries have the hardest since they really don't know what scenarios are the most important to optimize.

For what it is worth...

@Thealexbarney

Yes, abstractions have costs, but a 10x increase in execution time for this example doesn't seem very reasonable to me. I'm not very knowledgeable about the inner workings of the CLR's code generation, so somebody correct me if I'm wrong, but isn't one of the advantages of compiling at run-time the fact that the JIT compiler knows the type behind the interface, and can optimize accordingly?

I've written code across various projects that only operated on an object's indexer and length. We wanted to be able to run Arrays Lists, and other types through this code. Using an IList seemed like a good solution, but wasn't feasible for our project because of the performance issues.

I was curious how the JVM handled this, so I did a similar benchmark in Java testing a method that took an interface as a parameter. Going through an interface there didn't have much, if any, noticeable overhead compared to using a plain array. Looking at the compiled assembly confirmed that the interface calls were being inlined and optimized. It looks like when a new type goes through the interface, the method is JIT compiled for the new base type. This doesn't happen with the CLR.

I guess what I'm trying to ask is why is this still a problem after years of the CLR being developed? I can't speak for others, but I've personally run into it a decent number of times. Is there something about the way the CLR is designed that makes it difficult to improve on? Is it not worth the effort to improve? Or maybe there's some other reason that I'm overlooking.

@jkotas
Member
jkotas commented Jan 26, 2017 edited

Yes, it would be nice for CLR to have the optimizations you are asking about. They are not easy to build, but it is certainly doable with enough effort. They have not been built yet because of people working on CLR did not find them to be the best bang for the buck when choosing areas to improve so far.

@fatalerrorx

I agree with @Thealexbarney 10x increase is unreasonable. I very often use IReadOnlyList instead of List for safety and code correctness.

@mattwarren
Contributor

If anyone is interested in understanding 'why interface calls are more expensive', this BOTR page is an interesting read. Despite the title it is about interface dispatch, from the page:

Although it is possible for Virtual stub dispatching (VSD) to dispatch both virtual instance and interface method calls, it is currently used only for interface dispatch.

@NickStrupat
NickStrupat commented Jan 26, 2017 edited

Hi all, @fatalerrorx, @benaadams

Correct me if I'm wrong, but don't generic constraints alleviate this pressure to overload/specialize in a lot of cases?

private static int SumIList<TList>(TList list) where TList : IList<int>
{
    int result = 0;
    if (list != null)
    {
        int length = list.Count;
        for (int i = 0; i < length; i++)
            result += list[i];
    }
    return result;
}

The JIT already knows to compile a different function for each type used, right?

@mikedn
Contributor
mikedn commented Jan 26, 2017

Correct me if I'm wrong, but don't generic constraints alleviate this pressure to overload in a lot of cases?

Your example code still uses IList<int>.

The JIT already knows to compile a different function for each type used, right?

This only happens when TList is a struct type. If it's a reference type then the same code is used for arrays, List<int> etc.

@vancem
Member
vancem commented Jan 26, 2017

I don't want to get into sound too defensive about optimization the .NET runtime does, but I do want to clarify two things that are important in deciding how important various optimization are.

Yes, abstractions have costs, but a 10x increase in execution time for this example doesn't seem very reasonable to me

I think you tested the worst case since the operation being done was literally one instruction. Thus I do think it is not fair to use the 10x as the representative increase. Of those cases where you are doing 'flyweight' things in hot loops, that actually have impact in overall performance, how problematic would it have been to simply use arrays?

I was curious how the JVM handled this

First, I would like to confirm that the Java experiment you did was 'representative' of your real code. What is going on is code specialization, and every system has its limits. There is a good chance that your microbenchmark lives within the limits (the whole function was inlined), but in more typical code, it is problematic (the methods are bigger and you have to decide which are worth it). Certainly 'hotspot' style dynamic re-compilation can do this (and that may be what is happening in your case), but that is a non-trivial investment.

My main point, however is that you should be careful about over-extrapolation of microbechmarks. If the benchmark REALLY represents your hot paths, then I can't argue with that, but real-world code has a surprising tendency to be both not as bad in some case (as in the first case above) or as good as (if the java JIT 'falls off' its optimization for your real code in the second case).

This is kind of tradeoff we have to make when the .NET Runtime decides how to invest in optimization. What we have found is that in 'real-world' code the performance problems are only very rarely related to JIT code quality issues of the type you suggest (but they do turn up in microbenchmarks all the time, which is what is happening here).

Anyway, what would be more useful to us is not this microbenchmark, but examples of 'hot kernels' in your code where you find this to be an actual problem. That would certainly add weight for us to do some optimization of those scenarios.

@NickStrupat

@mikedn, wow I assumed it did that work for us. I must have misinterpreted something I read somewhere. I just took @Thealexbarney 's demo program and tried my change; same as not using generic. I'm kind of disappointed.

@NickStrupat

Now it makes sense why the LINQ operators don't use this (non)-"trick". I wish they would have added that to the JIT compiler to support the LINQ implementation using that optimization.

@markrendle

Regarding the generic constraint comment (@NickStrupat) there are areas where using that could avoid (a) the interface invocation tax and (b) the struct-boxing tax incurred when an instance is used via its interface, particularly for things like IEnumerable and IEnumerator, which work really well as structs.

@NickStrupat
NickStrupat commented Jan 27, 2017 edited

@markrendle absolutely. What I'm a bit disappointed about is that JIT doesn't emit a specialized method for classes as well. It could emit a method where all of the calls to the interface methods are resolved to call the methods on the class (at least if that class is sealed so it knows there won't be polymorphic behaviour).

I just tested @Thealexbarney 's demo program with an added SealedArrayWrapper and having all tests call a generic version of the Sum... method. The results are the same sealed or not.

Developing a library which reads the method body as IL and emits a DynamicMethod with the modified IL for sealed classes could be a workable solution.

@vancem
Member
vancem commented Jan 27, 2017

Adding @cmckinsey

@NickStrupat
NickStrupat commented Jan 27, 2017 edited

So in summary, RyuJIT does not take advantage of the optimization opportunity to output specialized native executable code for generic methods taking an interface/class-constrained parameter where T is a sealed class, despite value types receiving this optimization. The specialized code would in theory be calling the method directly instead of during a virtual table look-up call, assuming this is how it works for value types(?).

Is that correct? Does anyone have expertise on what changes that would require?

Scenarios like this where high performance abstractions can be achieved through using a sealed derived class combined with generic methods in hot loops or elsewhere could be considered very valuable for people with their eyes on the "performance as a feature" aspect of .NET development.

@mikedn
Contributor
mikedn commented Jan 27, 2017

So in summary, RyuJIT does not take advantage of the optimization opportunity to output specialized native executable code for generic methods taking an interface/class-constrained parameter where T is a sealed class, despite value types receiving this optimization. The specialized code would in theory be calling the method directly instead of during a virtual table look-up call, assuming this is how it works for value types

It works that way for value types because there aren't any reasonable alternatives (e.g the code for List<double> and List<int> has to be different because the 2 value types have different sizes and require different instructions, the only way to avoid this is to do box those values like Java generics do).

Doing the same thing indiscriminately in the case of reference types would result in a code explosion and in many cases it would be useless or worse, have adverse effects. If such specialization is done then it should be done only when needed and that makes things more difficult. The JIT needs to somehow figure which code paths are worth optimizing either at runtime (like Java supposedly does) or by having profiling information collected in advance (e.g. via tools like mpgo). And each approach has its pros and cons (e.g. the Java approach wouldn't probably work in AOT runtimes such as .NET Native and CoreRT).

@benaadams
Contributor
benaadams commented Jan 27, 2017 edited

@NickStrupat the caller can choose to force the implementation for the generic by creating a non-generic struct wrapper

struct ListWrapper : IList<int>
{
    private List<int> _list;
    public ListWrapper(List<int> list)
    {
        _list = list;
    } 
    public int Count => _list.Count;
    //...
}

var result = SumIList(new ListWrapper(list));

It should also give you the benefits of inlining back

@Thealexbarney
Thealexbarney commented Jan 28, 2017 edited

@vancem The microbenchmark is not completely representative of the hot spots in our code, but it's not too far off. The most severely I run into this performance decrease is when working with code that uses interfaces such as IList or IReadOnlyList.

Here's one example of a possible situation. Let's say someone's writing some code that does some sort of data processing, and the functions to call this code accept an Array. This implementation is working fine, and performance is good.

Now this person needs to pass in portion of a large array to this function. They have two main options: Create a new, smaller array, allocating memory in the process, or create an object (Similar to ArraySegment or Span) that abstracts the access to the original array, requiring duplicated code to handle the object, incurring a slight increase in execution time, but without a large memory allocation.

I won't go into any more detail, but there are a few more memory abstractions that this person uses in different situations, and a common theme between all of them is that they can all implement IList. Currently there are multiple versions of the code that have to be maintained, but if a single version of the code accepted an IList, ease of maintenance and the size of the code base would be better. However, making the original code accept an IList instead of an Array results in about a 2-4x increase in execution time, depending on the situation.

I'm not that great at creating scenarios, but I hope this shows one situation where a developer could run into this issue.

@NickStrupat

@mikedn I'm not suggesting indiscriminately doing the same thing for reference types. Only for sealed classes.

@benaadams The wrapper struct can actually be generic and receive the same inlining treatment.

@mikedn
Contributor
mikedn commented Jan 31, 2017

I'm not suggesting indiscriminately doing the same thing for reference types. Only for sealed classes.

How does doing it only for sealed classes help? List<T> and most collection classes aren't sealed anyway. The only commonly used "collection class" that is sealed is T[]. But even then wholesale generic method specialization doesn't necessarily solve this issue. The JIT needs to somehow figure out which specialization to call and at compile time this is only possible if the actual type of the collection is known at the callsite. As in:

void foo() {
    IList<int> nums = new int[42];
    ...
    // the compiler knows that nums is int[] so it could call
    // a specialized version of SumIList
    SumIList(nums);
}

Otherwise the best a compiler can do is something along the lines of @benaadams first example. And that kind of approach needs to be driven by profiling information otherwise you risk ending up with a larger and slower program.

@NickStrupat

@mikedn I think we've each been talking about slightly different scenarios, so I'll try to specify again.

If the calling code knows the IList<T> is T[] or some derived and sealed class of any IList<T>, the following method could and should perform the same "inlining" as value-type IList<T>s already receive.

private static int SumIList<TList>(TList list) where TList : IList<int>
{
    int result = 0;
    if (list != null)
    {
        int length = list.Count;
        for (int i = 0; i < length; i++)
            result += list[i];
    }
    return result;
}

If TList is a value-type, we get the same performance as if SumIList were overloaded specifically for TList, but not if TList is a sealed class.

Currently, as @benaadams points out, we need to make a value-type wrapper to get the "inlining". If someone has a List<T> or some other standard IList<T> collection, they need to make a struct wrapper for that particular class.

@benaadams
Contributor

@NickStrupat as @mikedn points out if that was done wholesale it would cause a binary size explosion especially as then it would start inlining code into these functions. It would be worthwhile in hot paths but disadvantageous to do in all paths.

There is a proposal in rosyln to indicate that you want specific handling for various generic types dotnet/roslyn#15822 <-- Shameless self promotion 😉

@NickStrupat

@benaadams, @mikedn, I don't know how many other ways I can point this out... I'm not talking about doing it wholesale.

I'm talking about doing it only for sealed classes where TList is the sealed class. In this scenario, the JIT compiler knows just as much about that type as if it were a struct. Only if it's sealed is this a good idea. Since the vast majority of classes are unsealed, there would be no code explosion in most cases.

I fully recognize and understand that generating different native code methods for every and any T would result in an undesirable code explosive à la C++ class templates. Again, I'm not proposing that.

@benaadams, generic specialization proposals get a +1 from me! I loved designing C++ libs which took advantage of that.

@benaadams
Contributor
benaadams commented Feb 1, 2017 edited

@NickStrupat sealed is only relevant for virtual methods; otherwise it already has all the type information. It would also cause code bloat unnecessarily for everywhere a sealed type was used rather than just in hot paths. e.g. as string is a sealed type creating a Dictionary<string, object> would generate full unique assembly for that dictionary, as would Dictionary<string, string> and List<string> etc but there is likely no reason to do so.

Regarding non-generic devirtualization for sealed types that is being looked at as part of #9230; though that is different than this issue.

@mikedn
Contributor
mikedn commented Feb 1, 2017

I think we've each been talking about slightly different scenarios, so I'll try to specify again.

I don't think so. Let's try again:

If the calling code knows the IList is T[] or some derived and sealed class of any IList, the following method could and should perform the same "inlining" as value-type ILists already receive.

Considering that doing this will invariably increase code size one can wonder why should the JIT be doing this in the absence of any indication that the code is performance sensitive. Does the fact that TList happens to be sealed implies that SumIList is performance sensitive? Clearly not.

If TList is a value-type, we get the same performance as if SumIList were overloaded specifically for TList, but not if TList is a sealed class.

And as I stated before value types works the way they work because there aren't any reasonable alternatives, not because the JIT is doing an optimization. If anything, the JIT could do better in the area of value types by avoiding generating separate code for similar structs (e.g. 2 structs that both contain a single int field)

@benaadams
Contributor
benaadams commented Feb 2, 2017 edited

After rechecking my assumptions, I take it back, alas a concrete struct wrapper doesn't inline via a generic. https://gist.github.com/benaadams/ba3c008a328f02c75663213cd9affa65

// incorrect numbers

The interface calls are slower, but closer to a non-inlined call (as above)

@benaadams
Contributor
benaadams commented Feb 2, 2017 edited

Ahh... had it wrong; they do inline, the wrappers were classes not structs, updated code, results are

 31      Array
237      Array as IList
 30      ArrayWrapper
149      ArrayWrapper Non-inlined        ** Non-inlined direct call
 45      List
207      List as IList
220      ArrayWrapper as IList
235      Array as Generic                ** via constraint
206      List as Generic                 ** via constraint
 34      ArrayWrapper as Generic         ** via constraint
@NickStrupat

Considering that doing this will invariably increase code size one can wonder why should the JIT be doing this in the absence of any indication that the code is performance sensitive.

@mikedn Generics in general already causes this "code explosion" relative to non-generic types/collections). What are the chances that the current implementation of generics and their surrounding code base (LINQ, etc.) represents exactly the right balance of native code size vs performance? There is no real reason other than performance to generate the specializations at all. The type arguments could just be used for type-safety to wrap a non-generic with casting. Clearly the performance gains were considered more important than the potentially large generated code size.

Does the fact that TList happens to be sealed implies that SumIList is performance sensitive? Clearly not.

Currently the mechanism for achieving the inlining is "wrap with a struct", which is no less cryptic than "seal your class". That said, I would definitely be interested in some fine-grained control of which inlined specializations are generated.

And as I stated before value types works the way they work because there aren't any reasonable alternatives, not because the JIT is doing an optimization.

There is definitely at least one reasonable alternative. You could just box the value type wrapper like many parts of the .NET BCL already/still do.

If anything, the JIT could do better in the area of value types by avoiding generating separate code for similar structs (e.g. 2 structs that both contain a single int field)

That would be very cool. So the easy case would be if the fields match exactly?

@mikedn
Contributor
mikedn commented Feb 9, 2017

Generics in general already causes this "code explosion" relative to non-generic types/collections). What are the chances that the current implementation of generics and their surrounding code base (LINQ, etc.) represents exactly the right balance of native code size vs performance?

There is definitely at least one reasonable alternative. You could just box the value type wrapper like many parts of the .NET BCL already/still do.

There may be places where the cost of boxing is small enough that code sharing would be desirable but in general boxing is costly enough that code sharing isn't a reasonable option. I don't think anybody wants List<int> to be a list of objects rather than a list of ints.

Currently the mechanism for achieving the inlining is "wrap with a struct", which is no less cryptic than "seal your class".

IMO the problem is not that it is cryptic, the problem is that sealed can be and is used for other purposes. I see no reason why making a class sealed because I don't want 3rd parties to inherit from it would result in more code being generated, code that most of the time is not needed.

That would be very cool. So the easy case would be if the fields match exactly?

They need to have identical layout but they don't need to match exactly. For example, most List<int> and List<SomeInt32BasedEnum> methods should generate identical code.

@benaadams
Contributor

There is definitely at least one reasonable alternative. You could just box the value type wrapper

There's still full backwards compat with .NET Framework 1.0 and 1.1 so you can use ArrayList for your ints if you want this behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment