Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New IL instruction for typeswitch #12260

Open
gafter opened this issue Mar 13, 2019 · 31 comments
Open

New IL instruction for typeswitch #12260

gafter opened this issue Mar 13, 2019 · 31 comments
Assignees
Milestone

Comments

@gafter
Copy link
Member

gafter commented Mar 13, 2019

It would be very convenient to have a CLR instruction for switching on the type of an object. The C# compiler could use this to generate code for a pattern-matching switch statement. The IL instruction would be followed by a series of (type, label) pairs. At runtime it would consume its operand, determine the first type in the list that the object inherits from, and jump to the corresponding label with a reference of the appropriate type on the stack. It would be required to operate in constant amortized time, no matter the number of types and whether or not those types are sealed.

We currently generate a series of type tests and branches for this case.

It is intended that this instruction would run in constant time for user-written types.

@AaronRobinsonMSFT AaronRobinsonMSFT changed the title New IL instruction for typeswich New IL instruction for typeswitch Mar 14, 2019
@davidwrighton
Copy link
Member

davidwrighton commented Mar 15, 2019

Why would this be better than a sequence of if/else pairs? Is this about reducing IL size? Do you expect the GC to have a more optimal implementation than the c# compiler generating the equivalent of below?

if (o.GetType() == typeof(string))
{}
else if (o.GetType() == typeof(object))
{
   ....
}
else
{
    ...
}

@jzabroski
Copy link
Contributor

@davidwrighton I think for literature on compiler optimization of pattern matching, the following two OCaml papers give you requisite knowledge and understanding:

Compiling Pattern Matching to Good Decision Trees
Optimizing Pattern Matching

Further, by hoisting this to a logical IL instruction, it is conceivable that really large F#/C# data science algorithms utilizing complex pattern matching could benefit from hot path optimization of the compiled decision tree. I realize C# only implements basic switch-on-type pattern matching right now, but the general benefits are wide-spread to further optimization. For OOP type-matching like switch-on-type, the compiler can specifically reason that:

a) there are significantly more "default case" type matches that fall-through to the default case
b) it is probably more reasonable to optimize for the default case to be evaluated first

The only remaining question is - if you're going to add an instruction, why only take one type parameter? The IL instruction should allow stacking it so it works as a decision tree. For a view into how OCaml does this, see the free chapter 23 of Real World Ocaml: https://v1.realworldocaml.org/v1/en/html/the-compiler-backend-byte-code-and-native-code.html

@gafter What do you think?

@gafter
Copy link
Member Author

gafter commented Mar 15, 2019

@davidwrighton

Here are two reasons:

  1. The IL sequence you provided is only correct for sealed types. Even for sealed types, it would not be a binary compatible code generation strategy in the face of types possibly being changed from sealed to unsealed. Only the runtime knows whether or not the types are sealed at runtime.
  2. Only the runtime can know which concrete types occur most frequently at runtime in the context of this kind of instruction, and could therefore perform the type tests in an order which minimizes overall runtime. I would not expect the runtime to reorder a series of if-then-else type tests on the basis of frequency of occurrence of concrete types (though a runtime could indeed do that), but selecting locally optimal code for a single typeswitch instruction based on knowledge available at runtime (or on the basis of profiles) seems more plausible.

[edited 2019-11-04 to add]
It is intended that the instruction run in amortized constant time. A straightforward translation of the sequence of type tests would not do that.

@gafter
Copy link
Member Author

gafter commented Mar 16, 2019

@jzabroski Thanks for those references!

@jkotas
Copy link
Member

jkotas commented Mar 16, 2019

The IL sequence you provided is only correct for sealed types

Ok, you can flip the sequence to use isinst instead. (The JIT has optimization that turns isinst into a simple type check for sealed types.)

Only the runtime can know which concrete types occur most frequently at runtime

The most efficient way to do a type switch over large number of types is hashtable. The runtime can make this work well by proving efficient way to get a type hash code if it is not efficient enough today.

Other than that, the compiler or the compiler support library can build about as efficient hashtable as the runtime itself. I think it is likely that building the hashtable in the compiler or the compiler support library would give you more flexibility for implementing more complex matching patterns efficiently.

@gafter
Copy link
Member Author

gafter commented Mar 17, 2019

The most efficient way to do a type switch over large number of types is hashtable.

Are you suggesting that we build a hashtable dynamically and incrementally? Wouldn't that prevent the unloading of types once they appear as keys in that table?

I think it is likely that building the hashtable in the compiler or the compiler support library would give you more flexibility for implementing more complex matching patterns efficiently

That would remove all flexibility that the runtime would otherwise have to implement type switches in the most efficient manner for the platform.

@jkotas
Copy link
Member

jkotas commented Mar 17, 2019

Wouldn't that prevent the unloading of types once they appear as keys in that table?

Yes, the hashtable would need to be aware of type unloading. Hashtables aware of type unloading are not unusual and the primitives available for building them are not great. It is something that we should improve.

That would remove all flexibility that the runtime would otherwise have to implement type switches in the most efficient manner for the platform.

As the C# pattern matching is getting more complex, I think it starting to look pretty similar to dynamic from runtime point of view. The runtime support for dynamic is in separate library, separate from the JIT and core runtime. It is pretty efficient, it tunes itself for the most frequently used signatures where possible, etc. In theory, it could have been a bit more efficient if it was all built into the JIT. In practice, I do not think it would be actually more efficient if it was all built into the JIT. We would spent all time dealing with the complexity of building it into the JIT and there would be no time or appetite left to fine tune the performance. Note that we have done work in the runtime for dynamic by introducing and improving the primitives that it is built out of, e.g. we have done a lot of performance work on delegates.

We should consider taking a similar route for pattern matching as we have taken for dynamic keyword: Identify the key most primitive operations that the core runtime needs to provide in order to make it work great; but keep the overall runtime support for pattern matching in separate library.

@gafter
Copy link
Member Author

gafter commented Mar 17, 2019

@jkotas You may have convinced me.

It looks like the break-even point for using a hashtable is about 20 types in a switch. See dotnet/roslyn#31515 (comment) for a benchmark.

I'm not sure how the jit could do better. Without evidence to the contrary, perhaps this issue should be closed?

@gafter
Copy link
Member Author

gafter commented Mar 17, 2019

Improved the prototype library so that the break-even point for using a hashtable is about 10 types in a switch. See https://github.com/gafter/TypeSwitch/blob/master/TypeSwitch/TypeSwitchDispatch.cs

@gafter gafter closed this as completed Mar 17, 2019
@gafter gafter reopened this May 10, 2019
@gafter
Copy link
Member Author

gafter commented May 10, 2019

I believe it is more appropriate for this kind of support to be directly in the runtime, where it can be optimized dynamically based on the program's actual behavior. The library approach fixes the time-space tradeoff at compile-time and makes it nearly impossible for the runtime to do any better.

@GSPP
Copy link

GSPP commented May 11, 2019

Maybe we can teach the JIT to recognize the pattern:

if (type == typeof(A)) ...
else if (type == typeof(B)) ...
...

No IL changes and the IL works on older runtimes.

@gafter
Copy link
Member Author

gafter commented May 11, 2019

They are subtype tests, not exact type tests.

@jkotas
Copy link
Member

jkotas commented May 11, 2019

The runtime/JIT does recognize many framework types as intrinsics and optimizes them in custom ways. The library approach does not prevent the time-space tradeoff from getting optimized based on the actual program behavior if the type is recognized as intrinsic.

@gafter
Copy link
Member Author

gafter commented May 12, 2019

@jkotas Can you please give me an example of a case in which the runtime eliminates a static field due to replacing it and its use with intrinsic behavior?

@jkotas
Copy link
Member

jkotas commented May 13, 2019

The JIT is able to inline the value of readonly static fields. It does not eliminate it. The field is still there, but it is not used on the hot path.

@gafter
Copy link
Member Author

gafter commented May 13, 2019

@jkotas Making an appropriate time-space tradeoff would require eliminating the static field (which uses a lot of space to get reasonable performance). If the runtime does not do that kind of thing, then a new instruction would be an appropriate approach.

@jzabroski
Copy link
Contributor

@jkotas Is there any way I can check whether the JIT has inlined something?

For example from some code I have written, how would I tell this has been inlined:

static class Constructor<T>
  where T : new()
{
  public static readonly Func<IScope, T> Empty = Expression.Lambda<Func<IScope, T>>(scope => scope.Exists<T>() ?? new T()).Compile();
}

Similarly, is there any "Auditing Machine" I can query inside the runtime to determine how far away the code is/was from being JIT'ed?

@jkotas
Copy link
Member

jkotas commented May 14, 2019

@jzabroski Your question is not relevant to this issue. Please open a new issue on this.

Making an appropriate time-space tradeoff would require eliminating the static field

@gafter I do not think elimination of the static field would be required for a decent time-space tradeoff.

Designs that require low-level implementation are harder to evolve and improve. They may be better in theory, but we find them to be worse in practice because of the experience and amount of work required to optimize them.

I do not have a problem with this issue being open as a new IL instruction proposal. But i is not obvious to me that implementing this as a new IL instruction will achieve the best results.

@gafter
Copy link
Member Author

gafter commented May 31, 2019

Dumping my notes for future reference

A code sequence

push e

typeswitch 3
  type1 label1
  type2 label2
  type3 label3

fallthrough: // original e on stack
...
label1: // type1 on stack
...

Would be defined to be equivalent to the following, except required to execute in (amortized) constant time:

push e

dup
isinst type1
brfalse next1
isinst type1
goto labe1
next1:

dup
isinst type2
brfalse next2
isinst type2
goto labe2
next2:

dup
isinst type3
brfalse next3
isinst type3
goto labe3
next3:

fallthrough: // original e on stack
...
label1: // type1 on stack
...

@GSPP
Copy link

GSPP commented Jun 1, 2019

Let's make the JIT recognize that IL pattern then. No need for new opcodes.

@omariom
Copy link
Contributor

omariom commented Jun 1, 2019

Let's make the JIT recognize that IL pattern then. No need for new opcodes.

Then all the patterns JIT recognize must be documented
as C# is not the only language generating IL.
And that will make IL RISC and the patterns will become CISC.

@gafter
Copy link
Member Author

gafter commented Jun 1, 2019

Let's make the JIT recognize that IL pattern then. No need for new opcodes.

The problem is that this pattern must be reliably recognized by all JITs, not just one of them under some circumstances. Otherwise programmers cannot rely on the performance in writing their algorithms. Adding a new instruction adds confidence that JITs recognize it.

@AndyAyersMS
Copy link
Member

It seems like this is something that might fit fairly well as an intrinsic. No new IL or IL pattern matching would be needed, and it could be engineered so that jit/runtime optimization was possible but also optional.

We'd agree on some BCL method(s) for this operation, eg int TypeSwitch(Type t, Type x_0, Type x_1, ...), and find an efficient way to cope with the varadic cases (one can always just implement up to some N, and chain calls to handle cases larger than N, I suppose). This would return the index i of the lowest-numbered x_i that satisfies t isinst x_i, or -1 if none satisfy.

This method (or set of methods) would be targeted by C#, which when doing pattern matching would follow this up with an actual switch to direct control to the appropriate label. There might be some complexities fitting this in with side effect ordering -- I don't know how much code can run during the matching phase currently.

The TypeSwitch method(s) would be fully implemented in the BCL by default. So no jit or runtime work would be required. This implementation could be via hashtable, linear search, etc, whatever seems suitable.

The TypeSwitch method(s) could also be marked with [Intrinsic] so that the they could be recognized by the jit as methods with known behavior. Doing this allows the possibility for special handling, but does not create an obligation. Then when the jit encountered a call to TypeSwitch where the x_i are all jit-time known types, the jit could interact with the runtime to see if it was possible to optimize the test away entirely (say, t's type is known at jit time), or if it made sense to peel off some of the cases for eager testing, or build a full decision tree based on what is known about the likely types of t and the class relationships of the x_i, or fuse the call and following switch into a hash-based indirect jump sequence. If the x_i are not jit-time constants, or the evidence for benefit from specialization is weak, the jit could just defer to calling the helper.

@jzabroski
Copy link
Contributor

That is a very pragmatic middle-ground. Also, by hoisting TypeSwitch up to intrinsic level, it might even be possible to do fancy C# to FPGA synthesizer (completely different target architecture), as you have not yet dipped into the needless abstraction of CIL registers (and even so, the registers become fields in a special component named device in the FPGA), so your approach is likely the most portable and meets specific use case of "emit event if not optimized" so that developers can see in Benchmark.Net or whatever what's going on and what basis they can make optimizer assumptions.

@gafter
Copy link
Member Author

gafter commented Jun 6, 2019

@AndyAyersMS

... could be engineered so that jit/runtime optimization was possible but also optional ...

The "non-optimized" path is likely to be significantly worse that the straightforward IL sequence when not optimized (for a few reasons*). That is very unfortunate, though possibly not fatal. The fatal flaw in this approach is that it relies on an optional optimization for an asymptotic performance improvement. A straightforward complexity analysis demonstrates that you cannot reliably get that asymptotic improvement to the program as a whole unless the optimization is reliably applied (nearly) everywhere it is applicable. That is why I believe a new IL instruction is the best approach - it guarantees that the runtime recognizes the "pattern". I'll explain in more detail when we meet on Monday afternoon.

If the performance is linear time (in the size of the data to the call) when not optimized but constant (amortized) time (which is what we want) when "optimized", then is it not something that we would be likely to want to rely on in designing language features (though as I will explain we need to rely on that in designing upcoming features), and probably not something we would benefit from using in normal C# code-gen scenarios.

*Some of the reasons that your proposed intrinsic is likely to be significantly slower than the (also unfortunately linear-time) straightforward IL sequence:

  • In the general (most common) case the intrinsic would require creating an object (garbage) for the params array.
  • The type tests performed inside the intrinsic are likely to use reflection without any knowledge of the specific types passed as parameters or the knowledge of the type of the input value, all of which are known at the call site. For example it cannot produce better code for sealed classes.
  • Since the method has no way of distinguishing one call site from another, there is no basis on which the method could build a cache. However, see TypeSwitchDispatch, which is about the best I believe a library can do.
  • Although the method performs the type tests in order to return an integer to the caller indicating which test succeeded, after a value is returned the caller must then perform a (checked) cast to that nth type again, so the code performs an extra type test that should not be needed.

@jkotas
Copy link
Member

jkotas commented Jun 6, 2019

It is fine to add a guarantee that the generated code will be always amortized constant time and never linear (for large number of types, not observable for small number of types), for either API or IL instruction design.

For non-optimized code, the JIT is going to call a method to implement this, even if it is a special IL instruction. For optimized code, the JIT can recognize the method call as intrinsic and achieve the same outcomes as a IL instruction, again no fundamental difference between the two designs.

I believe that the interesting optimizations for the API vs. IL instruction design discussion are:

  • Memory footprint optimizations - make the supporting runtime structures as small as possible.
  • Constant multiplier optimizations – full inlining of the switch.
  • Constant and type propagation optimizations - partially or fully eliminating the switch when the JIT can reason about the type of the argument.
  • Dynamic tuning optimizations - adjusting the code based on dynamic behavior of the program.

The question to ask is whether introducing a new IL instruction is worth the trouble and whether it will make the above set of optimizations significantly easier to do.

@cartermp
Copy link

Adding @dsyme to this thread.

@dsyme
Copy link

dsyme commented Jun 11, 2019

A primitive or intrinsic to aid efficient compilation of pattern matching would be very useful and clearly is part of an improved overall design for .NET in the 21st century. I thought a lot about proposing such an intrinsic around the time 2001-03 as we worked through the foundations necessary for F# (generics etc.) but didn't iterate to a concrete design.

  1. It would be great if the switch were over a compat set of integers. thought about a design which would dynamically augment the method table with extra slots for a compact integer tag (e.g. one slot for each commonly used switch) to avoid any hashing but there are obvious tradeoffs and difficulties with that (method table size, NGEN etc.). The runtime gurus on the thread would make much better decisions than me on this.

  2. On the same topic, for large union switches F# stores an extra integer tag in the object to allow switch-based switching.

  3. In advanced ML-family languages supporting GADTs, each branch of a switch in pattern matching can potentially introduce localized type information, e.g. the type systems means you "know" that on one particular branch a type variable T has definite type string. It's possible that F#, C# and the CLR should eventually support this. Introducing this contextual type information is much more suited to a new opcode, though could also be done if recognizing an intrinsic

@YairHalberstadt
Copy link
Contributor

How would you guarantee constant time switches for n types, each of which have thousands of implementors/derived types?

Eg:

private string Switch(object o) {
    return o switch {
        IReadOnlyList _ =>...
        IList _ => ...
        IEnumerable _ => ...
        ISerialisable _ => ...
        etc.
    }
}

@YairHalberstadt
Copy link
Contributor

Ah, I see that we cache the index that a type was found at, so that after first usage it's constant time.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@gafter
Copy link
Member Author

gafter commented Jul 29, 2020

This issue track the same request as #29002. I believe the current consensus is that it can be done as an API in which the set of types to match are passed as generic type arguments (e.g. the elements of a tuple type). See https://github.com/gafter/TypeSwitch/blob/master/TypeSwitch/TypeSwitchDispatch.cs for a prototype that does not address the issue mentioned in #29002 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests