RyuJIT: Guess unique interface for guarded devirtualization #37563

EgorBo · 2020-06-07T14:00:32Z

A good use-case for the Guarded Devirtualization is a quite popular case when an interface has only one implementation, I noticed that this case wasn't finished (getUniqueImplementingClass(objClass) was commented) and decided to try to add a super-naive implementation for getUniqueImplementingClass in VM.

Example:

public static class Tests
{
    public static int CallInterfaceMethod(IAnimal animal)
    {
        return animal.GetLegsCount();
    }
}


public interface IAnimal
{
    int GetLegsCount();
}

// the only IAnimal implementation:
public class Spider : IAnimal
{
    public int GetLegsCount() => 8;
}

Current codegen for CallInterfaceMethod

G_M37771_IG01:
       4883EC28             sub      rsp, 40
G_M37771_IG02:
       49BB4800671DFC7F0000 mov      r11, 0x7FFC1D670048
       48B84800671DFC7F0000 mov      rax, 0x7FFC1D670048
       FF10                 call     qword ptr [rax]IAnimal:GetLegsCount():int:this
       90                   nop      
G_M37771_IG03:
       4883C428             add      rsp, 40
       C3                   ret      
; Total bytes of code: 32

^ nothing is devirtualized, just a regular virtual call.

Codegen with COMPlus_JitEnableGuardedDevirtualization=1 and this PR:

G_M37771_IG01:
       4883EC28             sub      rsp, 40
G_M37771_IG02:
       49BB98918B1DFC7F0000 mov      r11, 0x7FFC1D8B9198
       4C3919               cmp      qword ptr [rcx], r11
       7507                 jne      SHORT G_M37771_IG04
G_M37771_IG03:
       B808000000           mov      eax, 8  ;;; <------- Spider:GetLegsCount() is inlined!
       EB16                 jmp      SHORT G_M37771_IG05
G_M37771_IG04:
       49BB4800E51CFC7F0000 mov      r11, 0x7FFC1CE50048
       48B84800E51CFC7F0000 mov      rax, 0x7FFC1CE50048
       FF10                 call     qword ptr [rax]IAnimal:GetLegsCount():int:this
G_M37771_IG05:
       90                   nop      
G_M37771_IG06:
       4883C428             add      rsp, 40
       C3                   ret      
; Total bytes of code: 54

So guarded devirtualization transformed:

return animal.GetLegsCount();

to:

if (animal is Spider) return 8; // Spider:GetLegsCount() is inlined!
else return animal.GetLegsCount(); // fallback

Now let's add one more IAnimal implementation and check codegen again:

public class Spider : IAnimal
{
    public int GetLegsCount() => 8;
}

public class Dog : IAnimal
{
    public int GetLegsCount() => 4;
}

G_M37771_IG01:
       4883EC28             sub      rsp, 40
G_M37771_IG02:
       49BB4800E51CFC7F0000 mov      r11, 0x7FFC1CE50048
       48B84800E51CFC7F0000 mov      rax, 0x7FFC1CE50048
       FF10                 call     qword ptr [rax]IAnimal:GetLegsCount():int:this
       90                   nop      
G_M37771_IG03:
       4883C428             add      rsp, 40
       C3                   ret      
; Total bytes of code: 32

Interface has two implementations - the optimization gives up as expected.

cc @AndyAyersMS

PS: this code is Debug/Checked configuration only.

AndyAyersMS · 2020-06-08T17:52:20Z

I would imagine this is potentially quite expensive at jit time. This does not seem like the kind of information that should be searched for.

At one point, @davidwrighton had a prototype implementation for getUniqueImplementingClass that tracked this information as classes were loaded. I think that approach, or perhaps something profile based, is likely to be a better path to enabling guarded devirtualization.

That being said, something like this might prove useful as part of an exploration of the impact of guarded devirt on steady-state performance and code size, as a source of plausible guesses.

Any thoughts on how to characterize diffs? PMI/SPMI are not useful here.

davidwrighton · 2020-06-08T18:38:38Z

Mmm, I ended up restricting the work that got checked in from trying to implement getUniqueImplementingClass to just the logic to build the data structure that would support gathering that data in a memory safe fashion (CrossLoaderAllocatorHash), and ran out of time to implement the actual getUniqueImplementingClass. As we're pulling back into having some resources for working on this sort of thing, it is likely worth bringing that back and trying some benchmarking.

EgorBo · 2020-06-08T19:07:41Z

@AndyAyersMS @davidwrighton initially I wanted to add a special ILLink substep (extension) to analyze the whole app at once and store info in a file (basically a list of "unique" implementations) and read that file in JIT. If you think it's a good idea I can make a prototype 🙂

We have a similiar ILLink substep in Xamarin.iOS: https://github.com/xamarin/xamarin-macios/blob/5fb09b1b841c073b638ff66d07a779a0951dadb5/tools/linker/MonoTouch.Tuner/SealerSubStep.cs#L11-L13 (we mark all leaf implementations with "sealed" since we can't jit on iOS)

EgorBo · 2020-06-08T20:22:24Z

Also, there in ILLink we can mark empty methods (in unique implementations) to focus on them as a start, e.g.

foo.Dispose();

to:

if (foo is FooImpl) no-op; // FooImpl has empty Dispose impl
else foo.Dispose();

optimized:

if (foo is not FooImpl) foo.Dispose(); // goto BB with 0 weight (rarely executed)

AndyAyersMS · 2020-06-09T02:46:23Z

... various interesting ideas to leverage the linker's analysis capabilities

It's tricky for RyuJit to initiate any sort of type discovery, so you'd need to read in those linker produced data files on the runtime side and decide how to robustly identify types via text so the runtime can map them to the right data structures and then relay that data to the jit. Since we're just experimenting and you're already touching the assemblies in the linker and we're not worried that much yet about jit-time perf, seems like it might be simpler to convey this information via custom attributes.

We've done a bit of brainstorming internally to try and address a similar problem: how to convey analysis data from AOT to JIT, in the context of crossgen2. No concrete proposals as of yet.

Another idea along these lines is to do it all in process, say by leveraging the TieredPGO stuff I added in #36887 and do some sort of method table profiling, and make that data available in real time back to Tier1. I have a bit of experience with probabilistic online algorithms that try and determine the most frequently occurring element in a stream using a fixed amount space, so it might be interesting to experiment with something like that (or similarly, try and tap into the VSD cache and see what sort of cell is in place).

Ultimately even when we have a decent source of predictions we may not want to act on all of them. We may need profile data to or may need to wait for deopt to be able to really take advantage. Hard to say for sure without data.

EgorBo · 2020-06-09T19:48:23Z

@AndyAyersMS I've changed the implementation to rely on custom attributes, e.g.:

[UniqueImpl(typeof(Spider))]
public interface IAnimal
{
    int GetLegsCount();
}

and rely on ILLink to define and use it where needed. it's not a perfect solution obviously but it's a way better than iterating all types each call. I understand that there should be a more complicated solution under the hood, ideally with the ability to de-optimize code if the assumption that the impl is unique becomes invalid for a method. Anyway will try to come up with some ILLink task + find related benchmarks.

MichalStrehovsky · 2020-06-10T07:49:50Z

(Accidentally hit close...)

Would it make sense to also allow a bool flag on the an attribute that says: this is the only implementation, and I know the app doesn't use ref emit and Assembly.Load, hence you don't need to guard?

EgorBo · 2020-06-10T09:16:02Z

(Accidentally hit close...)

Would it make sense to also allow a bool flag on the an attribute that says: this is the only implementation, and I know the app doesn't use ref emit and Assembly.Load, hence you don't need to guard?

Sounds like a good idea

EgorBo added 2 commits June 7, 2020 13:21

Add getUniqueImplementingClass call to jitinterface

b199388

Implement getUniqueImplementingClass

2c6b351

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 7, 2020

Implement getUniqueImplementingClass

e88ec6b

MichalStrehovsky closed this Jun 10, 2020

MichalStrehovsky reopened this Jun 10, 2020

Address feedback

a38b93d

EgorBo closed this Aug 6, 2020

EgorBo mentioned this pull request Oct 19, 2020

Unclear logic in Compiler::impDevirtualizeCall #43607

Closed

dotnet locked as resolved and limited conversation to collaborators Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RyuJIT: Guess unique interface for guarded devirtualization #37563

RyuJIT: Guess unique interface for guarded devirtualization #37563

EgorBo commented Jun 7, 2020 •

edited

AndyAyersMS commented Jun 8, 2020

davidwrighton commented Jun 8, 2020

EgorBo commented Jun 8, 2020 •

edited

EgorBo commented Jun 8, 2020 •

edited

AndyAyersMS commented Jun 9, 2020

EgorBo commented Jun 9, 2020

MichalStrehovsky commented Jun 10, 2020

EgorBo commented Jun 10, 2020

RyuJIT: Guess unique interface for guarded devirtualization #37563

RyuJIT: Guess unique interface for guarded devirtualization #37563

Conversation

EgorBo commented Jun 7, 2020 • edited

AndyAyersMS commented Jun 8, 2020

davidwrighton commented Jun 8, 2020

EgorBo commented Jun 8, 2020 • edited

EgorBo commented Jun 8, 2020 • edited

AndyAyersMS commented Jun 9, 2020

EgorBo commented Jun 9, 2020

MichalStrehovsky commented Jun 10, 2020

EgorBo commented Jun 10, 2020

EgorBo commented Jun 7, 2020 •

edited

EgorBo commented Jun 8, 2020 •

edited

EgorBo commented Jun 8, 2020 •

edited