Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devirtualize EqualityComparer for reference types #76714

Merged
merged 1 commit into from
Oct 10, 2022

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Oct 6, 2022

I noticed that we give up on reference types for EqualityComparer.Default<T>/Comparer.Default<T>, e.g.:

static bool Foo(string s1, string s2) => 
    EqualityComparer<string>.Default.Equals(s1, s2);

used to emit:

; Method ConsoleApp5.Program:Foo(System.String,System.String):bool
G_M38758_IG01:
       sub      rsp, 40
       mov      r8, rcx
       mov      rax, rdx
G_M38758_IG02:
       mov      rcx, 0xD1FFAB1E      ; const ptr
       mov      rcx, gword ptr [rcx]
       mov      rdx, r8
       mov      r8, rax
       call     [System.Collections.Generic.GenericEqualityComparer`1[System.__Canon]
:Equals(System.__Canon,System.__Canon):bool:this]
       nop      
G_M38758_IG03:
       add      rsp, 40
       ret      
; Total bytes of code: 41

New codegen:

; Method ConsoleApp5.Program:Foo(System.String,System.String):bool
G_M38758_IG01:
       sub      rsp, 40
G_M38758_IG02:
       test     rcx, rcx
       je       SHORT G_M38758_IG08
G_M38758_IG03:
       test     rdx, rdx
       je       SHORT G_M38758_IG07
G_M38758_IG04:
       mov      r11, 0xD1FFAB1E      ; code for System.IEquatable`1[__Canon][System.__Canon]:Equals
       call     [r11]System.IEquatable`1[System.__Canon]:Equals(System.__Canon):bool:this
G_M38758_IG05:
       nop      
G_M38758_IG06:
       add      rsp, 40
       ret      
G_M38758_IG07:
       xor      eax, eax
       jmp      SHORT G_M38758_IG05
G_M38758_IG08:
       test     rdx, rdx
       jne      SHORT G_M38758_IG09
       mov      eax, 1
       jmp      SHORT G_M38758_IG05
G_M38758_IG09:
       xor      eax, eax
       jmp      SHORT G_M38758_IG05
; Total bytes of code: 53

So now GenericEqaulityCompare<string>.get_Default().Equals(string,string) is inlined. Although, it still contains a runtime lookup 馃 - it seems that inliner doesn't propagate the known generic context here.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 6, 2022
@ghost ghost assigned EgorBo Oct 6, 2022
@ghost
Copy link

ghost commented Oct 6, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

I noticed that we give up on reference types for EqualityComparer.Default<T>/Comparer.Default<T>, e.g.:

static bool Foo(string s1, string s2) => 
    EqualityComparer<string>.Default.Equals(s1, s2);

used to emit:

; Method ConsoleApp5.Program:Foo(System.String,System.String):bool
G_M38758_IG01:
       sub      rsp, 40
       mov      r8, rcx
       mov      rax, rdx
G_M38758_IG02:
       mov      rcx, 0xD1FFAB1E      ; const ptr
       mov      rcx, gword ptr [rcx]
       mov      rdx, r8
       mov      r8, rax
       call     [System.Collections.Generic.GenericEqualityComparer`1[System.__Canon]
:Equals(System.__Canon,System.__Canon):bool:this]
       nop      
G_M38758_IG03:
       add      rsp, 40
       ret      
; Total bytes of code: 41

New codegen:

; Method ConsoleApp5.Program:Foo(System.String,System.String):bool
G_M38758_IG01:
       sub      rsp, 40
G_M38758_IG02:
       test     rcx, rcx
       je       SHORT G_M38758_IG08
G_M38758_IG03:
       test     rdx, rdx
       je       SHORT G_M38758_IG07
G_M38758_IG04:
       mov      r11, 0xD1FFAB1E      ; code for System.IEquatable`1[__Canon][System.__Canon]:Equals
       call     [r11]System.IEquatable`1[System.__Canon]:Equals(System.__Canon):bool:this
G_M38758_IG05:
       nop      
G_M38758_IG06:
       add      rsp, 40
       ret      
G_M38758_IG07:
       xor      eax, eax
       jmp      SHORT G_M38758_IG05
G_M38758_IG08:
       test     rdx, rdx
       jne      SHORT G_M38758_IG09
       mov      eax, 1
       jmp      SHORT G_M38758_IG05
G_M38758_IG09:
       xor      eax, eax
       jmp      SHORT G_M38758_IG05
; Total bytes of code: 53

So now GenericEqaulityCompare<string>.get_Default().Equals(string,string) is inlined. Although, it still contains a runtime lookup 馃

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Oct 6, 2022

Ah, I assume we currently hit the same problem with Dynamic PGO:

using System.Runtime.CompilerServices;

class Program
{
    static void Main()
    {
        for (int i = 0; i < 100; i++)
        {
            Test(new Foo<string>());
            Thread.Sleep(16);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        static bool Test(Foo<string> foo) => foo.CallEquals("1", "2");
    }

    public class Foo<T> where T : IEquatable<T>
    {
        public virtual bool CallEquals(T a1, T a2) => a1.Equals(a2);
    }
}

With PGO enabled we get this for Tier1 for Test:

; Assembly listing for method ConsoleApp5.Program:Test(ConsoleApp5.Foo`1[System.String]):bool
; Tier-1 compilation
; with Dynamic PGO: edge weights are valid, and fgCalledCount is 87
G_M27674_IG01:
       sub      rsp, 40
G_M27674_IG02:
       mov      rdx, 0xD1FFAB1E      ; ConsoleApp5.Foo`1[System.String]
       cmp      qword ptr [rcx], rdx
       jne      SHORT G_M27674_IG05
       mov      rdx, 0xD1FFAB1E      ; "1"
       mov      r8, 0xD1FFAB1E      ; "2"
       call     [ConsoleApp5.Foo`1[System.__Canon]:CallEquals(System.__Canon,System.__Canon):bool:this]
G_M27674_IG03:
       movzx    rax, al
G_M27674_IG04:
       add      rsp, 40
       ret      
G_M27674_IG05:
       mov      rdx, 0xD1FFAB1E      ; "1"
       mov      r8, 0xD1FFAB1E      ; "2"
       mov      rax, qword ptr [rcx]
       mov      rax, qword ptr [rax+48H]
       call     [rax+20H]ConsoleApp5.Foo`1[System.__Canon]:CallEquals(System.__Canon,System.__Canon):bool:this
       jmp      SHORT G_M27674_IG03
; Total bytes of code 85

while it's expected to be just return false in the guarded path.
Meaning we give up and not trying to propagate the known generic context during inlining.
cc @AndyAyersMS

@EgorBo EgorBo marked this pull request as ready for review October 6, 2022 16:00
@AndyAyersMS
Copy link
Member

Interesting -- do you think this change on its own will be worthwhile or should we look into why we can't further devirtualize?

@EgorBo
Copy link
Member Author

EgorBo commented Oct 6, 2022

Interesting -- do you think this change on its own will be worthwhile or should we look into why we can't further devirtualize?

I assume it's worthwhile even as is since it inlines null-checks, but, probably, it makes sense to find the reason why we don't devirtualize further indeed as part of this PR, I pinged you just in case if you hit this too/know simple answer 馃檪

@AndyAyersMS
Copy link
Member

Interesting -- do you think this change on its own will be worthwhile or should we look into why we can't further devirtualize?

I assume it's worthwhile even as is since it inlines null-checks, but, probably, it makes sense to find the reason why we don't devirtualize further indeed as part of this PR, I pinged you just in case if you hit this too/know simple answer 馃檪

I don't know offhand.

I'll take a deeper look if you like; I have some vague notes from long ago about cases where we could try and sharpen the context when inlining....

@build-analysis build-analysis bot mentioned this pull request Oct 6, 2022
2 tasks
@AndyAyersMS
Copy link
Member

In the first case the issue is that Equals contains a ldarga.1 (feeding a constrained callvirt) and this makes the inliner think the arg can be modified. So, it only uses the declared type for the arg instead of the actual type.

Importing BB03 (PC=000) of 'System.Collections.Generic.GenericEqualityComparer`1[System.__Canon]:Equals(System.__Canon,System.__Canon):bool:this'
    [ 0]   0 (0x000) ldarg.1
lvaGrabTemp returning 4 (V04 tmp2) called for Inlining Arg.
...
Importing BB05 (PC=016) of 'System.Collections.Generic.GenericEqualityComparer`1[System.__Canon]:Equals(System.__Canon,System.__Canon):bool:this'
    [ 0]  16 (0x010) ldarga.s 1
    [ 1]  18 (0x012) ldarg.2
    [ 2]  19 (0x013) constrained. (1B00002B) callvirt 0A0015E4
In Compiler::impImportCall: opcode is callvirt, kind=2, callRetType is bool, structSize is 0

impDevirtualizeCall: Trying to devirtualize interface call:
    class for 'this' is System.__Canon (attrib 20020000)
    base method is System.IEquatable`1[System.__Canon]::Equals
--- base class is interface
--- no derived method: object class was canonical
    Class not final or exact
Considering guarded devirtualization at IL offset 25 (0x19)
Not guessing for class or method: no GDV profile pgo data, or pgo disabled
INLINER: during 'impMarkInlineCandidate' result 'failed this call site' reason 'target not direct' for 'System.Collections.Generic.GenericEqualityComparer`1[System.__Canon]:Equals(System.__Canon,System.__Canon):bool:this' calling 'System.IEquatable`1[System.__Canon]:Equals(System.__Canon):bool:this'
INLINER: during 'impMarkInlineCandidate' result 'failed this call site' reason 'target not direct'

If I hack around that by telling the jit to ignore the ldarga during the IL scan, then things devirtualize nicely:

Importing BB05 (PC=016) of 'System.Collections.Generic.GenericEqualityComparer`1[System.__Canon]:Equals(System.__Canon,System.__Canon):bool:this'
    [ 0]  16 (0x010) ldarga.s 1
    [ 1]  18 (0x012) ldarg.2
    [ 2]  19 (0x013) constrained. (1B00002B) callvirt 0A0015E4
In Compiler::impImportCall: opcode is callvirt, kind=2, callRetType is bool, structSize is 0

impDevirtualizeCall: Trying to devirtualize interface call:
    class for 'this' is System.String [final] (attrib 20040010)
    base method is System.IEquatable`1[System.__Canon]::Equals
--- base class is interface
    devirt to System.String::Equals -- final class
               [000038] --CXG------                         *  CALLV stub int    System.IEquatable`1[__Canon][System.__Canon].Equals
               [000039] ---XG------ this                    +--*  IND       ref   
               [000036] -----------                         |  \--*  ADDR      long  
               [000035] -------N---                         |     \--*  LCL_VAR   ref    V00 this         
               [000037] ----------- arg1                    \--*  LCL_VAR   ref    V01 arg1         
    final class; can devirtualize
... after devirt...
               [000038] --CXG------                         *  CALL nullcheck int    System.String.Equals
               [000039] ---XG------ this                    +--*  IND       ref   
               [000036] -----------                         |  \--*  ADDR      long  
               [000035] -------N---                         |     \--*  LCL_VAR   ref    V00 this         
               [000037] ----------- arg1                    \--*  LCL_VAR   ref    V01 arg1         
INLINER: during 'impMarkInlineCandidate' result 'CheckCanInline Success' reason 'CheckCanInline Success' for 'System.Collections.Generic.GenericEqualityComparer`1[System.__Canon]:Equals(System.__Canon,System.__Canon):bool:this' calling 'System.String:Equals(System.String):bool:this'
INLINER: during 'impMarkInlineCandidate' result 'CheckCanInline Success' reason 'CheckCanInline Success'


STMT00009 ( 0x010[E-] ... ??? ) <- INLRT @ 0x000[E-]
               [000038] I-CXG------                         *  CALL nullcheck int    System.String.Equals (exactContextHnd=0x00007FFB6A4CAC91)
               [000039] ---XG------ this                    +--*  IND       ref   
               [000036] -----------                         |  \--*  ADDR      long  
               [000035] -------N---                         |     \--*  LCL_VAR   ref    V00 this         
               [000037] ----------- arg1                    \--*  LCL_VAR   ref    V01 arg1    

and the resulting codegen is:

G_M19911_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
                                                ;; size=4 bbWeight=1    PerfScore 0.25
G_M19911_IG02:              ;; offset=0004H
       4885C9               test     rcx, rcx
       743F                 je       SHORT G_M19911_IG12
                                                ;; size=5 bbWeight=1    PerfScore 1.25
G_M19911_IG03:              ;; offset=0009H
       4885D2               test     rdx, rdx
       7436                 je       SHORT G_M19911_IG11
                                                ;; size=5 bbWeight=0.50 PerfScore 0.62
G_M19911_IG04:              ;; offset=000EH
       483BCA               cmp      rcx, rdx
       742A                 je       SHORT G_M19911_IG10
                                                ;; size=5 bbWeight=0.50 PerfScore 0.62
G_M19911_IG05:              ;; offset=0013H
       8B4108               mov      eax, dword ptr [rcx+08H]
       3B4208               cmp      eax, dword ptr [rdx+08H]
       751E                 jne      SHORT G_M19911_IG09
                                                ;; size=8 bbWeight=0.42 PerfScore 2.49
G_M19911_IG06:              ;; offset=001BH
       488D410C             lea      rax, bword ptr [rcx+0CH]
       448B4108             mov      r8d, dword ptr [rcx+08H]
       4503C0               add      r8d, r8d
       4883C20C             add      rdx, 12
       488BC8               mov      rcx, rax
       FF15658E1600         call     [System.SpanHelpers:SequenceEqual(byref,byref,ulong):bool]
                                                ;; size=24 bbWeight=0.29 PerfScore 1.82
G_M19911_IG07:              ;; offset=0033H
       90                   nop
                                                ;; size=1 bbWeight=1    PerfScore 0.25
G_M19911_IG08:              ;; offset=0034H
       4883C428             add      rsp, 40
       C3                   ret
                                                ;; size=5 bbWeight=1    PerfScore 1.25
G_M19911_IG09:              ;; offset=0039H
       33C0                 xor      eax, eax
       EBF6                 jmp      SHORT G_M19911_IG07
                                                ;; size=4 bbWeight=0.12 PerfScore 0.28
G_M19911_IG10:              ;; offset=003DH
       B801000000           mov      eax, 1
       EBEF                 jmp      SHORT G_M19911_IG07
                                                ;; size=7 bbWeight=0.08 PerfScore 0.18
G_M19911_IG11:              ;; offset=0044H
       33C0                 xor      eax, eax
       EBEB                 jmp      SHORT G_M19911_IG07
                                                ;; size=4 bbWeight=0.00 PerfScore 0.01
G_M19911_IG12:              ;; offset=0048H
       4885D2               test     rdx, rdx
       7507                 jne      SHORT G_M19911_IG13
       B801000000           mov      eax, 1
       EBDF                 jmp      SHORT G_M19911_IG07
                                                ;; size=12 bbWeight=0.00 PerfScore 0.00
G_M19911_IG13:              ;; offset=0054H
       33C0                 xor      eax, eax
       EBDB                 jmp      SHORT G_M19911_IG07
                                                ;; size=4 bbWeight=0    PerfScore 0.00

; Total bytes of code 88

I still don't see all this expansion as desirable, unless perhaps we know one or the other of the arguments and can trim out some of the paths (note there is a lot of redundancy here but we don't have the optimization abilities to clean it up, eg some sort of tail merge).

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Oct 7, 2022

Looks like the second example runs into the same problem:

Invoking compiler for the inlinee method Foo`1[System.__Canon]:CallEquals(System.__Canon,System.__Canon):bool:this :
IL to import:
IL_0000  0f 01             ldarga.s     0x1
IL_0002  04                ldarg.2     
IL_0003  fe 16 02 00 00 1b constrained. 0x1B000002
IL_0009  6f 04 00 00 0a    callvirt     0xA000004
IL_000e  2a                ret         

We're generally not able to determine during the IL prescan if a given ldarga will be "consumed" by the constrained callvirt and so not lead us to believe the arg can be modified. A more general solution would require the IL prescan to maintain an accurate stack, which means resolving all call tokens (or at least accessing some of the related info) so we can do accurate pops for calls.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 7, 2022

I still don't see all this expansion as desirable

I think we should just remove AggressiveInlining from here https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/EqualityComparer.cs#L119 so inliner will decide itself if it's needed e.g. the constant case you mentioned.

@AndyAyersMS
Copy link
Member

SPMI can't really test this change as any method that might benefit won't have the right info in its context. Can you run some PMI diffs.

An alternative fix to recognizing that ldarga feeds a constrained callvirt is to update the arg/local sig parsing for shared methods to take the exact context when we have it. But this might be complicated.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 7, 2022

@AndyAyersMS jus a few diffs: 11 improved, 11 regressedacross -f --pmi , biggest regression is

28 (16.97% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.Text.TextChange:Equals(System.Object):bool:this

Example of an improvement: https://www.diffchecker.com/DL6aKLo5

I wanted to improve it because it's emitted by Roslyn for records Equals/GetHashCode

@AndyAyersMS
Copy link
Member

An alternative fix to recognizing that ldarga feeds a constrained callvirt is to update the arg/local sig parsing for shared methods to take the exact context when we have it. But this might be complicated.

Been playing around with this and maybe it's not too bad. At least the initial bit of getting properly typed args for methods in generic classes when we have an exact context for them.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants