Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Index use can't be completely optimized away #11870

Open
AndyAyersMS opened this issue Jan 22, 2019 · 6 comments
Open

JIT: Index use can't be completely optimized away #11870

AndyAyersMS opened this issue Jan 22, 2019 · 6 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Milestone

Comments

@AndyAyersMS
Copy link
Member

In the following example we'd like the jit to produce the same (or very similar code) for both Fint and Findex:

using System;

class X
{
    public static void Fint(Span<int> c)
    {
        for (int i = 1; i <= c.Length; i++)
        {
            c[c.Length - i] += 1;
        }
    }

    public static void Findex(Span<int> c)
    {
        for (int i = 1; i <= c.Length; i++)
        {
            c[new Index(i, fromEnd: true)] += 1;
        }
    }
}

However currently the Index ctor won't inline (see #11848) and the IL is not jit-friendly. If we revise it and then force it to inline, we still see some codegen deficiencies:

;; Fint

G_M7659_IG03:
       mov      r8d, edx
       sub      r8d, ecx
       cmp      r8d, edx
       jae      SHORT G_M7659_IG05
       movsxd   r8, r8d
       lea      r8, bword ptr [rax+4*r8]
       inc      dword ptr [r8]
       inc      ecx
       cmp      ecx, edx
       jle      SHORT G_M7659_IG03

;; Findex (with Index ctor changes)

G_M36427_IG03:
       test     ecx, ecx
       jl       SHORT G_M36427_IG08

G_M36427_IG04:
       mov      r8d, ecx
       not      r8d
       test     r8d, r8d
       jl       SHORT G_M36427_IG05
       jmp      SHORT G_M36427_IG06

G_M36427_IG05:
       not      r8d
       mov      r9d, edx
       sub      r9d, r8d
       mov      r8d, r9d

G_M36427_IG06:
       cmp      r8d, edx
       jae      SHORT G_M36427_IG09
       movsxd   r8, r8d
       lea      r8, bword ptr [rax+4*r8]
       inc      dword ptr [r8]
       inc      ecx
       cmp      ecx, edx
       jle      SHORT G_M36427_IG03

Root cause is that Index uses an inverted (~negative) value to encode "from end" and the jit can't propagate this from construction down to use.

Range prop might be the right place to address this but it currently does not model not. But doing that, opts would only kick in when Index is used as an array index.

Tentatively putting this in 3.0,but may defer depending on urgency of removing overhead from use of Index.

category:cq
theme:optimization
skill-level:expert
cost:medium

@AndyAyersMS
Copy link
Member Author

Also worth looking at the locally created clearly non-negative case:

using System;

class X
{
    public static void Fint(Span<int> c)
    {
        for (int i = 0; i < c.Length; i++)
        {
            c[i] += 1;
        }
    }

    public static void Findex(Span<int> c)
    {
        for (int i = 0; i < c.Length; i++)
        {
            c[new Index(i)] += 1;
        }
    }
}

Here the jit is currently (13ae47e) is unable to remove a redundant branch as there are complementary conditions (note IR is from from a similar case, may not match exactly if you compiled the above)

G_M15796_IG07:
       85C9                 test     ecx, ecx                   // Index ctor: value < 0? [no]
       7C40                 jl       SHORT G_M15796_IG14

G_M15796_IG08:
       448BC1               mov      r8d, ecx
       4585C0               test     r8d, r8d                   // Index.IsFromEnd ? [no]
       7D0B                 jge      SHORT G_M15796_IG09
       41F7D0               not      r8d                                 
       448BCF               mov      r9d, edi
       452BC8               sub      r9d, r8d
       EB03                 jmp      SHORT G_M15796_IG10

G_M15796_IG09:
       458BC8               mov      r9d, r8d                            

G_M15796_IG10:
       443BCF               cmp      r9d, edi                   // array index in bounds [yes]
       732B                 jae      SHORT G_M15796_IG15
       4D63C1               movsxd   r8, r9d
       420FB73440           movzx    rsi, word  ptr [rax+2*r8]
       FFC1                 inc      ecx
       3BCF                 cmp      ecx, edi
       7CD3                 jl       SHORT G_M15796_IG07

we could tweak GetOffset to reverse the then/else but we should really fix this in the jit.

Once we do that we'd have something like:

;;; with initial redundant branch optimized away

G_M15796_IG07:
       85C9                 test     ecx, ecx
       7C2D                 jl       SHORT G_M15796_IG12

G_M15796_IG08:
       448BC1               mov      r8d, ecx
       443BC7               cmp      r8d, edi
       732B                 jae      SHORT G_M15796_IG13
       4D63C0               movsxd   r8, r8d
       420FB73440           movzx    rsi, word  ptr [rax+2*r8]
       FFC1                 inc      ecx
       3BCF                 cmp      ecx, edi
       7CE6                 jl       SHORT G_M15796_IG07

Haven't investigated but would hope VN could now realize we're referring to the loop index var and so assertion prop can clean up both the negative check and the bounds check.

@tarekgh
Copy link
Member

tarekgh commented Feb 9, 2019

@tarekgh

@AndyAyersMS
Copy link
Member Author

You can get assertion prop to look for reversed relop assertions, but it's clunky. Does not seem to hit all that often, though it helps clean up second the example above.

Found 15 files with textual diffs.
PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -543 (-0.00% of base)
    diff is an improvement.
Top file improvements by size (bytes):
        -176 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.01% of base)
        -150 : System.Private.CoreLib.dasm (-0.00% of base)
         -55 : System.Net.Security.dasm (-0.03% of base)
         -45 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.00% of base)
         -40 : Microsoft.CodeAnalysis.dasm (-0.00% of base)
15 total files with size differences (15 improved, 0 regressed), 114 unchanged.
Top method improvements by size (bytes):
         -79 (-25.99% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - CodeAddressInfo:TryLookupMethodOrModule(ref):int:this
         -33 (-5.55% of base) : System.Private.CoreLib.dasm - String:ReplaceCore(ref,ref,ref,int):ref:this
         -32 (-2.58% of base) : System.Private.CoreLib.dasm - Array:FindIndex(ref,int,int,ref):int (5 methods)
         -23 (-4.51% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MemberLookup:LookupDefaultPropertyInBaseInterface(ref,ref,ref,byref)
         -17 (-10.90% of base) : System.Net.Security.dasm - SniHelper:GetSniFromExtension(struct,byref,byref):ref
Top method improvements by size (percentage):
         -79 (-25.99% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - CodeAddressInfo:TryLookupMethodOrModule(ref):int:this
         -16 (-12.21% of base) : System.Net.Security.dasm - SniHelper:SkipOpaqueType2(struct,byref):struct
         -17 (-10.90% of base) : System.Net.Security.dasm - SniHelper:GetSniFromExtension(struct,byref,byref):ref
         -14 (-7.49% of base) : System.Net.Security.dasm - SniHelper:GetSniFromServerNameList(struct,byref,byref):ref
         -33 (-5.55% of base) : System.Private.CoreLib.dasm - String:ReplaceCore(ref,ref,ref,int):ref:this
55 total methods with size differences (55 improved, 0 regressed), 193297 unchanged.

Would be less clunky if there was a VN operation to find the VN of the reverse relop (rather than actually forming the reverse tree and value numbering that), and if assertion lookup was driven just by VN and not by tree. Former seems doable, haven't looked at the latter.

Constant prop seems prone to leave off VNs on propagated constants so one has to bail out sometimes, perhaps if this was fixed it would find a few more cases?

Haven't yet looked beyond this to see why we can't subsequently show the index is always in bounds. Would guess though that when we remove the branch we don't go back and update VNs so we don't benefit as much from branch pruning as we could... such updates are probably intractable now anyways since SSA defs and uses aren't linked. I suppose if assertion prop manages to remove a non-bounds check branch we could consider just rerunning things.

@mikedn
Copy link
Contributor

mikedn commented Feb 24, 2019

Would guess though that when we remove the branch we don't go back and update VNs so we don't benefit as much from branch pruning as we could...

Hmm, that sounds like the issue that prompted me to attempt SCCP in VN. I need to retry that with this example. In fact, I need to try that on its own because when I tried it last time it was mixed up with a bunch of other experiments.

@mikedn
Copy link
Contributor

mikedn commented Feb 25, 2019

Random idea of the idea: what if we somehow teach VN to recognize induction variables and assign them some special VNFunc, something like VNF_Linear(vnInit, vnStep, vnLimit)? Easier said than done I'd guess, finding init and step should be doable, at least in trivial cases. The limit, well, not so much.

@AndyAyersMS
Copy link
Member Author

Don't think we can promise any improvements here for 3.0, so will move to future.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants