Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jit] Skip test instruction after add? #32389

Closed
benaadams opened this issue Feb 16, 2020 · 12 comments
Closed

[Jit] Skip test instruction after add? #32389

benaadams opened this issue Feb 16, 2020 · 12 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@benaadams
Copy link
Member

benaadams commented Feb 16, 2020

The following C#

long lengthToExamine = length - Vector256<byte>.Count;
if (lengthToExamine != 0)
{

Produces asm:

add      r8, -32
test     r8, r8
je       SHORT G_M64667_IG11

However add sets flags so could this be?

add      r8, -32
jz       SHORT G_M64667_IG11

category:cq
theme:optimization
skill-level:expert
cost:medium

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Feb 16, 2020
@GrabYourPitchforks
Copy link
Member

Related: #10742, #6794

@jkotas jkotas added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization labels Feb 16, 2020
@AndyAyersMS
Copy link
Member

@benaadams do you have a repro? I see this getting optimized in

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int Foo(long x)
    {
   	long lengthToExamine = x - Vector256<byte>.Count;
	if (lengthToExamine != 0)
	{    
            return 1;
	}
	else
        {
	    return 0;
        }
    }

as

       4883C1E0             add      rcx, -32
       7406                 je       SHORT G_M34204_IG05

@benaadams
Copy link
Member Author

gist outputs for SequenceEqualThreshold:SequenceEqualPR(byref,byref,long):bool

G_M14281_IG09:
       cmp      r8, 32
       jl       SHORT G_M14281_IG12
       xor      r9, r9
       add      r8, -32
       test     r8, r8
       je       SHORT G_M14281_IG11

and

G_M14281_IG12:
       cmp      r8, 16
       jl       SHORT G_M14281_IG15
       xor      r9, r9
       add      r8, -16
       test     r8, r8
       je       SHORT G_M14281_IG14

@AndyAyersMS
Copy link
Member

Ah, in your case we have split trees, so OptimizeConstCompare can't see the add. Roslyn is the one that inspires this since there are multiple uses of lengthToExamine ...

***** BB14
STMT00038 (IL 0x0A2...0x0AA)
N005 (  3,  3) [000154] -A------R---              *  ASG       long   $202
N004 (  1,  1) [000153] D------N----              +--*  LCL_VAR   long   V06 loc3         d:2 $202
N003 (  3,  3) [000152] ------------              \--*  ADD       long   $202
N001 (  1,  1) [000149] ------------                 +--*  LCL_VAR   long   V02 arg2         u:1 (last use) $c0
N002 (  1,  1) [000151] ------------                 \--*  CNS_INT   long   -32 $106

***** BB14
STMT00039 (IL 0x0AB...0x0AC)
N004 (  5,  5) [000158] ------------              *  JTRUE     void  
N003 (  3,  3) [000157] J------N----              \--*  EQ        int    $186
N001 (  1,  1) [000155] ------------                 +--*  LCL_VAR   long   V06 loc3         u:2 $202
N002 (  1,  1) [000156] ------------                 \--*  CNS_INT   long   0 $100

If I add a second use in my simple example we produce

       488D41E0             lea      rax, [rcx-32]
       4885C0               test     rax, rax
       7401                 je       SHORT G_M42597_IG04

which would be even harder to clean up later on.

@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Feb 24, 2020
@AndyAyersMS
Copy link
Member

Seems like another case for forward substitution -- seems like the tracking issue for this is #4655.

@AndyAyersMS
Copy link
Member

Have looked into implementing forward sub and am not yet happy with any of the approaches. So am going to mark this as future.

@benaadams
Copy link
Member Author

@AndyAyersMS is the instruction stream still mutable post emit?

e.g. lea is emitted; next is test could the test emit then undo the lea and change it to an add? (Since when the lea is being emitted it doesn't know next will be a test; but the test can look back)

@AndyAyersMS
Copy link
Member

Perhaps? We are streaming other things along with emission, in particular gc liveness tracking, so "undoing" a previously emitted instruction may not be so easy.

@AndyAyersMS
Copy link
Member

cc @kunalspathak as this just came up after a presentation Kunal gave that included the new arm64 peepholes...

@kunalspathak
Copy link
Member

@benaadams - We don't have the infrastructure yet to undo the already emitted instruction. @BruceForstall and I discussed some ideas on possibility to do that. But currently, only thing we can do is to avoid emitting current instruction by looking at previous instruction. E.g. #38586 and #38179

@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020
@AndyAyersMS
Copy link
Member

@kunalspathak perhaps worth looking this one over again, given #53214?

@kunalspathak
Copy link
Member

Thank you @AndyAyersMS for reminding me. I verified and we do eliminate the test after the add in #53214.
Below is the assembly for bool SequenceEqualPR(Span<byte> input, Span<byte> expected).

G_M14186_IG08:              ;; offset=008DH
 00007ffa`02de8aed        483BCA               cmp      rcx, rdx
 00007ffa`02de8af0        0F84F9000000         je       G_M14186_IG17
 00007ffa`02de8af6        4983F820             cmp      r8, 32
 00007ffa`02de8afa        7C60                 jl       SHORT G_M14186_IG11
 00007ffa`02de8afc        33C0                 xor      eax, eax
 00007ffa`02de8afe        4983C0E0             add      r8, -32
 00007ffa`02de8b02        7434                 je       SHORT G_M14186_IG10
 00007ffa`02de8b04                             align    [0 bytes]
G_M14186_IG11:              ;; offset=00F6H
 00007ffa`02de8b56        4983F810             cmp      r8, 16
 00007ffa`02de8b5a        7C5F                 jl       SHORT G_M14186_IG14
 00007ffa`02de8b5c        33C0                 xor      eax, eax
 00007ffa`02de8b5e        4983C0F0             add      r8, -16
 00007ffa`02de8b62        7433                 je       SHORT G_M14186_IG13
 00007ffa`02de8b64                             align    [0 bytes]

@ghost ghost locked as resolved and limited conversation to collaborators Jul 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

7 participants