Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT doesn't hoist invariant division by non power of 2 constant #4726

Closed
mikedn opened this issue Nov 29, 2015 · 4 comments · Fixed by dotnet/coreclr#8106
Closed

RyuJIT doesn't hoist invariant division by non power of 2 constant #4726

mikedn opened this issue Nov 29, 2015 · 4 comments · Fixed by dotnet/coreclr#8106
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@mikedn
Copy link
Contributor

mikedn commented Nov 29, 2015

static int foo(int x) {
    int sum = 0;
    for (int i = 0; i < 42; i++)
        sum += x / 3;
    return sum;
}

generates

G_M24519_IG03:
       BA56555555           mov      edx, 0x55555556
       8BC1                 mov      eax, ecx
       F7EA                 imul     edx:eax, edx
       8BC2                 mov      eax, edx
       8B542404             mov      edx, dword ptr [rsp+04H]
       03D0                 add      edx, eax
       C1E81F               shr      eax, 31
       03D0                 add      edx, eax
       41FFC0               inc      r8d
       4183F82A             cmp      r8d, 42
       89542404             mov      dword ptr [rsp+04H], edx
       7CDD                 jl       SHORT G_M24519_IG03

It also does only partial CSE:

return (x / 3) > 6 ? 6 : (x / 3);

generates

G_M24519_IG02:
       B856555555           mov      eax, 0x55555556
       F7E9                 imul     edx:eax, ecx
       8BC2                 mov      eax, edx
       448BC0               mov      r8d, eax
       41C1E81F             shr      r8d, 31
       4103C0               add      eax, r8d
       83F806               cmp      eax, 6
       7F0D                 jg       SHORT G_M24519_IG04
       B856555555           mov      eax, 0x55555556
       F7E9                 imul     edx:eax, ecx
       8BC2                 mov      eax, edx
; somehow the shift disappeared but not the multiplication
       4103C0               add      eax, r8d
G_M24519_IG03:
       C3                   ret
G_M24519_IG04:
       B806000000           mov      eax, 6
G_M24519_IG05:
       C3                   ret

CSE works fine in JIT32 which doesn't do reciprocal multiplication. Both CSE and hoisting work fine in RyuJIT for division by power of 2.

Shouldn't reciprocal multiplication have been done in lowering instead of morphing? The generated IR is larger and more likely to hinder further optimization rather than exposing new optimization opportunities.

@mikedn
Copy link
Contributor Author

mikedn commented Nov 30, 2015

Looks like this doesn't work due to a GT_ASG that's present in the division tree. That's how the shift got CSEd but the rest of tree wasn't.

@mikedn
Copy link
Contributor Author

mikedn commented Nov 12, 2016

Also, because this optimization is done only during global morph cases that rely on constant propagation to have been performed are not handled:

    static int Test(int x)
    {
        int y = 3;
        return x / y;
    }

generates:

G_M55886_IG02:
       BE03000000   mov      esi, 3
       8BC1         mov      eax, ecx
       99           cdq
       F7FE         idiv     edx:eax, esi

PS:
Funny, code like int y = 3; return x / y; actually occurs in the real world, if only by accident. Complex.GetHashCode and ManualResetEventSlim.Wait hit this limitation. Those constants should have been const.

@mikedn
Copy link
Contributor Author

mikedn commented Nov 13, 2016

GT_MULHI register requirements are incorrectly set and we get an unnecessary mov:

       B856555555           mov      eax, 0x55555556
       F7E9                 imul     edx:eax, ecx
       8BC2                 mov      eax, edx
       8BD0                 mov      edx, eax ; this is not needed
       C1EA1F               shr      edx, 31
       03C2                 add      eax, edx

This happens because SetMulOpCounts sets the dest candidate to rax instead of rdx. This basically guarantees that you'll get a useless mov every time GT_MULHI is used. And if you try changing the candidate to rdx you discover that genCodeForMulHi is broken, it tries to force the implicit source operand into targetReg instead of REG_RAX.

@mikedn
Copy link
Contributor Author

mikedn commented Nov 13, 2016

When 2 shifts are needed the currently generated code is less than ideal. The second shift depends on the first shift:

       B893244992           mov      eax, 0xFFFFFFFF92492493
       F7E9                 imul     edx:eax, ecx
       03D1                 add      edx, ecx
       8BC2                 mov      eax, edx
       C1F802               sar      eax, 2
       8BD0                 mov      edx, eax
       C1EA1F               shr      edx, 31
       03C2                 add      eax, edx

The second shift is simply extracting the sign bit and the previous shift doesn't change the sign bit. So, we could generate this instead:

       B893244992           mov      eax, 0xFFFFFFFF92492493
       F7E9                 imul     edx:eax, ecx
       03D1                 add      edx, ecx
       8BC2                 mov      eax, edx
       C1F802               sar      eax, 2
       C1EA1F               shr      edx, 31
       03C2                 add      eax, edx

Saves a mov and it may be faster as those 2 shifts could be dispatched in parallel.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@msftgits msftgits added this to the Future milestone Jan 30, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants