RyuJIT doesn't hoist invariant division by non power of 2 constant #4726

mikedn · 2015-11-29T07:51:08Z

static int foo(int x) {
    int sum = 0;
    for (int i = 0; i < 42; i++)
        sum += x / 3;
    return sum;
}

generates

G_M24519_IG03:
       BA56555555           mov      edx, 0x55555556
       8BC1                 mov      eax, ecx
       F7EA                 imul     edx:eax, edx
       8BC2                 mov      eax, edx
       8B542404             mov      edx, dword ptr [rsp+04H]
       03D0                 add      edx, eax
       C1E81F               shr      eax, 31
       03D0                 add      edx, eax
       41FFC0               inc      r8d
       4183F82A             cmp      r8d, 42
       89542404             mov      dword ptr [rsp+04H], edx
       7CDD                 jl       SHORT G_M24519_IG03

It also does only partial CSE:

return (x / 3) > 6 ? 6 : (x / 3);

generates

G_M24519_IG02:
       B856555555           mov      eax, 0x55555556
       F7E9                 imul     edx:eax, ecx
       8BC2                 mov      eax, edx
       448BC0               mov      r8d, eax
       41C1E81F             shr      r8d, 31
       4103C0               add      eax, r8d
       83F806               cmp      eax, 6
       7F0D                 jg       SHORT G_M24519_IG04
       B856555555           mov      eax, 0x55555556
       F7E9                 imul     edx:eax, ecx
       8BC2                 mov      eax, edx
; somehow the shift disappeared but not the multiplication
       4103C0               add      eax, r8d
G_M24519_IG03:
       C3                   ret
G_M24519_IG04:
       B806000000           mov      eax, 6
G_M24519_IG05:
       C3                   ret

CSE works fine in JIT32 which doesn't do reciprocal multiplication. Both CSE and hoisting work fine in RyuJIT for division by power of 2.

Shouldn't reciprocal multiplication have been done in lowering instead of morphing? The generated IR is larger and more likely to hinder further optimization rather than exposing new optimization opportunities.

The text was updated successfully, but these errors were encountered:

mikedn · 2015-11-30T10:31:13Z

Looks like this doesn't work due to a GT_ASG that's present in the division tree. That's how the shift got CSEd but the rest of tree wasn't.

mikedn · 2016-11-12T22:45:41Z

Also, because this optimization is done only during global morph cases that rely on constant propagation to have been performed are not handled:

    static int Test(int x)
    {
        int y = 3;
        return x / y;
    }

generates:

G_M55886_IG02:
       BE03000000   mov      esi, 3
       8BC1         mov      eax, ecx
       99           cdq
       F7FE         idiv     edx:eax, esi

PS:
Funny, code like int y = 3; return x / y; actually occurs in the real world, if only by accident. Complex.GetHashCode and ManualResetEventSlim.Wait hit this limitation. Those constants should have been const.

mikedn · 2016-11-13T08:13:40Z

GT_MULHI register requirements are incorrectly set and we get an unnecessary mov:

       B856555555           mov      eax, 0x55555556
       F7E9                 imul     edx:eax, ecx
       8BC2                 mov      eax, edx
       8BD0                 mov      edx, eax ; this is not needed
       C1EA1F               shr      edx, 31
       03C2                 add      eax, edx

This happens because SetMulOpCounts sets the dest candidate to rax instead of rdx. This basically guarantees that you'll get a useless mov every time GT_MULHI is used. And if you try changing the candidate to rdx you discover that genCodeForMulHi is broken, it tries to force the implicit source operand into targetReg instead of REG_RAX.

mikedn · 2016-11-13T08:39:04Z

When 2 shifts are needed the currently generated code is less than ideal. The second shift depends on the first shift:

       B893244992           mov      eax, 0xFFFFFFFF92492493
       F7E9                 imul     edx:eax, ecx
       03D1                 add      edx, ecx
       8BC2                 mov      eax, edx
       C1F802               sar      eax, 2
       8BD0                 mov      edx, eax
       C1EA1F               shr      edx, 31
       03C2                 add      eax, edx

The second shift is simply extracting the sign bit and the previous shift doesn't change the sign bit. So, we could generate this instead:

       B893244992           mov      eax, 0xFFFFFFFF92492493
       F7E9                 imul     edx:eax, ecx
       03D1                 add      edx, ecx
       8BC2                 mov      eax, edx
       C1F802               sar      eax, 2
       C1EA1F               shr      edx, 31
       03C2                 add      eax, edx

Saves a mov and it may be faster as those 2 shifts could be dispatched in parallel.

russellhadley closed this as completed in dotnet/coreclr#8106 Nov 21, 2016

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

msftgits added this to the Future milestone Jan 30, 2020

ghost locked as resolved and limited conversation to collaborators Jan 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RyuJIT doesn't hoist invariant division by non power of 2 constant #4726

RyuJIT doesn't hoist invariant division by non power of 2 constant #4726

mikedn commented Nov 29, 2015

mikedn commented Nov 30, 2015

mikedn commented Nov 12, 2016 •

edited

mikedn commented Nov 13, 2016

mikedn commented Nov 13, 2016

RyuJIT doesn't hoist invariant division by non power of 2 constant #4726

RyuJIT doesn't hoist invariant division by non power of 2 constant #4726

Comments

mikedn commented Nov 29, 2015

mikedn commented Nov 30, 2015

mikedn commented Nov 12, 2016 • edited

mikedn commented Nov 13, 2016

mikedn commented Nov 13, 2016

mikedn commented Nov 12, 2016 •

edited