Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Move magic division optimization from morph to lowering #8106

Merged
merged 5 commits into from
Nov 21, 2016

Conversation

mikedn
Copy link

@mikedn mikedn commented Nov 13, 2016

Fixes #2183

This fixes all issues mentioned in 2183 by moving magic division to lowering and fixing GT_MULHI register requirements.

As mentioned in the issue doing magic division during global morph has a few problems - increased IR complexity, missed or partial CSE and LICM, missed magic division in cases that depend on constant propagation and folding.

It did, however, had one positive effect. In code like a = x / 3; b = x % 3; most of the common division code was CSEed. To preserve this useful side-effect I expanded x % 3 to x - (x / 3) * 3 in morph (there was already code to do this for ARM). This allows CSE to remove a redundant x / 3 (if any).

This expansion is currently done only if we know that lower will do magic division because magic division does the expansion anyway. We could do this for mod by power of 2 but the lowering or something else will need to fold (shl(shr(x, 2), 2) into a single and like the current mod by power of 2 does. But that's stuff for another PR.

@mmitche
Copy link
Member

mmitche commented Nov 14, 2016

@dotnet-bot test Windows_NT x86 legacy_backend Checked Build and Test

Currently this optimization is done during global morph. Global morph
takes place very early so cases that rely on constant propagation and
folding (e.g "int d = 3; return n / d;") aren't handled.

Also, the IR generated during morph is complex and unlikely to enable
further optimizations. Quite the contrary, the presence of GT_ASG and
GT_MULHI nodes blocks other optimizations (CSE, LICM currently).

The generated code is identical to the code generated by the morph
implementation with one exception: when 2 shifts are needed
(e.g. for x / 7) they are now computed independently:

mov eax, edx
sar eax, 2
shr edx, 31
add eax, edx

instead of:
mov eax, edx
sar eax, 2
mov edx, eax
shr edx, 31
add eax, edx

This results in shorter code and avoids creating an additional temp.
This is no longer needed, lowering does this now.
Doing magic division in morph did have one positive effect. For code like
`x = a / 10; y = a % 10;` some (not all due to an assignment) of the common
code was CSEd.

To preserve this optimization we transform `a % b` to `a - (a / b) * b` if we
know that lowering will do magic division. If a redundant `a / b` exists
then CSE can pick it up.
We only care about the upper 32 bits of the 64 bit result and those are
in EDX. It doesn't make sense to set the destination candidate to EAX,
most of the time this will result in a useless `mov eax, edx` being
generated.

Also, genCodeForMulHi needs to check if the implicit operand is in EAX
rather than targetReg.
@mikedn
Copy link
Author

mikedn commented Nov 15, 2016

jit-diff summary:

Total bytes of diff: -722 (-0.01 % of base)
    diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file regressions by size (bytes):
          17 : System.Runtime.Numerics.dasm (0.03 % of base)
Top file improvements by size (bytes):
        -618 : System.Private.CoreLib.dasm (-0.02 % of base)
         -51 : System.Xml.ReaderWriter.dasm (-0.01 % of base)
         -19 : System.Reflection.Metadata.dasm (-0.01 % of base)
         -15 : System.Collections.Concurrent.dasm (-0.03 % of base)
          -9 : System.Runtime.InteropServices.dasm (-0.28 % of base)
11 total files with size differences.
Top method regessions by size (bytes):
          22 : System.Private.CoreLib.dasm - System.Threading.ManualResetEventSlim:Wait(int,struct):bool:this
          19 : System.Private.CoreLib.dasm - System.Globalization.Calendar:GetWeekOfYearOfMinSupportedDateTime(int,int):int:this
          19 : System.Runtime.Numerics.dasm - System.Numerics.Complex:GetHashCode():int:this
           9 : System.Private.CoreLib.dasm - System.Globalization.Calendar:GetWeekOfYearFullDays(struct,int,int):int:this
           4 : System.Private.CoreLib.dasm - System.Globalization.HijriCalendar:IsLeapYear(int,int):bool:this
Top method improvements by size (bytes):
         -96 : System.Private.CoreLib.dasm - System.Globalization.TimeSpanFormat:FormatCustomized(struct,ref,ref):ref
         -80 : System.Private.CoreLib.dasm - System.Globalization.TimeSpanFormat:FormatStandard(struct,bool,ref,int):ref
         -30 : System.Private.CoreLib.dasm - System.DateTimeFormat:FormatCustomized(struct,ref,ref,struct):ref
         -22 : System.Private.CoreLib.dasm - System.DateTimeFormat:AppendNumber(ref,long,int)
         -17 : System.Private.CoreLib.dasm - System.Globalization.JulianCalendar:AddMonths(struct,int):struct:this
121 total methods with size differences.

Elimination of useless mov instructions is responsible for most improvements. Improved elimination of redundant div code also helps here and there.

ManualResetEventSlim:Wait and Complex:GetHashCode regressed because they're now using magic division. GetWeekOfYearOfMinSupportedDateTime regressed due to not so fortunate register allocation (for example we end up using a larger lea instead of smaller add).

@mikedn mikedn changed the title [WIP] Move magic division optimization from morph to lowering Move magic division optimization from morph to lowering Nov 15, 2016
@jkotas
Copy link
Member

jkotas commented Nov 19, 2016

cc @dotnet/jit-contrib

@russellhadley
Copy link

LGTM.

@russellhadley russellhadley merged commit d92a1ce into dotnet:master Nov 21, 2016
@mikedn mikedn deleted the magic-div branch April 10, 2017 05:44
@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Move magic division optimization from morph to lowering

Commit migrated from dotnet/coreclr@d92a1ce
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RyuJIT doesn't hoist invariant division by non power of 2 constant
6 participants