Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Math.Pow(x, c) where c is 2, 1, -1 or 0 #31978

Open
wants to merge 8 commits into
base: master
from

Conversation

@EgorBo
Copy link
Contributor

EgorBo commented Feb 8, 2020

Resurrects dotnet/coreclr#26552
Optimizes:

Math.Pow(x,  2) --> x*x
Math.Pow(x,  1) --> x
Math.Pow(x, -1) --> 1/x

(same for MathF and float)

This time it's done in the importer.cpp and handles all kinds of the first argument (introduces a temp variable if needed, e.g. for GT_CALL).

Example:

static double Pow2(double x)  => Math.Pow(x, 2);
static double Pow1(double x)  => Math.Pow(x, 1);
static double PowN1(double x) => Math.Pow(x, -1);

Current codegen:

; Method Tests:Pow2(double):double
       vzeroupper
       vmovsd   xmm1, qword ptr [reloc @RWD00]
       jmp      System.Math:Pow(double,double):double


; Method Tests:Pow1(double):double
       vzeroupper
       vmovsd   xmm1, qword ptr [reloc @RWD00]
       jmp      System.Math:Pow(double,double):double


; Method Tests:PowN1(double):double
       vzeroupper
       vmovsd   xmm1, qword ptr [reloc @RWD00]
       jmp      System.Math:Pow(double,double):double

New codegen:

; Method Tests:Pow2(double):double
       vzeroupper
       vmulsd   xmm0, xmm0, xmm0
       ret


; Method Tests:Pow1(double):double
       vzeroupper
       ret  ; just return xmm0


; Method Tests:PowN1(double):double
       vzeroupper
       vmovsd   xmm1, qword ptr [reloc @RWD00]
       vdivsd   xmm0, xmm1, xmm0
       ret

It seems this pattern can be found in gamedev, e.g.. Xenko (a game engine): https://github.com/xenko3d/xenko/search?q=Math.Pow&unscoped_q=Math.Pow
Also the dotnet/performance benchmarks use it: https://github.com/dotnet/performance/blob/8aed638c9ee65c034fe0cca4ea2bdc3a68d2a6b5/src/benchmarks/micro/runtime/Burgers/Burgers.cs
Jitdiff for bcl:

Total bytes of diff: -13 (-0.00% of base)
    diff is an improvement.

Top file improvements by size (bytes):
         -13 : System.Private.CoreLib.dasm (-0.00% of base)

1 total files with size differences (1 improved, 0 regressed), 107 unchanged.

Top method improvements by size (bytes):
          -9 (-2.05% of base) : System.Private.CoreLib.dasm - System.Globalization.CalendricalCalculationsHelper:DefaultEphemerisCorrection(int):double
          -4 (-0.73% of base) : System.Private.CoreLib.dasm - System.Globalization.CalendricalCalculationsHelper:EquationOfTime(double):double

Top method improvements by size (percentage):
          -9 (-2.05% of base) : System.Private.CoreLib.dasm - System.Globalization.CalendricalCalculationsHelper:DefaultEphemerisCorrection(int):double
          -4 (-0.73% of base) : System.Private.CoreLib.dasm - System.Globalization.CalendricalCalculationsHelper:EquationOfTime(double):double

2 total methods with size differences (2 improved, 0 regressed), 261780 unchanged.

The optimization can be extended to handle more cases once some sort of fast-math mode appears in .NET Core.

@benaadams

This comment has been minimized.

Copy link
Member

benaadams commented Feb 8, 2020

If you can do it at import with consts, would it be worth going higher? e.g. to 5 in you linked example 3-4 crops up
image

For 5 I was going to suggest smootherstep

image

However, you'd probably write it like

x * x * x * (x * (x * 6 - 15) + 10)
@EgorBo

This comment has been minimized.

Copy link
Contributor Author

EgorBo commented Feb 8, 2020

@benaadams If I understand you correctly I can't optimize other constants in "safe math" mode, e.g.
Math.Pow(x, 4) can be optimized to

        vmulsd  xmm0, xmm0, xmm0
        vmulsd  xmm0, xmm0, xmm0

(a single xmm0 register!)

but it might return a slightly different value (and violate the ieee754 spec)
see https://godbolt.org/z/R78Ev-

EgorBo added 3 commits Feb 8, 2020
@stephentoub

This comment has been minimized.

Copy link
Member

stephentoub commented Feb 8, 2020

@EgorBo

This comment has been minimized.

Copy link
Contributor Author

EgorBo commented Feb 9, 2020

CI failures are unrelated (#31985)

@@ -4226,6 +4226,39 @@ GenTree* Compiler::impMathIntrinsic(CORINFO_METHOD_HANDLE method,

op1 = nullptr;

if (intrinsicID == CORINFO_INTRINSIC_Pow && impStackTop().val->IsCnsFltOrDbl())

This comment has been minimized.

Copy link
@tannergooding

tannergooding Feb 11, 2020

Member

Are there cases we are going to miss by handling this in the importer, rather than later in morph or lowering?

This comment has been minimized.

Copy link
@EgorBo

EgorBo Feb 11, 2020

Author Contributor

@tannergooding not sure honestly, but "inlining" cases are handled, e.g.

float Foo(float x, float y) => MathF.Pow(x, y);

float Test(float x) => Foo(x, 2.0f);

Test will be optimized.

This comment has been minimized.

Copy link
@EgorBo

EgorBo Feb 11, 2020

Author Contributor

However, not sure about this one:

float Foo(float x, float y) => MathF.Pow(x, y + 1.0f);
float Test(float x) => Foo(x, 1.0f);

(constant folding after inlining)

// Math.Pow(x, 2) -> x*x
impPopStack();
GenTree* arg0 = impPopStack().val;
if (arg0->OperIs(GT_LCL_VAR, GT_CNS_DBL))

This comment has been minimized.

Copy link
@EgorBo

EgorBo Feb 14, 2020

Author Contributor

hm.. looks like I actually need fgMakeMultiUse here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.