New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to make "force optimization mode" for RyuJIT? #978

Closed
AndreyAkinshin opened this Issue May 11, 2015 · 8 comments

Comments

Projects
None yet
8 participants
@AndreyAkinshin
Member

AndreyAkinshin commented May 11, 2015

Let's consider the following method:

public double Run()
{
    return Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
           Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
           Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13);
}

RyuJIT generates the following asm:

       F20F510598000000     sqrtsd   xmm0, qword ptr [@RWD00]
       F20F510D98000000     sqrtsd   xmm1, qword ptr [@RWD08]
       F20F58C1             addsd    xmm0, xmm1
       F20F510D94000000     sqrtsd   xmm1, qword ptr [@RWD16]
       F20F58C1             addsd    xmm0, xmm1
       F20F510D90000000     sqrtsd   xmm1, qword ptr [@RWD24]
       F20F58C1             addsd    xmm0, xmm1
       F20F510D8C000000     sqrtsd   xmm1, qword ptr [@RWD32]
       F20F58C1             addsd    xmm0, xmm1
       F20F510D88000000     sqrtsd   xmm1, qword ptr [@RWD40]
       F20F58C1             addsd    xmm0, xmm1
       F20F510D84000000     sqrtsd   xmm1, qword ptr [@RWD48]
       F20F58C1             addsd    xmm0, xmm1
       F20F510D80000000     sqrtsd   xmm1, qword ptr [@RWD56]
       F20F58C1             addsd    xmm0, xmm1
       F20F510D7C000000     sqrtsd   xmm1, qword ptr [@RWD64]
       F20F58C1             addsd    xmm0, xmm1
       F20F510D78000000     sqrtsd   xmm1, qword ptr [@RWD72]
       F20F58C1             addsd    xmm0, xmm1
       F20F510D74000000     sqrtsd   xmm1, qword ptr [@RWD80]
       F20F58C1             addsd    xmm0, xmm1
       F20F510D70000000     sqrtsd   xmm1, qword ptr [@RWD88]
       F20F58C1             addsd    xmm0, xmm1
       F20F510D6C000000     sqrtsd   xmm1, qword ptr [@RWD96]
       F20F58C1             addsd    xmm0, xmm1
       C3                   ret

But If I add one Math.Sqrt to the target method:

public double Run()
{
    return Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
           Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
           Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13) + Math.Sqrt(14);
}

Then RyuJIT applies constant folding:

N001 [000001]   dconst    1.0000000000000000 => $c0 {DblCns[1.000000]}
N002 [000002]   mathFN    => $c0 {DblCns[1.000000]}
N003 [000003]   dconst    2.0000000000000000 => $c1 {DblCns[2.000000]}
N004 [000004]   mathFN    => $c2 {DblCns[1.414214]}
N005 [000005]   +         => $c3 {DblCns[2.414214]}
N006 [000006]   dconst    3.0000000000000000 => $c4 {DblCns[3.000000]}
N007 [000007]   mathFN    => $c5 {DblCns[1.732051]}
N008 [000008]   +         => $c6 {DblCns[4.146264]}
N009 [000009]   dconst    4.0000000000000000 => $c7 {DblCns[4.000000]}
N010 [000010]   mathFN    => $c1 {DblCns[2.000000]}
N011 [000011]   +         => $c8 {DblCns[6.146264]}
N012 [000012]   dconst    5.0000000000000000 => $c9 {DblCns[5.000000]}
N013 [000013]   mathFN    => $ca {DblCns[2.236068]}
N014 [000014]   +         => $cb {DblCns[8.382332]}
N015 [000015]   dconst    6.0000000000000000 => $cc {DblCns[6.000000]}
N016 [000016]   mathFN    => $cd {DblCns[2.449490]}
N017 [000017]   +         => $ce {DblCns[10.831822]}
N018 [000018]   dconst    7.0000000000000000 => $cf {DblCns[7.000000]}
N019 [000019]   mathFN    => $d0 {DblCns[2.645751]}
N020 [000020]   +         => $d1 {DblCns[13.477573]}
N021 [000021]   dconst    8.0000000000000000 => $d2 {DblCns[8.000000]}
N022 [000022]   mathFN    => $d3 {DblCns[2.828427]}
N023 [000023]   +         => $d4 {DblCns[16.306001]}
N024 [000024]   dconst    9.0000000000000000 => $d5 {DblCns[9.000000]}
N025 [000025]   mathFN    => $c4 {DblCns[3.000000]}
N026 [000026]   +         => $d6 {DblCns[19.306001]}
N027 [000027]   dconst    10.000000000000000 => $d7 {DblCns[10.000000]}
N028 [000028]   mathFN    => $d8 {DblCns[3.162278]}
N029 [000029]   +         => $d9 {DblCns[22.468278]}
N030 [000030]   dconst    11.000000000000000 => $da {DblCns[11.000000]}
N031 [000031]   mathFN    => $db {DblCns[3.316625]}
N032 [000032]   +         => $dc {DblCns[25.784903]}
N033 [000033]   dconst    12.000000000000000 => $dd {DblCns[12.000000]}
N034 [000034]   mathFN    => $de {DblCns[3.464102]}
N035 [000035]   +         => $df {DblCns[29.249005]}
N036 [000036]   dconst    13.000000000000000 => $e0 {DblCns[13.000000]}
N037 [000037]   mathFN    => $e1 {DblCns[3.605551]}
N038 [000038]   +         => $e2 {DblCns[32.854556]}
N039 [000041]   lclVar    V01 tmp0         d:2 => $e2 {DblCns[32.854556]}
N040 [000042]   =         => $e2 {DblCns[32.854556]}

And generates an optimized version of asm:

       F20F100508000000     movsd    xmm0, qword ptr [@RWD00]
       C3                   ret

Is it possible to force RyuJIT apply optimization for simple methods with short expressions?

@briansull

This comment has been minimized.

Show comment
Hide comment
@briansull

briansull May 12, 2015

Contributor

You will always see the optimization that you want if you assign the result to a local:
(see Run2B, Run13B and Run14B)

The JIT's value numbering does compute that the expression is computing a constant value, but the JIT currently doesn't replace such expressions with the computed constant.

When you increase the complexity of the expression to include 14 Math.Sqrt calls the JIT decides to introduce a compiler temp local variable to prevent the expression stack from growing any deeper. This triggers an optimization that we have for assignments of constant into a local variable. When that optimization fires the JIT replaces the large expression on the right-hand-side of the assignment with a simple double constant.

Here is the repro that I used:
All of these return the computed constant as in your optimized version,
(except for Run2A and Run 13A, which don't optimize)

using System;
using System.Runtime.CompilerServices;

class Program {
public static double Run2A()
{
return Math.Sqrt(1) + Math.Sqrt(2);
}

public static double Run13A()
{
    return Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
           Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
           Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13);
}

public static double Run14A()
{
    return Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
           Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
           Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13) + Math.Sqrt(14);
}

public static double Run2B()
{
    double res = Math.Sqrt(1) + Math.Sqrt(2);
    return res;
}

public static double Run13B()
{
    double res = Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
                 Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
                 Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13);
    return res;
}

public static double Run14B()
{
    double res = Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
                 Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
                 Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13) + Math.Sqrt(14);
    return res;
}

static void Main()
{
    Run2A();
    Run13A();
    Run14A();
    Run2B();
    Run13B();
    Run14B();
}

}

Contributor

briansull commented May 12, 2015

You will always see the optimization that you want if you assign the result to a local:
(see Run2B, Run13B and Run14B)

The JIT's value numbering does compute that the expression is computing a constant value, but the JIT currently doesn't replace such expressions with the computed constant.

When you increase the complexity of the expression to include 14 Math.Sqrt calls the JIT decides to introduce a compiler temp local variable to prevent the expression stack from growing any deeper. This triggers an optimization that we have for assignments of constant into a local variable. When that optimization fires the JIT replaces the large expression on the right-hand-side of the assignment with a simple double constant.

Here is the repro that I used:
All of these return the computed constant as in your optimized version,
(except for Run2A and Run 13A, which don't optimize)

using System;
using System.Runtime.CompilerServices;

class Program {
public static double Run2A()
{
return Math.Sqrt(1) + Math.Sqrt(2);
}

public static double Run13A()
{
    return Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
           Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
           Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13);
}

public static double Run14A()
{
    return Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
           Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
           Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13) + Math.Sqrt(14);
}

public static double Run2B()
{
    double res = Math.Sqrt(1) + Math.Sqrt(2);
    return res;
}

public static double Run13B()
{
    double res = Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
                 Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
                 Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13);
    return res;
}

public static double Run14B()
{
    double res = Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + Math.Sqrt(4) + Math.Sqrt(5) + 
                 Math.Sqrt(6) + Math.Sqrt(7) + Math.Sqrt(8) + Math.Sqrt(9) + Math.Sqrt(10) + 
                 Math.Sqrt(11) + Math.Sqrt(12) + Math.Sqrt(13) + Math.Sqrt(14);
    return res;
}

static void Main()
{
    Run2A();
    Run13A();
    Run14A();
    Run2B();
    Run13B();
    Run14B();
}

}

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall May 12, 2015

Contributor

@briansull you write: "the JIT currently doesn't replace such expressions with the computed constant." Is there (and should there be) an issue opened to track adding this optimization?

Contributor

BruceForstall commented May 12, 2015

@briansull you write: "the JIT currently doesn't replace such expressions with the computed constant." Is there (and should there be) an issue opened to track adding this optimization?

@cmckinsey

This comment has been minimized.

Show comment
Hide comment
@cmckinsey

cmckinsey May 12, 2015

Contributor

+1 Ideally we'd be less code shape sensitive. We also might want to add a forward-substitution pass make tree edges out of single-def single-use locals thereby giving more scope of optimization to the morpher. Thanks for the report Andrey.

Contributor

cmckinsey commented May 12, 2015

+1 Ideally we'd be less code shape sensitive. We also might want to add a forward-substitution pass make tree edges out of single-def single-use locals thereby giving more scope of optimization to the morpher. Thanks for the report Andrey.

@AndreyAkinshin

This comment has been minimized.

Show comment
Hide comment
@AndreyAkinshin

AndreyAkinshin May 12, 2015

Member

As a developer, I expect that methods

int Foo1()
{
    return /* expression */;
}
int Foo2()
{
    var value = /* expression */;
    return value;
}

should generate the same asm code for optimized source. What do you think?

Member

AndreyAkinshin commented May 12, 2015

As a developer, I expect that methods

int Foo1()
{
    return /* expression */;
}
int Foo2()
{
    var value = /* expression */;
    return value;
}

should generate the same asm code for optimized source. What do you think?

@masonwheeler

This comment has been minimized.

Show comment
Hide comment
@masonwheeler

masonwheeler May 12, 2015

@AndreyAkinshin Yes, they definitely should! Not for debug code, but in an optimized release build, yes.

masonwheeler commented May 12, 2015

@AndreyAkinshin Yes, they definitely should! Not for debug code, but in an optimized release build, yes.

@briansull

This comment has been minimized.

Show comment
Hide comment
@briansull

briansull May 12, 2015

Contributor

Yes, we should open a tracking issue to correct this.

Contributor

briansull commented May 12, 2015

Yes, we should open a tracking issue to correct this.

@briansull

This comment has been minimized.

Show comment
Hide comment
@briansull

briansull May 12, 2015

Contributor

I opened issue #987 to track this

Contributor

briansull commented May 12, 2015

I opened issue #987 to track this

@RussKeldorph RussKeldorph added this to the Future milestone Oct 29, 2015

@schellap

This comment has been minimized.

Show comment
Hide comment
@schellap

schellap Dec 14, 2015

This should be fixed now with 121d095

schellap commented Dec 14, 2015

This should be fixed now with 121d095

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment