Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Improve performance for Math.Abs #15823

Merged
merged 4 commits into from Jan 11, 2018
Merged

Conversation

benaadams
Copy link
Member

@benaadams
Copy link
Member Author

Total bytes of diff: -251 (-0.01% of base)
    diff is an improvement.

Total byte diff includes -285 bytes from reconciling methods
        Base had    4 unique methods,      345 unique bytes
        Diff had    1 unique methods,       60 unique bytes

Top file improvements by size (bytes):
        -251 : System.Private.CoreLib.dasm (-0.01% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method regessions by size (bytes):
          60 : System.Private.CoreLib.dasm - Math:ThrowAbsOverflow() (0/1 methods)
          14 : System.Private.CoreLib.dasm - Math:Abs(short):short
          14 : System.Private.CoreLib.dasm - Math:Abs(byte):byte
           8 : System.Private.CoreLib.dasm - Math:Abs(long):long
           6 : System.Private.CoreLib.dasm - Math:Abs(int):int

Top method improvements by size (bytes):
         -91 : System.Private.CoreLib.dasm - Math:AbsHelper(long):long (1/0 methods)
         -87 : System.Private.CoreLib.dasm - Math:AbsHelper(short):short (1/0 methods)
         -85 : System.Private.CoreLib.dasm - Math:AbsHelper(byte):byte (1/0 methods)
         -82 : System.Private.CoreLib.dasm - Math:AbsHelper(int):int (1/0 methods)
          -8 : System.Private.CoreLib.dasm - Random:.ctor(int):this

10 total methods with size differences (5 improved, 5 regressed), 16760 unchanged.

@benaadams
Copy link
Member Author

Random improves because Abs doesn't inline anymore; will fix

@benaadams
Copy link
Member Author

Post

; Assembly listing for method Math:Abs(int):int
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  7,  5.50)     int  ->  rcx        
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+0x00]  
;
; Lcl frame size = 40
G_M42159_IG01:
       sub      rsp, 40
       nop      
G_M42159_IG02:
       test     ecx, ecx
       jge      SHORT G_M42159_IG03
       neg      ecx
       test     ecx, ecx
       jl       SHORT G_M42159_IG05
G_M42159_IG03:
       mov      eax, ecx
G_M42159_IG04:
       add      rsp, 40
       ret      
G_M42159_IG05:
       call     Math:ThrowAbsOverflow()
       int3     
; Total bytes of code 28, prolog size 5 for method Math:Abs(int):int

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

this.flags = flags;
return;
}
if (!((flags & ~(SignMask | ScaleMask)) == 0 && (flags & ScaleMask) <= (28 << 16)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was; but is also completely unnecessary as the function doesn't return anything - will revert

@benaadams
Copy link
Member Author

benaadams commented Jan 11, 2018

Total bytes of diff: -94 (0.00% of base)
    diff is an improvement.

Total byte diff includes -225 bytes from reconciling methods
        Base had    5 unique methods,      445 unique bytes
        Diff had    3 unique methods,      220 unique bytes

Top file improvements by size (bytes):
         -94 : System.Private.CoreLib.dasm (0.00% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method regessions by size (bytes):
         100 : System.Private.CoreLib.dasm - Decimal:<DecDivMod1E9>g__D32DivMod1E9|161_0(int,byref):int (0/1 methods)
          90 : System.Private.CoreLib.dasm - Decimal:Remainder(struct,struct):struct
          60 : System.Private.CoreLib.dasm - Decimal:ThrowInvalidDecimalBytes():this (0/1 methods)
          60 : System.Private.CoreLib.dasm - Math:ThrowAbsOverflow() (0/1 methods)
          31 : System.Private.CoreLib.dasm - Decimal:Ceiling(struct):struct
          31 : System.Private.CoreLib.dasm - Math:Ceiling(struct):struct
          14 : System.Private.CoreLib.dasm - Math:Abs(short):short
          14 : System.Private.CoreLib.dasm - Math:Abs(byte):byte
          14 : System.Private.CoreLib.dasm - Random:.ctor(int):this
          10 : System.Private.CoreLib.dasm - Decimal:op_UnaryNegation(struct):struct
          10 : System.Private.CoreLib.dasm - Math:Abs(struct):struct
           8 : System.Private.CoreLib.dasm - Math:Abs(long):long
           6 : System.Private.CoreLib.dasm - Decimal:ToDecimal(ref):struct
           6 : System.Private.CoreLib.dasm - Math:Abs(int):int
           5 : System.Private.CoreLib.dasm - Decimal:Abs(struct):struct
           5 : System.Private.CoreLib.dasm - Decimal:Negate(struct):struct

Top method improvements by size (bytes):
        -100 : System.Private.CoreLib.dasm - Decimal:<DecDivMod1E9>g__D32DivMod1E9|160_0(int,byref):int (1/0 methods)
         -91 : System.Private.CoreLib.dasm - Math:AbsHelper(long):long (1/0 methods)
         -87 : System.Private.CoreLib.dasm - Math:AbsHelper(short):short (1/0 methods)
         -85 : System.Private.CoreLib.dasm - Math:AbsHelper(byte):byte (1/0 methods)
         -82 : System.Private.CoreLib.dasm - Math:AbsHelper(int):int (1/0 methods)
         -62 : System.Private.CoreLib.dasm - Decimal:.ctor(int,int,int,int):this
         -51 : System.Private.CoreLib.dasm - Decimal:SetBits(ref):this

23 total methods with size differences (7 improved, 16 regressed), 16746 unchanged.

Decimal regressions are from the Jit choosing to inline Decimal.Abs, Decimal.ctor due to the reduced size from removing the throw; rather than Math.Abs(decimal) which is force inlined.

}
throw new ArgumentException(SR.Arg_DecBitCtor);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this exception go?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the decimal optimization into separate PR. They are pretty unrelated to the Abs change.

Copy link
Member Author

@benaadams benaadams Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to improve the Math.Abs(decimal)

; Assembly listing for method Math:Abs(struct):struct
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 RetBuf       [V00,T01] (  7,  7   )   byref  ->  rcx        
;  V01 arg0         [V01,T00] (  7, 12   )   byref  ->  rdx        
;* V02 loc0         [V02    ] (  0,  0   )  struct (16) zero-ref   
;* V03 tmp1         [V03    ] (  0,  0   )  struct (16) zero-ref   
;  V04 tmp2         [V04,T02] (  4,  7   )     int  ->  rax        
;  V05 tmp3         [V05,T11] (  3,  0   )     ref  ->  rsi         class-hnd exact
;* V06 tmp4         [V06    ] (  0,  0   )     int  ->  zero-ref   
;* V07 tmp5         [V07    ] (  0,  0   )     int  ->  zero-ref   
;* V08 tmp6         [V08    ] (  0,  0   )     int  ->  zero-ref   
;  V09 tmp7         [V09,T03] (  2,  2   )     int  ->  rax         V02.flags(offs=0x00) P-INDEP
;  V10 tmp8         [V10,T04] (  2,  2   )     int  ->   r8         V02.hi(offs=0x04) P-INDEP
;  V11 tmp9         [V11,T05] (  2,  2   )     int  ->   r9         V02.lo(offs=0x08) P-INDEP
;  V12 tmp10        [V12,T06] (  2,  2   )     int  ->  rdx         V02.mid(offs=0x0c) P-INDEP
;  V13 tmp11        [V13,T07] (  2,  2   )     int  ->  rax         V03.flags(offs=0x00) P-INDEP
;  V14 tmp12        [V14,T08] (  2,  2   )     int  ->   r8         V03.hi(offs=0x04) P-INDEP
;  V15 tmp13        [V15,T09] (  2,  2   )     int  ->   r9         V03.lo(offs=0x08) P-INDEP
;  V16 tmp14        [V16,T10] (  2,  2   )     int  ->  rdx         V03.mid(offs=0x0c) P-INDEP
;  V17 tmp15        [V17,T12] (  2,  0   )     ref  ->  rcx        
;  V18 tmp16        [V18,T13] (  2,  0   )     ref  ->  rdx        
;  V19 OutArgs      [V19    ] (  1,  1   )  lclBlk (32) [rsp+0x00]  
;
; Lcl frame size = 32
G_M42161_IG01:
       push     rsi
       sub      rsp, 32
G_M42161_IG02:
       mov      eax, dword ptr [rdx]
       mov      r8d, dword ptr [rdx+4]
       mov      r9d, dword ptr [rdx+8]
       mov      edx, dword ptr [rdx+12]
       and      eax, 0xD1FFAB1E
       test     eax, 0xD1FFAB1E
       jne      G_M42161_IG04
       mov      r10d, eax
       and      r10d, 0xD1FFAB1E
       cmp      r10d, 0xD1FFAB1E
       jg       G_M42161_IG04
       mov      dword ptr [rcx], eax
       mov      dword ptr [rcx+4], r8d
       mov      dword ptr [rcx+8], r9d
       mov      dword ptr [rcx+12], edx
       mov      rax, rcx
G_M42161_IG03:
       add      rsp, 32
       pop      rsi
       ret      
************** Beginning of cold code **************
G_M42161_IG04:
       lea      rcx, [(reloc)]
       call     CORINFO_HELP_NEWSFAST
       mov      rsi, rax
       mov      ecx, 0x2648
       call     CORINFO_HELP_STRCNS_CURRENT_MODULE
       mov      rcx, rax
       xor      rdx, rdx
       call     SR:GetResourceString(ref,ref):ref
       mov      rdx, rax
       mov      rcx, rsi
       call     ArgumentException:.ctor(ref):this
       mov      rcx, rsi
       call     CORINFO_HELP_THROW
       int3     
; Total bytes of code 134, prolog size 5 for method Math:Abs(struct):struct
; ============================================================

But seems to have made it worse by making the variables do-not-enreg[X] addr-exposed

; Assembly listing for method Math:Abs(struct):struct
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 RetBuf       [V00,T01] (  7,  7   )   byref  ->  rcx        
;  V01 arg0         [V01,T00] (  7, 12   )   byref  ->  rdx        
;* V02 loc0         [V02    ] (  0,  0   )  struct (16) zero-ref   
;  V03 tmp1         [V03    ] ( 13, 24   )  struct (16) [rsp+0x28]   do-not-enreg[XS] addr-exposed
;  V04 tmp2         [V04,T02] (  4,  7   )     int  ->  rax        
;* V05 tmp3         [V05    ] (  0,  0   )     int  ->  zero-ref   
;* V06 tmp4         [V06    ] (  0,  0   )     int  ->  zero-ref   
;* V07 tmp5         [V07    ] (  0,  0   )     int  ->  zero-ref   
;  V08 tmp6         [V08,T03] (  2,  2   )     int  ->  rax         V02.flags(offs=0x00) P-INDEP
;  V09 tmp7         [V09,T04] (  2,  2   )     int  ->   r8         V02.hi(offs=0x04) P-INDEP
;  V10 tmp8         [V10,T05] (  2,  2   )     int  ->   r9         V02.lo(offs=0x08) P-INDEP
;  V11 tmp9         [V11,T06] (  2,  2   )     int  ->  rdx         V02.mid(offs=0x0c) P-INDEP
;  V12 tmp10        [V12    ] (  4,  3   )     int  ->  [rsp+0x28]   do-not-enreg[X] addr-exposed V03.flags(offs=0x00) P-DEP
;  V13 tmp11        [V13    ] (  4,  3   )     int  ->  [rsp+0x2C]   do-not-enreg[X] addr-exposed V03.hi(offs=0x04) P-DEP
;  V14 tmp12        [V14    ] (  4,  3   )     int  ->  [rsp+0x30]   do-not-enreg[X] addr-exposed V03.lo(offs=0x08) P-DEP
;  V15 tmp13        [V15    ] (  4,  3   )     int  ->  [rsp+0x34]   do-not-enreg[X] addr-exposed V03.mid(offs=0x0c) P-DEP
;  V16 OutArgs      [V16    ] (  1,  1   )  lclBlk (32) [rsp+0x00]  
;
; Lcl frame size = 56
G_M42161_IG01:
       sub      rsp, 56
       nop      
G_M42161_IG02:
       mov      eax, dword ptr [rdx]
       mov      r8d, dword ptr [rdx+4]
       mov      r9d, dword ptr [rdx+8]
       mov      edx, dword ptr [rdx+12]
       xor      r10d, r10d
       mov      dword ptr [rsp+28H], r10d
       mov      dword ptr [rsp+2CH], r10d
       mov      dword ptr [rsp+30H], r10d
       mov      dword ptr [rsp+34H], r10d
       and      eax, 0xD1FFAB1E
       test     eax, 0xD1FFAB1E
       jne      G_M42161_IG04
       mov      r10d, eax
       and      r10d, 0xD1FFAB1E
       cmp      r10d, 0xD1FFAB1E
       jg       G_M42161_IG04
       mov      dword ptr [rsp+30H], r9d
       mov      dword ptr [rsp+34H], edx
       mov      dword ptr [rsp+2CH], r8d
       mov      dword ptr [rsp+28H], eax
       mov      eax, dword ptr [rsp+28H]
       mov      dword ptr [rcx], eax
       mov      eax, dword ptr [rsp+2CH]
       mov      dword ptr [rcx+4], eax
       mov      eax, dword ptr [rsp+30H]
       mov      dword ptr [rcx+8], eax
       mov      eax, dword ptr [rsp+34H]
       mov      dword ptr [rcx+12], eax
       mov      rax, rcx
G_M42161_IG03:
       add      rsp, 56
       ret      
************** Beginning of cold code **************
G_M42161_IG04:
       lea      rcx, bword ptr [rsp+28H]
       call     Decimal:ThrowInvalidDecimalBytes():this
       int3     
; Total bytes of code 144, prolog size 5 for method Math:Abs(struct):struct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure why, but will revert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the decimal improvement will grow once it is fine tuned with all the feedback. E.g.:

  • I think that Decimal Abs can assume that the in-comming decimal is valid. It is what all other Decimal ops do. I do not think it needs to throw at all.
  • There seemed to be a bug introduced in SetBits by your change - it was not throwing on invalid flags anymore.
  • SetBits can be folded into constructor that takes byte[]. It does not look like an improvement for it to be separate.

More smaller PRs move faster.

@mikedn
Copy link

mikedn commented Jan 11, 2018

Fortunately this version of Math.Abs(long) also generates reasonable code on 32 bit targets. For a loop like foreach (long x in longArray)) s += Math.Abs(x); we get

G_M55888_IG03:
       894DF0       mov      gword ptr [ebp-10H], ecx
       8D5CF108     lea      ebx, bword ptr [ecx+8*esi+8]
       8B0B         mov      ecx, dword ptr [ebx]
       8B5B04       mov      ebx, dword ptr [ebx+4]
; begin abs
       85DB         test     ebx, ebx
       7D0B         jge      SHORT G_M55888_IG04
       F7D9         neg      ecx
       83D300       adc      ebx, 0
       F7DB         neg      ebx
       85DB         test     ebx, ebx
       7C12         jl       SHORT G_M55888_IG06
G_M55888_IG04:
; end abs
       03C1         add      eax, ecx
       13D3         adc      edx, ebx
       46           inc      esi
       3BFE         cmp      edi, esi
       8B4DF0       mov      ecx, gword ptr [ebp-10H]
       7FD9         jg       SHORT G_M55888_IG03

Unfortunately it won't be easy to get rid of the useless test after the neg. The JIT does something like this in certain cases but it does not have any support for this particular case.

@mikedn
Copy link

mikedn commented Jan 11, 2018

Some perf numbers from @aobatact 's benchmark extended with the new abs version:

Method x Mean Error StdDev
If -100 0.6153 ns 0.0015 ns 0.0013 ns
MyAbs -100 0.5509 ns 0.0009 ns 0.0008 ns
MathAbs -100 1.7725 ns 0.0019 ns 0.0017 ns
NewMathAbs -100 0.2772 ns 0.0032 ns 0.0030 ns
If 100 0.2616 ns 0.0116 ns 0.0103 ns
MyAbs 100 0.2286 ns 0.0013 ns 0.0012 ns
MathAbs 100 0.2251 ns 0.0017 ns 0.0014 ns
NewMathAbs 100 0.0000 ns 0.0000 ns 0.0000 ns

Don't ask me how come NewMathAbs(100) takes 0ns, I do not know. Too bad it doesn't take -0.2ns, it would have been a suitable number for an abs benchmark 😁

@jkotas jkotas merged commit 288bd46 into dotnet:master Jan 11, 2018
@jkotas
Copy link
Member

jkotas commented Jan 11, 2018

@mikedn and @benaadams Thank you!

@damienleroy
Copy link

damienleroy commented Jan 11, 2018

Hi all,
Just a quick question about the merge.

        public static int Abs(int value)
        {
            if (value < 0)
            {
                value = -value;
                if (value < 0)
                {
                    ThrowAbsOverflow();
                }
            }
            return value;
        }

For me a negative value smaller than zero have to be positive.
Why do you check again if value is negative ?

@pdelvo
Copy link

pdelvo commented Jan 11, 2018

@LeroyD The problem is that numbers are represented in twos complement. You have the problem that there is one more negative number than positive. You can represent -2147483648 = 0b1000...000, but not 2147483648. Inverting this number gives itself (to invert in twos complement you invert all bits and add 1, negative numbers start with 1).

@damienleroy
Copy link

Ok. Understood.
It's why in the first way, we saw this condition to throw the exception :

if (value == int.MinValue)
{ ... }

Thanks for your return.

@mikedn
Copy link

mikedn commented Jan 11, 2018

It's worth noting that this specific implementation works in C# because the specification says:

If this occurs within a checked context, a System.OverflowException is thrown; if it occurs within an unchecked context, the result is the value of the operand and the overflow is not reported.

Do not use this implementation in other languages without first checking the specific language behavior. For example in C++ this code exhibits undefined behavior and some C++ compilers may very well discard the second if (value < 0) or do some other unexpected optimizations.

@adamsitnik
Copy link
Member

Don't ask me how come NewMathAbs(100) takes 0ns, I do not know

@mikedn if something takes 0ns with BenchmarkDotNet it means that given method was either empty or returned a const. In that case, I suspect that JIT was smart enough to recognize that the field does not change and it's calculating the same thing all the time and decided to simply return a const.
The DisassemblyDiagnoser can be used to verify the output asm code.

@AndreyAkinshin I think that we should print some kind of a warning in that case ;)

@mikedn
Copy link

mikedn commented Jan 11, 2018

In that case, I suspect that JIT was smart enough to recognize that the field does not change and it's calculating the same thing all the time and decided to simply return a const.

Nope, there is no way that the JIT can't do that. Fields are set outside the method so there's no way for the JIT to know what value is there. The generated code is:

G_M12548_IG01:
       4883EC28             sub      rsp, 40
G_M12548_IG02:
       8B4108               mov      eax, dword ptr [rcx+8]
       85C0                 test     eax, eax
       7D06                 jge      SHORT G_M12548_IG03
       F7D8                 neg      eax
       85C0                 test     eax, eax
       7C08                 jl       SHORT G_M12548_IG05
G_M12548_IG03:
       89410C               mov      dword ptr [rcx+12], eax
G_M12548_IG04:
       4883C428             add      rsp, 40
       C3                   ret
G_M12548_IG05:
       E812EEFFFF           call     AbsHelper:AbsThrowHelper()
       CC                   int3

Most likely the code that we attempt to measure is just too fast. Measuring some sort of loop that uses Math.Abs would have been preferable probably. Anyway, the essence is that the new version is better than the old version, the absolute numbers are kind of pointless at this scale (0.2ns is basically 1 CPU cycle, good luck measuring that).

@AndreyAkinshin
Copy link
Member

0.2ns is basically 1 CPU cycle, good luck measuring that

BenchmarkDotNet should handle such cases. =) Otherwise, it's a bug which should be fixed.
@mikedn, could you share the whole source code of the Benchmark and the environment info (a few lines before the summary table).

@mikedn
Copy link

mikedn commented Jan 11, 2018

could you share the whole source code of the Benchmark

You can reproduce this with something as simple as

[Benchmark]
public void Empty() => y = -x; // x and y are int fields

and the environment info

BenchmarkDotNet=v0.10.9, OS=Windows 10.0.16299
Processor=Intel Core i5-4440 CPU 3.10GHz (Haswell), ProcessorCount=4
Frequency=3026466 Hz, Resolution=330.4184 ns, Timer=TSC
.NET Core SDK=2.1.4
  [Host] : .NET Core ? (Framework 4.6.0.0), 64bit RyuJIT

BenchmarkDotNet should handle such cases. =) Otherwise, it's a bug which should be fixed.

Attempting to measure the immeasurable is hardly a bug, more like a case of trying to hard.

@benaadams benaadams deleted the math.abs-perf branch January 12, 2018 07:18
@benaadams
Copy link
Member Author

Might need to bump up the executions and use OperationsPerInvoke

A for loop would likely dominate, so maybe direct repetition?

[Benchmark(OperationsPerInvoke = 10)]
public static int Negate()
{
    y = -x;
    x = -y;
    y = -x;
    x = -y;
    y = -x;
    
    x = -y;
    y = -x;
    x = -y;
    y = -x;
    x = -y;
}

@AndreyAkinshin
Copy link
Member

Well, it'a tricky example.
I should tell more about how BenchmarkDotNet evaluates performance.
0 ns is a time of empty method like void Idle(), int Idle() => 0, etc.
If the performance impact of a method is indistinguishable from the performance impact of Idle (we are talking about many repetitions of course), we say that "this method also takes 0 ns.
Typically, we are able to measure a method which takes a few CPU cycle. We generate an unrolled loop like this (unrolling is very important for Haswell+; inlining is prevented):

for (int i = 0; i < N / 16; i++)
{
    Method();
    Method();
    Method();
    Method();
    Method();
    Method();
    Method();
    Method();
    Method();
    Method();
    Method();
    Method();
    Method();
    Method();
    Method();
    Method();
}

If this code have the equal time for Idle and our tested method, we say that duration of the tested method is 0 ns. I guess that we have 0 ns for void Empty() => y = -x because of smart branch prediction and out of order execution on modern processors. The bottleneck here is the loop, Empty() and Idle() have the same performance impact here.

Next, the most important fact in such method is not the body of the method, but how we call this method. The trick by @benaadams with OperationsPerInvoke is valid, but we should understand that we get different values for different implementations (it depends on the dependency instruction graph between fields inside the method).
I used the same trick in this post: Performance exercise: Division. If we want to get a reliable result we should build a linear graph which prevents instruction-level parallelism (@benaadams wrote a great implementation from this point of view). However, we can play with it and build other kinds of dependency graphs. In some cases, we even get different "winners" for such nanobenchmarks depends on the dependency graph.

@mikedn
Copy link

mikedn commented Jan 12, 2018

A for loop would likely dominate, so maybe direct repetition?

Actually I tried a for loop and the results were pretty good. Too lazy to run it again now to post the numbers but AFAIR the new implementation was something like 2x faster than the old one for negative numbers while being identical with the old one for positive numbers.

And a for loop scenario is likely better as it's more likely to showcase real world performance. Nobody cares about the performance of a lone Math.Abs hidden in a dark corner of a program. What people care is how it performs in hot code and that tends to directly or indirectly involve loops. And if the loop's own code dominates then, well, you're probably optimizing and measuring the wrong thing.

I should tell more about how BenchmarkDotNet evaluates performance.
0 ns is a time of empty method like void Idle(), int Idle() => 0, etc.
If the performance impact of a method is indistinguishable from the performance impact of Idle (we are talking about many repetitions of course), we say that "this method also takes 0 ns.

That's definitely a case of trying too hard. Measuring the accurate time taken but very small pieces of code is practically impossible to get right with the tools you have at your disposal. You'd need to measure the number of cycles such sequences of code require and then, if you really want, you can extrapolate the execution time from the number cycles. Attempting to measure the time and compensate with all kinds of adjustments and statistics is pretty much useless, in the end you have no way to prove that the absolute result is somehow meaningful. In such cases it's good enough if the results of a couple of benchmarks are comparable.

dotnet-bot pushed a commit to dotnet/corert that referenced this pull request Jan 13, 2018
* Improve perf for Math.Abs

* Inline Math.Abs

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Jan 13, 2018
* Improve perf for Math.Abs

* Inline Math.Abs

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corert that referenced this pull request Jan 13, 2018
* Improve perf for Math.Abs

* Inline Math.Abs

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Jan 13, 2018
* Improve perf for Math.Abs

* Inline Math.Abs

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
@@ -63,6 +99,12 @@ public static decimal Abs(decimal value)
return decimal.Abs(value);
}

[StackTraceHidden]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do, prevent inlining?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stops it showing up in exception stack trace #14652 so it looks like the throw was in-place rather than off a throw helper function

safern pushed a commit to dotnet/corefx that referenced this pull request Jan 16, 2018
* Improve perf for Math.Abs

* Inline Math.Abs

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
safern pushed a commit to dotnet/corefx that referenced this pull request Jan 16, 2018
* Improve perf for Math.Abs

* Inline Math.Abs

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
am11 pushed a commit to am11/corert that referenced this pull request Jan 29, 2018
* Improve perf for Math.Abs

* Inline Math.Abs

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
8 participants