Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make typedefof more efficient #5019

Open
ForNeVeR opened this issue May 27, 2018 · 25 comments
Open

Make typedefof more efficient #5019

ForNeVeR opened this issue May 27, 2018 · 25 comments
Labels
Area-Compiler-Optimization The F# optimizer, release code gen etc. Feature Improvement
Milestone

Comments

@ForNeVeR
Copy link
Contributor

ForNeVeR commented May 27, 2018

I don't think it's actually a suggestion, so I'm filing it as a compiler issue.

As @rspeele mentions in comment to TaskBuilder.fs repo, currently typedefof looks rather inefficient. Compare the following C# and the following F# snippets:

using System.Collections.Generic;
public class C {
    public void M() {
        var x = typeof(List<>);
    }
}
open System.Collections.Generic
type C() =
    member this.M() =
        let x = typedefof<List<_>>
        ()

C# compiles to the following CIL:

         ldtoken [mscorlib]System.Collections.Generic.List`1
         call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
         stloc.0

While F# compiles to the following:

         ldtoken class [mscorlib]System.Collections.Generic.List`1<object>
         call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
         stloc.1
         ldloc.1
         callvirt instance bool [mscorlib]System.Type::get_IsGenericType()
         brfalse.s IL_001b
         
         ldloc.1
         callvirt instance class [mscorlib]System.Type [mscorlib]System.Type::GetGenericTypeDefinition()
         br.s IL_001c
         
IL_001b: ldloc.1
IL_001c: stloc.0

which is an equivalent of

Type typeFromHandle = typeof(List<object>);
Type type = (!typeFromHandle.IsGenericType) ? typeFromHandle : typeFromHandle.GetGenericTypeDefinition()

Could we improve that? Are there any issues I miss with plain ldtoken that require the current approach?

@buybackoff
Copy link
Contributor

buybackoff commented May 27, 2018

If you use let x = typeof<List<_>> instead of typedefof the IL is the same as in C#. My original question in TaskBuilder issue was about which one to pick to make sure JIT-time constant magic doesn't break. Since it is JIT work, typeof should be fine. Much bigger issue would be if that didn't work from F#.

I created a benchmark for that. If writing simple F# code like below it is 100+ time slower than C#, because the method equality uses F# GenericEquality and when I just want to generate 10 millions longs profiler reminds me about my old SO question. Also the ULong64 case in F# generates many GC0, which indicates that casting is not optimized away by JIT.

type Worker() =
  [<MethodImplAttribute(MethodImplOptions.AggressiveInlining)>]
  member this.Work<'T>(x: 'T) : 'T =
    if typeof<'T> = typeof<uint64> then
      let d : uint64 = unbox(box(x))
      unbox(box(d))
    else 
      x

main
F#

Case MOPS Elapsed GC0 GC1 GC2 Memory
F# Long 45.46 22 ms 0.0 0.0 0.0 0.008 MB
F# Ulong 37.04 27 ms 3.0 0.0 0.0 4.891 MB

Main
C#

Case MOPS Elapsed GC0 GC1 GC2 Memory
C# Ulong 5000.00 2 ms 0.0 0.0 0.0 0.000 MB
C# Long 5000.00 2 ms 0.0 0.0 0.0 0.000 MB

image

Then I changed the code to

type Worker() =
  [<MethodImplAttribute(MethodImplOptions.AggressiveInlining)>]
  member this.Work<'T>(x: 'T) : 'T =
    if obj.ReferenceEquals(typeof<'T>, typeof<uint64>) then
      let d : uint64 = unbox(box(x))
      unbox(box(d))
    else 
      x

and found that the method is not inlined for some reason even though it has the attribute (I believe the issue on this attributed was merged long time ago)
image

But adding inline to the F# method makes it exactly the same as C# without allocations etc.:

type Worker() =
  [<MethodImplAttribute(MethodImplOptions.AggressiveInlining)>]
  member inline this.Work<'T>(x: 'T) : 'T =
    if obj.ReferenceEquals(typeof<'T>, typeof<uint64>) then
      let d : uint64 = unbox(box(x))
      unbox(box(d + 1UL))
    else 
      x

main
F#

Case MOPS Elapsed GC0 GC1 GC2 Memory
F# Long 2500.00 4 ms 0.0 0.0 0.0 0.000 MB
F# Ulong 2500.00 4 ms 0.0 0.0 0.0 0.000 MB

Main
C#

Case MOPS Elapsed GC0 GC1 GC2 Memory
C# Ulong 2500.00 4 ms 0.0 0.0 0.0 0.000 MB
C# Long 2500.00 4 ms 0.0 0.0 0.0 0.000 MB

But increasing the number of operations for more precise measurement reveals that F# does less operations per second in both cases and the difference to C# is similar to the difference between the two cases (they differ by ULong addition) so likely the if branch is not eliminated in F#. I do not want to disassembly and dig deeper for now, probably @adamsitnik could help to show how to use disassembly with BenchmarkDotNet.

main
F#

Case MOPS Elapsed GC0 GC1 GC2 Memory
F# Long 2173.91 46 ms 0.0 0.0 0.0 0.000 MB
F# Ulong 1923.08 52 ms 0.0 0.0 0.0 0.000 MB

Main
C#

Case MOPS Elapsed GC0 GC1 GC2 Memory
C# Ulong 2325.58 43 ms 0.0 0.0 0.0 0.000 MB
C# Long 2272.73 44 ms 0.0 0.0 0.0 0.000 MB

The C# code is as simple/idiomatic as one could write, no special tricks are needed.

    class Worker
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public T Work<T>(T x)
        {
            if (typeof(T) == typeof(ulong))
            {
                var u = (ulong)(object)x;
                u = u + 1;
                return (T)(object)u;
            }
            else
            {
                return x;
            }
        }
    }

Update:
I have looked into the IL and found that C# generates equality using System.Type::op_Equality where in F# I used obj.ReferenceEquals. How to make F# generate the same? When I use simple .Equals it generates a method call and the benchmark becomes many times slower.

call bool [System.Runtime]System.Type::op_Equality(class [System.Runtime]System.Type, class [System.Runtime]System.Type)

@rspeele
Copy link

rspeele commented May 27, 2018

Hi, perhaps I misunderstand your meaning with this statement:

If you use let x = typeof<List<_>> instead of typedefof the IL is the same as in C#

But F#'s typeof<List<_>> is equivalent to C#'s typeof(List<object>).

It's not a replacement for C#'s typeof(List<>), which is why in F# you must use either typedefof<List<_>> or typeof<List<_>>.GetGenericTypeDefinition(). In both of these scenarios, again, the <_> is effectively <obj>, it's just idiomatic to use _.


I should mention regardless of this being changed I still probably wouldn't want to do reflection-based checks to return cached bool tasks in TaskBuilder.fs, just because it adds complexity and there would always be some overhead to it in the majority of cases where not returning bools. But this issue is a separate thing and I must admit there have been times when doing heavy reflection code in F# that I have (probably as an unnecessary premature optimization) chosen to save the result of typedefof in a variable.

@buybackoff
Copy link
Contributor

@rspeele in the original issue my focus was on JIT compile-time constant and if branch elimination. If it worked then...

there would always be some overhead to it in the majority of cases where not returning bools.

... there would be exactly zero overhead because JIT compiler just eliminates the wrong branches entirely (details here). I promised some benchmarks in the original issue and just posted it there.

@zpodlovics
Copy link

zpodlovics commented May 28, 2018

@buybackoff I guess this issue #526 still haunt us. I guess of the performance related issues comes from unwanted heap allocations (or even stack because of the excessive struct copy due the inability to pass struct's by reference,hopefully this will change with Span<'T> and inref<'T>).

F# should have some kind of infrastructure (even better: constraints or even attributes something like the roslyn checker provides!) to check the heap / stack allocations - perferrably by the compiler. It it possible to calculate the heap / stack allocation based on the TAST and the generated IL? While for functional languages It's an active research area [1], even a basic or a Roslyn HeapAllocationAnalyzer like tool will help a lot due the quick feedback loop to fix existing F# problems like this issue (it also helps that we know it's possible to solve it, because C# already have these tricks):

https://github.com/Microsoft/RoslynClrHeapAllocationAnalyzer

Also take a look at this "constraint" for C# as "The proposal is to declare a local function as non-capturing. The keyword to indicate that would be static":
dotnet/csharplang#1565

[1] http://www.raml.co/
[2] https://arxiv.org/abs/1801.01896

@dsyme
Copy link
Contributor

dsyme commented May 29, 2018

I have looked into the IL and found that C# generates equality using System.Type::op_Equality where in F# I used obj.ReferenceEquals. How to make F# generate the same?

@buybackoff What happens when you use System.Type.op_Equality explicitly, e.g.

    if System.Type.op_Equality(typeof<'T>, typeof<uint64>) then
      ...

Note that open FSharp.Core.Operators.NonStructuralComparison may be sensible for this kind of low-allocation code in any case. In that case, I believe op_Equality will always be called directly for all uses of a = b but of course you will lose structural quality calls and won't be able to use = on some types

@buybackoff
Copy link
Contributor

@dsyme yep, this removes the difference (and actually F# becomes slightly faster). Haven't looked into IL.

But what is the deal with inlining here? Is F# compiler or JIT does the work when the inline is present on the method?

@dsyme
Copy link
Contributor

dsyme commented May 29, 2018

But what is the deal with inlining here? Is F# compiler or JIT does the work when the inline is present on the method?

The F# compiler does the work for inline. The JIT does the work for the AggressiveInlining attribute if there are any remaining calls to the method (which there won't be in F# code).

@buybackoff
Copy link
Contributor

@dsyme I formulated the question incorrectly. When F# inlines the method (I realize that this is like literally pasting IL code to a call site) then boxing disappears. Does F# compiler eliminate the unbox(box(x)) part when types are the same or JIT? And why do you thing this method is not inlined with the AggressiveInlining attribute?

@dsyme
Copy link
Contributor

dsyme commented May 29, 2018

@dsyme I formulated the question incorrectly. When F# inlines the method (I realize that this is like literally pasting IL code to a call site) then boxing disappears. Does F# compiler eliminate the unbox(box(x)) part when types are the same or JIT?

I think the JIT will. F# doesn't. The inlining will also type-specialise the code and that might help the JIT.

And why do you thing this method is not inlined with the AggressiveInlining attribute?

No idea I'm afraid. I'd just be guessing.

@zpodlovics
Copy link

zpodlovics commented May 30, 2018

FYI:

"Also, there are a number of reasons why methods with AggressiveInlining might not get inlined. If you can drop a checked clrjit.dll into your dotnet install, you can set COMPlus_JitPrintInlinedMethods to get a log of inlining decisions with failure reasons (this is less voluminous than a full jit dump)."

https://github.com/dotnet/coreclr/issues/5996#issuecomment-228467831

There is an optimization pass to detect box + isinst: (https://github.com/dotnet/coreclr/issues/12877 + https://github.com/dotnet/coreclr/issues/14472) https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.h#L3153 however I did not yet found a box + unbox optimization pass in compiler.h / compiler.cpp.

https://github.com/dotnet/coreclr/issues/14305

@zpodlovics
Copy link

@buybackoff last time I checked the if typeof<'T> = typeof then else if pattern created a somewhat horrible and the slowest code (fsharp/fslang-design#287 (comment)).

 TypeofStructDU | 342.482 ns | 3.3741 ns | 3.1561 ns |
    [<MethodImpl(MethodImplOptions.AggressiveInlining)>]
    let inline matchTypeofStructDU(x: 'T) (v: int) = 
        if typeof<StructCase1> = typeof<'T> then 
            Baseline.Invoke(v,1)
        else if typeof<StructCase2> = typeof<'T> then
                Baseline.Invoke(v,2)
        else if typeof<StructCase3> = typeof<'T> then
                Baseline.Invoke(v,3)
        else if typeof<StructCase4> = typeof<'T> then
                Baseline.Invoke(v,4)
        else if typeof<StructCase5> = typeof<'T> then
                Baseline.Invoke(v,5)
        else if typeof<StructCase6> = typeof<'T> then
                Baseline.Invoke(v,6)
        else if typeof<StructCase7> = typeof<'T> then
                Baseline.Invoke(v,7)
        else if typeof<StructCase8> = typeof<'T> then
                Baseline.Invoke(v,8)
        else if typeof<StructCase9> = typeof<'T> then
                Baseline.Invoke(v,9)
        else if typeof<StructCase10> = typeof<'T> then
                Baseline.Invoke(v,10)
        else 0

    let typeofStructDUInstance = (Unchecked.defaultof<StructCase10>) 

    [<MethodImpl(MethodImplOptions.NoInlining)>]
    let typeofStructDU() = matchTypeofStructDU typeofStructDUInstance 1

Generated IL code snippet (full namespace retracted) for typeof as match:

  .method assembly static int32  matchTypeofStructDU$cont@396(class [FSharp.Core]Microsoft.FSharp.Core.Unit unitVar) cil managed
  {
    .custom instance void [System.Runtime]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 ) 
    // Code size       284 (0x11c)
    .maxstack  4
    IL_0000:  ldtoken    StructCase3
    IL_0005:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_000a:  ldtoken    StructCase10
    IL_000f:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0014:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_0019:  brfalse.s  IL_0023

    IL_001b:  ldc.i4.1
    IL_001c:  ldc.i4.3
    IL_001d:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0022:  ret

    IL_0023:  ldtoken    StructCase4
    IL_0028:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_002d:  ldtoken    StructCase10
    IL_0032:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0037:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_003c:  brfalse.s  IL_0046

    IL_003e:  ldc.i4.1
    IL_003f:  ldc.i4.4
    IL_0040:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0045:  ret

    IL_0046:  ldtoken    StructCase5
    IL_004b:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0050:  ldtoken    StructCase10
    IL_0055:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_005a:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_005f:  brfalse.s  IL_0069

    IL_0061:  ldc.i4.1
    IL_0062:  ldc.i4.5
    IL_0063:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0068:  ret

    IL_0069:  ldtoken    StructCase6
    IL_006e:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0073:  ldtoken    StructCase10
    IL_0078:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_007d:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_0082:  brfalse.s  IL_008c

    IL_0084:  ldc.i4.1
    IL_0085:  ldc.i4.6
    IL_0086:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_008b:  ret

    IL_008c:  ldtoken    StructCase7
    IL_0091:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0096:  ldtoken    StructCase10
    IL_009b:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_00a0:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_00a5:  brfalse.s  IL_00af

    IL_00a7:  ldc.i4.1
    IL_00a8:  ldc.i4.7
    IL_00a9:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_00ae:  ret

    IL_00af:  ldtoken    StructCase8
    IL_00b4:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_00b9:  ldtoken    StructCase10
    IL_00be:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_00c3:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_00c8:  brfalse.s  IL_00d2

    IL_00ca:  ldc.i4.1
    IL_00cb:  ldc.i4.8
    IL_00cc:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_00d1:  ret

    IL_00d2:  ldtoken    StructCase9
    IL_00d7:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_00dc:  ldtoken    StructCase10
    IL_00e1:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_00e6:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_00eb:  brfalse.s  IL_00f6

    IL_00ed:  ldc.i4.1
    IL_00ee:  ldc.i4.s   9
    IL_00f0:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_00f5:  ret

    IL_00f6:  ldtoken    StructCase10
    IL_00fb:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0100:  ldtoken    StructCase10
    IL_0105:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_010a:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_010f:  brfalse.s  IL_011a

    IL_0111:  ldc.i4.1
    IL_0112:  ldc.i4.s   10
    IL_0114:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0119:  ret

    IL_011a:  ldc.i4.0
    IL_011b:  ret
  } // end of method MatchBenchmark::matchTypeofStructDU$cont@396

Generated IL code snippet (full namespace retracted) for benchmark:

 .method public static int32  typeofStructDU() cil managed noinlining
  {
    // Code size       77 (0x4d)
    .maxstack  4
    IL_0000:  ldtoken    StructCase1
    IL_0005:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_000a:  ldtoken    StructCase10
    IL_000f:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0014:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_0019:  brfalse.s  IL_0023

    IL_001b:  ldc.i4.1
    IL_001c:  ldc.i4.1
    IL_001d:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0022:  ret

    IL_0023:  ldtoken    StructCase2
    IL_0028:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_002d:  ldtoken    StructCase10
    IL_0032:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0037:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_003c:  brfalse.s  IL_0046

    IL_003e:  ldc.i4.1
    IL_003f:  ldc.i4.2
    IL_0040:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0045:  ret

    IL_0046:  ldnull
    IL_0047:  call       int32 MatchBenchmark::matchTypeofStructDU$cont@396(class [FSharp.Core]Microsoft.FSharp.Core.Unit)
    IL_004c:  ret
  } // end of method MatchBenchmark::typeofStructDU

@buybackoff
Copy link
Contributor

@zpodlovics the point is that the same C# method is inlined by JIT and they use the same JIT... which probably means IL is different. But in such cases like this I prefer F# order, not a hint, to inline.

There are not so many reasons when a simple method is not inlined - the most common is exception throwing, like in this case:

image

The method is F#-intrinsic, but not JIT-intrisic, so it is a normal one and any throw 100% makes a method not a candidate for inlining (which is easy to avoid with ThrowHelper pattern).

@buybackoff
Copy link
Contributor

@zpodlovics

a somewhat horrible and the slowest code

"horrible and the slowest code" is this part: typeof<StructCase2> = typeof<'T> which translates to bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>.

Try using what @dsyme showed above: if System.Type.op_Equality(typeof<'T>, typeof<uint64>) and JIT will happily recognize that this condition is JIT-time constant and will remove false branches completely.

@zpodlovics
Copy link

zpodlovics commented May 30, 2018

@buybackoff I did not noticed that System.Type.op_Equality comment earlier, I will try it out.

It may worth to reconsider the exception handling in F# code base expecially in standard library and move out the slow path to helpers (throwhelpers). This could shrink the fastpath number of ILs, that could also improve the JIT inlining decisions, but that's a different story / issue.

@AndyAyersMS
Copy link
Member

Adjacent box / unbox.any is handled via a peephole optimization in the jit importer (look at importer.cpp around line 14890).

I tried looking at inlining decisions in your linked repro case but it appears that inlining of Worker.Work was done by the frontend compiler (even without the inline keyword).

The jit in .Net Core 2.1 has a number of enhancements to type equality opts and boxing opts so you might also give that a try, if you're still using 2.0.

@buybackoff
Copy link
Contributor

It may worth to reconsider the exception handling in F# code base expecially in standard library and move out the slow path to helpers (throwhelpers). This could shrink the fastpath number of ILs, that could also improve the JIT inlining decisions, but that's a different story / issue.

This is a low-hanging fruit. Corefx uses this everywhere with interesting comments on why two nested method. Since @AndyAyersMS is here he could give authoritative answer if this pattern is worth it to change many LOCs

Just looked into Array.fs and found that exceptions are inlined with F# inline keyword, which is quite opposite to ThrowHelper pattern.

@zpodlovics
Copy link

zpodlovics commented May 30, 2018

@buybackoff Exactly, this why I mentioned. It's assumed to be a low-hanging fruit, but I am afraid it will be probably the most critical realization for F# performance related issues. It's not accidental that CoreCLR/CoreFX/Kestrel code refactored to that style.

a) The most common execution path shrinked significantly (fastpath). The less code means lot more opportunity to inline (some slow code path could also prevent inlining due the complexity, size, etc) and the JIT will generate lot less code to execute that path.
b) The non-common path (slowpath) moved out to own method/function to avoid code bloat when the fastpath is inlinied.
c) The other non-common path (exception) moved out to own method/function to avoid code bloat when the fastpath is inlinied.

Inlining/NoInlining flags should be used wisely: https://github.com/dotnet/coreclr/issues/1270#issuecomment-180862050

The original reason why raise marked inline is ... to avoid warnings is code coverage tests (instead of fixing the code coverage tool). That was reasonably reasonable at that time, but the CoreCLR changed/improved significantly since then. Please see my late concerns about here: #3121 (comment)

There are some legitimate case when inlining (at IL level by the compiler) exception throwing is required for example NoDynamicInvocation case here: https://github.com/dotnet/coreclr/issues/1270#issuecomment-181767435

@buybackoff
Copy link
Contributor

@zpodlovics but that is tedious work without much fun. I try to use ThrowHelper by default after I created such helper for myself, but usually it makes sense when a profiler shows that e.g. method call takes more time than some numeric calculation. But at least in F# Core many throwing functions are reused from helper modules and it's easier to change code there than everywhere.

@zpodlovics
Copy link

zpodlovics commented May 30, 2018

@buybackoff I did the same in my code. Well, we can always introduce a an attribute for forced inline exception throwing (on raise) eg.: ForceInlineThrow (or something else), and the compiler do a transformation and generate helper static methods for raise code. But I am afraid this automatic tranformation may result a non-equivalent program and/or error prone (but the same could be true for inlining exception throwing).

@AndyAyersMS
Copy link
Member

I would say in CoreCLR/CoreFX where're kind of in an ok but funny place. We have some number of methods that throw manually converted over to use throw helpers. But not all. We'd really prefer that this transformation be done automatically, behind the scenes (via say the ILLinker). But we don't have that yet.

The pattern is:

  • The main method does its checks (say argument validation) and then conditionally calls throw helper with any needed args. Type the helper so no implicit boxing or parms array allocation is needed.
  • The throw helper does all formatting and exception object creation, then unconditionally throws. It is deliberately not marked with [MethodImpl(MethodImplOptions.NoInlining)].

The jit recognizes this pattern and does not inline the throw helper, and moves its call site to the "cold" section of the calling method's code (either end of method when jitting, or in a split off cold part when prejitting).

Note that because of resource lookup, boxing, etc the formatting of the exception message can be a fairly large chunk of code (and the trend is to make these messages be more detailed). So we also get overall size wins when throw helpers can be reused across a number of different methods.

Whether this is worth doing is hard to say -- if you have a number of frequently called methods that have conditional throws then it probably is helpful. You can be selective and only convert things gradually.

@manofstick
Copy link
Contributor

Not re: the original topic, but where it meandered to... #5307 affects this (sprinkling breadcrumbs...)

(Oh, and BTW, not sure if it was ever changed, but I remember looked many moons ago at AggressiveInlining in f#... From memory the MethodImplAttributes were handled as special cases, and AggressiveInlining were not passed through to the IL (probably the enum value was created after the fsharp compiler did that bit...), which might explain why it didn't work. From memory again I think I had a crack at adding it, and it wasn't particularly hard - not sure why I never posted a PR though... Hmmm. Maybe there was some issue...)

@zpodlovics
Copy link

@manofstick Last time I checked AggressiveInlining was fine in F#. The decompiled IL (with ildasm) have the aggressiveinlining marker. Congratulation for your amazing work on equality!

    [<MethodImpl(MethodImplOptions.AggressiveInlining)>]
    static member op_Equality(this : SOption<'a>, other : SOption<'a>) =
        this.IsSome = other.IsSome && this.Value = other.Value        
    .method public specialname static bool 
            op_Equality(valuetype Program/VOption`1<!a> this,
                        valuetype Program/VOption`1<!a> other) cil managed aggressiveinlining
    {
      // Code size       44 (0x2c)
      .maxstack  4
      .locals init (!a V_0,
               !a V_1)
      IL_0000:  ldarga.s   this
      IL_0002:  call       instance bool valuetype Program/VOption`1<!a>::get_IsSome()
      IL_0007:  ldarga.s   other
      IL_0009:  call       instance bool valuetype Program/VOption`1<!a>::get_IsSome()
      IL_000e:  bne.un.s   IL_002a

      IL_0010:  ldarga.s   this
      IL_0012:  call       instance !0 valuetype Program/VOption`1<!a>::get_Value()
      IL_0017:  stloc.0
      IL_0018:  ldarga.s   other
      IL_001a:  call       instance !0 valuetype Program/VOption`1<!a>::get_Value()
      IL_001f:  stloc.1
      IL_0020:  ldloc.0
      IL_0021:  ldloc.1
      IL_0022:  tail.
      IL_0024:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<!a>(!!0,
                                                                                                                                !!0)
      IL_0029:  ret

      IL_002a:  ldc.i4.0
      IL_002b:  ret
    } // end of method VOption`1::op_Equality

@NinoFloris
Copy link
Contributor

Any way to move this forward?

@dsyme what are your thoughts on loading an open generic token like C# does by repurposing _ within a typedefof call by adding an optimization in the compiler?

@dsyme
Copy link
Contributor

dsyme commented Mar 23, 2020

@dsyme what are your thoughts on loading an open generic token like C# does by repurposing _ within a typedefof call by adding an optimization in the compiler?

Couldn't we just add a specific optimization to generate the open ldtoken?

@NinoFloris
Copy link
Contributor

That was the suggestion indeed!

Without adding discards support it is a bit confusing you must fully specify the type yet you don't care about the arguments.

Related to this is something I ran into a while back with a complex type; as typedefof is also just a normal generic function it must also respect the constraints on the type parameters involved. This can lead to awkward signatures where you must put a lot of emphasis on the arguments (think nonsense like Foo<'T, 'U when 'T :> seq<'U>, 'U :> IEquatable<'U>>). I'm assuming it would have to become an intrinsic to loosen type checks for typedef, if it's even something you want to entertain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compiler-Optimization The F# optimizer, release code gen etc. Feature Improvement
Projects
Status: New
Development

No branches or pull requests

9 participants