New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up for … in xs -> …
in computed collections
#16948
Speed up for … in xs -> …
in computed collections
#16948
Conversation
* Use `Array.map` for [|for … in xs -> …|]` when `xs` is an array. * Use `List.map` for [for … in xs -> …]` when `xs` is a list.
❗ Release notes required
|
The gains for arrays achieved by precise allocation (as part of What I am fearing is that the lack of inlining might give us a degradation, e.g. for small arrays which previously relied on For that to happen, I guess this would need to move to Or is there another way to achieve inlining of the mapping? |
That's a good callout — my guess, without looking, is that this isn't a problem, since this transformation is happening after the main optimization phases, so I think that the lambda representing the mapping should already have any specializations in place (if that were going to happen).
If my guess above is wrong, then yes, we could simply emit I went ahead and threw your example code into the emitted IL tests, and this is what I see: Full IL.assembly extern runtime { }
.assembly extern FSharp.Core { }
.assembly assembly
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.FSharpInterfaceDataVersionAttribute::.ctor(int32,
int32,
int32) = ( 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 )
.hash algorithm 0x00008004
.ver 0:0:0:0
}
.mresource public FSharpSignatureData.assembly
{
}
.mresource public FSharpOptimizationData.assembly
{
}
.module assembly.exe
.imagebase {value}
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003
.corflags 0x00000001
.class public abstract auto ansi sealed assembly
extends [runtime]System.Object
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = ( 01 00 07 00 00 00 00 00 )
.class auto ansi serializable sealed nested assembly beforefieldinit result@11
extends class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<int32,int32>
{
.field static assembly initonly class assembly/result@11 @_instance
.method assembly specialname rtspecialname instance void .ctor() cil managed
{
.custom instance void [runtime]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
.custom instance void [runtime]System.Diagnostics.DebuggerNonUserCodeAttribute::.ctor() = ( 01 00 00 00 )
.maxstack 8
IL_0000: ldarg.0
IL_0001: call instance void class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<int32,int32>::.ctor()
IL_0006: ret
}
.method public strict virtual instance int32 Invoke(int32 x) cil managed
{
.maxstack 8
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldc.i4.2
IL_0003: beq.s IL_0007
IL_0005: ldarg.1
IL_0006: ret
IL_0007: ldarg.1
IL_0008: ldc.i4.2
IL_0009: mul
IL_000a: ret
}
.method private specialname rtspecialname static void .cctor() cil managed
{
.maxstack 10
IL_0000: newobj instance void assembly/result@11::.ctor()
IL_0005: stsfld class assembly/result@11 assembly/result@11::@_instance
IL_000a: ret
}
}
.field static assembly int32[] input@10
.custom instance void [runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [runtime]System.Diagnostics.DebuggerBrowsableState) = ( 01 00 00 00 00 00 00 00 )
.field static assembly int32[] result@11
.custom instance void [runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [runtime]System.Diagnostics.DebuggerBrowsableState) = ( 01 00 00 00 00 00 00 00 )
.method public specialname static int32 get_valueToCompare() cil managed
{
.custom instance void [runtime]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
.custom instance void [runtime]System.Diagnostics.DebuggerNonUserCodeAttribute::.ctor() = ( 01 00 00 00 )
.maxstack 8
IL_0000: ldc.i4.2
IL_0001: ret
}
.method public specialname static int32[] get_input() cil managed
{
.maxstack 8
IL_0000: ldsfld int32[] assembly::input@10
IL_0005: ret
}
.method public specialname static int32[] get_result() cil managed
{
.maxstack 8
IL_0000: ldsfld int32[] assembly::result@11
IL_0005: ret
}
.method private specialname rtspecialname static void .cctor() cil managed
{
.maxstack 8
IL_0000: ldc.i4.0
IL_0001: stsfld int32 '<StartupCode$assembly>'.$assembly::init@
IL_0006: ldsfld int32 '<StartupCode$assembly>'.$assembly::init@
IL_000b: pop
IL_000c: ret
}
.method assembly specialname static void staticInitialization@() cil managed
{
.maxstack 8
IL_0000: nop
IL_0001: ldc.i4.2
IL_0002: newarr [runtime]System.Int32
IL_0007: dup
IL_0008: ldc.i4.0
IL_0009: ldc.i4.1
IL_000a: stelem [runtime]System.Int32
IL_000f: dup
IL_0010: ldc.i4.1
IL_0011: ldc.i4.2
IL_0012: stelem [runtime]System.Int32
IL_0017: stsfld int32[] assembly::input@10
IL_001c: ldsfld class assembly/result@11 assembly/result@11::@_instance
IL_0021: call int32[] assembly::get_input()
IL_0026: call !!1[] [FSharp.Core]Microsoft.FSharp.Collections.ArrayModule::Map<int32,int32>(class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!0,!!1>,
!!0[])
IL_002b: stsfld int32[] assembly::result@11
IL_0030: ret
}
.property int32 valueToCompare()
{
.get int32 assembly::get_valueToCompare()
}
.property int32[] input()
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = ( 01 00 09 00 00 00 00 00 )
.get int32[] assembly::get_input()
}
.property int32[] result()
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = ( 01 00 09 00 00 00 00 00 )
.get int32[] assembly::get_result()
}
}
.class private abstract auto ansi sealed '<StartupCode$assembly>'.$assembly
extends [runtime]System.Object
{
.field static assembly int32 init@
.custom instance void [runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [runtime]System.Diagnostics.DebuggerBrowsableState) = ( 01 00 00 00 00 00 00 00 )
.custom instance void [runtime]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
.custom instance void [runtime]System.Diagnostics.DebuggerNonUserCodeAttribute::.ctor() = ( 01 00 00 00 )
.method public static void main@() cil managed
{
.entrypoint
.maxstack 8
IL_0000: call void assembly::staticInitialization@()
IL_0005: ret
}
} The specialized equality check ( .method public strict virtual instance int32 Invoke(int32 x) cil managed
{
.maxstack 8
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldc.i4.2
IL_0003: beq.s IL_0007
IL_0005: ldarg.1
IL_0006: ret
IL_0007: ldarg.1
IL_0008: ldc.i4.2
IL_0009: mul
IL_000a: ret
} If I take your example code and change it to make it generic at the If you want me to run more benchmarks of various scenarios, though, or to add more emitted IL tests, or to look into ways of getting rid of the closure, let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There is some more work that could be done to enable this optimization in more scenarios, e.g., when the For example, I don't love that this gets optimized — let y = f ()
[
for x in xs -> x + y
] — but this does not — [
let y = f ()
for x in xs -> x + y
] I started solving this, but I have not had a chance to finish it yet. I think that this PR is probably worthwhile on its own — in fact, I believe that the same phenomenon applies to the optimization introduced in #16832 anyway — so I can just open a new PR to address this issue once I have time. |
Good stuff Brian. This can go as is, follow ups are appreciated :) As for language flags, I think similar to Edgar's approach with attributes, you can merge these flags into one umbrella flag given you come up with a good name. But this is optional. |
Description
Semi-related followup to #16650 and #16832.
Array.map
for[|for … in xs -> …|]
whenxs
is an array. We see speedups up to 10×, and allocate between ¼ and ⅓ as much.List.map
for[for … in xs -> …]
whenxs
is a list. We see more moderate speedups of ~1.2×.Examples
Benchmarks
Arrays
Source
Lists
Source
Questions
Notes/followups
Array.map
orList.map
, which results in the emission and invocation of an extra closure type. Since this optimization is being called at the IlxGen stage, the closure won't be inlined, since we're already past optimization. It would probably be even faster in most cases if we just emittedwhile
-loops instead, at the expense of adding more code to the compiler.for
in list or array comprehensions. That could be done — use a fast integer iteration for arrays even when inside of a list comprehension; use the fast list iteration for lists even when inside of an array comprehension; use a fast integer iteration forResizeArray<_>
, orIList
, orICollection
; etc.Checklist