Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix opcodes from vb compiler when initialize an array and function local variant, jump invoke and delegate invoke. #519

Open
RevensofT opened this issue Feb 15, 2015 · 19 comments
Milestone

Comments

@RevensofT
Copy link

Hi guys,

I want to point out some flaw opcodes that vb compiler create and hope VB dev team have sometime to fix those and make VB better performance.

  1. Array

When I create an array like this "Dim Data(input.Length - 1) As Char", compiler will rewrite to "Dim Data(input.Length - 1 + 1) As Char"; it's unnecessary 4 bytes and you loss 4 of 64 bytes if you aim to create a tiny method.

  1. Function local variant

VB function away has self local variant like "Function foo() As string" will have "Dim foo As String" even it's unused and it bad for performance when you aim to create tiny method header, even a single local variant will make it become a fat method header.

  1. Optional to invoke method by jmp

Jump invoke might be not fast as call or callvirt but it's safe for a stack overflow error and recursive method.

4) Delegate invoke

Instead to use invoke method of delegate class, make it become sugar syntax.

 ldfld obj Delegate::ThisObj
 ldfld native int Delegate::MethodPtr
 //Some arguments
 calli void(arguments) //maybe has a field stock method signature could helpful to do this line.

Cost for invoke method is really high, less method invoke more performance.

  1. Inline method

Long method is hard to read and maintain but it's not suffer from method invoke cost, an inline method create by a non return method(aka Sub) could do this without any problem by use caller method's local variant as arguments of inline method.

  • Use keyword Inline when declare a method.
  • All argument and local variant of an inline method will belong to caller method's local variant.
  • Great performance with easily readable and maintainable without "#Region" in method.
Public Sub A()
      Dim First = 15, Second = 9
      Addition(First, Second)
      Subtraction(16, First)
End Sub

Inline Sub Addition(A As Integer, B As Integer)
      Console.WriteLine(A + B)
End Sub

Inline Sub Subtraction(A As Integer, B As Integer)
      Console.WriteLine(A - B)
End Sub

Code will compile to this.

Public Sub A()
      Dim First = 15, Second = 9
      Console.WriteLine(First + Second)

      Dim Third = 16
      Console.WriteLine(Third - First)
End Sub

Local variant will not create any new local variant, anything else (maybe include a field of class for performance) will create a new local variant.

Ps. Sorry for no label, I can't add a label, don't see any option like in help document in topic apply labels to an issue.

@mikedn
Copy link

mikedn commented Feb 15, 2015

  1. As far as I can tell the input.Length - 1 + 1 is not optimized to input.Length because VB does integer overflow checks by default. If you disable these checks then the expression is optimized as expected.

  2. I suppose this is a valid request but it's unclear why do you care about the fact that the IL method header is fat or not. How do you measure the performance impact of this?

  3. Presumably you're asking for tailcall optimization to be performed when possible.

  4. That's unnecessary. Delegate.Invoke is treated specially by the runtime and JIT compiler. The native code generated for the Invoke call is identical to the code that would be produced by the IL code you have shown.

  5. Inlining is an optimization that's normally performed by the JIT compiler and in the particular example that you have shown that's exactly what happens. Doing inlining in the VB compiler (or any other managed compiler) is questionable because the managed compiler doesn't know the size of the native code to be able to appreciate if inlining is beneficial or not. Additionally, the JIT compiler tends to perform worse when the method code size gets large so arbitrary inlining can end up making the code slower.

@RevensofT
Copy link
Author

Hi @mikedn ,

  1. I set remove integer overflow checks and enable optimizations but result still same as I wrote.

  2. I'm just do some huge loop(100M - 1000M) and compare the time of each method spent.

  3. Not tail.call I really mean jmp, jump can do at any where of method not just tail call optimize at the end of method like F#. I know they are the same but I don't want to people think it will work in limit condition like they did in F#.

    Public Sub Junction(A As Object, B As Object)
        If TypeOf A Is SomeType Then
            Jump SomeType_Handle
        Else
            Jump Common_Handle
        End If
    End Sub
  1. Delegate invoke has some cost in it, when I run some test between it and IL code like I wrote, IL code alway spent the time less then delegate invoke.

  2. Can't VB compiler do it with regex like constant field value ? It's not like compiler just streak a head to build IL code. About downside on large method, I don't find any slow down on my test and my work(I do this by use ILGenerator), could you specifies the size of code that start to give a performance drop ?

@mikedn
Copy link

mikedn commented Feb 15, 2015

  1. I cannot reproduce this result. Here's the x86 code generated when overflow checks are enabled:
mov         edx,dword ptr [ecx+4]  
add         edx,0FFFFFFFFh  
jo          01C808F2  
add         edx,1  
jo          01C808F2  
mov         ecx,6D94160Ah  
call        00BF3234  

And here's the code generated when overflow checks are disabled, the -1+1 expression is clearly gone:

mov         edx,dword ptr [ecx+4]  
mov         ecx,6D94160Ah  
call        00B13234  
  1. Show the code you're using to test this.

  2. I see. IMO that's something that the JIT compiler could do.

  3. But how do you even test that? Delegate's _methodPtr field is not public so you can't quite write the code you're suggesting to begin with.

  4. I have no idea what you are trying to say with regex, constant field value etc.

As for code size - that's complicated. I'm not even sure if it has something to do with the code size, it may be related to the number of local variables. In any case, I looked at the code generated by the JIT compiler long enough to know that code quality suffers when the methods are large/complicated. And let's not forget that there are at least 3 different JIT compilers out there and each has different characteristics.

@AdamSpeight2008
Copy link
Contributor

@RevensofT I think "extra" +1 is for backwards compatibility with VB6, that had Option Base 1

@gafter gafter added Area-Compilers Enhancement Tenet-Performance Regression in measured performance of the product from goals. Language-VB labels Feb 15, 2015
@RevensofT
Copy link
Author

Hi @AdamSpeight2008 ,
Wow, I never know VB has an optional like that before, thanks; however I think compiler should calculated in advance when it's about constant value, something like extra -1 + 1 compiler should compile to extra , constant byte array like "Test " + "method" should compile to "Test method" too.

@mikedn

  1. I don't know what CLR does but I think it's not good idea to leave optimize job to CLR; compiler should be the one that create shortest waypoint for CLR to follow, CLR should does less side job as possible and concentrate to run CIL fastest as it could; any optimization should be done when compile to CIL code and I don't see it does when I look in to CIL code by use ildasm.

  2. Sure, this is my test code but you need to install IL Extension for directly write CIL code.

VB code

Imports System.Runtime.CompilerServices

Module Module1

    Public Const Max As UInteger = 1000000000
    Public Const A As Integer = -1, B As Integer = 256

    <MethodImplAttribute(MethodImplOptions.ForwardRef)>
    Public Sub Thin(A As Integer, B As Integer)
    End Sub
    <MethodImplAttribute(MethodImplOptions.ForwardRef)>
    Public Sub Fat(A As Integer, B As Integer)
    End Sub

    Public Sub TestThin()
        Dim Time = Date.Now

        For i As UInteger = 0 To Max
            Thin(A, B)
        Next
        Console.Write("Thin : ")
        Console.WriteLine(Date.Now - Time)
    End Sub

    Public Sub TestFat()
        Dim Time = Date.Now

        For i As UInteger = 0 To Max
            Fat(A, B)
        Next
        Console.Write("Fat : ")
        Console.WriteLine(Date.Now - Time)
    End Sub

    Sub Main()
        TestThin()
        TestFat()

        TestFat()
        TestThin()

        TestThin()
        TestFat()

        Console.ReadKey()
    End Sub
End Module

IL code (my test project's name is ILTest)

.class public sealed ILTest.Module1
{
    .method public static void Thin(int32 A, int32 B) cil managed
    {
        .maxstack 2

        ldarg.0
        ldarg.1
        add
        starg 0
        ret
    }
    .method public static void Fat(int32 A, int32 B) cil managed
    {
        .maxstack 2
        .locals init (int32 Useless)

        ldarg.0
        ldarg.1
        add
        starg 0
        ret
    }
}

As you see, only thing those method are difference is Fat method has local variant (I need to do this on CIL because compiler will rid any unused local variant.)

Result

Thin : 00:00:05.1372935
Fat : 00:00:05.5303172
Fat : 00:00:05.5233163
Thin : 00:00:05.0562958
Thin : 00:00:05.0552829
Fat : 00:00:05.5223180

As you see, Thin method alway faster then Fat method and only thing that difference between those 2 is one has local variant and other not.

  1. Bypass access modifiers isn't that hard and VB developer team could change it to public readonly easily on their side but for a test I did, I just create a new class the same way as delegate constructor method did and do the test.

  2. I'm talk about precompile step, text edit before compile to CIL, VB compiler should has it unless it go streak a head to write CIL without dress a code.

@mikedn
Copy link

mikedn commented Feb 16, 2015

  1. The VB compiler already does constant folding, for example Console.WriteLine(2 + 3) is transformed into Console.WriteLine(5). The problem with your example is that the actual expression is (input.Length - 1) + 1 and in this case constant folding is impossible without also doing expression reassociation. That's possible but it duplicates work because the JIT compiler does it anyway.

You should probably open a separate issue for this specific point and clearly state that you expect the IL to be optimized. The compiler team can then decide what to do about this and other expression optimization issues - for example Console.WriteLine(-1 + 1 + input.Length) is compiled as Console.WriteLine(0 + input.Length).

  1. Those timings indicate that you're running a debug build, in the release build the timings are identical for both Fat and Thin because the generated native code is identical - a single ret instruction. It's not the fat/thin header that causes the difference in debug builds, the difference is the result of an additional x86 instruction that's used to initialize that local.

  2. The VB team doesn't own the Delegate code so they can change the access modifiers, you'll have to ask the runtime to do that. Anyway, this doesn't answer my question. You claimed that in your tests the IL code that you have shown is faster than Invoke. How did you test given that you can't actually write the code you're suggesting?

  3. Compiler don't "edit text", they parse the text into some data structures (trees, graphs etc.) and work on those data structures. As I already mentioned at 1, the managed compilers already do some simple optimizations like constant folding. There may be some other optimizations that could be added to managed compiler but inlining isn't one of them. The JIT compiler is in a much better position to do inlining for a number of reasons:

  • knowledge of CPU characteristics like instruction size, number of available registers, calling convention
  • ability to bypass access modifiers
  • guaranteed access to the code that needs to be inlined (the managed compiler doesn't always have access to the IL code if the method to be inlined is outside of the project)

@RevensofT
Copy link
Author

  1. Thanks, I don't re check it since VS 2012; but VB Compiler still has to fixed it, it should not leave their mess to JIT to clean up.

  2. I don't think anyone going to test performance on debug build mode, of course I did it on release build mode; could you post your code that you get the same result of both method ? I'm curios why you get the same result.

  3. C# already custom their delegate to has optional argument, why can't VB do the same ? Do you ever play around delegate ? Do you think which part of delegate belong to runtime ?

  4. Code optimization should be done in compile to CLI step, JIT should focus on translate bytecode to CLR for better performance.
    What's I propose is a way to rid of method's cost, not optimize it like RyuJIT does.

@mikedn
Copy link

mikedn commented Feb 16, 2015

  1. I simply copy pasted your code in a VB project with IL support. The results for a debug build are:
Thin : 00:00:04.6988712
Fat : 00:00:04.9461733
Fat : 00:00:04.9852200
Thin : 00:00:04.7018761
Thin : 00:00:04.5566761
Fat : 00:00:04.9311626

The results for a release build are:

Thin : 00:00:01.9664632
Fat : 00:00:01.9624628
Fat : 00:00:01.9424247
Thin : 00:00:01.9201779
Thin : 00:00:01.9444396
Fat : 00:00:01.9323991

Clearly, your results are much closer to my debug results. Maybe you're running a release build but you're running in the debugger (F5). By default VS debugger disables JIT optimizations so what you get are really debug results.

  1. I don't understand what you're trying to say, C# uses Delegate.Invoke too.

  2. Well, I already tried to explain why some optimizations like inlining cannot be done properly at IL level. If you didn't understand feel free to ask for more details and I'll see what else can I add. If you choose to ignore my explanation, well, good luck convincing the compiler team to do this simply because you believe that's how things should be done.

@RevensofT
Copy link
Author

  1. Thanks, it seem like you said.

4) & 5) I will halt those topic for now, error for 2) might be effect with those test's result too; I need to re check it.

About C# delegate, it has [opt] and .param that VB hasn't even VB has option before C#.

.method public hidebysig newslot virtual 
        instance int32  Invoke(int32 A,
                               [opt] int32 B) runtime managed
{
  .param [2] = int32(0x00000005)
} // end of method TestGate::Invoke

@RevensofT
Copy link
Author

Update for 4) test code.

VB code

Imports System.Runtime.CompilerServices

Module Module1

    Public Const Max As UInteger = 100000000

    Public Sub Test_Runtime()
        Dim Subject As New List(Of UInteger)
        Dim Target As New Action(Of UInteger)(AddressOf Subject.Add)

        Dim Time = Date.Now

        For i As UInteger = 0 To Max
            Target.Invoke(i)
        Next
        Console.Write(Date.Now - Time)
        Console.WriteLine(" : Runtime")
    End Sub

    <MethodImplAttribute(MethodImplOptions.ForwardRef)>
    Public Sub Test_CIL()
    End Sub

    Sub Main()
        Test_CIL()
        Test_Runtime()

        Test_Runtime()
        Test_CIL()

        Test_CIL()
        Test_Runtime()

        Console.ReadKey()
    End Sub
End Module

CIL code

.class public sealed ILTest.Module1
{
    .method public static void  Test_CIL() cil managed
    {
        .maxstack  3
        .locals init ([0] class [mscorlib]System.Collections.Generic.List`1<uint32> Subject,
                      [1] native int Target,
                      [2] valuetype [mscorlib]System.DateTime Time,
                      [3] uint32 i)
        newobj     instance void class [mscorlib]System.Collections.Generic.List`1<uint32>::.ctor()
        stloc.0
        ldloc.0
        ldvirtftn  instance void class [mscorlib]System.Collections.Generic.List`1<uint32>::Add(!0)
        stloc.1
        call       valuetype [mscorlib]System.DateTime [mscorlib]System.DateTime::get_Now()
        stloc.2
        ldc.i4.0
        stloc.3
Loop:   ldloc.0
        ldloc.3
        ldloc.1
        calli      void(class [mscorlib]System.Collections.Generic.List`1<uint32>, uint32)
        ldloc.3
        ldc.i4.1
        add.ovf.un
        stloc.3
        ldloc.3
        ldc.i4     100000000
        ble.un.s   Loop
        call       valuetype [mscorlib]System.DateTime [mscorlib]System.DateTime::get_Now()
        ldloc.2
        call       valuetype [mscorlib]System.TimeSpan [mscorlib]System.DateTime::op_Subtraction(valuetype [mscorlib]System.DateTime,
                                                                                                     valuetype [mscorlib]System.DateTime)
        box        [mscorlib]System.TimeSpan
        call       void [mscorlib]System.Console::Write(object)
        ldstr      " : CIL"
        call       void [mscorlib]System.Console::WriteLine(string)
        ret
    }

}

Result

00:00:00.8380550 : CIL
00:00:01.0010617 : Runtime
00:00:01.0100505 : Runtime
00:00:00.9870990 : CIL
00:00:00.9840563 : CIL
00:00:01.0150865 : Runtime

Update 5) test code.

Imports System.Runtime.CompilerServices

Module Module1

    Public Const Max As UInteger = 1000000000
    Public Const A As Integer = -1, B As Integer = 256

    Public Sub Add(A As Integer, B As Integer)
        A = A + B
        A += A
    End Sub

    Public Sub Test_Method()
        Dim Time = Date.Now

        For i As UInteger = 0 To Max
            Add(A, B)
            Add(A, B)
            Add(A, B)
            Add(A, B)
        Next
        Console.Write(Date.Now - Time)
        Console.WriteLine(" : Method")
    End Sub

    Public Sub Test_Inline()
        Dim X = A, Y = B
        Dim Time = Date.Now

        For i As UInteger = 0 To Max
            X = A
            Y = B
            X = X + Y
            X += X

            X = A
            Y = B
            X = X + Y
            X += X

            X = A
            Y = B
            X = X + Y
            X += X

            X = A
            Y = B
            X = X + Y
            X += X
        Next
        Console.Write(Date.Now - Time)
        Console.WriteLine(" : Inline")
    End Sub

    Sub Main()
        Test_Method()
        Test_Inline()

        Test_Inline()
        Test_Method()

        Test_Method()
        Test_Inline()

        Console.ReadKey()
    End Sub
End Module

Result

00:00:04.7772708 : Method
00:00:00.7050473 : Inline
00:00:00.7080334 : Inline
00:00:04.7442713 : Method
00:00:04.7302734 : Method
00:00:00.7040401 : Inline

I'm sure I'm not doing bias about this test but result is very large difference between 2 method.

@mikedn
It's not like I ignore your explain but what you are explain is outside my ability to test it, even if I know it in theory but I can't mess around it so I can't get a real experiment about it to confirm what's I know is right or wrong or maybe something else that not mention in theory I know.

@mikedn
Copy link

mikedn commented Feb 16, 2015

  1. Eh, that's a bit of apples to oranges comparison - you've changed ldflds to ldlocs. I'll have to post the generated assembly code to explain what's going on.
    VB loop code:
01592FF8  mov         ecx,edi  
01592FFA  mov         edx,esi  
01592FFC  mov         eax,dword ptr [ecx+0Ch]  ; ldfld Delegate::_methodPtr
01592FFF  mov         ecx,dword ptr [ecx+4]    ; ldfld Delegate::_target
01593002  call        eax  
01593004  add         esi,1  
01593007  jb          01593093  
0159300D  cmp         esi,5F5E100h  
01593013  jbe         01592FF8  

IL loop code:

01592EFB  mov         ecx,edi  
01592EFD  mov         edx,esi  
01592EFF  call        ebx  
01592F01  add         esi,1  
01592F04  jb          01592F90  
01592F0A  cmp         esi,5F5E100h  
01592F10  jbe         01592EFB  

The VB loop code already looks very much as you suggested - the JIT compiler replaced the Invoke call with the x86 equivalent of 2 ldfld and a calli.

The reason why the IL loop is slightly faster is that it lacks the 2 memory reads that corresponds to those 2 flds - the 3 locals that you use in the IL version end up being stored in registers.

So, even if the VB compiler follows your suggestion the results will very likely be identical to what you have now. It's technically possible for the JIT compiler to produce IL loop like code from the VB loop. It just has to take advantage of the fact that delegates are immutable, that would allow it to hoist the ldflds outside the loop.

@mikedn
Copy link

mikedn commented Feb 16, 2015

  1. You've walked straight into a JIT compiler limitation. Method as simple as Add can and should be inlined. Unfortunately the JIT compiler disables inlining when you assign to method's arguments. Change Add like below and you'll get identical performance for both cases.
Public Sub Add(A As Integer, B As Integer)
    Dim x As Integer = A + B
    x += x
End Sub

@RevensofT
Copy link
Author

  1. Thanks; I see, it work inline on runtime, so delegate invoke method is a sugar syntax then I withdraw this propose because it's already did what I propose.

  2. Wow, it's really interesting result, what's condition to do inline like this ?

By the way, talk about register memory I really curios what's data it stored ? Is it address of local variant or value of it ? Is field address and method address stored in register memory too ?

@mikedn
Copy link

mikedn commented Feb 17, 2015

Wow, it's really interesting result, what's condition to do inline like this ?

The main factor that decides inlining is the native code size - the larger the code, the smaller the chance that the method will get inlined. The .NET's JIT compiler also pays attention to the number of local variables, if there are too many it gives up inlining.

Assigning to a method argument always prevents inlining (in the current JIT compilers). Exception handling (try/catch) also prevents inlining. Crossing security boundaries also prevents inlining but it's less likely that you'll run into such cases.

By the way, talk about register memory I really curios what's data it stored ? Is it address of local variant or value of it ? Is field address and method address stored in register memory too ?

Registers are basically small but very fast pieces of memory in the CPU. Anything that doesn't exceed the register size (32/64 bits usually) can be stored in a register - that includes integers, doubles, pointers/addresses, references. The thing is that there is a limited number (8-32) of registers available and the number varies from CPU to CPU. Deciding what variables should be stored in registers is an important compiler optimization and it can impact other optimization as well. That's one reason why managed compiler don't do many optimizations and leave most of the work to the JIT compiler.

About C# delegate, it has [opt] and .param that VB hasn't even VB has option before C#.

If I understand correctly you want to be able optional parameters with delegates in VB. I think you should create a separate issue for this, I haven't seen this request before and since C# allows this it makes sense that VB should allow it too.

@RevensofT
Copy link
Author

I see, I'm test it(base on add on 5) ) and it seem like local variant limit at 8(might be relate with registers), code size no more then 84 on Sub and 58 on Function, exception handling prevent inlining as you said and recursive method too. Thanks a lot @mikedn .

Registers are basically small but very fast pieces of memory in the CPU...

I see, registers is on CPU cache(I guess it should be on L2 or L1).

About delegate, yes, I mean optional parameters, when I wrote that I think about C# team can do something about their delegate that mean VB team could too( but it's fine now, I know delegate invoke method is just a sugar syntax). However, no one request optional delegate yet ? I'm stay away from delegate(include relaxed delegate conversion) much as possible because I think it drag performance of code down(at less, it prevent inlining), I think I should investigate about it , thank you for your suggestion.

@mikedn
Copy link

mikedn commented Feb 18, 2015

and it seem like local variant limit at 8

Indeed, in the x86 JIT compiler that limit is 8. There's also a limit of 10 for the number of method arguments. RyuJIT appears to have increased both limits to 32.

code size no more then 84 on Sub and 58 on Function

I doubt that there's a particular native code size limit, it's likely that the limit depends on other factors. At a minimum the limit should increase with the number of arguments. It may also depend on AggresiveInlining.

I see, registers is on CPU cache(I guess it should be on L2 or L1).

Not cache, the registers are in the CPU core itself. Cache access is actually slow compared to register access.

However, no one request optional delegate yet ?

None that I know of here.

@RevensofT
Copy link
Author

There's also a limit of 10 for the number of method arguments. RyuJIT appears to have increased both limits to 32.

Thanks, sadly it's not like RyuJIT will be apply to .Net soon.

I doubt that there's a particular native code size limit, it's likely that the limit depends on other factors.

All I can do is observe CIL code size, not much I can do to clarify inlining's conditions.

It may also depend on AggresiveInlining.

I try to use it before but it seem like I'm not understand its condition to work and a description in MSDN is really not help at all, "The method should be inlined if possible".

Not cache, the registers are in the CPU core itself. Cache access is actually slow compared to register access.

I see, thanks again.

@theoy
Copy link
Contributor

theoy commented Mar 5, 2015

This issue reports five separate issues, and therefore is not really actionable. We'll split it up.

@theoy theoy added this to the 1.0-rc2 milestone Mar 5, 2015
@theoy theoy assigned theoy and VSadov and unassigned theoy Mar 5, 2015
@theoy theoy modified the milestones: 1.0 (stable), 1.0-rc2 Mar 5, 2015
@VSadov
Copy link
Member

VSadov commented Apr 27, 2015

none of the 5 issues seem to be a defect though.

@VSadov VSadov added this to the 1.1 milestone Apr 27, 2015
@VSadov VSadov removed this from the 1.0 (stable) milestone Apr 27, 2015
@VSadov VSadov removed their assignment May 12, 2015
@VSadov VSadov modified the milestones: C# 7 and VB 15, 1.1 May 12, 2015
@jaredpar jaredpar removed this from the C# 7 and VB 15 milestone Nov 23, 2015
@dpoeschl dpoeschl added this to the 2.0 milestone Dec 3, 2015
@gafter gafter modified the milestones: 2.0 (Preview 1), 2.0 (RC) Jun 20, 2016
@gafter gafter modified the milestones: 2.1, 2.0 (RC) Jul 19, 2016
@jaredpar jaredpar modified the milestones: Unknown, 2.1 Jan 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants