Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow implicit widenings when tailcalling #54864

Merged
merged 2 commits into from
Jul 5, 2021

Conversation

jakobbotsch
Copy link
Member

The managed calling convention dictates that the callee is responsible
for widening up to 4 bytes. We can use this to allow some more
tailcalls.

Fix #51957

I couldn't use SPMI as the new tailcalls change the jitinterface calls (thanks for explaining @sandreenko), so I used PMI. A few diffs in frameworks:

Analyzing CodeSize diffs...
Found 10 files with textual diffs.

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 55385375
Total bytes of diff: 55385274
Total bytes of delta: -101 (-0,00% of base)
    diff is an improvement.


Top file improvements (bytes):
         -24 : Microsoft.CodeAnalysis.dasm (-0,00% of base)
         -20 : System.Private.DataContractSerialization.dasm (-0,00% of base)
         -15 : System.Reflection.Metadata.dasm (-0,00% of base)
         -14 : System.Drawing.Common.dasm (-0,00% of base)
         -10 : System.Diagnostics.EventLog.dasm (-0,01% of base)
          -6 : Microsoft.CodeAnalysis.CSharp.dasm (-0,00% of base)
          -6 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0,00% of base)
          -4 : System.Net.Quic.dasm (-0,00% of base)
          -2 : System.Reflection.MetadataLoadContext.dasm (-0,00% of base)

9 total files with Code Size differences (9 improved, 0 regressed), 263 unchanged.

Top method improvements (bytes):
         -14 (-33,33% of base) : System.Drawing.Common.dasm - System.Drawing.Printing.PrinterSettings:get_Duplex():int:this
         -10 (-47,62% of base) : System.Private.DataContractSerialization.dasm - System.Xml.XmlBinaryReader:GetNodeType():int:this
         -10 (-66,67% of base) : System.Private.DataContractSerialization.dasm - System.Xml.XmlBufferReader:GetNodeType():int:this
         -10 (-37,04% of base) : System.Diagnostics.EventLog.dasm - System.Diagnostics.EventLogEntry:get_EntryType():int:this
          -6 (-30,00% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.SyntaxToken:get_RawContextualKind():int:this
          -6 (-31,58% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.ConstantValue:get_Int16Value():short:this
          -6 (-30,00% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.ConstantValue:get_UInt16Value():ushort:this
          -6 (-30,00% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.ConstantValue:get_Int32Value():int:this
          -6 (-30,00% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.ConstantValue:get_UInt32Value():int:this
          -6 (-30,00% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Syntax.InternalSyntax.IdentifierTokenSyntax:get_RawContextualKind():int:this
          -5 (-12,20% of base) : System.Reflection.Metadata.dasm - System.Reflection.Metadata.GenericParameter:get_Index():int:this
          -5 (-12,20% of base) : System.Reflection.Metadata.dasm - System.Reflection.Metadata.Parameter:get_SequenceNumber():int:this
          -5 (-12,20% of base) : System.Reflection.Metadata.dasm - System.Reflection.Metadata.LocalVariable:get_Index():int:this
          -2 (-2,74% of base) : System.Net.Quic.dasm - System.Net.Quic.Implementations.MsQuic.MsQuicConnection:GetRemoteAvailableUnidirectionalStreamCount():int:this
          -2 (-2,74% of base) : System.Net.Quic.dasm - System.Net.Quic.Implementations.MsQuic.MsQuicConnection:GetRemoteAvailableBidirectionalStreamCount():int:this
          -2 (-3,45% of base) : System.Reflection.MetadataLoadContext.dasm - System.Reflection.TypeLoading.Ecma.EcmaGenericParameterType:ComputePosition():int:this

16 total methods with Code Size differences (16 improved, 0 regressed), 369818 unchanged.

Diffs look as expected, we get some cases where we were relying on the widening anyway.

 ; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
 ; optimized code
 ; rsp based frame
-; partially interruptible
+; fully interruptible
 ; No PGO data
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T00] (  4,  4   )     ref  ->  rcx         this class-hnd
-;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
+;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
 ;
-; Lcl frame size = 40
+; Lcl frame size = 0
 
 G_M58155_IG01:
-       sub      rsp, 40
-						;; bbWeight=1    PerfScore 0.25
+						;; bbWeight=1    PerfScore 0.00
 G_M58155_IG02:
        mov      rax, qword ptr [rcx]
        mov      rax, qword ptr [rax+120]
-       call     qword ptr [rax+8]Microsoft.CodeAnalysis.VisualBasic.Syntax.InternalSyntax.IdentifierTokenSyntax:get_PossibleKeywordKind():ushort:this
-       nop      
-						;; bbWeight=1    PerfScore 7.25
+       mov      rax, qword ptr [rax+8]
+						;; bbWeight=1    PerfScore 6.00
 G_M58155_IG03:
-       add      rsp, 40
-       ret      
-						;; bbWeight=1    PerfScore 1.25
+       rex.jmp  rax
+						;; bbWeight=1    PerfScore 2.00
 
-; Total bytes of code 20, prolog size 4, PerfScore 10.75, instruction count 7, allocated bytes for code 20 (MethodHash=91bc1cd4) for method Microsoft.CodeAnalysis.VisualBasic.Syntax.InternalSyntax.IdentifierTokenSyntax:get_RawContextualKind():int:this
+; Total bytes of code 14, prolog size 0, PerfScore 9.40, instruction count 4, allocated bytes for code 14 (MethodHash=91bc1cd4) for method Microsoft.CodeAnalysis.VisualBasic.Syntax.InternalSyntax.IdentifierTokenSyntax:get_RawContextualKind():int:this
 ; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
 ; optimized code
 ; rsp based frame
-; partially interruptible
+; fully interruptible
 ; No PGO data
 ; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T00] (  4,  3.50)     ref  ->  rcx         this class-hnd
-;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
+;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
 ;  V02 cse0         [V02,T01] (  3,  2.50)     int  ->  rax         "CSE - aggressive"
 ;
-; Lcl frame size = 40
+; Lcl frame size = 0
 
 G_M35284_IG01:
-       sub      rsp, 40
-						;; bbWeight=1    PerfScore 0.25
+						;; bbWeight=1    PerfScore 0.00
 G_M35284_IG02:
        mov      eax, dword ptr [rcx+56]
        cmp      eax, -1
        je       SHORT G_M35284_IG04
 						;; bbWeight=1    PerfScore 3.25
 G_M35284_IG03:
-       add      rsp, 40
        ret      
-						;; bbWeight=0.50 PerfScore 0.63
+						;; bbWeight=0.50 PerfScore 0.50
 G_M35284_IG04:
        mov      edx, 8
        mov      r8d, 1
        xor      r9d, r9d
-       call     System.Drawing.Printing.PrinterSettings:GetModeField(int,short,long):short:this
-       nop      
-						;; bbWeight=0.50 PerfScore 1.00
+						;; bbWeight=0.50 PerfScore 0.38
 G_M35284_IG05:
-       add      rsp, 40
-       ret      
-						;; bbWeight=0.50 PerfScore 0.63
+       jmp      System.Drawing.Printing.PrinterSettings:GetModeField(int,short,long):short:this
+						;; bbWeight=0.50 PerfScore 1.00
 
-; Total bytes of code 42, prolog size 4, PerfScore 9.95, instruction count 13, allocated bytes for code 42 (MethodHash=b14c762b) for method System.Drawing.Printing.PrinterSettings:get_Duplex():int:this
+; Total bytes of code 28, prolog size 0, PerfScore 7.93, instruction count 8, allocated bytes for code 28 (MethodHash=b14c762b) for method System.Drawing.Printing.PrinterSettings:get_Duplex():int:this

cc @dotnet/jit-contrib

The managed calling convention dictates that the callee is responsible
for widening up to 4 bytes. We can use this to allow some more
tailcalls.

Fix dotnet#51957
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 28, 2021
// is tracked with https://github.com/dotnet/runtime/issues/8413.
//
// 3) If the callee has a 9 to 16 byte struct argument and the callee has
// If the callee has a 9 to 16 byte struct argument and the callee has
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was outdated, both of the issues have been fixed.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member Author

PTAL @dotnet/jit-contrib

@EgorBo
Copy link
Member

EgorBo commented Jul 5, 2021

Are failed jitstress jobs related?

@EgorBo
Copy link
Member

EgorBo commented Jul 5, 2021

image

Are we losing method name here? 🙂 I'd be nice to keep it for indirect calls where possible for better readability but it's a separate issue.

@jakobbotsch
Copy link
Member Author

Are failed jitstress jobs related?

I believe the failing test is #54200

Are we losing method name here? 🙂 I'd be nice to keep it for indirect calls where possible for better readability but it's a separate issue.

Hmm, looks like it. I can try to take a look at that separately.

@jakobbotsch jakobbotsch merged commit 66c339b into dotnet:main Jul 5, 2021
@jakobbotsch jakobbotsch deleted the allow-widening-tailcalls branch July 5, 2021 14:02
@ghost ghost locked as resolved and limited conversation to collaborators Aug 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stale narrowing check for tail calls in importer
2 participants