Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: enable tail calls and copy omission for implicit byref structs #33004

Merged
merged 9 commits into from
Mar 11, 2020

Conversation

AndyAyersMS
Copy link
Member

Handle cases where a caller passes an implicit byref argument struct to
a callee in tail position. There is no need to create a local copy or
to block tail calling.

This pattern comes up increasingly often as we write more layered code
operating on spans and similar structs.

Handle cases where a caller passes an implicit byref argument struct to
a callee in tail position. There is no need to create a local copy or
to block tail calling.

This pattern comes up increasingly often as we write more layered code
operating on spans and similar structs.
@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib

Addresses issues seen in #13824 (see comment) and #32843 (see comment).

This modifies the promotion heuristic for implicit byrefs to not consider uses at (some) call sites as significant for promotion, and modifies the legality checks for tail calls to allow implicit byref params to be passed as args in tail calls. To make this work we then must avoid copying the param to a local struct at the call.

Jit diffs shows a lot of hits in the json parser and other places where spans are passed through various layers.

Found 92 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -25235 (-0.05% of base)
    diff is an improvement.

Top file regressions (bytes):
         124 : System.Linq.Parallel.dasm (0.01% of base)
         116 : System.Reflection.Metadata.dasm (0.03% of base)
          14 : xunit.core.dasm (0.02% of base)
           6 : System.Composition.Hosting.dasm (0.01% of base)
           2 : System.CodeDom.dasm (0.00% of base)

Top file improvements (bytes):
       -5553 : System.Private.CoreLib.dasm (-0.12% of base)
       -2843 : System.Security.Cryptography.Pkcs.dasm (-0.66% of base)
       -2260 : System.Text.Json.dasm (-0.47% of base)
       -2253 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.04% of base)
       -1963 : Microsoft.CodeAnalysis.CSharp.dasm (-0.05% of base)
       -1586 : System.Security.Cryptography.Algorithms.dasm (-0.52% of base)
       -1024 : System.Data.Common.dasm (-0.07% of base)
        -813 : System.Collections.Immutable.dasm (-0.07% of base)
        -793 : Microsoft.CodeAnalysis.dasm (-0.05% of base)
        -625 : Newtonsoft.Json.dasm (-0.07% of base)
        -603 : System.Threading.Tasks.Dataflow.dasm (-0.07% of base)
        -476 : System.Memory.dasm (-0.19% of base)
        -451 : System.Security.Cryptography.Cng.dasm (-0.30% of base)
        -434 : System.Net.Http.dasm (-0.06% of base)
        -297 : System.Reflection.MetadataLoadContext.dasm (-0.16% of base)
        -257 : System.Runtime.Numerics.dasm (-0.36% of base)
        -245 : System.Drawing.Common.dasm (-0.08% of base)
        -212 : System.IO.Compression.dasm (-0.28% of base)
        -203 : System.Collections.dasm (-0.04% of base)
        -167 : System.Numerics.Tensors.dasm (-0.07% of base)

92 total files with Code Size differences (87 improved, 5 regressed), 139 unchanged.

Top method regressions (bytes):
         596 ( 3.62% of base) : System.Linq.Parallel.dasm - UnionQueryOperator`1:WrapPartitionedStream(PartitionedStream`2,PartitionedStream`2,IPartitionedStreamRecipient`1,bool,QuerySettings):this (49 methods)
         158 ( 1.06% of base) : System.Linq.Parallel.dasm - ExceptQueryOperator`1:WrapPartitionedStream(PartitionedStream`2,PartitionedStream`2,IPartitionedStreamRecipient`1,bool,QuerySettings):this (49 methods)
         158 ( 1.06% of base) : System.Linq.Parallel.dasm - IntersectQueryOperator`1:WrapPartitionedStream(PartitionedStream`2,PartitionedStream`2,IPartitionedStreamRecipient`1,bool,QuerySettings):this (49 methods)
          54 ( 5.61% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteNumberEscape(ReadOnlySpan`1,long):this (4 methods)
          37 ( 4.91% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:PassArgumentByRef(bool,BoundExpression,KeyValuePair`2,KeyValuePair`2,TypeSymbol,String,DiagnosticBag):BoundExpression:this
          35 ( 1.12% of base) : System.Numerics.Tensors.dasm - DenseTensor`1:Reshape(ReadOnlySpan`1):Tensor`1:this (7 methods)
          30 ( 4.97% of base) : Microsoft.CodeAnalysis.CSharp.dasm - LocalRewriter:MakeIsOperator(BoundIsOperator,CSharpSyntaxNode,BoundExpression,BoundTypeExpression,Conversion,TypeSymbol):BoundExpression:this
          28 ( 4.54% of base) : Microsoft.CodeAnalysis.dasm - ValueTuple`2:op_Equality(ValueTuple`2,ValueTuple`2):bool (7 methods)
          27 ( 5.68% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteStringEscape(ReadOnlySpan`1,DateTime):this (2 methods)
          26 ( 3.04% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`1:HeapSort(Span`1) (5 methods)
          21 ( 3.58% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteStringEscape(ReadOnlySpan`1,DateTimeOffset):this (2 methods)
          21 ( 3.58% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteNumberEscape(ReadOnlySpan`1,Decimal):this (2 methods)
          21 ( 3.58% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteStringEscape(ReadOnlySpan`1,Guid):this (2 methods)
          19 ( 3.79% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteNumberEscape(ReadOnlySpan`1,double):this (2 methods)
          19 ( 3.79% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteNumberEscape(ReadOnlySpan`1,float):this (2 methods)
          17 ( 1.31% of base) : System.Runtime.Numerics.dasm - BigInteger:.ctor(ReadOnlySpan`1,bool,bool):this
          16 ( 2.71% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteLiteralEscape(ReadOnlySpan`1,ReadOnlySpan`1):this (2 methods)
          16 ( 1.08% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteStringEscape(ReadOnlySpan`1,ReadOnlySpan`1):this (4 methods)
          16 ( 2.71% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteBase64Escape(ReadOnlySpan`1,ReadOnlySpan`1):this (2 methods)
          14 ( 4.81% of base) : System.Private.CoreLib.dasm - MemoryStream:Read(Span`1):int:this

Top method improvements (bytes):
        -761 (-19.18% of base) : System.Private.CoreLib.dasm - Utf8Parser:TryParse(ReadOnlySpan`1,byref,byref,ushort):bool (16 methods)
        -336 (-82.76% of base) : System.Private.CoreLib.dasm - Path:Join(ReadOnlySpan`1,ReadOnlySpan`1,ReadOnlySpan`1,ReadOnlySpan`1):String
        -324 (-53.64% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteStringByOptions(ReadOnlySpan`1,ReadOnlySpan`1):this (4 methods)
        -306 (-3.05% of base) : System.Security.Cryptography.Algorithms.dasm - KeyFormatHelper:ReadEncryptedPkcs8(ref,ReadOnlyMemory`1,ReadOnlySpan`1,ReadOnlySpan`1,KeyReader`1,byref,byref) (7 methods)
        -306 (-3.05% of base) : System.Security.Cryptography.Pkcs.dasm - KeyFormatHelper:ReadEncryptedPkcs8(ref,ReadOnlyMemory`1,ReadOnlySpan`1,ReadOnlySpan`1,KeyReader`1,byref,byref) (7 methods)
        -262 (-14.24% of base) : Newtonsoft.Json.dasm - DictionaryWrapper`2:Contains(KeyValuePair`2):bool:this (7 methods)
        -243 (-9.14% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:IsProperSubsetOf(IEnumerable`1,MutationInput):bool (7 methods)
        -240 (-83.33% of base) : System.Private.CoreLib.dasm - Path:Join(ReadOnlySpan`1,ReadOnlySpan`1,ReadOnlySpan`1):String
        -238 (-9.06% of base) : System.Reflection.MetadataLoadContext.dasm - EcmaModule:GetTypeCoreNoCache(ReadOnlySpan`1,ReadOnlySpan`1,byref):RoDefinitionType:this
        -216 (-69.23% of base) : System.Text.Json.dasm - JsonWriterHelper:ValidatePropertyAndValue(ReadOnlySpan`1,ReadOnlySpan`1) (4 methods)
        -190 (-6.61% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Decrypt(byref,ReadOnlySpan`1,ReadOnlySpan`1,ReadOnlySpan`1,Span`1):int
        -190 (-6.61% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Decrypt(byref,ReadOnlySpan`1,ReadOnlySpan`1,ReadOnlySpan`1,Span`1):int
        -187 (-6.50% of base) : System.Security.Cryptography.Pkcs.dasm - PasswordBasedEncryption:Decrypt(byref,ReadOnlySpan`1,ReadOnlySpan`1,ReadOnlySpan`1,Span`1):int
        -176 (-38.60% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteNumberByOptions(ReadOnlySpan`1,long):this (4 methods)
        -174 (-4.22% of base) : System.Security.Cryptography.Algorithms.dasm - KeyFormatHelper:ReadSubjectPublicKeyInfo(ref,ReadOnlyMemory`1,KeyReader`1,byref,byref) (7 methods)
        -174 (-4.23% of base) : System.Security.Cryptography.Algorithms.dasm - KeyFormatHelper:ReadPkcs8(ref,ReadOnlyMemory`1,KeyReader`1,byref,byref) (7 methods)
        -174 (-4.22% of base) : System.Security.Cryptography.Pkcs.dasm - KeyFormatHelper:ReadSubjectPublicKeyInfo(ref,ReadOnlyMemory`1,KeyReader`1,byref,byref) (7 methods)
        -174 (-4.23% of base) : System.Security.Cryptography.Pkcs.dasm - KeyFormatHelper:ReadPkcs8(ref,ReadOnlyMemory`1,KeyReader`1,byref,byref) (7 methods)
        -167 (-59.01% of base) : Microsoft.CodeAnalysis.CSharp.dasm - CSharpAttributeData:CreateMetadataExpression(TypedConstant,EmitContext):IMetadataExpression:this
        -162 (-53.64% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteLiteralByOptions(ReadOnlySpan`1,ReadOnlySpan`1):this (2 methods)

Top method regressions (percentages):
           9 (12.50% of base) : Microsoft.CodeAnalysis.dasm - SyntaxWalker:VisitTrivia(SyntaxTrivia):this
           9 ( 8.74% of base) : Microsoft.CodeAnalysis.CSharp.dasm - CSharpSyntaxWalker:VisitTrivia(SyntaxTrivia):this
           9 ( 8.74% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VisualBasicSyntaxWalker:VisitTrivia(SyntaxTrivia):this
          12 ( 8.28% of base) : CommandLine.dasm - <>c__DisplayClass0_1:<MapValues>b__1(KeyValuePair`2):bool:this
          13 ( 7.93% of base) : System.Transactions.Local.dasm - TransactionsEtwProvider:TransactionScopeCurrentChanged(TransactionTraceIdentifier,TransactionTraceIdentifier):this
          11 ( 7.69% of base) : System.Reflection.Metadata.dasm - StateMachineMethodTableReader:.ctor(int,bool,int,MemoryBlock,int):this
           4 ( 7.41% of base) : System.Net.Sockets.dasm - UdpReceiveResult:op_Equality(UdpReceiveResult,UdpReceiveResult):bool
           7 ( 6.48% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Binder:ReportDiagnosticsIfObsolete(DiagnosticBag,Conversion,SyntaxNodeOrToken,bool):this
          10 ( 6.41% of base) : System.Transactions.Local.dasm - TransactionsEtwProvider:TransactionPromoted(TransactionTraceIdentifier,TransactionTraceIdentifier):this
          11 ( 6.15% of base) : System.Reflection.Metadata.dasm - CustomDebugInformationTableReader:.ctor(int,bool,int,int,int,MemoryBlock,int):this
          27 ( 5.68% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteStringEscape(ReadOnlySpan`1,DateTime):this (2 methods)
          54 ( 5.61% of base) : System.Text.Json.dasm - Utf8JsonWriter:WriteNumberEscape(ReadOnlySpan`1,long):this (4 methods)
           9 ( 5.56% of base) : System.Reflection.Metadata.dasm - FieldLayoutTableReader:.ctor(int,bool,int,MemoryBlock,int):this
           9 ( 5.56% of base) : System.Reflection.Metadata.dasm - FieldRVATableReader:.ctor(int,bool,int,MemoryBlock,int):this
           9 ( 5.56% of base) : System.Reflection.Metadata.dasm - NestedClassTableReader:.ctor(int,bool,int,MemoryBlock,int):this
           2 ( 5.26% of base) : Microsoft.CodeAnalysis.dasm - <>c:<ReportAnalyzerExecutionTime>b__42_2(KeyValuePair`2):Assembly:this
           2 ( 5.26% of base) : System.Private.DataContractSerialization.dasm - XmlUTF8NodeWriter:WriteGuidText(Guid):this
           9 ( 5.26% of base) : System.Reflection.Metadata.dasm - ClassLayoutTableReader:.ctor(int,bool,int,MemoryBlock,int):this
           6 ( 5.17% of base) : System.Transactions.Local.dasm - TransactionsEtwProvider:TransactionScopeCreated(TransactionTraceIdentifier,int):this
           9 ( 5.14% of base) : System.Reflection.Metadata.dasm - InterfaceImplTableReader:.ctor(int,bool,int,int,MemoryBlock,int):this

Top method improvements (percentages):
        -240 (-83.33% of base) : System.Private.CoreLib.dasm - Path:Join(ReadOnlySpan`1,ReadOnlySpan`1,ReadOnlySpan`1):String
        -336 (-82.76% of base) : System.Private.CoreLib.dasm - Path:Join(ReadOnlySpan`1,ReadOnlySpan`1,ReadOnlySpan`1,ReadOnlySpan`1):String
         -20 (-80.00% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Int(Decimal):Decimal
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlDecimal:Add(SqlDecimal,SqlDecimal):SqlDecimal
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlDecimal:Multiply(SqlDecimal,SqlDecimal):SqlDecimal
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlDecimal:Divide(SqlDecimal,SqlDecimal):SqlDecimal
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlDouble:Add(SqlDouble,SqlDouble):SqlDouble
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlDouble:Subtract(SqlDouble,SqlDouble):SqlDouble
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlDouble:Multiply(SqlDouble,SqlDouble):SqlDouble
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlDouble:Divide(SqlDouble,SqlDouble):SqlDouble
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlInt64:OnesComplement(SqlInt64):SqlInt64
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlInt64:Add(SqlInt64,SqlInt64):SqlInt64
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlInt64:Subtract(SqlInt64,SqlInt64):SqlInt64
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlInt64:Multiply(SqlInt64,SqlInt64):SqlInt64
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlInt64:Divide(SqlInt64,SqlInt64):SqlInt64
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlInt64:Mod(SqlInt64,SqlInt64):SqlInt64
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlInt64:Modulus(SqlInt64,SqlInt64):SqlInt64
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlInt64:BitwiseAnd(SqlInt64,SqlInt64):SqlInt64
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlInt64:BitwiseOr(SqlInt64,SqlInt64):SqlInt64
         -20 (-80.00% of base) : System.Data.Common.dasm - SqlInt64:Xor(SqlInt64,SqlInt64):SqlInt64

1996 total methods with Code Size differences (1811 improved, 185 regressed), 238046 unchanged.

Lots of interesting diffs, here's simple one:

;; SqlDecimal:Add(SqlDecimal,SqlDecimal):SqlDecimal
;; before

G_M20502_IG01:
       push     rsi
       sub      rsp, 32
       mov      rsi, rcx

G_M20502_IG02:
       mov      rcx, rsi
       call     SqlDecimal:op_Addition(SqlDecimal,SqlDecimal):SqlDecimal
       mov      rax, rsi

G_M20502_IG03:
       add      rsp, 32
       pop      rsi
       ret      

;; after

G_M20502_IG02:
       jmp      SqlDecimal:op_Addition(SqlDecimal,SqlDecimal):SqlDecimal

More is possible here:

  • can do this for any call in tail position, not just tail calls
  • can handle passing a nested sub-struct to callee
  • can handle promoted case by writing back promoted fields to the original struct before tail calling
    (and avoid write-back if original struct and promoted fields haven't diverged)
  • true liveness for robust last-use detection would be nice

@AndyAyersMS AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 29, 2020
@AndyAyersMS
Copy link
Member Author

Needs an extra safety check to make sure we're not introducing aliasing -- eg same byref passed to callee in two param slots, and callee writes to one.... hopefully doesn't happen often.

@jashook jashook self-requested a review February 29, 2020 04:14
Copy link
Contributor

@jashook jashook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm minus a nit

//
// Otherwise, see how many appearances there are. We keep two early ref counts: total
// number of references to the struct or some field, and how many of these are
// arguments to calls. We undo promotion unless we see enougn non-call uses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: enougn -> enough

@benaadams
Copy link
Member

Raised issue for "Linux arm checked" #33012

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 29, 2020

As with anything aliasing related, checking is costly... still working on it.

@jaykrell
Copy link
Contributor

It’d be nice to test & implement similar for mono. /cc @lambdageek

@AndyAyersMS
Copy link
Member Author

Note for coreclr, this is windows-x64 and windows-arm64 only.

More register-intensive ABIs like SysV should (ideally) already be able to optimize some of these cases -- that is, if we keep passed-via-registers structs in registers, they can just flow through to callees. But if we materialize the struct on the frame (as has been our habit) then we will miss these cases on SysV too. The fix for that is quite different than the fix in this PR.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Mar 1, 2020

A first cut at an alias analysis is in, not sure if its complexity matches its value, though.

Belive there is a bug in morph that I'm working around for now; will chase this down separately, and if/when fixed we can get rid of the check for TYP_I_IMPL args aliasing our byref arg; anything that can alias the struct should be TYP_BYREF.

Have changed the "do not return" case to require that there be just one reference to the implicit byref in the method. We're only doing the alias analysis for tail calls, so those are the only cases where we can safely deal with multiple references.

This version has basically the same diffs as before (we lose 9 cases); we don't expect to see true aliases very often. Magic arg limit of 6 chosen so we still get the copy elimination and tail calls in SPC's Number:TryFormatInt32

public static bool TryFormatInt32(int value, int hexMask, ReadOnlySpan<char> format, IFormatProvider? provider, Span<char> destination, out int charsWritten)
{
// Fast path for default format
if (format.Length == 0)
{
return value >= 0 ?
TryUInt32ToDecStr((uint)value, digits: -1, destination, out charsWritten) :
TryNegativeInt32ToDecStr(value, digits: -1, NumberFormatInfo.GetInstance(provider).NegativeSign, destination, out charsWritten);
}
return TryFormatInt32Slow(value, hexMask, format, provider, destination, out charsWritten);

;;; BEFORE

; Assembly listing for method Number:TryFormatInt32(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T01] (  6,  4   )     int  ->  rsi        
;  V01 arg1         [V01,T03] (  3,  2.50)     int  ->  rdx        
;  V02 arg2         [V02,T00] (  4,  7   )   byref  ->   r8         ld-addr-op
;  V03 arg3         [V03,T02] (  5,  3.50)     ref  ->   r9         class-hnd
;  V04 arg4         [V04,T04] (  2,  4   )   byref  ->  rcx        
;  V05 arg5         [V05,T16] (  3,  1.50)   byref  ->  rdi        
;  V06 OutArgs      [V06    ] (  1,  1   )  lclBlk (48) [rsp+0x00]   "OutgoingArgSpace"
;  V07 tmp1         [V07,T17] (  3,  1.50)     ref  ->  rax         class-hnd "Inline return value spill temp"
;* V08 tmp2         [V08    ] (  0,  0   )   byref  ->  zero-ref    V12._pointer(offs=0x00) P-INDEP "field V02._pointer (fldOffset=0x0)"
;* V09 tmp3         [V09    ] (  0,  0   )     int  ->  zero-ref    V12._length(offs=0x08) P-INDEP "field V02._length (fldOffset=0x8)"
;  V10 tmp4         [V10,T08] (  4,  2.50)   byref  ->  rbx         V13._pointer(offs=0x00) P-INDEP "field V04._pointer (fldOffset=0x0)"
;  V11 tmp5         [V11,T09] (  4,  2.50)     int  ->  rbp         V13._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)"
;* V12 tmp6         [V12    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref"
;* V13 tmp7         [V13    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref"
;  V14 tmp8         [V14    ] (  6,  6   )  struct (16) [rsp+0x48]   do-not-enreg[XSB] must-init addr-exposed "by-value struct argument"
;  V15 tmp9         [V15,T05] (  3,  3   )   byref  ->   r9         stack-byref "BlockOp address local"
;  V16 tmp10        [V16,T12] (  2,  2   )     int  ->  rcx         "argument with side effect"
;  V17 tmp11        [V17,T10] (  2,  2   )     ref  ->   r8         "argument with side effect"
;  V18 tmp12        [V18,T06] (  3,  3   )   byref  ->   r8         stack-byref "BlockOp address local"
;  V19 tmp13        [V19,T13] (  2,  2   )     int  ->  rcx         "argument with side effect"
;  V20 tmp14        [V20    ] (  2,  2   )  struct (16) [rsp+0x38]   do-not-enreg[XSB] must-init addr-exposed "by-value struct argument"
;  V21 tmp15        [V21,T07] (  3,  3   )   byref  ->   r8         stack-byref "BlockOp address local"
;  V22 tmp16        [V22,T14] (  2,  2   )     int  ->  rcx         "argument with side effect"
;  V23 tmp17        [V23,T15] (  2,  2   )     int  ->  rdx         "argument with side effect"
;  V24 tmp18        [V24,T11] (  2,  2   )     ref  ->   r9         "argument with side effect"
;
; Lcl frame size = 88

G_M23990_IG01:
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 88
       vzeroupper 
       xor      rax, rax
       mov      qword ptr [rsp+48H], rax
       mov      qword ptr [rsp+38H], rax
       mov      esi, ecx
       mov      rcx, bword ptr [rsp+A0H]
       mov      rdi, bword ptr [rsp+A8H]
						;; bbWeight=1    PerfScore 9.75
G_M23990_IG02:
       mov      rbx, bword ptr [rcx]
       mov      ebp, dword ptr [rcx+8]
						;; bbWeight=1    PerfScore 4.00
G_M23990_IG03:
       cmp      dword ptr [r8+8], 0
       jne      SHORT G_M23990_IG10
						;; bbWeight=1    PerfScore 3.00
G_M23990_IG04:
       test     esi, esi
       jge      SHORT G_M23990_IG08
       test     r9, r9
       je       SHORT G_M23990_IG05
       mov      rcx, r9
       call     NumberFormatInfo:<GetInstance>g__GetProviderNonNull|42_0(IFormatProvider):NumberFormatInfo
       jmp      SHORT G_M23990_IG06
						;; bbWeight=0.50 PerfScore 2.88
G_M23990_IG05:
       call     NumberFormatInfo:get_CurrentInfo():NumberFormatInfo
						;; bbWeight=0.50 PerfScore 0.50
G_M23990_IG06:
       mov      ecx, esi
       mov      r8, gword ptr [rax+40]
       lea      r9, bword ptr [rsp+48H]
       mov      bword ptr [r9], rbx
       mov      dword ptr [r9+8], ebp
       mov      bword ptr [rsp+20H], rdi
       lea      r9, bword ptr [rsp+48H]
       mov      edx, -1
       call     Number:TryNegativeInt32ToDecStr(int,int,String,Span`1,byref):bool
       nop      
						;; bbWeight=0.50 PerfScore 3.88
G_M23990_IG07:
       add      rsp, 88
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       ret      
						;; bbWeight=0.50 PerfScore 1.63
G_M23990_IG08:
       mov      ecx, esi
       lea      r8, bword ptr [rsp+48H]
       mov      bword ptr [r8], rbx
       mov      dword ptr [r8+8], ebp
       lea      r8, bword ptr [rsp+48H]
       mov      r9, rdi
       mov      edx, -1
       call     Number:TryUInt32ToDecStr(int,int,Span`1,byref):bool
       nop      
						;; bbWeight=0.50 PerfScore 2.50
G_M23990_IG09:
       add      rsp, 88
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       ret      
						;; bbWeight=0.50 PerfScore 1.63
G_M23990_IG10:
       mov      ecx, esi
						;; bbWeight=0.50 PerfScore 0.13
G_M23990_IG11:
       vmovdqu  xmm0, xmmword ptr [r8]
       vmovdqu  xmmword ptr [rsp+38H], xmm0
						;; bbWeight=0.50 PerfScore 1.50
G_M23990_IG12:
       lea      r8, bword ptr [rsp+48H]
       mov      bword ptr [r8], rbx
       mov      dword ptr [r8+8], ebp
       mov      bword ptr [rsp+28H], rdi
       lea      r8, bword ptr [rsp+38H]
       lea      rax, bword ptr [rsp+48H]
       mov      bword ptr [rsp+20H], rax
       call     Number:<TryFormatInt32>g__TryFormatInt32Slow|33_0(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool
       nop      
						;; bbWeight=0.50 PerfScore 3.38
G_M23990_IG13:
       add      rsp, 88
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       ret      
						;; bbWeight=0.50 PerfScore 1.63

; Total bytes of code 228, prolog size 23, PerfScore 59.28, (MethodHash=5b40a24c) for method Number:TryFormatInt32(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool

;;; AFTER

; Assembly listing for method Number:TryFormatInt32(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T01] (  6,  4   )     int  ->  rsi        
;  V01 arg1         [V01,T03] (  3,  2.50)     int  ->  rdx        
;  V02 arg2         [V02,T00] (  4,  7   )   byref  ->   r8         ld-addr-op
;  V03 arg3         [V03,T02] (  5,  3.50)     ref  ->   r9         class-hnd
;  V04 arg4         [V04,T04] (  3,  3   )   byref  ->  rdi        
;  V05 arg5         [V05,T06] (  3,  1.50)   byref  ->  rbx        
;  V06 OutArgs      [V06    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;  V07 tmp1         [V07,T07] (  3,  1.50)     ref  ->  rax         class-hnd "Inline return value spill temp"
;* V08 tmp2         [V08    ] (  0,  0   )   byref  ->  zero-ref    V12._pointer(offs=0x00) P-INDEP "field V02._pointer (fldOffset=0x0)"
;* V09 tmp3         [V09    ] (  0,  0   )     int  ->  zero-ref    V12._length(offs=0x08) P-INDEP "field V02._length (fldOffset=0x8)"
;* V10 tmp4         [V10    ] (  0,  0   )   byref  ->  zero-ref    V13._pointer(offs=0x00) P-INDEP "field V04._pointer (fldOffset=0x0)"
;* V11 tmp5         [V11    ] (  0,  0   )     int  ->  zero-ref    V13._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)"
;* V12 tmp6         [V12    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref"
;* V13 tmp7         [V13    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref"
;  V14 rat0         [V14,T05] (  2,  2   )   byref  ->   r9         "Fast tail call lowering is creating a new local variable"
;
; Lcl frame size = 32

G_M23990_IG01:
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 32
       mov      esi, ecx
       mov      rdi, bword ptr [rsp+60H]
       mov      rbx, bword ptr [rsp+68H]
						;; bbWeight=1    PerfScore 5.50
G_M23990_IG02:
       cmp      dword ptr [r8+8], 0
       jne      SHORT G_M23990_IG09
						;; bbWeight=1    PerfScore 3.00
G_M23990_IG03:
       test     esi, esi
       jge      SHORT G_M23990_IG07
       test     r9, r9
       je       SHORT G_M23990_IG04
       mov      rcx, r9
       call     NumberFormatInfo:<GetInstance>g__GetProviderNonNull|42_0(IFormatProvider):NumberFormatInfo
       jmp      SHORT G_M23990_IG05
						;; bbWeight=0.50 PerfScore 2.88
G_M23990_IG04:
       call     NumberFormatInfo:get_CurrentInfo():NumberFormatInfo
						;; bbWeight=0.50 PerfScore 0.50
G_M23990_IG05:
       mov      r9, rdi
       mov      bword ptr [rsp+60H], rbx
       mov      r8, gword ptr [rax+40]
       mov      ecx, esi
       mov      edx, -1
						;; bbWeight=0.50 PerfScore 1.88
G_M23990_IG06:
       add      rsp, 32
       pop      rbx
       pop      rsi
       pop      rdi
       jmp      Number:TryNegativeInt32ToDecStr(int,int,String,Span`1,byref):bool
						;; bbWeight=0.50 PerfScore 1.88
G_M23990_IG07:
       mov      ecx, esi
       mov      r9, rbx
       mov      r8, rdi
       mov      edx, -1
						;; bbWeight=0.50 PerfScore 0.50
G_M23990_IG08:
       add      rsp, 32
       pop      rbx
       pop      rsi
       pop      rdi
       jmp      Number:TryUInt32ToDecStr(int,int,Span`1,byref):bool
						;; bbWeight=0.50 PerfScore 1.88
G_M23990_IG09:
       mov      bword ptr [rsp+60H], rdi
       mov      bword ptr [rsp+68H], rbx
       mov      ecx, esi
						;; bbWeight=0.50 PerfScore 1.13
G_M23990_IG10:
       add      rsp, 32
       pop      rbx
       pop      rsi
       pop      rdi
       jmp      Number:<TryFormatInt32>g__TryFormatInt32Slow|33_0(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool
						;; bbWeight=0.50 PerfScore 1.88

; Total bytes of code 130, prolog size 19, PerfScore 34.00, (MethodHash=5b40a24c) for method Number:TryFormatInt32(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool

Note we could tighten up the latter a bit more, there's no need to copy the span in to RDI/RBX and back.

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Mar 1, 2020

@AndyAyersMS I rebased #32843 on top of this change (and removed the in weirdness that was contentious in that PR) and I'm seeing significantly improved codegen. 😄

Method Toolchain Mean Error StdDev Ratio RatioSD
TryParseInt32 master 1,218.0 ns 30.51 ns 33.91 ns 1.00 0.00
TryParseInt32 utf8parser 841.8 ns 7.34 ns 6.87 ns 0.69 0.02

The only remaining nit is the "jump-to-jump" pattern that appears in the below codegen, but I don't know if that's related to this PR or if it's a separate opportunity for further cleanup. (Edit: maybe it's for predict-not-taken scenarios?)

00007ffe`449db320 4585c9          test    r9d,r9d
00007ffe`449db323 7505            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca905a (00007ffe`449db32a)
00007ffe`449db325 e9dedaffff      jmp     CLRStub[MethodDescPrestub]@7ffe449d8e08 (00007ffe`449d8e08)
00007ffe`449db32a 4183c920        or      r9d,20h
00007ffe`449db32e 4183f967        cmp     r9d,67h
00007ffe`449db332 7f0e            jg      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca9072 (00007ffe`449db342)
00007ffe`449db334 4183f964        cmp     r9d,64h
00007ffe`449db338 74eb            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca9055 (00007ffe`449db325)
00007ffe`449db33a 4183f967        cmp     r9d,67h
00007ffe`449db33e 74e5            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca9055 (00007ffe`449db325)
00007ffe`449db340 eb18            jmp     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca908a (00007ffe`449db35a)
00007ffe`449db342 4183f96e        cmp     r9d,6Eh
00007ffe`449db346 7408            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca9080 (00007ffe`449db350)
00007ffe`449db348 4183f978        cmp     r9d,78h
00007ffe`449db34c 7407            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca9085 (00007ffe`449db355)
00007ffe`449db34e eb0a            jmp     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca908a (00007ffe`449db35a)
00007ffe`449db350 e9d3daffff      jmp     CLRStub[MethodDescPrestub]@7ffe449d8e28 (00007ffe`449d8e28)
00007ffe`449db355 e94edbffff      jmp     CLRStub[MethodDescPrestub]@7ffe449d8ea8 (00007ffe`449d8ea8)
00007ffe`449db35a e991f3ffff      jmp     CLRStub[MethodDescPrestub]@7ffe449da6f0 (00007ffe`449da6f0)

For reference, here's the C# that was used for this codegen:

https://github.com/GrabYourPitchforks/runtime/blob/ab36e65d9d1938a22e8c0baa6aa51e1797016817/src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Integer.Signed.cs#L129-L158

@AndyAyersMS
Copy link
Member Author

@GrabYourPitchforks thanks.

For the jump intensive TryParse example -- we don't have a concept of a "conditional tail jump" so what you see is about the best we can do for now. Ideally we could invert the first branch

G_M5679_IG02:
       4585C9               test     r9d, r9d
       7505                 jne      SHORT G_M5679_IG04
G_M5679_IG03:
       E9D613A5FF           jmp      TryParseInt32D(System.ReadOnlySpan`1[Byte],...)

and emit something like:

G_M5679_IG02:
       4585C9               test     r9d, r9d
       xxxxxxxx             je       TryParseInt32D(System.ReadOnlySpan`1[Byte],...)

G_M5679_IG04:

and similarly elsewhere, but this sort of construct would be a new concept for the jit. And I'm not sure how often we see tail calls with no need for argument setup. Will keep it on my list of things to watch for.

SysV codegen looks like it needs some TLC here too. On SysV spans are passed in registers, and here the jit can't resist spilling them to the stack.

G_M51636_IG01:
       push     rbp
       sub      rsp, 16
       lea      rbp, [rsp+10H]
       mov      bword ptr [rbp-10H], rdi
       mov      qword ptr [rbp-08H], rsi

G_M51636_IG02:
       test     r8d, r8d
       jne      SHORT G_M51636_IG05

G_M51636_IG03:
       mov      rdi, bword ptr [rbp-10H]
       mov      rsi, qword ptr [rbp-08H]

G_M51636_IG04:
       lea      rsp, [rbp]
       pop      rbp
       jmp      TryParseInt32D(ReadOnlySpan`1,byref,byref):bool

@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib any other comments? I figured some of this might need re-expressing...

@erozenfeld
Copy link
Member

I'll review this tomorrow.

Copy link
Member

@erozenfeld erozenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any holes in the algorithm but left some questions and suggestions.

src/coreclr/src/jit/lclmorph.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/lclmorph.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
{
JITDUMP("Arg [%06u] is unpromoted implicit byref V%02u, seeing if we can still tail call\n",
dspTreeID(arg->GetNode()), lcl->GetLclNum());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this JITDUMP to under the next else.

src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
return result;
}

// Will return different answers if x and y refer to same struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y is not a struct here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'y' is a ref so it parsed ok to me, but will revise to make it clearer.

lcl = addr->AsLclVarCommon();
}
else if (addr->OperIs(GT_ADDR))
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a semantic difference between OBJ(lcl) and OBJ(ADDR(lcl)) where lcl is an implicit byref parameter? Do they both represent a copy of the struct? When do we end up with one vs. the other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this means changes over time, and we're straddling that change.

See above?

// Here the call to ZZZ would need to be a slow tail call,
// so we won't tail call at all. But the only
// reference to x is at the call, and it's not in
// a loop, so we can still can avoid making a copy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have a test case when the call is in a loop and a test case for NoReturnLastUse?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add these.

hasByrefParameter = true;
break;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this var to something like copyByRefStructParameter.

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few questions.


if (addr->OperIs(GT_LCL_VAR))
{
lcl = addr->AsLclVarCommon();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this case, LCL_VAR is used as an address here (in other cases it is used as a value), so it has to be 'byref' or 'ref' type. Then how can it be lvaIsImplicitByRefLocal if it always should return false for TYP_IMPL size lclVars?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is potentially confusing.

The short answer is that before morph, arguments passed implicitly by-reference are represented by-value in the IR.
This gets changed in fgMorphImplicitByRefArgs where we make the by-reference aspect explicit. But the above helper gets called both before and after, and so needs to recognize cases where the local either represents a struct or a pointer to a struct.

Note the first two "match" cases here are pre-existing logic; all I did was move this to a helper method and add the last case.

// Return Value:
// GenTreeLclVarCommon node for the local, or nullptr.

GenTreeLclVarCommon* GenTree::IsImplicitByrefParameterValue(Compiler* compiler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can change GenTreeLclVarCommon to GenTreeLclVar, it is more precise for this case.

if (!call->IsTailCallViaHelper() && (varDsc->lvRefCnt(RCS_EARLY) == 1) && !fgMightHaveLoop())
{
assert(!call->IsTailCall());
unsigned varNum = lcl->AsLclVarCommon()->GetLclNum();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for AsLclVarCommon.

assert(!call->IsTailCall());
unsigned varNum = lcl->AsLclVarCommon()->GetLclNum();
LclVarDsc* varDsc = lvaGetDesc(varNum);
const unsigned short totalAppearances = varDsc->lvRefCnt(RCS_EARLY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: how do you decide what is marked as const and what is not? Like why varNum is not in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to mark all locals that won't be redefined as const, but am more likely to do this for lines I explicitly edit.

{
// First, see if this arg is a byref param.
GenTreeLclVarCommon* const lcl = arg->GetNode()->IsImplicitByrefParameterValue(this);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: create a temp for lcl->GetLclNum()?

// a local struct assembled from the promoted fields.
if (lcl != nullptr)
{
JITDUMP("Arg [%06u] is unpromoted implicit byref V%02u, seeing if we can still tail call\n",
Copy link
Contributor

@sandreenko sandreenko Mar 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sentence is confusing: "Arg 000001 is unpromoted implicit byref V002, seeing if we can still tail call ... no, V002 was promoted. "
Can we check varDsc->lvPromoted before we print that it was not promoted?

JITDUMP("... yes, arg is the only appearance of V%02u\n", lcl->GetLclNum());
hasByrefParameter = false;
}
// If no other call arg refers to this byref, and no other arg is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find it in https://github.com/dotnet/runtime/blob/master/docs/coding-guidelines/clr-jit-coding-conventions.md#73-commenting-code-blocks, but my personal opinion is that we should not allow comments after } and before 'else {`.

// Do we pass 'lcl' more than once to the callee?
if (arg2->isStruct && arg2->passedByRef)
{
GenTreeLclVarCommon* const lcl2 =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, in the previous use of IsImplicitByrefParameterValue was for optimizations, so if it was not able to find the lcl it was correct, it was just rejecting the optimization. Here it looks different: if it doesn't find the local, then it is optimizing. There are cases where locals are buried under IND(ADDR(IND(ADDR())) sequences, I think IsImplicitByrefParameterValue won't give us the right result for such cases and it could lead to an incorrect optimization.
Note that if IND long(ADDR(IND long(ADDR(LCL_VAR long))) doesn't widen, then lclmorph doesn't mark LCL_VAR as lvaSetVarAddrExposed, so the next check won't catch it also.

Copy link
Member

@erozenfeld erozenfeld Mar 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it looks different: if it doesn't find the local, then it is optimizing.

No, if we don't find the local, we are being conservative and checking arg for exposure if arg2 is a TYP_BYREF or TYP_I_IMPL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if IND long(ADDR(IND long(ADDR(LCL_VAR long))) doesn't widen, then lclmorph doesn't mark LCL_VAR as lvaSetVarAddrExposed, so the next check won't catch it also.

I think that's fine, there is no real address exposure here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean a case like call(LCL_VAR v01, IND(ADDR(IND(ADDR(LCL_VAR v01))) ), when we are checking arg2 for interference with arg1, how will we catch it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yes, that may be a problem. Also, can we have call(LCL_VAR V01, ADD(LCL_VAR V01, INT x)) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, won't we have lvHasLdAddrOp set in these cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't set lvHasLdAddrOp if the next instruction consumes it:

if (notStruct && notLastInstr && notDebugCode &&
impILConsumesAddr(codeAddr + sz, impTokenLookupContextHandle, info.compScopeHnd))
{
// We can skip the addrtaken, as next IL instruction consumes
// the address.
}
else
{
lvaTable[varNum].lvHasLdAddrOp = 1;
if (!info.compIsStatic && (varNum == 0))
{
// Addr taken on "this" pointer is significant,
// go ahead to mark it as permanently addr-exposed here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't recognize an implicit byref arg via IsImplicitByrefParameterValue we won't tail call. So for instance, passing a sub-struct of an implicit byref param currently blocks tail calling. I thought about unblocking this but left it as a future enhancement.

If there's an explicit byref arg1 that could alias some other implicit byref arg2 (which is also an implicit byref param), then the only way for arg1 to alias arg2 is for the param local to be address exposed or address taken. This must be an "unconsumed" case per above.

AndyAyersMS and others added 4 commits March 10, 2020 18:52
Co-Authored-By: Eugene Rozenfeld <erozen@microsoft.com>
Use no-opt to ensure tail called methods don't promote their struct
args and so undo the aliasing checks.

Note that one of the tests that could tail call doesn't because of
the way morph reuses struct temps.
@AndyAyersMS
Copy link
Member Author

OSX builds failing with

Error: python 3.7.6_1 is already installed
To upgrade to 3.7.7, run `brew upgrade python`.

@jashook
Copy link
Contributor

jashook commented Mar 11, 2020

Fixed this morning with #33481


GenTreeLclVarCommon* GenTree::IsImplicitByrefParameterValue(Compiler* compiler)
{
#if defined(TARGET_AMD64) || defined(TARGET_ARM64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be limited to windows amd64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows, yes.

We seem to think arm64 can have these too. I don't know the ABI well enough to say one way or another.

#if defined(TARGET_AMD64) || defined(TARGET_ARM64)
unsigned char lvIsImplicitByRef : 1; // Set if the argument is an implicit byref.
#endif // defined(TARGET_AMD64) || defined(TARGET_ARM64)

Perhaps we need a define just for this and a cleanup of all references?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like windows arm64 uses the standard ARM64 EABI, so there are no implicit byrefs there. I'll put up a separate PR to clean this up here and throughout the rest of the jit codebase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that arm64 has implicit byrefs, but defining it for unix amd64 seems incorrect for sure.

Copy link
Contributor

@jashook jashook Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> 16 byte structs on arm64 windows or linux I believe are passed by reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see it now.. (from the EABI)

If the argument type is a Composite Type larger than 16 bytes, then the argument is copied to memory allocated by the caller, and the argument is replaced by a pointer to the copy.

So the right condition is windows x64, and all arm64.

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@erozenfeld erozenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making the changes.

@AndyAyersMS AndyAyersMS merged commit 514db28 into dotnet:master Mar 11, 2020
@AndyAyersMS AndyAyersMS deleted the ImplicitByRefTailCallOpt branch March 11, 2020 20:16
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants