Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: profile updates for return merges and tail calls #48773

Merged
merged 7 commits into from
Mar 2, 2021

Conversation

AndyAyersMS
Copy link
Member

Stop trying to update the common return block profile data during return
merging, as it is not yet known which return blocks will become tail calls.
Start updating constant return block profile data during return merging
as this is when we transform the flow.

Update the common return block profile data during return merging in
morph (adding more counts) and when creating tail calls (removing counts).

Update profile consistency checker to handle switches properly and to use
tolerant compares.

Add extra dumping when solving for edge weights or adjusting flow edge
weights to help track down where errors are coming from.

Add new FMT_WT formatting string for profile weights, and start using it
in fgprofile.cpp.

Stop trying to update the common return block profile data during return
merging, as it is not yet known which return blocks will become tail calls.
Start updating constant return block profile data during return merging
as this is when we transform the flow.

Update the common return block profile data during return merging in
morph (adding more countss) and when creating tail calls (removing counts).

Update profile consistency checker to handle switches properly and to use
tolerant compares.

Add extra dumping when solving for edge weights or adjusting flow edge
weights to help track down where errors are coming from.

Add new FMT_WT formatting string for profile weights, and start using it
in fgprofile.

handle constant return merges too
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 25, 2021
@AndyAyersMS
Copy link
Member Author

@briansull ptal
cc @dotnet/jit-contrib

On my local Tiered-PGO SPMI collection this reduces the number of consistency check failures from 247 to 182 (out of ~2600 compiles).

Will post some info on diffs shortly.

Update FMT_WT to %g so we don't see huge digit strings.
@AndyAyersMS
Copy link
Member Author

Second commit fixed some spurious diffs w/o pgo. Only see a handful of diffs now in regular collections, all in roslyn:

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 9345
Total bytes of diff: 9355
Total bytes of delta: 10 (0.11% of base)
    diff is a regression.


Top file regressions (bytes):
          11 : 28712.dasm (1.19% of base)
           9 : 28162.dasm (1.07% of base)
           4 : 30953.dasm (0.21% of base)
           1 : 30152.dasm (0.16% of base)

Top file improvements (bytes):
          -8 : 30357.dasm (-0.67% of base)
          -7 : 30215.dasm (-1.55% of base)

6 total files with Code Size differences (2 improved, 4 regressed), 1 unchanged.

Top method regressions (bytes):
          11 ( 1.19% of base) : 28712.dasm - Microsoft.CodeAnalysis.CSharp.Binder:LookupMembersInType(Microsoft.CodeAnalysis.CSharp.LookupResult,Microsoft.CodeAnalysis.CSharp.Symbols.TypeSymbol,System.String,int,Roslyn.Utilities.ConsList`1[[Microsoft.CodeAnalysis.CSharp.Symbol, Microsoft.CodeAnalysis.CSharp, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],int,Microsoft.CodeAnalysis.CSharp.Binder,bool,byref):this
           9 ( 1.07% of base) : 28162.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseStatementNoDeclaration(bool):Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.StatementSyntax:this
           4 ( 0.21% of base) : 30953.dasm - Microsoft.CodeAnalysis.CSharp.MethodCompiler:BindMethodBody(Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol,Microsoft.CodeAnalysis.CSharp.TypeCompilationState,Microsoft.CodeAnalysis.DiagnosticBag,byref):Microsoft.CodeAnalysis.CSharp.BoundBlock
           1 ( 0.16% of base) : 30152.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ScanPossibleTypeArgumentList(byref):int:this

Top method improvements (bytes):
          -8 (-0.67% of base) : 30357.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseUsingExpression(byref,byref,byref):this
          -7 (-1.55% of base) : 30215.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ScanCast():bool:this

Top method regressions (percentages):
          11 ( 1.19% of base) : 28712.dasm - Microsoft.CodeAnalysis.CSharp.Binder:LookupMembersInType(Microsoft.CodeAnalysis.CSharp.LookupResult,Microsoft.CodeAnalysis.CSharp.Symbols.TypeSymbol,System.String,int,Roslyn.Utilities.ConsList`1[[Microsoft.CodeAnalysis.CSharp.Symbol, Microsoft.CodeAnalysis.CSharp, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],int,Microsoft.CodeAnalysis.CSharp.Binder,bool,byref):this
           9 ( 1.07% of base) : 28162.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseStatementNoDeclaration(bool):Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.StatementSyntax:this
           4 ( 0.21% of base) : 30953.dasm - Microsoft.CodeAnalysis.CSharp.MethodCompiler:BindMethodBody(Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol,Microsoft.CodeAnalysis.CSharp.TypeCompilationState,Microsoft.CodeAnalysis.DiagnosticBag,byref):Microsoft.CodeAnalysis.CSharp.BoundBlock
           1 ( 0.16% of base) : 30152.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ScanPossibleTypeArgumentList(byref):int:this

Top method improvements (percentages):
          -7 (-1.55% of base) : 30215.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ScanCast():bool:this
          -8 (-0.67% of base) : 30357.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseUsingExpression(byref,byref,byref):this

6 total methods with Code Size differences (2 improved, 4 regressed), 1 unchanged.

My local TieredPGO collection has more diffs, but still not extensive...

31 total files with Code Size differences (19 improved, 12 regressed), 12 unchanged.

Top method regressions (bytes):
          48 ( 7.35% of base) : 6376.dasm - System.RuntimeType:GetTypeCodeImpl():int:this
          27 ( 3.09% of base) : 6228.dasm - System.Buffer:Memmove(byref,byref,long)
          20 ( 4.56% of base) : 6560.dasm - System.Array:Copy(System.Array,System.Array,int)
          12 ( 4.23% of base) : 6538.dasm - System.Runtime.CompilerServices.CastHelpers:IsInstanceOfAny(long,System.Object):System.Object
           7 ( 0.84% of base) : 6289.dasm - System.Collections.Generic.Dictionary`2[__Canon,__Canon][System.__Canon,System.__Canon]:FindValue(System.__Canon):byref:this
           5 ( 0.99% of base) : 6554.dasm - System.Diagnostics.Tracing.ManifestBuilder:AppendKeywords(System.Text.StringBuilder,long,System.String):this
           5 ( 0.39% of base) : 6458.dasm - System.Diagnostics.Tracing.EventSource:DebugCheckEvent(byref,System.Diagnostics.Tracing.EventSource+EventMetadata[],System.Reflection.MethodInfo,System.Diagnostics.Tracing.EventAttribute,System.Diagnostics.Tracing.ManifestBuilder,int)
           4 ( 2.63% of base) : 63284.dasm - System.Diagnostics.ProcessModuleCollection:.ctor(int):this
           4 ( 1.66% of base) : 6434.dasm - System.Runtime.CompilerServices.CastHelpers:StelemRef_Helper(byref,long,System.Object)
           3 ( 0.83% of base) : 6330.dasm - MemberInfoCache`1[__Canon][System.__Canon]:GetMemberList(int,System.String,int):System.__Canon[]:this
           2 ( 0.47% of base) : 6422.dasm - System.Number:TryUInt32ToDecStr(int,int,System.Span`1[Char],byref):bool
           2 ( 0.34% of base) : 6277.dasm - System.SpanHelpers:SequenceEqual(byref,byref,long):bool

Top method improvements (bytes):
         -32 (-8.77% of base) : 6344.dasm - System.Reflection.RuntimeMethodInfo:get_InvocationFlags():int:this
         -13 (-0.97% of base) : 6559.dasm - System.Collections.Generic.Dictionary`2[Int32,__Canon][System.Int32,System.__Canon]:TryInsert(int,System.__Canon,ubyte):bool:this
          -8 (-1.92% of base) : 6509.dasm - System.RuntimeType:IsAssignableFrom(System.Type):bool:this
          -8 (-1.98% of base) : 6334.dasm - System.Reflection.RuntimePropertyInfo:GetIndexParametersNoCopy():System.Reflection.ParameterInfo[]:this
          -8 (-5.41% of base) : 6341.dasm - System.Reflection.Associates:IncludeAccessor(System.Reflection.MethodInfo,bool):bool
          -7 (-0.71% of base) : 6367.dasm - System.Text.ASCIIUtility:GetIndexOfFirstNonAsciiByte_Intrinsified(long,long):long
          -6 (-2.93% of base) : 6297.dasm - System.Runtime.CompilerServices.CastHelpers:TryGet(long,long):int
          -6 (-0.67% of base) : 6374.dasm - System.Diagnostics.Tracing.ManifestBuilder:GetTypeName(System.Type):System.String:this
          -4 (-1.38% of base) : 6305.dasm - System.Runtime.CompilerServices.CastHelpers:ChkCastAny(long,System.Object):System.Object
          -3 (-0.59% of base) : 63329.dasm - System.Diagnostics.ProcessStartInfo:AppendArgumentsTo(byref):this
          -3 (-0.53% of base) : 6459.dasm - System.Diagnostics.Tracing.EventSource:GetHelperCallFirstArg(System.Reflection.MethodInfo):int
          -3 (-1.36% of base) : 6283.dasm - ListBuilder`1[__Canon][System.__Canon]:Add(System.__Canon):this
          -2 (-0.78% of base) : 6331.dasm - System.Reflection.CerHashtable`2[__Canon,__Canon][System.__Canon,System.__Canon]:get_Item(System.__Canon):System.__Canon:this
          -2 (-0.37% of base) : 6468.dasm - System.Diagnostics.Tracing.EventSource:AddEventDescriptor(byref,System.String,System.Diagnostics.Tracing.EventAttribute,System.Reflection.ParameterInfo[],bool)
          -2 (-1.16% of base) : 6221.dasm - System.Runtime.CompilerServices.CastHelpers:IsInstanceOfClass(long,System.Object):System.Object
          -2 (-0.47% of base) : 6372.dasm - System.Text.ASCIIUtility:WidenAsciiToUtf16(long,long,long):long
          -2 (-1.35% of base) : 6329.dasm - RuntimeTypeCache:GetMemberCache(byref):MemberInfoCache`1[__Canon]:this
          -2 (-0.95% of base) : 6573.dasm - System.Globalization.CultureInfo:get_CompareInfo():System.Globalization.CompareInfo:this
          -1 (-0.06% of base) : 6466.dasm - System.Collections.Generic.Dictionary`2[__Canon,__Canon][System.__Canon,System.__Canon]:TryInsert(System.__Canon,System.__Canon,ubyte):bool:this

Top method regressions (percentages):
          48 ( 7.35% of base) : 6376.dasm - System.RuntimeType:GetTypeCodeImpl():int:this
          20 ( 4.56% of base) : 6560.dasm - System.Array:Copy(System.Array,System.Array,int)
          12 ( 4.23% of base) : 6538.dasm - System.Runtime.CompilerServices.CastHelpers:IsInstanceOfAny(long,System.Object):System.Object
          27 ( 3.09% of base) : 6228.dasm - System.Buffer:Memmove(byref,byref,long)
           4 ( 2.63% of base) : 63284.dasm - System.Diagnostics.ProcessModuleCollection:.ctor(int):this
           4 ( 1.66% of base) : 6434.dasm - System.Runtime.CompilerServices.CastHelpers:StelemRef_Helper(byref,long,System.Object)
           5 ( 0.99% of base) : 6554.dasm - System.Diagnostics.Tracing.ManifestBuilder:AppendKeywords(System.Text.StringBuilder,long,System.String):this
           7 ( 0.84% of base) : 6289.dasm - System.Collections.Generic.Dictionary`2[__Canon,__Canon][System.__Canon,System.__Canon]:FindValue(System.__Canon):byref:this
           3 ( 0.83% of base) : 6330.dasm - MemberInfoCache`1[__Canon][System.__Canon]:GetMemberList(int,System.String,int):System.__Canon[]:this
           2 ( 0.47% of base) : 6422.dasm - System.Number:TryUInt32ToDecStr(int,int,System.Span`1[Char],byref):bool
           5 ( 0.39% of base) : 6458.dasm - System.Diagnostics.Tracing.EventSource:DebugCheckEvent(byref,System.Diagnostics.Tracing.EventSource+EventMetadata[],System.Reflection.MethodInfo,System.Diagnostics.Tracing.EventAttribute,System.Diagnostics.Tracing.ManifestBuilder,int)
           2 ( 0.34% of base) : 6277.dasm - System.SpanHelpers:SequenceEqual(byref,byref,long):bool

Top method improvements (percentages):
         -32 (-8.77% of base) : 6344.dasm - System.Reflection.RuntimeMethodInfo:get_InvocationFlags():int:this
          -8 (-5.41% of base) : 6341.dasm - System.Reflection.Associates:IncludeAccessor(System.Reflection.MethodInfo,bool):bool
          -6 (-2.93% of base) : 6297.dasm - System.Runtime.CompilerServices.CastHelpers:TryGet(long,long):int
          -8 (-1.98% of base) : 6334.dasm - System.Reflection.RuntimePropertyInfo:GetIndexParametersNoCopy():System.Reflection.ParameterInfo[]:this
          -8 (-1.92% of base) : 6509.dasm - System.RuntimeType:IsAssignableFrom(System.Type):bool:this
          -4 (-1.38% of base) : 6305.dasm - System.Runtime.CompilerServices.CastHelpers:ChkCastAny(long,System.Object):System.Object
          -3 (-1.36% of base) : 6283.dasm - ListBuilder`1[__Canon][System.__Canon]:Add(System.__Canon):this
          -2 (-1.35% of base) : 6329.dasm - RuntimeTypeCache:GetMemberCache(byref):MemberInfoCache`1[__Canon]:this
          -2 (-1.16% of base) : 6221.dasm - System.Runtime.CompilerServices.CastHelpers:IsInstanceOfClass(long,System.Object):System.Object
         -13 (-0.97% of base) : 6559.dasm - System.Collections.Generic.Dictionary`2[Int32,__Canon][System.Int32,System.__Canon]:TryInsert(int,System.__Canon,ubyte):bool:this
          -2 (-0.95% of base) : 6573.dasm - System.Globalization.CultureInfo:get_CompareInfo():System.Globalization.CompareInfo:this
          -2 (-0.78% of base) : 6331.dasm - System.Reflection.CerHashtable`2[__Canon,__Canon][System.__Canon,System.__Canon]:get_Item(System.__Canon):System.__Canon:this
          -7 (-0.71% of base) : 6367.dasm - System.Text.ASCIIUtility:GetIndexOfFirstNonAsciiByte_Intrinsified(long,long):long
          -6 (-0.67% of base) : 6374.dasm - System.Diagnostics.Tracing.ManifestBuilder:GetTypeName(System.Type):System.String:this
          -3 (-0.59% of base) : 63329.dasm - System.Diagnostics.ProcessStartInfo:AppendArgumentsTo(byref):this
          -3 (-0.53% of base) : 6459.dasm - System.Diagnostics.Tracing.EventSource:GetHelperCallFirstArg(System.Reflection.MethodInfo):int
          -2 (-0.47% of base) : 6372.dasm - System.Text.ASCIIUtility:WidenAsciiToUtf16(long,long,long):long
          -2 (-0.37% of base) : 6468.dasm - System.Diagnostics.Tracing.EventSource:AddEventDescriptor(byref,System.String,System.Diagnostics.Tracing.EventAttribute,System.Reflection.ParameterInfo[],bool)
          -1 (-0.06% of base) : 6466.dasm - System.Collections.Generic.Dictionary`2[__Canon,__Canon][System.__Canon,System.__Canon]:TryInsert(System.__Canon,System.__Canon,ubyte):bool:this

31 total methods with Code Size differences (19 improved, 12 regressed), 12 unchanged.

Most common diff is a return block being placed earlier in the flow as it now has a more representative profile weight.

@AndyAyersMS
Copy link
Member Author

Arm64 failure is a CI issue:

D:\h\w\ADFA097D\w\AAD60982\e>dotnet D:\h\w\ADFA097D\p\xunit\xunit.console.dll JIT\jit64\JIT.jit64.XUnitWrapper.dll -parallel collections -nocolor -noshadow -xml testResults.xml -trait TestGroup=JIT.jit64 
Microsoft.DotNet.XUnitConsoleRunner v2.5.0 (64-bit .NET 5.0.0)
System.IO.FileNotFoundException: Could not load file or assembly 'System.Runtime, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The system cannot find the file specified.
System.IO.FileNotFoundException: Could not load file or assembly 'System.Runtime, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The system cannot find the file specified.

@@ -2628,6 +2671,9 @@ void flowList::setEdgeWeights(BasicBlock::weight_t theMinWeight, BasicBlock::wei
{
assert(theMinWeight <= theMaxWeight);

JITDUMP("Setting edge weights for BB?? -> " FMT_BB " to [" FMT_WT " .. " FMT_WT "]\n", getBlock()->bbNum,
theMinWeight, theMaxWeight);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is backwards; I'll fix in a subsequent PR (and pass in the actual dest block to remove the BB??), if that's ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe just merge in my next round of changes here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with latest commit.

Also fix dump output from `setEdgeWeights` and pass in the destination
of the edge.
@AndyAyersMS
Copy link
Member Author

Latest commit fixed another 20 or so checker failures, so number of cases failing with profile checks enabled and asserting is now down to 159.

Getting to zero is not realistic as there are some things we won't be able to get right or even get consistent for a while. One example is inlining an unprofiled method with flow into a profiled one.

@AndyAyersMS
Copy link
Member Author

Similar Arm64 CI issue as before:

D:\h\w\B8B10A2A\w\AE560A02\e>dotnet D:\h\w\B8B10A2A\p\xunit\xunit.console.dll profiler\unittest\profiler.unittest.XUnitWrapper.dll -parallel collections -nocolor -noshadow -xml testResults.xml -trait TestGroup=profiler.unittest 
Microsoft.DotNet.XUnitConsoleRunner v2.5.0 (64-bit .NET 5.0.0)
System.IO.FileNotFoundException: Could not load file or assembly 'System.Runtime, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The system cannot find the file specified.
System.IO.FileNotFoundException: Could not load file or assembly 'System.Runtime, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The system cannot find the file specified.

@AndyAyersMS
Copy link
Member Author

Breakdown of when we detect inconsistencies in the remaining 159 failures:

After Phase Count
Compute edge weights (1, false) 67
Optimize layout 15
Clone loops 15
Global local var liveness 15
Find loops 14
Assertion prop 14
Invert loops 11
LSRA resolve 4
Compute edge weights (2, false) 2
OTHER 1
SSA: rename 1

The "other" entry is an assert I need to address here.

@AndyAyersMS
Copy link
Member Author

Failures seen above are apparently tracked by #48820

Base automatically changed from master to main March 1, 2021 09:08
@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib anyone up for a review...?

@BruceForstall BruceForstall self-requested a review March 2, 2021 01:29
src/coreclr/jit/fgprofile.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved

nextBlock->setBBProfileWeight(newNextWeight);

if (newNextWeight == 0.0f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all these 0.0f constants be some ZERO_WEIGHT define?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the relevant define is BB_ZERO_WEIGHT. A quick regex suggests it's almost exactly as popular as the 0 literal when used for comparisons with bbWeight.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat ambivalent about the value of having BB_ZERO_WEIGHT and BB_UNITY_WEIGHT since going forward we're quite unlikely to ever change them from 0.0f and 1.0f.

But I suppose we should be consistent and perhaps the symbolic form calls more attention to itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed what I could easily find over to BB_ZERO_WEIGHT.

// converting this tail call to a branch, we'll add appropriate
// successor information then.
fgRemoveBlockAsPred(compCurBB);
// If this block has a flow successor, make suitable updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if all this new code (or some of it?) should be extracted to a fgUpdateTailcallProfileWeights or similar?

In fact, I wonder if most/all of the profile weight maintenance should be extracted into functions in fgprofile.cpp instead of spread around the various bits of the code base.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it belongs with the transformation; this is an important part of the transformation and not some additional / optional side work.

In the (hopefully not too distant) future we'll always have profile weights on every block so some of the conditional guarding will go way.

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Mar 2, 2021

Thanks, Bruce.

I think I'm also going to fold the "run rarely" logic into setBBProfileWeight -- no point in duplicating it everywhere or (more likely) forgetting to do it...

@AndyAyersMS
Copy link
Member Author

Interop IEnumeratorTest on windows arm64 is failing with result -1073740791 aka 0xC0000409 -- stack overflow.

Seems quite likely unrelated to my change....

@AndyAyersMS AndyAyersMS merged commit cb606ad into dotnet:main Mar 2, 2021
@AndyAyersMS AndyAyersMS deleted the ProfileInconsistencies branch March 2, 2021 22:03
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Mar 24, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants