New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for object stack allocation. #21950

Merged
merged 1 commit into from Jan 15, 2019

Conversation

Projects
None yet
3 participants
@erozenfeld
Copy link
Member

erozenfeld commented Jan 11, 2019

This change enables object stack allocation for more cases:

  1. Objects with gc fields can now be stack-allocated.
  2. Object stack allocation is enabled for x86.

ObjectAllocator updates the types of trees containing references
to possibly-stack-allocated objects to TYP_BYREF or TYP_I_IMPL as appropriate.
That allows us to remove the hacks in gcencode.cpp and refine reporting of pointers:
the pointer is not reported when we can prove that it always points to a stack-allocated object or is null (typed as TYP_I_IMPL);
the pointer is reported as an interior pointer when it may point to either a stack-allocated object or a heap-allocated object (typed as TYP_BYREF);
the pointer is reported as a normal pointer when it points to a heap-allocated object (typed as TYP_REF).

ObjectAllocator also adds flags to indirections:
GTF_IND_TGTANYWHERE when the indirection may be the heap or the stack
(that results in checked write barriers used for writes)
or the new GTF_IND_TGT_NOT_HEAP when the indirection is null or stack memory
(that results in no barrier used for writes).

@erozenfeld

This comment has been minimized.

Copy link
Member

erozenfeld commented Jan 11, 2019

x64 PMI diffs with #21944 applied to both base and diff and object stack allocation enabled in both base and diff:

PMI Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary:
(Lower is better)
Total bytes of diff: -2225 (-0.01% of base)
    diff is an improvement.
Top file regressions by size (bytes):
         154 : Microsoft.DotNet.ProjectModel.dasm (0.07% of base)
          88 : xunit.execution.dotnet.dasm (0.04% of base)
          67 : System.Net.Http.dasm (0.01% of base)
          50 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.00% of base)
          34 : System.Private.Xml.dasm (0.00% of base)
Top file improvements by size (bytes):
       -1870 : System.Threading.Tasks.Dataflow.dasm (-0.30% of base)
        -421 : System.Private.DataContractSerialization.dasm (-0.05% of base)
        -277 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.00% of base)
         -80 : NuGet.Configuration.dasm (-0.15% of base)
         -14 : System.Linq.Parallel.dasm (0.00% of base)
15 total files with size differences (7 improved, 8 regressed), 114 unchanged.
Top method regressions by size (bytes):
         154 ( 4.28% of base) : Microsoft.DotNet.ProjectModel.dasm - ProjectReader:ReadProject(ref,ref,ref,ref):ref:this
          51 ( 2.44% of base) : System.Private.Xml.dasm - XslAstRewriter:Refactor(ref,int):this
          50 ( 8.00% of base) : xunit.execution.dotnet.dasm - <>c__DisplayClass26_0:<Find>b__0():this
          46 ( 1.21% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceEventSession:EnableProvider(struct,int,long,ref):bool:this
          38 (11.73% of base) : xunit.execution.dotnet.dasm - <>c__DisplayClass28_0:<Find>b__0():this
Top method improvements by size (bytes):
        -189 (-14.12% of base) : System.Threading.Tasks.Dataflow.dasm - TransformBlock`2:get_DebuggerDisplayContent():ref:this (5 methods)
        -189 (-14.12% of base) : System.Threading.Tasks.Dataflow.dasm - TransformManyBlock`2:get_DebuggerDisplayContent():ref:this (5 methods)
        -148 (-12.52% of base) : System.Threading.Tasks.Dataflow.dasm - BroadcastBlock`1:get_DebuggerDisplayContent():ref:this (5 methods)
         -93 (-19.25% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodCompiler:GetEntryPoint(ref,ref,ref,struct):ref
         -92 (-8.03% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LocalRewriter:VisitForEachStatement(ref):ref:this
Top method regressions by size (percentage):
          38 (11.73% of base) : xunit.execution.dotnet.dasm - <>c__DisplayClass28_0:<Find>b__0():this
          25 ( 9.69% of base) : System.Private.DataContractSerialization.dasm - DataContractJsonSerializerImpl:.ctor(ref,ref):this (2 methods)
          50 ( 8.00% of base) : xunit.execution.dotnet.dasm - <>c__DisplayClass26_0:<Find>b__0():this
          13 ( 5.88% of base) : System.Security.Cryptography.X509Certificates.dasm - ECDsaCertificateExtensions:HasECDsaKeyUsage(ref):bool
          12 ( 5.56% of base) : System.Private.DataContractSerialization.dasm - DataContractSerializer:.ctor(ref,ref):this (2 methods)
Top method improvements by size (percentage):
         -18 (-32.14% of base) : System.Threading.Tasks.Dataflow.dasm - BroadcastBlock`1:get_ValueForDebugger():int:this
         -18 (-31.58% of base) : System.Threading.Tasks.Dataflow.dasm - BroadcastBlock`1:get_ValueForDebugger():long:this
         -18 (-29.51% of base) : System.Threading.Tasks.Dataflow.dasm - BroadcastBlock`1:get_ValueForDebugger():double:this
         -72 (-28.02% of base) : System.Threading.Tasks.Dataflow.dasm - BroadcastBlock`1:get_HasValueForDebugger():bool:this (5 methods)
         -90 (-25.35% of base) : System.Threading.Tasks.Dataflow.dasm - TransformBlock`2:get_OutputCountForDebugger():int:this (5 methods)
72 total methods with size differences (53 improved, 19 regressed), 193309 unchanged.
@erozenfeld

This comment has been minimized.

Copy link
Member

erozenfeld commented Jan 11, 2019

No diffs with object stack allocation disabled.

@erozenfeld

This comment has been minimized.

Copy link
Member

erozenfeld commented Jan 11, 2019

Some of the regressions are due to stack offset changes after moving allocations from the heap to the stack. For example, on x64
mov qword ptr [rbp-80H], rax is 4 bytes while mov qword ptr [rbp-B0H], rax is 7 bytes.

@@ -349,7 +349,7 @@ CONFIG_STRING(JitInlineReplayFile, W("JitInlineReplayFile"))
#endif // defined(DEBUG) || defined(INLINE_DATA)

CONFIG_INTEGER(JitInlinePolicyModel, W("JitInlinePolicyModel"), 0)
CONFIG_INTEGER(JitObjectStackAllocation, W("JitObjectStackAllocation"), 0)
CONFIG_INTEGER(JitObjectStackAllocation, W("JitObjectStackAllocation"), 1)

This comment has been minimized.

@erozenfeld

erozenfeld Jan 11, 2019

Member

I will not be merge this change. I'd like to run some ci testing with object stack allocation enabled. I'll revert this change before merging.

@erozenfeld

This comment has been minimized.

Copy link
Member

erozenfeld commented Jan 11, 2019

@erozenfeld erozenfeld requested review from AndyAyersMS and echesakovMSFT Jan 11, 2019

@erozenfeld

This comment has been minimized.

Copy link
Member

erozenfeld commented Jan 11, 2019

Example of a good diff (BroadcastBlock``1:get_HasValueForDebugger():bool:this from System.Threading.Tasks.Dataflow.dll):

G_M19268_IG01:
-     push     rdi
-     push     rsi
-     sub      rsp, 40
+     sub      rsp, 24
+     xor      rax, rax
+     mov      qword ptr [rsp+10H], rax

G_M19268_IG02:
-     mov      rsi, gword ptr [rcx+8]
+     mov      rax, gword ptr [rcx+8]
-     mov      ecx, dword ptr [rsi]
-     mov      rcx, 0xD1FFAB1E
-     call     CORINFO_HELP_NEWSFAST
-     mov      rdi, rax
-     lea      rcx, bword ptr [rdi+8]
-     mov      rdx, rsi
-     call     CORINFO_HELP_ASSIGN_REF
-     mov      rax, gword ptr [rdi+8]
+     mov      rdx, rax
+     mov      edx, dword ptr [rdx]
+     xor      rdx, rdx
+     lea      rcx, bword ptr [rsp+08H]
+     mov      qword ptr [rcx], rdx
      mov      eax, dword ptr [rax+96]

G_M19268_IG03:
-     add      rsp, 40
+     add      rsp, 24
-     pop      rsi
-     pop      rdi

-; Total bytes of code 56, prolog size 6 for method BroadcastBlock`1:get_ValueForDebugger():int:this
+; Total bytes of code 38, prolog size 11 for method BroadcastBlock`1:get_ValueForDebugger():int:this
@AndyAyersMS
Copy link
Member

AndyAyersMS left a comment

Overall this looks good -- nice to see it rounding into shape.

Left you a two small notes.

{
tree->ChangeType(newType);
}
lclVarDsc->lvType = newType;

This comment has been minimized.

@AndyAyersMS

AndyAyersMS Jan 11, 2019

Member

Suggest you sink this update out of both then/else and add a JITDUMP message describing the change.

@@ -321,6 +392,8 @@ bool ObjectAllocator::MorphAllocObjNodes()

const unsigned int stackLclNum = MorphAllocObjNodeIntoStackAlloc(asAllocObj, block, stmt);
m_HeapLocalToStackLocalMap.AddOrUpdate(lclNum, stackLclNum);
MarkLclVarAsDefinitelyStackPointing(lclNum);
MarkLclVarAsPossiblyStackPointing(lclNum);

This comment has been minimized.

@AndyAyersMS

AndyAyersMS Jan 11, 2019

Member

Can you add a note here and/or elsewhere that the possibly stack pointing set is kept as a superset of the definitely stack pointing set (and a local is in both it is definitely stack pointing).

MarkLclVarAsPossiblyStackPointing(lclNum);

// Check if this pointer always points to the stack.
if (lclVarDsc->lvSingleDef == 1)

This comment has been minimized.

@AndyAyersMS

AndyAyersMS Jan 11, 2019

Member

We should really get around to writing a checker for the early meaning of lvSingleDef someday. I think the upstream phases maintain it, but ...

@erozenfeld erozenfeld force-pushed the erozenfeld:RetypeStackPointingRefs branch from 45c257d to a377055 Jan 11, 2019

@erozenfeld

This comment has been minimized.

Copy link
Member

erozenfeld commented Jan 11, 2019

@AndyAyersMS I addressed your feedback and added a couple of small fixes for issues found in x86 testing. PTAL.

@erozenfeld

This comment has been minimized.

Copy link
Member

erozenfeld commented Jan 12, 2019

@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test

Improvements for object stack allocation.
This change enables object stack allocation for more cases.

1. Objects with gc fields can now be stack-allocated.
2. Object stack allocation is enabled for x86.

ObjectAllocator updates the types of trees containing references
to possibly-stack-allocated objects to TYP_BYREF or TYP_I_IMPL as appropriate.
That allows us to remove the hacks in gcencode.cpp and refine reporting of pointers:
the pointer is not reported when we can prove that it always points to a stack-allocated object or is null (typed as TYP_I_IMPL);
the pointer is reported as an interior pointer when it may point to either a stack-allocated object or a heap-allocated object (typed as TYP_BYREF);
the pointer is reported as a normal pointer when it points to a heap-allocated object (typed as TYP_REF).

ObjectAllocator also adds flags to indirections:
GTF_IND_TGTANYWHERE when the indirection may be the heap or the stack
(that results in checked write barriers used for writes)
or the new GTF_IND_TGT_NOT_HEAP when the indirection is null or stack memory
(that results in no barrier used for writes).

@erozenfeld erozenfeld force-pushed the erozenfeld:RetypeStackPointingRefs branch from a377055 to 01731f7 Jan 13, 2019

@erozenfeld

This comment has been minimized.

Copy link
Member

erozenfeld commented Jan 13, 2019

@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test

2 similar comments
@erozenfeld

This comment has been minimized.

Copy link
Member

erozenfeld commented Jan 14, 2019

@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test

@erozenfeld

This comment has been minimized.

Copy link
Member

erozenfeld commented Jan 14, 2019

@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test

@AndyAyersMS

This comment has been minimized.

Copy link
Member

AndyAyersMS commented Jan 14, 2019

The Tizen leg is broken (and now, removed) so you might as well ignore it.

{
object o = (f1 == 0) ? (object)new SimpleClassB(f1, f2) : (object)new SimpleClassA(f1, f2);
return (o is SimpleClassB) || !(o is SimpleClassA) ? 0 : 1;
GC.Collect();
return !(o is SimpleClassA) ? 0 : 1;

This comment has been minimized.

@echesakovMSFT

echesakovMSFT Jan 15, 2019

Member

I bet there should be a reason why you prefered this over (o is SimpleClassA) ? 1 : 0

This comment has been minimized.

@erozenfeld

erozenfeld Jan 15, 2019

Member

It's the result of refactoring the previous version of the test. I didn't bother simplifying this.

@erozenfeld erozenfeld merged commit f2d3dfe into dotnet:master Jan 15, 2019

31 of 32 checks passed

Tizen armel Cross Checked Innerloop Build and Test Build finished.
Details
CentOS7.1 x64 Checked Innerloop Build and Test Build finished.
Details
CentOS7.1 x64 Debug Innerloop Build Build finished.
Details
Linux-musl x64 Debug Build Build finished.
Details
OSX10.12 x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Checked crossgen_comparison Build and Test Build finished.
Details
Ubuntu arm Cross Checked no_tiered_compilation_innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Release crossgen_comparison Build and Test Build finished.
Details
Ubuntu x64 Checked CoreFX Tests Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
Ubuntu16.04 arm64 Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu16.04 arm64 Cross Checked no_tiered_compilation_innerloop Build and Test Build finished.
Details
Windows_NT arm Cross Checked Innerloop Build and Test Build finished.
Details
Windows_NT arm Cross Debug Innerloop Build Build finished.
Details
Windows_NT arm64 Cross Checked Innerloop Build and Test Build finished.
Details
Windows_NT arm64 Cross Debug Innerloop Build Build finished.
Details
Windows_NT x64 Checked CoreFX Tests Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x64 Release CoreFX Tests Build finished.
Details
Windows_NT x64 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x64 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x86 Release Innerloop Build and Test Build finished.
Details
Windows_NT x86 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
license/cla All CLA requirements met.
Details

@erozenfeld erozenfeld deleted the erozenfeld:RetypeStackPointingRefs branch Jan 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment