[RyuJIT] Fully enregister structs that fit into a single register when profitable #8016

CarolEidt · 2017-05-04T22:05:08Z

When a struct is small enough to fit into a register, and is only accessed in its entirety (e.g. to initialize or pass as an argument), it should be fully enregistered. Below is a simple test case that was provided with a bug filed against the desktop version. It should generate simply xor eax; ret on x86 and x64, but instead generates many unnecessary copies. It is addressed by full enregistration of structs that fit into a register (see work item dotnet/coreclr#7 in https://github.com/dotnet/runtime/blob/master/docs/design/coreclr/jit/first-class-structs.md#support-full-enregistration-of-struct-types):

struct foo { public byte b1, b2, b3, b4; }
static foo getfoo() { return new foo(); }

category:cq
theme:structs
skill-level:expert
cost:large

The text was updated successfully, but these errors were encountered:

sandreenko · 2020-06-17T10:47:20Z

The issue became more important with !JitDoOldStructRetyping, because in the past we were able to create merge return lclVar as a primitive type and enregister (it was usually not accessed by fields) and now the merge return lclVar has struct type and can't be enregistered, so I have ~15 regressions like that (but they are big, like 423 (33.68% of base) : Microsoft.CodeAnalysis.CSharp.dasm - BinopEasyOut:TypeToIndex(TypeSymbol):Nullable1`).

Off-topic: I see that we are creating merge return block even if there is exactly one return, but the method is synced or profiler is enabled, do we have plans to improve that? @echesakovMSFT

The groundwork to fix this issue was done in #32362, so I made a quick prototype to fix this one and was able to compile framework libraries win x64(crossgen and PMI).
Here are the diffs:

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -1745 (-0.00% of base) (there are ~1000 byte regression that is temporary workaround for STORE_IND src containement issue, it could be fixed and eliminated)
Top method improvements (bytes):
         -70 (-55.56% of base) : Microsoft.CodeAnalysis.dasm - CommonAssemblyWellKnownAttributeData`1:set_AssemblyAlgorithmIdAttributeSetting(Nullable`1):this (7 methods)
         -65 (-24.07% of base) : System.Private.CoreLib.dasm - Vector128:WithLower(Vector128`1,Vector64`1):Vector128`1 (6 methods)
         -50 (-19.61% of base) : System.Private.CoreLib.dasm - Vector128:<WithLower>g__SoftwareFallback|73_0(Vector128`1,Vector64`1):Vector128`1 (6 methods)
         -50 (-11.90% of base) : System.Private.CoreLib.dasm - Vector64:ToVector128Unsafe(Vector64`1):Vector128`1 (6 methods)
         -42 (-0.32% of base) : System.Security.Cryptography.Pkcs.dasm - 
...
         -10 (-10.53% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SyntaxRemover:GetRemovedSpan(TextSpan,TextSpan):TextSpan:this
         -10 (-22.73% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - BaseListEditor:.ctor(TextSpan,int,bool,bool):this
         -10 (-55.56% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceProcess:set_ExitStatus(Nullable`1):this
         -10 (-66.67% of base) : Microsoft.Extensions.Logging.EventLog.dasm - WindowsEventLog:set_DefaultEventId(Nullable`1):this
         -10 (-55.56% of base) : Newtonsoft.Json.dasm - JsonContainerContract:set_ItemIsReference(Nullable`1):this
         -10 (-55.56% of base) : Newtonsoft.Json.dasm - JsonContainerContract:set_ItemReferenceLoopHandling(Nullable`1):this
         -10 (-55.56% of base) : Newtonsoft.Json.dasm - 
...
685 total methods with Code Size differences (656 improved, 29 regressed), 242849 unchanged.

Diff example for -10 (-55.56% of base) : Newtonsoft.Json.dasm - JsonSchema:set_Required(Nullable1):this`:
before:

IN0003: 000000 mov      qword ptr [rsp+10H], rdx (prolog)
IN0001: 000005 mov      ax, word  ptr [rsp+10H]
IN0002: 00000A mov      word  ptr [rcx+152], ax

after:

IN0001: 000000 mov      word  ptr [rcx+152], dx

and there are 100+ methods with Nullable arguments that were improved like that, so it looks like it is the main winner, full diffs:
diffs.Enreg.2.txt @AndyAyersMS

However, the result diffs are very small and not impressive. So unless I do not find a workaround for the regressions that I see or we do not consider improved methods as performance critical I do not think it is worth finishing in this release (but possible). Maybe I have missed some opportunities to make the improvements bigger, do we know benchmarks that suffer from this issue?

My raw prototype. It would need a week or so to finish (make sure all LCL_VARs that were changed into LCL_FLDs have lvFieldAccessed set, delete STORE_BLOCK->STOREIND workaround, carefully review changes in scopeinfo.cpp and regset.cpp, because there is a bug there right now).

Thoughts @CarolEidt , @dotnet/jit-contrib ?

Edit: we could try to avoid changes after lowering (5/11 changed files) if we replace TYP_STRUCT to native type in lclVarTable during lowering, it would require to change types in trees that work with these LCL_VAR as well, but we are doing it almost always nowadays (struct type on STORE_LCL_VAR is the only exception that came to my mind).

echesakov · 2020-06-17T17:12:36Z

Off-topic: I see that we are creating merge return block even if there is exactly one return, but the method is synced or profiler is enabled, do we have plans to improve that? @echesakovMSFT

IIRC, this is by design - in case of synchronized methods this is when CORINFO_HELP_MON_EXIT is called, and in case of attached profiler - this is where the JIT injects a call to the CORINFO_HELP_PROF_FCN_LEAVE hook.

AndyAyersMS · 2020-06-17T17:52:16Z

full diffs

Is this from a PMI that also instantiates with Nullable<int>? If so, we should either add that in as a default type to try, or possibly as an optional type to try. If not, you might want to add it and rerun the diffs.

Small structs are not all that common, at least in FX code, so not surprising you don't see too many diffs.

FYI, your diff script is disabling loop cloning. Probably doesn't matter here, but in general you should include the impact of cloning in your diffs.

sandreenko · 2020-06-17T19:06:35Z

Is this from a PMI that also instantiates with Nullable? If so, we should either add that in as a default type to try, or possibly as an optional type to try. If not, you might want to add it and rerun the diffs.

With `Type[] typesToTry = new Type[] { default + typeof(int?) };

Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -8522 (-0.02% of base)
1336 total methods with Code Size differences (1288 improved, 48 regressed), 244398 unchanged.

FYI, your diff script is disabling loop cloning. Probably doesn't matter here, but in general you should include the impact of cloning in your diffs.

It could show size regressions some times that could be misleading, so I include it only with PRs changes.

AndyAyersMS · 2020-06-17T19:14:57Z

Did you look at the codegen for the simple example at the top of the issue?

Overall improvement/regression ratio looks pretty good.

I'd be in favor of finishing this up now since you have the context and seem confident it is just a few more days work.

I would suggest spending a part of a day trying to understand the more prominent regressions, then spend two or so days working through the other issues you mention, and then check back to in to see if things still seem on track.

sandreenko · 2020-06-17T20:28:32Z

Did you look at the codegen for the simple example at the top of the issue?

Yes, it does not improve with JitDoOldStructRetyping because we have "a field access":

               [000005] ------------              *  RETURN    int   
               [000004] ------------              \--*  LCL_FLD   int    V00 loc0         [+0]

with !JitDoOldStructRetyping we generate what we wanted:

               [000005] -----+------              *  RETURN    struct
               [000004] -----+------              \--*  CNS_INT   int    0

so the final code is

IN0001: 000000 xor      eax, eax
IN0002: 000002 ret

so this issue is more beneficial for no-retyping. I am planning to finish #37745 ignoring regressions from this issue, then try to finish it before Prev8.

CarolEidt · 2020-06-17T21:16:55Z

@sandreenko - I support the idea of moving forward with this for Prev8 if possible, and although it's more code I like the use of the layout info to get the register type for the structs that can fit in a register, rather than retyping the lclVars in Lowering.

CarolEidt · 2020-10-15T22:46:17Z

This was originally issue 11407 in dotnet/coreclr, and the associated test is JIT\Regression\JitBlue\GitHub_11407.
The code for getfoo() is now xor eax,eax; ret as expected, thanks to @sandreenko 's changes.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

sandreenko mentioned this issue Jun 30, 2020

Disable JitDoOldStructRetyping by default. #37745

Merged

CarolEidt modified the milestones: Future, 5.0.0 Oct 15, 2020

CarolEidt self-assigned this Oct 15, 2020

CarolEidt closed this as completed Oct 15, 2020

PathogenDavid mentioned this issue Nov 25, 2020

Workaround unavoidable marshaling on function pointers MochiLibraries/Biohazrd#99

Closed

ghost locked as resolved and limited conversation to collaborators Dec 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RyuJIT] Fully enregister structs that fit into a single register when profitable #8016

[RyuJIT] Fully enregister structs that fit into a single register when profitable #8016

CarolEidt commented May 4, 2017 •

edited by sandreenko

Loading

sandreenko commented Jun 17, 2020 •

edited

Loading

echesakov commented Jun 17, 2020

AndyAyersMS commented Jun 17, 2020

sandreenko commented Jun 17, 2020

AndyAyersMS commented Jun 17, 2020

sandreenko commented Jun 17, 2020

CarolEidt commented Jun 17, 2020

CarolEidt commented Oct 15, 2020

[RyuJIT] Fully enregister structs that fit into a single register when profitable #8016

[RyuJIT] Fully enregister structs that fit into a single register when profitable #8016

Comments

CarolEidt commented May 4, 2017 • edited by sandreenko Loading

sandreenko commented Jun 17, 2020 • edited Loading

echesakov commented Jun 17, 2020

AndyAyersMS commented Jun 17, 2020

sandreenko commented Jun 17, 2020

AndyAyersMS commented Jun 17, 2020

sandreenko commented Jun 17, 2020

CarolEidt commented Jun 17, 2020

CarolEidt commented Oct 15, 2020

CarolEidt commented May 4, 2017 •

edited by sandreenko

Loading

sandreenko commented Jun 17, 2020 •

edited

Loading