Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RyuJIT] Fully enregister structs that fit into a single register when profitable #8016

Closed
CarolEidt opened this issue May 4, 2017 · 8 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Milestone

Comments

@CarolEidt
Copy link
Contributor

CarolEidt commented May 4, 2017

When a struct is small enough to fit into a register, and is only accessed in its entirety (e.g. to initialize or pass as an argument), it should be fully enregistered. Below is a simple test case that was provided with a bug filed against the desktop version. It should generate simply xor eax; ret on x86 and x64, but instead generates many unnecessary copies. It is addressed by full enregistration of structs that fit into a register (see work item dotnet/coreclr#7 in https://github.com/dotnet/runtime/blob/master/docs/design/coreclr/jit/first-class-structs.md#support-full-enregistration-of-struct-types):

struct foo { public byte b1, b2, b3, b4; }
static foo getfoo() { return new foo(); }

category:cq
theme:structs
skill-level:expert
cost:large

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@sandreenko
Copy link
Contributor

sandreenko commented Jun 17, 2020

The issue became more important with !JitDoOldStructRetyping, because in the past we were able to create merge return lclVar as a primitive type and enregister (it was usually not accessed by fields) and now the merge return lclVar has struct type and can't be enregistered, so I have ~15 regressions like that (but they are big, like 423 (33.68% of base) : Microsoft.CodeAnalysis.CSharp.dasm - BinopEasyOut:TypeToIndex(TypeSymbol):Nullable1`).

Off-topic: I see that we are creating merge return block even if there is exactly one return, but the method is synced or profiler is enabled, do we have plans to improve that? @echesakovMSFT

The groundwork to fix this issue was done in #32362, so I made a quick prototype to fix this one and was able to compile framework libraries win x64(crossgen and PMI).
Here are the diffs:

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -1745 (-0.00% of base) (there are ~1000 byte regression that is temporary workaround for STORE_IND src containement issue, it could be fixed and eliminated)
Top method improvements (bytes):
         -70 (-55.56% of base) : Microsoft.CodeAnalysis.dasm - CommonAssemblyWellKnownAttributeData`1:set_AssemblyAlgorithmIdAttributeSetting(Nullable`1):this (7 methods)
         -65 (-24.07% of base) : System.Private.CoreLib.dasm - Vector128:WithLower(Vector128`1,Vector64`1):Vector128`1 (6 methods)
         -50 (-19.61% of base) : System.Private.CoreLib.dasm - Vector128:<WithLower>g__SoftwareFallback|73_0(Vector128`1,Vector64`1):Vector128`1 (6 methods)
         -50 (-11.90% of base) : System.Private.CoreLib.dasm - Vector64:ToVector128Unsafe(Vector64`1):Vector128`1 (6 methods)
         -42 (-0.32% of base) : System.Security.Cryptography.Pkcs.dasm - 
...
         -10 (-10.53% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SyntaxRemover:GetRemovedSpan(TextSpan,TextSpan):TextSpan:this
         -10 (-22.73% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - BaseListEditor:.ctor(TextSpan,int,bool,bool):this
         -10 (-55.56% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceProcess:set_ExitStatus(Nullable`1):this
         -10 (-66.67% of base) : Microsoft.Extensions.Logging.EventLog.dasm - WindowsEventLog:set_DefaultEventId(Nullable`1):this
         -10 (-55.56% of base) : Newtonsoft.Json.dasm - JsonContainerContract:set_ItemIsReference(Nullable`1):this
         -10 (-55.56% of base) : Newtonsoft.Json.dasm - JsonContainerContract:set_ItemReferenceLoopHandling(Nullable`1):this
         -10 (-55.56% of base) : Newtonsoft.Json.dasm - 
...
685 total methods with Code Size differences (656 improved, 29 regressed), 242849 unchanged.

Diff example for -10 (-55.56% of base) : Newtonsoft.Json.dasm - JsonSchema:set_Required(Nullable1):this`:
before:

IN0003: 000000 mov      qword ptr [rsp+10H], rdx (prolog)
IN0001: 000005 mov      ax, word  ptr [rsp+10H]
IN0002: 00000A mov      word  ptr [rcx+152], ax

after:

IN0001: 000000 mov      word  ptr [rcx+152], dx

and there are 100+ methods with Nullable arguments that were improved like that, so it looks like it is the main winner, full diffs:
diffs.Enreg.2.txt @AndyAyersMS

However, the result diffs are very small and not impressive. So unless I do not find a workaround for the regressions that I see or we do not consider improved methods as performance critical I do not think it is worth finishing in this release (but possible). Maybe I have missed some opportunities to make the improvements bigger, do we know benchmarks that suffer from this issue?

My raw prototype. It would need a week or so to finish (make sure all LCL_VARs that were changed into LCL_FLDs have lvFieldAccessed set, delete STORE_BLOCK->STOREIND workaround, carefully review changes in scopeinfo.cpp and regset.cpp, because there is a bug there right now).

Thoughts @CarolEidt , @dotnet/jit-contrib ?

Edit: we could try to avoid changes after lowering (5/11 changed files) if we replace TYP_STRUCT to native type in lclVarTable during lowering, it would require to change types in trees that work with these LCL_VAR as well, but we are doing it almost always nowadays (struct type on STORE_LCL_VAR is the only exception that came to my mind).

@echesakov
Copy link
Contributor

Off-topic: I see that we are creating merge return block even if there is exactly one return, but the method is synced or profiler is enabled, do we have plans to improve that? @echesakovMSFT

IIRC, this is by design - in case of synchronized methods this is when CORINFO_HELP_MON_EXIT is called, and in case of attached profiler - this is where the JIT injects a call to the CORINFO_HELP_PROF_FCN_LEAVE hook.

@AndyAyersMS
Copy link
Member

full diffs

Is this from a PMI that also instantiates with Nullable<int>? If so, we should either add that in as a default type to try, or possibly as an optional type to try. If not, you might want to add it and rerun the diffs.

Small structs are not all that common, at least in FX code, so not surprising you don't see too many diffs.

FYI, your diff script is disabling loop cloning. Probably doesn't matter here, but in general you should include the impact of cloning in your diffs.

@sandreenko
Copy link
Contributor

Is this from a PMI that also instantiates with Nullable? If so, we should either add that in as a default type to try, or possibly as an optional type to try. If not, you might want to add it and rerun the diffs.

With `Type[] typesToTry = new Type[] { default + typeof(int?) };

Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -8522 (-0.02% of base)
1336 total methods with Code Size differences (1288 improved, 48 regressed), 244398 unchanged.

FYI, your diff script is disabling loop cloning. Probably doesn't matter here, but in general you should include the impact of cloning in your diffs.

It could show size regressions some times that could be misleading, so I include it only with PRs changes.

@AndyAyersMS
Copy link
Member

Did you look at the codegen for the simple example at the top of the issue?

Overall improvement/regression ratio looks pretty good.

I'd be in favor of finishing this up now since you have the context and seem confident it is just a few more days work.

I would suggest spending a part of a day trying to understand the more prominent regressions, then spend two or so days working through the other issues you mention, and then check back to in to see if things still seem on track.

@sandreenko
Copy link
Contributor

Did you look at the codegen for the simple example at the top of the issue?

Yes, it does not improve with JitDoOldStructRetyping because we have "a field access":

               [000005] ------------              *  RETURN    int   
               [000004] ------------              \--*  LCL_FLD   int    V00 loc0         [+0]

with !JitDoOldStructRetyping we generate what we wanted:

               [000005] -----+------              *  RETURN    struct
               [000004] -----+------              \--*  CNS_INT   int    0

so the final code is

IN0001: 000000 xor      eax, eax
IN0002: 000002 ret

so this issue is more beneficial for no-retyping. I am planning to finish #37745 ignoring regressions from this issue, then try to finish it before Prev8.

@CarolEidt
Copy link
Contributor Author

@sandreenko - I support the idea of moving forward with this for Prev8 if possible, and although it's more code I like the use of the layout info to get the register type for the structs that can fit in a register, rather than retyping the lclVars in Lowering.

@CarolEidt
Copy link
Contributor Author

This was originally issue 11407 in dotnet/coreclr, and the associated test is JIT\Regression\JitBlue\GitHub_11407.
The code for getfoo() is now xor eax,eax; ret as expected, thanks to @sandreenko 's changes.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants