Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep structs in registers #43867

Closed
5 of 10 tasks
CarolEidt opened this issue Oct 27, 2020 · 6 comments
Closed
5 of 10 tasks

Keep structs in registers #43867

CarolEidt opened this issue Oct 27, 2020 · 6 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Bottom Up Work Not part of a theme, epic, or user story tenet-performance Performance related issue User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@CarolEidt
Copy link
Contributor

CarolEidt commented Oct 27, 2020

Epic for improving the ability of the JIT to keep structs in registers.

Overview

The JIT currently supports keeping structs in registers when the individual fields are promoted to local variables (also known as scalar replacement). However, there are a number of cases where it forces them to the stack. Some of these cases are more fundamental, e.g. when they are address exposed. Others, however, are due to JIT limitations in handling argument passing and returns.

The Jit does not support keeping small structs in registers when the fields are not promoted and not accessed, for example, when the struct is only used as an argument or block.

.NET 6 scope:

Work items for multi-reg

Work items for single-reg

Stretch goals for 6.0:

Stretch goal for multi-reg:

Stretch goal for single-reg:

  • Add support for LclFld with struct type;
  • LclVar struct as first-class nodes, meaning don't create OBJ(ADDR(LCL_VAR)) when pass as args etc.

.NET 6 Result:

We have a set of repro-tests that show incorrect/inefficient code generation in 5.0 that is expected to be fixed. It is covered by issues in the previous section.

The perf results are shown in the bottom messages.

category:planning
theme:planning
skill-level:expert
cost:large
impact:medium

@CarolEidt CarolEidt added tenet-performance Performance related issue area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Team Epic labels Oct 27, 2020
@CarolEidt CarolEidt added this to the 6.0.0 milestone Oct 27, 2020
@CarolEidt CarolEidt self-assigned this Oct 27, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Oct 27, 2020
@CarolEidt CarolEidt added this to Needs Triage in .NET Core CodeGen via automation Oct 27, 2020
@CarolEidt CarolEidt moved this from Needs Triage to Optimizations (RegAlloc/Struct) in .NET Core CodeGen Oct 27, 2020
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Oct 30, 2020
@JulieLeeMSFT JulieLeeMSFT moved this from Optimizations (RegAlloc/Struct) to Team Epics in .NET Core CodeGen Nov 2, 2020
@JulieLeeMSFT JulieLeeMSFT added Bottom Up Work Not part of a theme, epic, or user story User Story A single user-facing feature. Can be grouped under an epic. and removed Team Epic labels Nov 16, 2020
@CarolEidt CarolEidt removed their assignment Dec 4, 2020
@sandreenko
Copy link
Contributor

some additional test casts that I am planning to improve in this issue for 6.0:
1,

    struct S
    {
        public short i0;
        public short i1;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static S Test()
    {
        S s = new S();
        TestS(s);
        s = GetS1();
        TestS(s);
        s = GetS2();
        TestS(s);
        return s;
    }

currently generates:

; Total bytes of code 88, prolog size 4, PerfScore 28.05, instruction count 22, allocated bytes for code 88 (MethodHash=67013932) for method SmallStructEnreg:Test():S
; ============================================================

*************** After end code gen, before unwindEmit()
G_M50893_IG01:        ; func=00, offs=000000H, size=0004H, bbWeight=1    PerfScore 0.25, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG

IN0014: 000000 sub      rsp, 56

G_M50893_IG02:        ; offs=000004H, size=004FH, bbWeight=1    PerfScore 17.75, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref

IN0001: 000004 xor      ecx, ecx
IN0002: 000006 lea      rax, bword ptr [V10 rsp+20H]
IN0003: 00000B mov      word  ptr [rax], cx
IN0004: 00000E mov      word  ptr [rax+2], cx
IN0005: 000012 mov      ecx, dword ptr [V10 rsp+20H]
IN0006: 000016 call     SmallStructEnreg:TestS(S)
IN0007: 00001B call     SmallStructEnreg:GetS1():S
IN0008: 000020 mov      dword ptr [V02 rsp+30H], eax
IN0009: 000024 mov      ecx, dword ptr [V02 rsp+30H]
IN000a: 000028 mov      dword ptr [V10 rsp+20H], ecx
IN000b: 00002C mov      ecx, dword ptr [V10 rsp+20H]
IN000c: 000030 call     SmallStructEnreg:TestS(S)
IN000d: 000035 call     SmallStructEnreg:GetS2():S
IN000e: 00003A mov      dword ptr [V03 rsp+28H], eax
IN000f: 00003E mov      ecx, dword ptr [V03 rsp+28H]
IN0010: 000042 mov      dword ptr [V10 rsp+20H], ecx
IN0011: 000046 mov      ecx, dword ptr [V10 rsp+20H]
IN0012: 00004A call     SmallStructEnreg:TestS(S)
IN0013: 00004F mov      eax, dword ptr [V03 rsp+28H]

G_M50893_IG03:        ; offs=000053H, size=0005H, bbWeight=1    PerfScore 1.25, epilog, nogc, extend

IN0015: 000053 add      rsp, 56
IN0016: 000057 ret 

with a small struct enreg we are going to get:

; Total bytes of code 46, prolog size 5, PerfScore 13.85, instruction count 15, allocated bytes for code 46 (MethodHash=67013932) for method SmallStructEnreg:Test():S
; ============================================================

*************** After end code gen, before unwindEmit()
G_M50893_IG01:        ; func=00, offs=000000H, size=0005H, bbWeight=1    PerfScore 1.25, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG

IN000b: 000000 push     rsi
IN000c: 000001 sub      rsp, 32

G_M50893_IG02:        ; offs=000005H, size=0023H, bbWeight=1    PerfScore 6.25, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref

IN0001: 000005 xor      ecx, ecx
IN0002: 000007 call     SmallStructEnreg:TestS(S)
IN0003: 00000C call     SmallStructEnreg:GetS1():S
IN0004: 000011 mov      ecx, eax
IN0005: 000013 call     SmallStructEnreg:TestS(S)
IN0006: 000018 call     SmallStructEnreg:GetS2():S
IN0007: 00001D mov      esi, eax
IN0008: 00001F mov      ecx, esi
IN0009: 000021 call     SmallStructEnreg:TestS(S)
IN000a: 000026 mov      eax, esi

G_M50893_IG03:        ; offs=000028H, size=0006H, bbWeight=1    PerfScore 1.75, epilog, nogc, extend

IN000d: 000028 add      rsp, 32
IN000e: 00002C pop      rsi
IN000f: 00002D ret 

it is a wider case of #8016.

  1. Inefficient codegen for casts between same size types. #11413
  2. together with the issues in the head it will allow generating code without stack access for cases like:
    struct S
    {
        public short i0;
        public short i1;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static S Test()
    {
        S s = new S();
        s.i0 = 1;
        s.i1 = 2;
        return s;
    }

@sandreenko
Copy link
Contributor

sandreenko commented Mar 1, 2021

Link #48377

@sandreenko
Copy link
Contributor

sandreenko commented May 10, 2021

April update:
#52039 prepared backed to see LCL_VAR struct after lowering but there are not many changes because most of the struct local vars are marked as 'doNotEnreg` in several places, we have fixed one, now need to fix the others that are mostly in morphCopyBlock.

Also, I got estimates of the expected perf results using a local branch with fixed morph (improvements are in addition to what already have been merged):

benchmarks.run.windows.x64.checked.mch
Total bytes of delta: -8352 
700 total methods with Code Size differences (528 improved, 172 regressed)

libraries.crossgen.windows.x64.checked.mch
Total bytes of delta: -29514
2876 total methods with Code Size differences (2184 improved, 692 regressed)

libraries.crossgen2.windows.x64.checked.mch
Total bytes of delta: -166676
12003 total methods with Code Size differences (10614 improved, 1389 regressed)

libraries.pmi.windows.x64.checked.mch
Total bytes of delta: -110843
7409 total methods with Code Size differences (6383 improved, 1026 regressed)

tests.pmi.windows.x64.checked.mch
Total bytes of delta: -150047
5068 total methods with Code Size differences (4559 improved, 509 regressed)

Also, I have checked that the next changes will close at least #7200, #4323, #51472

A few general improvements required for the struct enreg work were merged:
#51851
#52292

and fixes:
#50669

and cleanups:
#51092

@sandreenko
Copy link
Contributor

With #55558 and #55727 we finished struct optimizations in 6.0 scope and soon I will open new work items for 7.0.
The 6.0 work gave us significant code size improvements and prepared opportunities for bigger optimizations in the next release cycle.

There were also many fixes for struct, especially with passing float in int regs or in SIMD types.

Some examples:
#56375
#54694
#46345
#46899
#49101
#52581
#53578
#56521

The additional checks and cleanups should help us against silent bad codegen cases that we saw in the past releases.

Some examples:
#52802
#51092
#53983

It is hard to gather performance wins for 7.0 changes because there were many of them, but if we just some asm diffs from the main PR we will get something like:

PRs https://github.com//pull/55558 https://github.com//pull/55045 https://github.com//pull/55727 https://github.com//pull/52292 https://github.com//pull/49879 https://github.com//pull/48377 https://github.com//pull/45824

x64 windows pmi libraries:
~ -85kb

arm64 linux pmi libraries:
~ -95kb

.NET Core CodeGen automation moved this from Team Epics to Done Aug 2, 2021
@CarolEidt
Copy link
Contributor Author

Nice work @sandreenko !

@stephentoub
Copy link
Member

To you both 😄

@ghost ghost locked as resolved and limited conversation to collaborators Sep 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Bottom Up Work Not part of a theme, epic, or user story tenet-performance Performance related issue User Story A single user-facing feature. Can be grouped under an epic.
Projects
Archived in project
Development

No branches or pull requests

5 participants