Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

JIT: allow slightly more general promotion of structs with struct fields #22867

Merged
merged 1 commit into from
Feb 28, 2019

Conversation

AndyAyersMS
Copy link
Member

For a while now the jit has been able to promote an outer struct A with an
inner struct field B that itself has a single non-struct field C, provided
that C occupies all of B and that C and B are pointer-sized.

For example, this comes up when supporting promotion of Span<T>, as a span
contains a ByReference<T> field that itself contains a pointer-sized field.

This change relaxes the constraints slightly, allowing B and C to be less than
pointer sized, provided C still occupies all of B, and B is suitably aligned
within A.

Doing so allows promotion of the new Range type, which contains two Index
fields that each wrap an int. This improves performance for uses of Range
for simple examples like those in #22079.

For a while now the jit has been able to promote an outer struct A with an
inner struct field B that itself has a single non-struct field C, provided
that C occupies all of B and that C and B are pointer-sized.

For example, this comes up when supporting promotion of `Span<T>`, as a span
contains a `ByReference<T>` field that itself contains a pointer-sized field.

This change relaxes the constraints slightly, allowing B and C to be less than
pointer sized, provided C still occupies all of B, and B is suitably aligned
within A.

Doing so allows promotion of the new `Range` type, which contains two `Index`
fields that each wrap an `int`. This improves performance for uses of `Range`
for simple examples like those in #22079.
@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib

Will keep #22079 open after this, but move remainder of work to future.

FX diff impact:

PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -2776 (-0.01% of base)
    diff is an improvement.
Top file regressions by size (bytes):
          39 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.00% of base)
Top file improvements by size (bytes):
       -1350 : Microsoft.CodeAnalysis.dasm (-0.09% of base)
       -1112 : System.Reflection.Metadata.dasm (-0.27% of base)
        -317 : System.Private.CoreLib.dasm (-0.01% of base)
         -19 : Microsoft.CodeAnalysis.CSharp.dasm (-0.00% of base)
         -17 : System.Linq.Expressions.dasm (-0.00% of base)
6 total files with size differences (5 improved, 1 regressed), 123 unchanged.
Top method regressions by size (bytes):
          37 ( 7.66% of base) : Microsoft.CodeAnalysis.dasm - MetadataWriter:PopulateParamTableRows():this
          36 (10.14% of base) : Microsoft.CodeAnalysis.dasm - PEModule:HasDeprecatedOrObsoleteAttribute(struct,byref):bool:this
          29 ( 3.06% of base) : System.Reflection.Metadata.dasm - ControlFlowBuilder:CopyCodeAndFixupBranches(ref,ref):this
          25 ( 5.97% of base) : Microsoft.CodeAnalysis.dasm - MetadataWriter:GetOrAddDocument(ref,ref):int:this
          22 (11.76% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:AddManifestResource(int,struct,struct,int):struct:this
Top method improvements by size (bytes):
        -112 (-5.28% of base) : Microsoft.CodeAnalysis.dasm - MetadataWriter:SerializeMethodDebugInfo(ref,int,int):this
        -110 (-14.19% of base) : System.Reflection.Metadata.dasm - NamespaceCache:MergeDuplicateNamespaces(ref,byref):this
         -93 (-12.33% of base) : Microsoft.CodeAnalysis.dasm - MetadataWriter:SerializeStateMachineLocalScopes(ref,int):this
         -88 (-11.84% of base) : Microsoft.CodeAnalysis.dasm - MetadataWriter:SerializeEncMethodDebugInformation(ref,int):this
         -68 (-3.41% of base) : System.Private.CoreLib.dasm - RuntimeHelpers:GetSubArray(ref,struct):ref (5 methods)
Top method regressions by size (percentage):
          22 (11.76% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:AddManifestResource(int,struct,struct,int):struct:this
          36 (10.14% of base) : Microsoft.CodeAnalysis.dasm - PEModule:HasDeprecatedOrObsoleteAttribute(struct,byref):bool:this
          37 ( 7.66% of base) : Microsoft.CodeAnalysis.dasm - MetadataWriter:PopulateParamTableRows():this
          15 ( 7.35% of base) : Microsoft.CodeAnalysis.dasm - PEModule:HasInterfaceTypeAttribute(struct,byref):bool:this
          15 ( 7.35% of base) : Microsoft.CodeAnalysis.dasm - PEModule:HasTypeLibTypeAttribute(struct,byref):bool:this
Top method improvements by size (percentage):
         -63 (-65.62% of base) : System.Private.CoreLib.dasm - String:EnumerateRunes():struct:this
         -56 (-35.00% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:AddFieldDefinition(int,struct,struct):struct:this
         -56 (-33.73% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:AddProperty(int,struct,struct):struct:this
         -59 (-32.42% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:AddMemberReference(struct,struct,struct):struct:this
         -59 (-32.42% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:AddCustomDebugInformation(struct,struct,struct):struct:this
177 total methods with size differences (137 improved, 40 regressed), 193175 unchanged.

Test case impact:

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Span<int> TrimFirstLast_OpenCoded(Span<int> s) => s.Slice(1, s.Length - 1);
 
    [MethodImpl(MethodImplOptions.NoInlining)]
    static Span<int> TrimFirstLast_Range(Span<int> s) => s[new Range(Index.FromStart(1), Index.FromEnd(0))];
PMI Diffs for d:\bugs\22079\ex.exe for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -88 (-12.39% of base)
    diff is an improvement.
Top file improvements by size (bytes):
         -88 : ex.dasm (-12.39% of base)
1 total files with size differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements by size (bytes):
         -88 (-44.67% of base) : ex.dasm - X:TrimFirstLast_Range(struct):struct
Top method improvements by size (percentage):
         -88 (-44.67% of base) : ex.dasm - X:TrimFirstLast_Range(struct):struct
1 total methods with size differences (1 improved, 0 regressed), 3 unchanged.

@AndyAyersMS
Copy link
Member Author

Perf numbers. Still a big gap, but better than it was. Notes explaining why over in #22079.

Test Before After
OpenCoded 450 450
Range 2568 1554

@AndyAyersMS
Copy link
Member Author

Looks like this hit a whole raft of jenkins hangups.

@AndyAyersMS
Copy link
Member Author

@dotnet-bot test Windows_NT x86 Release Innerloop Build and Test
@dotnet-bot test Windows_NT x86 Checked Innerloop Build and Test
@dotnet-bot test Windows_NT x64 Release CoreFX Tests
@dotnet-bot test Windows_NT x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0)
@dotnet-bot test Windows_NT x86 Checked Innerloop Build and Test (Jit - TieredCompilation=0)
@dotnet-bot test Windows_NT x64 Checked Innerloop Build and Test
@dotnet-bot test Windows_NT arm64 Cross Checked Innerloop Build and Test
@dotnet-bot test Windows_NT arm Cross Checked Innerloop Build and Test
@dotnet-bot test Ubuntu arm Cross Release crossgen_comparison Build and Test
@dotnet-bot test Ubuntu arm Cross Checked crossgen_comparison Build and Test
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test

@AndyAyersMS
Copy link
Member Author

More hangups...

@dotnet-bot test Windows_NT x64 Checked CoreFX Tests
@dotnet-bot test Windows_NT arm64 Cross Checked Innerloop Build and Test

Release x64 CoreFX test may be real, will need to investigate:

20:10:04       System.IO.Tests.FileInfo_GetSetTimes.CopyToMillisecondPresent [FAIL]
20:10:04       Assert.NotEqual() Failure
20:10:04       Expected: Not 0
20:10:04       Actual:   0

@sandreenko
Copy link

LGTM
Does it work for fields less than int? (byte or short)
Could you please give an example of jit-diff regression dasm from this?

@AndyAyersMS
Copy link
Member Author

Yes, it handles bytes.

But there is something odd going on in VN that I want to look into further.

Example:

using System;

struct B
{
    public byte x;
}

struct BB
{
    public B b1;
    public B b2;
}

class X
{
    public static int Main()
    {
        BB bb = new BB();
        bb.b1.x = 64;
        bb.b2.x = 36;
        return (int) bb.b1.x + (int) bb.b2.x;
    }
}
;; before
;; Promotion blocked: struct contains struct field with one field, but that field has invalid size or type
...
       mov      eax, 100
       ret      

;; after
;; Promoting struct local V00 (BB):
;; lvaGrabTemp returning 2 (V02 tmp1) (a long lifetime temp) called for field V00.b1 (fldOffset=0x0).
;; lvaGrabTemp returning 3 (V03 tmp2) (a long lifetime temp) called for field V00.b2 (fldOffset=0x1).
...
       push     rax
       nop      
       mov      byte  ptr [rsp+04H], 64
       mov      byte  ptr [rsp], 36
       movzx    rax, byte  ptr [rsp+04H]
       movzx    rdx, byte  ptr [rsp]
       add      eax, edx
       add      rsp, 8
       ret      

For some reason VN is unable to propagate constants though the promoted fields.

@AndyAyersMS
Copy link
Member Author

Some notes on the byte wrapper example. First, in morph.... in the after case, we have:

[000010] ------------              /--*  CNS_INT   int    64
[000012] -A--G-------              *  ASG       ubyte 
[000011] ----G--N----              \--*  FIELD     ubyte  x
[000009] ------------                 \--*  ADDR      byref 
[000008] ------------                    \--*  LCL_VAR   ubyte  V02 tmp1

Where tmp1 is a normalize on load promoted field.

Morph transforms the RHS into

[000011] *---G--N----              *  IND       ubyte 
[000009] -----+------              \--*  ADDR      byref 
[000008] -----+-N----                 \--*  LCL_VAR   ubyte  V02 tmp1 

and then tries to simplify the IND(ADDR(...)).

Because morph sees a small type and a normalize on load temp, it won't fold this to just tmp1. So it ends up creating a local field and marking tmp1 as DNER.

Local V02 should not be enregistered because: was accessed as a local field

[000010] -----+------              /--*  CNS_INT   int    64
[000012] -A--G+------              *  ASG       ubyte 
[000008] D----+-N----              \--*  LCL_FLD   ubyte  V02 tmp1         [+0] Fseq[x]

But here the IR is storing to tmp1, so it seems odd to check for normalize on load.

Later on at the return we do hit uses and here the normalize on load check behaves similarly. But if we allowed folding in that case it looks like we'd go on to call fgMorphLclVar and introduce the needed casts.

So I'm wondering if we can just remove those normalize on load blockers entirely, or if not, put in a more accurate check.

None of this should be needed to enable const prop later on, so will look into that next.

@mikedn
Copy link

mikedn commented Feb 27, 2019

Ahm, promotion of small int fields, be careful with that because the JIT does something weird: https://github.com/dotnet/coreclr/issues/20957#issuecomment-442171482. It also results in other CQ isuess: https://github.com/dotnet/coreclr/issues/20957#issuecomment-445941114

@AndyAyersMS
Copy link
Member Author

Second, for value numbering: When VN encounters the write to a GT_LCL_FLD, since it is an entire write, it simply uses the value number of the source.

N001 [000010]   CNS_INT   64 => $42 {IntCns 64}
    VNForCastOper(ubyte) is $43
N002 [000008]   LCL_FLD   V02 tmp1         d:2[+0] Fseq[x] => $42 {IntCns 64}
N003 [000012]   ASG       => $42 {IntCns 64}

***** BB01, stmt 2 (after)
N001 (  1,  1) [000010] ------------              /--*  CNS_INT   int    64 $42
N003 (  6,  7) [000012] -A--G---R---              *  ASG       ubyte  $42
N002 (  4,  5) [000008] D------N----              \--*  LCL_FLD   ubyte  V02 tmp1         d:2[+0] Fseq[x] $42

But for reads of GT_LCL_FLD, VN always goes via the field selectors:

  VNApplySelectors:
    VNForHandle(x) is $100, fieldType is ubyte
    VNForMapSelect($42, $100):ubyte returns $140 {$42[$100]}
  VNApplySelectors:
    VNForHandle(x) is $100, fieldType is ubyte
    VNForMapSelect($42, $100):ubyte returns $140 {$42[$100]}
N001 [000024]   LCL_FLD   V02 tmp1         u:2[+0] Fseq[x] (last use) => $140 {$42[$100]}

N004 ( 10, 12) [000033] ----G-------              *  RETURN    int    $1c0
N002 (  4,  5) [000029] ------------              |  /--*  LCL_FLD   ubyte  V03 tmp2         u:2[+0] Fseq[x] (last use) $141
N003 (  9, 11) [000032] ----G-------              \--*  ADD       int    $180
N001 (  4,  5) [000024] ------------                 \--*  LCL_FLD   ubyte  V02 tmp1         u:2[+0] Fseq[x] (last use) $140

So the read doesn't pick up the constant.

In the "before" case none of the accesses are entire, things match up, and constants propagate.

Seems like we need to duplicate the "entire" logic here and just use the SSA value number for the local.

@AndyAyersMS
Copy link
Member Author

Am going to split off the VN change as a separate PR. Will leave morph alone for now.

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 27, 2019

Regression in PopulateParamTableRows is that we now do field-wise assignment of a 3 field struct instead of entire assignment.

@AndyAyersMS
Copy link
Member Author

The CoreFX failures (release, checked) seem somewhat random. Let's see if they repro.

@dotnet-bot test Windows_NT x64 Checked CoreFX Tests
@dotnet-bot test Windows_NT x64 Release CoreFX Tests

@AndyAyersMS
Copy link
Member Author

Got some different checked CoreFX failures this time.

@AndyAyersMS
Copy link
Member Author

Ubuntu arm stuff never ran

@dotnet-bot test Ubuntu arm Cross Checked Innerloop Build and Test
@dotnet-bot test Ubuntu arm Cross Checked no_tiered_compilation_innerloop Build and Test

@AndyAyersMS
Copy link
Member Author

And will take one more shot at checked CoreFX...

@dotnet-bot test Windows_NT x64 Checked CoreFX Tests

@AndyAyersMS
Copy link
Member Author

Not a fan of rerunning tests until they pass, but I am fairly sure the CoreFX failures seen above were unrelated.

@AndyAyersMS AndyAyersMS merged commit 8f5bf71 into dotnet:master Feb 28, 2019
@AndyAyersMS AndyAyersMS deleted the Range branch February 28, 2019 20:15
AndyAyersMS added a commit to AndyAyersMS/coreclr that referenced this pull request Mar 1, 2019
AndyAyersMS added a commit that referenced this pull request Mar 1, 2019
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…lds (dotnet/coreclr#22867)

For a while now the jit has been able to promote an outer struct A with an
inner struct field B that itself has a single non-struct field C, provided
that C occupies all of B and that C and B are pointer-sized.

For example, this comes up when supporting promotion of `Span<T>`, as a span
contains a `ByReference<T>` field that itself contains a pointer-sized field.

This change relaxes the constraints slightly, allowing B and C to be less than
pointer sized, provided C still occupies all of B, and B is suitably aligned
within A.

Doing so allows promotion of the new `Range` type, which contains two `Index`
fields that each wrap an `int`. This improves performance for uses of `Range`
for simple examples like those in dotnet/coreclr#22079.

Commit migrated from dotnet/coreclr@8f5bf71
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants