Improve struct inits. #52292

sandreenko · 2021-05-05T08:34:52Z

There is 1 change:

Keep ASG struct(LCL_VAR, 0) as STORE_LCL_VAR struct(0).

Small positive diffs (spmi x64):

benchmarks.run.windows.x64.checked.mch
Total bytes of delta: -86 (-0.07% of base)
88 total files with Code Size differences (87 improved, 1 regressed), 13 unchanged.

libraries.crossgen.windows.x64.checked.mch
Total bytes of delta: -152 (-0.16% of base)
142 total files with Code Size differences (141 improved, 1 regressed), 26 unchanged.

libraries.crossgen2.windows.x64.checked.mch
Total bytes of delta: -182 (-0.03% of base)
196 total files with Code Size differences (177 improved, 19 regressed), 864 unchanged.

libraries.pmi.windows.x64.checked.mch
Total bytes of delta: -576 (-0.08% of base)
573 total files with Code Size differences (565 improved, 8 regressed), 20 unchanged.

tests.pmi.windows.x64.checked.mch
Total bytes of delta: -11384 (-0.34% of base)
1951 total files with Code Size differences (1926 improved, 25 regressed), 255 unchanged.

a typical improvement looks like:

before:
N001 (  1,  1) [000020] ------------        t20 =    CNS_INT   int    0 $41
               [000201] Dc-----N----       t201 =    LCL_VAR_ADDR byref  V02 tmp2         
                                                  *    ubyte  V02.m_value (offs=0x00) -> V13 tmp13        
                                                  /--*  t201   byref  
                                                  +--*  t20    int    
N003 (  5,  4) [000021] sA----------              *  STORE_BLK struct<System.Data.SqlTypes.SqlBoolean, 1> (init) (Unroll)

codegen:
IN001a: 000070 xor      ecx, ecx
IN001b: 000072 mov      byte  ptr [rsp+30H], cl

after:
N001 (  1,  1) [000020] -c----------        t20 =    CNS_INT   int    0 $41
                                                  /--*  t20    int    
N003 (  5,  4) [000021] DA--G-------              *  STORE_LCL_VAR struct<System.Data.SqlTypes.SqlBoolean, 1>(AX)(P) V02 tmp2 

codegen:
IN0017: 00006D mov      byte  ptr [rsp+30H], 0

sandreenko

PTAL @echesakovMSFT @tannergooding , cc @dotnet/jit-contrib

sandreenko · 2021-05-05T16:07:06Z

src/coreclr/jit/vartype.h

+
+// Check if type1 matches any type from the list.
+template <typename... T>
+bool TypeIsInList(var_types type1, var_types type2, T... rest)


@AndyAyersMS was right in #43386 (comment).
I am looking for a better name for this method, could not keep TypeIs because that would be a conflict inside GenTree methods.

Prior art here is GenTree::StaticOperIs, I personally do not like that name.

I would be voting for V/varTypeIs (both cases are in use today...).

sandreenko · 2021-05-05T16:10:20Z

src/coreclr/jit/rationalize.cpp

-                var_types   simdType        = location->TypeGet();
-                GenTree*    initVal         = assignment->AsOp()->gtOp2;
-                CorInfoType simdBaseJitType = comp->getBaseJitTypeOfSIMDLocal(location);
-                if (simdBaseJitType != CORINFO_TYPE_UNDEF)


we have never had simdBaseJitType == CORINFO_TYPE_UNDEF on x64, otherwise it would be an assert later on

codegenxarch.cpp:4439: assert(varTypeUsesFloatReg(targetType) == varTypeUsesFloatReg(op1Type));

from a tree like:

STORE_LCL_VAR simd16 V01 \--* CNS_INT 0

This is possible, but we handle this in register allocation by lying about the type:
https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lsra.cpp#L6995-L7000

if (simdNode->GetSimdBaseJitType() == CORINFO_TYPE_UNDEF) { // There are a few scenarios where we can get a LCL_VAR which // doesn't know the underlying baseType. In that scenario, we // will just lie and say it is a float. Codegen doesn't actually // care what the type is but this avoids an assert that would // otherwise be fired from the more general checks that happen. simdNode->SetSimdBaseJitType(CORINFO_TYPE_FLOAT); }

Namely there are a variety of scenarios where the base type might not get copied around for things like SIMD locals and where we lose the class handle and type information.

Ideally, we would preserve this 100% of the time, since its needed by things like CSE, but we don't today.

The potential issue was:
ASG SIMD(LCL_VAR , 0) with simdBaseJitType == CORINFO_TYPE_UNDEF comes to rationalize and does not create STORE_LCL_VAR(SIMD(0)) leaving STORE_LCL_VAR(0).
code in lsra does not matter, it won't create a new node because it is after rationalize.
codegenxarch fails.

These lyings could be also deleted now (will push a change soon), maybe it will require to add if ((simdNode->gtSIMDIntrinsicID == SIMDIntrinsicUpperSave) || (simdNode->gtSIMDIntrinsicID != SIMDIntrinsicUpperRestore)) to simdcodegenxarch.cpp:genSIMDIntrinsic as we have this condition for arm64.

ASG SIMD(LCL_VAR , 0)

Is an interesting tree. In my testing (wrote a checker that looks for all ASG SIMD trees with mismatched types of LHS and RHS and ran it through SPMI), it only comes up when you initobj/initblk a SIMD local, and even then morph transforms it into ASG SIMD(LCL_VAR, SIMD(0)). I am curious to know if you know of other sources because my (WIP) fix for #51500 is to never import such ASGs in the first place.

oh, this code exists here to catch what morph did not for some reason, for example:

repro-7984.zip
unzip it and run like

"D:\Sergey\git\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe" D:\Sergey\git\runtime\artifacts\obj\coreclr\windows.x64.Checked\ide\jit\Debug\clrjit.dll(or any jit that you want to use) "repro-7984.mc" -jitoption force AltJit= -jitoption force AltJitNgen= -jitoption force JitDump=* -jitoption force NgenDump=*

to see the dump, it will show you that in this case the issue is in fgMorphBlock that does not call morphTree for the trees that it creates, @kunalspathak recently pointed me to another issue caused by the same source.
I have decided not to hack fgMorphBlock anymore to squeeze positive diffs(and support struct enreg), so I am rewriting it now. However, it will take time and for this PR I want to keep this "catch morph failures" block in place.

fgMorphInitBlock: using field by field initialization. GenTreeNode creates assertion: [000079] IA---------- * ASG simd12 (init) In BB01 New Local Constant Assertion: V06 == 0 index=#01, mask=0000000000000001 GenTreeNode creates assertion: [000082] -A---------- * ASG float In BB01 New Local Constant Assertion: V07 == 0.000000 index=#02, mask=0000000000000002 fgMorphInitBlock (after): [000083] -A---+------ * COMMA void [000079] IA---------- +--* ASG simd12 (init) <- forgot to call fgMorphTree :-( [000077] D------N---- | +--* LCL_VAR simd12<System.Numerics.Vector3> V06 tmp4 [000078] ------------ | \--* CNS_INT simd12 0 [000082] -A---------- \--* ASG float [000080] D------N---- +--* LCL_VAR float V07 tmp5 [000081] ------------ \--* CNS_DBL float 0.00000000000000000

I have decided not to hack fgMorphBlock anymore to squeeze positive diffs(and support struct enreg), so I am rewriting it now. However, it will take time.

I believe you 😄. Good luck!

it will show you that in this case the issue is in fgMorphBlock that does not call morphTree for the trees that it creates

Thanks!

sandreenko · 2021-05-05T16:12:31Z

src/coreclr/jit/lower.cpp

        }
-        else if (!src->OperIs(GT_LCL_VAR) || (varDsc->GetLayout()->GetRegisterType() == TYP_UNDEF))
+        else if (!varDsc->IsEnregisterable())


the main goal of this change is to keep ASG(LCL_VAR, 0) as STORE_LCL_VAR(0) without taking the address of the lcl.

tannergooding · 2021-05-05T19:26:01Z

src/coreclr/jit/codegenarm64.cpp

-        // these implemented over lclVars created by CSE without full handle information (and
-        // therefore potentially without a base type).
-        if ((simdNode->gtSIMDIntrinsicID != SIMDIntrinsicUpperSave) &&
+    if (!TypeIsInList(simdNode->GetSimdBaseType(), TYP_INT, TYP_LONG, TYP_FLOAT, TYP_DOUBLE, TYP_USHORT, TYP_UBYTE,


This check is probably better served by one that does a value >= TYP_BYTE && value <= TYP_ULONG (I think we have an existing helper for this already), since these values are sequentially ordered in the enum.

I believe the relevant helper is varTypeIsArithmetic, although it also allows TYP_BOOL.

varTypeIsArithmetic is the one that the rest of the SIMD checks use.

Do we have tests for TYP_BOOL SIMD intrinsic? why don't they fail nowadays on this check?

Because we filter it out by resolving the actual handle for all generic types during import: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/simd.cpp#L128

If the CorInfoType isn't recognized here, then we return CORINFO_TYPE_UNDEF and that is ultimately filtered out and treated as "unsupported":

https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/simd.cpp#L1922-L1925

https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/simdashwintrinsic.cpp#L207-L213

https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsic.cpp#L662-L729

etc

And yes, we have tests (such as https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/General/NotSupported) that covers types like bool and nint/nuint (the latter two of which are only supported on Vector<T> and not yet on Vector64/128/256<T>)

tannergooding · 2021-05-05T19:31:10Z

src/coreclr/jit/codegenarm64.cpp

+        }
+
+        const GenTree* op1 = simdNode->gtGetOp1();
+        if ((simdNode->gtSIMDIntrinsicID == SIMDIntrinsicInit) && (op1->IsIntegralConst(0) || op1->IsFPZero()))


We have GenTree::IsSIMDZero that should be preferred here, since it will be updated to include the get_Zero HWIntrinsics in the future: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/gentree.h#L7061

tannergooding · 2021-05-05T19:36:46Z

src/coreclr/jit/importer.cpp

@@ -1042,30 +1030,30 @@ bool Compiler::impCheckImplicitArgumentCoercion(var_types sigType, var_types nod
        return true;
    }

-    if (TypeIs(sigType, TYP_BOOL, TYP_UBYTE, TYP_BYTE, TYP_USHORT, TYP_SHORT, TYP_UINT, TYP_INT))


TypeIs matches the names we have elsewhere, like OperIs. I don't think its worth changing without doing considering a standard approach for the entire JIT.

The problem is that TypeIs is declared for GenTree as a member function, so if we add a global TypeIs (we can't declare methods inside enum, even if it is a enum class) for each GenTree:method() { TypeIs() } we will deal with c++ overload resolution that will choose between GenTree:TypeIs() and global ::TypeIs() that could lead to errors.

src/coreclr/jit/rationalize.cpp

src/coreclr/jit/lower.cpp

tannergooding · 2021-05-05T19:46:01Z

src/coreclr/jit/simdcodegenxarch.cpp

@@ -446,7 +446,7 @@ void CodeGen::genSIMDScalarMove(
                }
                else
                {
-                    genSIMDZero(targetType, TYP_FLOAT, targetReg);


I'm slightly worried about how this might impact CSE, particularly since (AFAIR) it relies on having the handle available (which requires the baseType to resolve)

These changes are (and I think the intent for them is to stay that way) all contained to codegen, so this shouldn't be a problem?

It might be fine for this case, but it does lose the ability to generate different instructions based on type (which we somewhat do for hwintrinsics).

I guess a bit of it will be moot when I eventually port this over to the SimdAsHWIntrinsic path...

It might be fine for this case, but it does lose the ability to generate different instructions based on type (which we somewhat do for hwintrinsics).

My understanding is that some intrinsic generate different code depending on their base type, for example, init(SIMD16, 1) will need to know the base type to know how many elements are in the SIMD16.
The other intrinsics in some cases do not depend on base type, like SIMDIntrinsicInit(0), so we should not include baseType in CSE/VN considerations.

It might be fine for this case, but it does lose the ability to generate different instructions based on type (which we somewhat do for hwintrinsics).

I don't see what is lost here, the generated code does not depend on the base type.

There are multiple instructions for zeroing and while they are treated the same on most modern CPUs, not all CPUs do.

And so, it may be that certain hardware we want to generate pxor or xorpd instead of xorps. Likewise, if we ever add AVX-512 or SVE support, we may want different logic here for various masking or other operations that can occur.
and so carrying the type through to codegen is goodness IMO, even if today we happen to ignore it.

For both TYP_SIMD16 and TYP_SIMD32 there are the following "simd zeroing" instructions:

xorps - float

xorpd - double

pxor - integer
Where xmm is used for TYP_SIMD16 and ymm is used for TYP_SIMD32

On modern Intel/AMD (the architecture manuals list Silvermont/Ryzen and newer), all three of these are dependency breaking/zeroing idioms and should actually be replaced in the renamer and so should be equivalent/interchangeable.

Noting that Silvermont is listed under the Intel Atom section, I didn't see anything listed under the non-Atom sections

However, on some hardware (particularly "older" hardware, such as Intel Core microarchitecture), the recommendation is to not mix floating-point and integer code:

3.5.1.10 Mixing SIMD Data Type
Previous microarchitectures (before Intel Core microarchitecture) do not have explicit restrictions on mixing integer and floating-point (FP) operations on XMM registers. For Intel Core microarchitecture, mixing integer and floating-point operations on the content of an XMM register can degrade perfor�mance. Software should avoid mixed-use of integer/FP operation on XMM registers. Specifically:
• Use SIMD integer operations to feed SIMD integer operations. Use PXOR for idiom.
• Use SIMD floating-point operations to feed SIMD floating-point operations. Use XORPS for idiom.
• When floating-point operations are bitwise equivalent, use PS data type instead of PD data type.
MOVAPS and MOVAPD do the same thing, but MOVAPS takes one less byte to encode the instruction.

Likewise, once you get to AVX-512, there are additional instructions such as pxord and pxorq, which get used instead because it is impactful for masked instructions.

I see, thanks for the explanation, so would you prefer to lie about the type when it is not available to better support a future possibility of an arch where void genSIMDZero(var_types targetType, var_types baseType, regNumber targetReg); will use base_type?

I'd prefer we always pass the "correct" type for the operation here and to finish plumbing through things like GenTreeJitIntrinsic->m_classLayout, but short of that the existing behavior of lying works well enough today so we can still generate something.

I guess I did not clearly describe the scenario that I want to support:

struct RandomStructWith16ByteSize { int a; float b; int c; float d; }

and we do:

RandomStructWith16ByteSize a = new RandomStructWith16ByteSize(); // zero init

if a can be put into an xmm register I want it to be there and generate a zero-init when there is no existing "correct type".

Maybe we should just explicitly support TYP_UNDEF and we can handle it as appropriate (which is currently treating it as TYP_FLOAT) in codegen ?

For context, I've been slowly doing work to get rid of the older SIMD support (like SimdIntrinsicInit) and replacing it with the newer HWIntrinsic support. When that happens, genSimdZero will be replaced with NI_Vector64_get_Zero, NI_Vector128_get_Zero, and NI_Vector256_get_zero so everything can be centrally handled. -- #35421 and #37882 are the two past PRs and #52288 is the latest

At that point, the codegen will generate different instructions based on the type by default and we'll need to handle TYP_UNDEF somehow anyways (different HWIntrinsics have different contracts for what instruction they'll generate, since they are mostly a 1-to-1 mapping with the hardware).

sandreenko · 2021-05-05T20:48:28Z

The regressions are covered by #52286

sandreenko · 2021-05-06T01:03:13Z

The PR was updated based on the previous discussion.

sandreenko · 2021-05-07T16:12:30Z

@dotnet/jit-contrib Could somebody please take a look?

tannergooding · 2021-05-07T16:20:43Z

src/coreclr/jit/lower.cpp

+                    }
+                    GenTreeSIMD* simdTree =
+                        comp->gtNewSIMDNode(regType, src, SIMDIntrinsicInit, simdBaseJitType, varDsc->lvExactSize);
+                    BlockRange().InsertAfter(src, simdTree);


Shouldn't we lower the newly created simdTree so containment and other transforms can happen as appropriate?

Nice catch, thank you! It fixes a todo that I added because now we have SIMD16 instead of SIMD12.

sandreenko · 2021-05-07T18:18:55Z

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress

azure-pipelines · 2021-05-07T18:19:26Z

Azure Pipelines successfully started running 2 pipeline(s).

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 5, 2021

sandreenko commented May 5, 2021

View reviewed changes

sandreenko marked this pull request as ready for review May 5, 2021 16:13

tannergooding reviewed May 5, 2021

View reviewed changes

src/coreclr/jit/rationalize.cpp Outdated Show resolved Hide resolved

tannergooding reviewed May 5, 2021

View reviewed changes

src/coreclr/jit/lower.cpp Outdated Show resolved Hide resolved

tannergooding reviewed May 5, 2021

View reviewed changes

Support STORE_LCL_VAR(0) after lowering.

bb7356e

sandreenko force-pushed the improveSIMDInit branch from 02bacd1 to bb7356e Compare May 6, 2021 01:02

tannergooding reviewed May 7, 2021

View reviewed changes

fix a nice catch from Tanner

e8e2a8b

tannergooding approved these changes May 7, 2021

View reviewed changes

BruceForstall approved these changes May 7, 2021

View reviewed changes

sandreenko merged commit d0b9864 into dotnet:main May 8, 2021

sandreenko deleted the improveSIMDInit branch May 8, 2021 00:34

sandreenko mentioned this pull request May 10, 2021

Keep structs in registers #43867

Closed

10 tasks

karelz added this to the 6.0.0 milestone May 20, 2021

JulieLeeMSFT mentioned this pull request Jun 8, 2021

What's new in .NET 6 Preview 5 dotnet/core#6099

Closed

ghost locked as resolved and limited conversation to collaborators Jun 19, 2021

Improve struct inits. #52292

Improve struct inits. #52292

Conversation

sandreenko commented May 5, 2021 • edited Loading

sandreenko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding May 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandreenko May 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding May 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandreenko commented May 5, 2021

sandreenko commented May 6, 2021

sandreenko commented May 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandreenko commented May 7, 2021

azure-pipelines bot commented May 7, 2021

sandreenko commented May 5, 2021 •

edited

Loading

tannergooding May 5, 2021 •

edited

Loading

sandreenko May 5, 2021 •

edited

Loading

tannergooding May 5, 2021 •

edited

Loading