Adding EVEX encoding logic for RR/AM pathways #77419

DeepakRajendrakumaran · 2022-10-24T23:00:54Z

This PR enables EVEX encoding for the following paths

emitOutputRR()
paths using emitOutputAM()

ghost · 2022-10-24T23:01:18Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This is a draft PR created for verifying some configs.

Author:	DeepakRajendrakumaran
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

DeepakRajendrakumaran · 2022-10-28T21:43:07Z

@tannergooding @BruceForstall @anthonycanino

These are the next set of changes for enabling EVEX encoding paths.

BruceForstall

I have some minor questions and suggestions

BruceForstall · 2022-10-29T00:38:15Z

src/coreclr/jit/instrsxarch.h

@@ -29,117 +29,117 @@
 #endif
 /*****************************************************************************/
 #ifndef INST0
-#define INST0(id, nm, um, mr,                 flags)
+#define INST0(id, nm, um, mr,                 tt, flags)


Add a comment for tt in the comment section at the top of the file

BruceForstall · 2022-10-29T00:43:24Z

src/coreclr/jit/instr.h

@@ -330,6 +339,31 @@ enum insBarrier : unsigned
 };
 #endif

+#if defined(TARGET_XARCH)
+// Represents tupletype attribute of instruction.
+// This is used in determining factor N while calculated compressed displacement in EVEX encoding


Suggested change

// This is used in determining factor N while calculated compressed displacement in EVEX encoding

// This is used in determining factor N while calculating compressed displacement in EVEX encoding

BruceForstall · 2022-10-29T00:44:27Z

src/coreclr/jit/instr.h

+// Represents tupletype attribute of instruction.
+// This is used in determining factor N while calculated compressed displacement in EVEX encoding
+// Reference: Section 2.6.5 in Intel 64 and ia-32 architectures software developer's manual volume 2.
+enum insTupleType : uint32_t


Is there a reason why the 0xF00 bits are not used?

Whoops. Yeah that's a mistake. Fixed

BruceForstall · 2022-10-29T00:55:30Z

src/coreclr/jit/instr.cpp

@@ -1789,7 +1789,7 @@ instruction CodeGen::ins_MathOp(genTreeOps oper, var_types type)
 }

 // Conversions to or from floating point values
-instruction CodeGen::ins_FloatConv(var_types to, var_types from)
+instruction CodeGen::ins_FloatConv(var_types to, var_types from, emitAttr attr)


Can you document attr? (Best would be to add a new-style function header comment that documents all the arguments and the function behavior)

Is this newly required for AVX-512? Does it change any existing behavior?

Done. It's required now because we divided INS_cvtsi2ss to INS_cvtsi2ss32 and INS_cvtsi2ss64. Similar for INS_cvtsi2sd. This is required because these instruction have some specific characteristic depending on if input is m32 or m64(W bit set or not)

BruceForstall · 2022-10-29T00:56:30Z

src/coreclr/jit/instr.cpp

@@ -1867,7 +1887,7 @@ instruction CodeGen::ins_MathOp(genTreeOps oper, var_types type)
    }
 }

-instruction CodeGen::ins_FloatConv(var_types to, var_types from)
+instruction CodeGen::ins_FloatConv(var_types to, var_types from, emitAttr attr)


It might be best to ifdef this function declaration so you don't touch the ARM implementation which doesn't use this argument and for which it might be confusing.

BruceForstall · 2022-10-29T00:57:17Z

src/coreclr/jit/codegenxarch.cpp

@@ -7015,7 +7015,7 @@ void CodeGen::genIntToFloatCast(GenTree* treeNode)

    // Note that here we need to specify srcType that will determine
    // the size of source reg/mem operand and rex.w prefix.
-    instruction ins = ins_FloatConv(dstType, TYP_INT);
+    instruction ins = ins_FloatConv(dstType, TYP_INT, emitTypeSize(srcType));


The other cases use emitTypeSize(dstType) but this uses srcType?

Ah.. should be srcType . We are trying to differentiate between VCVTSI2SS xmm1, xmm2, m32 and VCVTSI2SS xmm1, xmm2, m64

BruceForstall · 2022-10-29T01:01:27Z

src/coreclr/jit/emitxarch.cpp

+inline bool hasTupleTypeInfo(instruction ins)
+{
+    assert((unsigned)ins < ArrLen(insTupleTypeInfos));
+    return ((insTupleTypeInfos[ins] != INS_TT_NONE));


nit

Suggested change

return ((insTupleTypeInfos[ins] != INS_TT_NONE));

return (insTupleTypeInfos[ins] != INS_TT_NONE);

BruceForstall · 2022-10-29T01:02:42Z

src/coreclr/jit/emitxarch.cpp

+inline insTupleType insTupleTypeInfo(instruction ins)
+{
+    assert((unsigned)ins < ArrLen(insTupleTypeInfos));
+    assert((insTupleTypeInfos[ins] != INS_TT_NONE));


nit

Suggested change

assert((insTupleTypeInfos[ins] != INS_TT_NONE));

assert(insTupleTypeInfos[ins] != INS_TT_NONE);

BruceForstall · 2022-10-29T01:07:09Z

src/coreclr/jit/emitxarch.cpp

+    // Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doind so currently
+    // since we cannot differentiate EVEX vs VEX without 'code' untill all paths have EVEX support.


nits

Suggested change

// Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doind so currently

// since we cannot differentiate EVEX vs VEX without 'code' untill all paths have EVEX support.

// Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doing so currently

// since we cannot differentiate EVEX vs VEX without 'code' until all paths have EVEX support.

BruceForstall · 2022-10-29T01:12:58Z

src/coreclr/jit/emitxarch.h

+//    id - Instruction descriptor.
+//
+// Returns:
+//    `true` if the instruction does embeddded broadcast.


nit

Suggested change

// `true` if the instruction does embeddded broadcast.

// `true` if the instruction does embedded broadcast.

…nup.

BruceForstall · 2022-11-05T00:13:56Z

No diffs, as expected. Throughput is slightly improved on x64 (about 0.09%), slightly regressed on x86 (minimal on full opts, 0.18% on MinOpts)

BruceForstall · 2022-11-05T00:34:28Z

src/coreclr/jit/instr.h

+    INS_TT_NONE             = 0x00000,
+    INS_TT_FULL             = 0x00001,
+    INS_TT_HALF             = 0x00002,
+    INS_TT_IS_BROADCAST     = 0x00003,
+    INS_TT_FULL_MEM         = 0x00010,
+    INS_TT_TUPLE1_SCALAR    = 0x00020,
+    INS_TT_TUPLE1_FIXED     = 0x00040,
+    INS_TT_TUPLE2           = 0x00080,
+    INS_TT_TUPLE4           = 0x00100,
+    INS_TT_TUPLE8           = 0x00200,
+    INS_TT_HALF_MEM         = 0x00400,
+    INS_TT_QUARTER_MEM      = 0x00800,
+    INS_TT_EIGHTH_MEM       = 0x01000,
+    INS_TT_MEM128           = 0x02000,
+    INS_TT_MOVDDUP          = 0x04000,
+    INS_TT_IS_NON_BROADCAST = 0x7FFFC


A few minor nits to consider for later:

The high zero isn't needed, especially since there aren't any cases which set it. Remove it?

Thus, the "7" in INS_TT_IS_NON_BROADCAST is not needed

I was confused at first by INS_TT_IS_BROADCAST being 3 and not 4. Maybe rename INS_TT_IS_BROADCAST to INS_TT_BROADCAST_MASK and INS_TT_IS_NON_BROADCAST to INS_TT_NON_BROADCAST_MASK to make it clear they are bit masks? (I notice they currently aren't used)

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 24, 2022

ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 24, 2022

DeepakRajendrakumaran force-pushed the avx512-RR branch 7 times, most recently from bf2025a to 7468f73 Compare October 28, 2022 21:36

DeepakRajendrakumaran marked this pull request as ready for review October 28, 2022 21:42

BruceForstall requested changes Oct 29, 2022

View reviewed changes

ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Oct 29, 2022

DeepakRajendrakumaran added 2 commits November 2, 2022 10:14

Cleaning up TODO-XARCH-CLEANUP on EA_8BYTE size.

b172f73

Adding EVEX encoding pathways for emitOutputRR().

d6e5fd2

DeepakRajendrakumaran force-pushed the avx512-RR branch from 7876962 to c24f908 Compare November 2, 2022 17:17

Adding EVEX encoding support for AM paths.

2570374

DeepakRajendrakumaran force-pushed the avx512-RR branch from c24f908 to 77aa41e Compare November 2, 2022 17:51

Moving 'JitStressEvexEncoding' under Debug flag and other review clea…

0a1d2b9

…nup.

DeepakRajendrakumaran force-pushed the avx512-RR branch from 77aa41e to 0a1d2b9 Compare November 4, 2022 01:29

BruceForstall approved these changes Nov 5, 2022

View reviewed changes

BruceForstall merged commit 15f015f into dotnet:main Nov 5, 2022

ghost locked as resolved and limited conversation to collaborators Dec 5, 2022

BruceForstall added the avx512 Related to the AVX-512 architecture label Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding EVEX encoding logic for RR/AM pathways #77419

Adding EVEX encoding logic for RR/AM pathways #77419

DeepakRajendrakumaran commented Oct 24, 2022 •

edited

Loading

ghost commented Oct 24, 2022

DeepakRajendrakumaran commented Oct 28, 2022

BruceForstall left a comment

BruceForstall Oct 29, 2022

DeepakRajendrakumaran Nov 1, 2022

BruceForstall Oct 29, 2022

DeepakRajendrakumaran Nov 1, 2022

BruceForstall Oct 29, 2022

DeepakRajendrakumaran Nov 1, 2022

BruceForstall Oct 29, 2022

DeepakRajendrakumaran Nov 1, 2022

BruceForstall Oct 29, 2022

DeepakRajendrakumaran Nov 1, 2022

BruceForstall Oct 29, 2022

DeepakRajendrakumaran Nov 1, 2022 •

edited

Loading

BruceForstall Oct 29, 2022

DeepakRajendrakumaran Nov 1, 2022

BruceForstall Oct 29, 2022

DeepakRajendrakumaran Nov 1, 2022

BruceForstall Oct 29, 2022

DeepakRajendrakumaran Nov 1, 2022

BruceForstall Oct 29, 2022

DeepakRajendrakumaran Nov 1, 2022

BruceForstall commented Nov 5, 2022 •

edited

Loading

BruceForstall Nov 5, 2022

	// This is used in determining factor N while calculated compressed displacement in EVEX encoding
	// This is used in determining factor N while calculating compressed displacement in EVEX encoding

	return ((insTupleTypeInfos[ins] != INS_TT_NONE));
	return (insTupleTypeInfos[ins] != INS_TT_NONE);

	assert((insTupleTypeInfos[ins] != INS_TT_NONE));
	assert(insTupleTypeInfos[ins] != INS_TT_NONE);

		// Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doind so currently
		// since we cannot differentiate EVEX vs VEX without 'code' untill all paths have EVEX support.

	// `true` if the instruction does embeddded broadcast.
	// `true` if the instruction does embedded broadcast.

Adding EVEX encoding logic for RR/AM pathways #77419

Adding EVEX encoding logic for RR/AM pathways #77419

Conversation

DeepakRajendrakumaran commented Oct 24, 2022 • edited Loading

ghost commented Oct 24, 2022

DeepakRajendrakumaran commented Oct 28, 2022

BruceForstall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DeepakRajendrakumaran Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BruceForstall commented Nov 5, 2022 • edited Loading

Choose a reason for hiding this comment

DeepakRajendrakumaran commented Oct 24, 2022 •

edited

Loading

DeepakRajendrakumaran Nov 1, 2022 •

edited

Loading

BruceForstall commented Nov 5, 2022 •

edited

Loading