Simplifying the emitter handling of 4-byte encoded SSE instructions #21528

tannergooding · 2018-12-13T18:41:12Z

This is an incremental cleanup on the emitter around the 4-byte encoded SSE instruction handling.

Currently, we set UseSSE4=true whenever the compiler supports any ISA that requires such encoding. In the emitter, we then check this value along with some other metadata/values to determine if we need to increase the estimated bytes emitted or if we have an extra byte to actually emit.

We can simplify this logic greatly by just getting rid of UseSSE4 and relying only on EncodedBySSE38OrSSE3A and UseVexEncoding.

tannergooding · 2018-12-13T18:41:20Z

CC. @CarolEidt, @fiigii

tannergooding · 2018-12-13T18:41:50Z

src/jit/emitxarch.cpp

@@ -25,19 +25,14 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 #include "emit.h"
 #include "codegen.h"

-bool IsSSE2Instruction(instruction ins)
+bool IsSSEInstruction(instruction ins)


SSE2 wasn't an accurate name, since this was also checking SSE and SSE3 instructions

tannergooding · 2018-12-13T18:42:32Z

src/jit/emitxarch.cpp

-    return (ins >= INS_FIRST_SSE2_INSTRUCTION) && (ins <= INS_LAST_SSE2_INSTRUCTION);
-}
-
-bool IsSSE4Instruction(instruction ins)


We don't need this grouping anymore, since we just rely on EncodedBySSE38OrSSE3A (and since we had some instructions in this grouping that weren't 4-byte encoded).

tannergooding · 2018-12-13T18:43:08Z

src/jit/emitxarch.cpp

-//
-// Note that this should be true for any of the instructions in instrsXArch.h
-// that use the SSE38 or SSE3A macro.
-bool emitter::Is4ByteSSE4OrAVXInstruction(instruction ins)


This is just simplified to EncodedBySSE38OrSSE3A

tannergooding · 2018-12-13T18:44:57Z

src/jit/instrsxarch.h

@@ -177,7 +177,7 @@ INSTMUL(imul_15,        "imul",             IUM_RD, BAD_CODE,     0x4400003868,
 #define VEX3INT(c1,c2)   PACK4(c1, 0xc5, 0x02, c2)
 #define VEX3FLT(c1,c2)   PACK4(c1, 0xc5, 0x02, c2)

-INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)
+INST3(FIRST_SSE_INSTRUCTION, "FIRST_SSE_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)


We can simplify this grouping to just SSE, which covers everything prior to AVX

tannergooding · 2018-12-13T18:45:35Z

src/jit/instrsxarch.h

@@ -177,7 +177,7 @@ INSTMUL(imul_15,        "imul",             IUM_RD, BAD_CODE,     0x4400003868,
 #define VEX3INT(c1,c2)   PACK4(c1, 0xc5, 0x02, c2)
 #define VEX3FLT(c1,c2)   PACK4(c1, 0xc5, 0x02, c2)

-INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)


SSE2 wasn't accurate since we also had SSE and SSE3 instructions.

tannergooding · 2018-12-13T18:46:15Z

src/jit/instrsxarch.h


-INST3(FIRST_SSE4_INSTRUCTION, "FIRST_SSE4_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)


SSE4 wasn't accurate, since this covered anything that was SSE38 or SSE3A encoded (including some SSSE3 and other instructions). It also didn't only cover instructions that required the 4-byte encoding.

tannergooding · 2018-12-13T18:47:53Z

src/jit/emitxarch.cpp

@@ -1306,6 +1286,11 @@ bool emitter::EncodedBySSE38orSSE3A(instruction ins)

    size_t insCode = 0;

+    if (!IsSSEOrAVXInstruction(ins))


This covers the additional check that Is4ByteSSE4Instruction and Is4ByteSSE4OrAVXInstruction were doing, but as a single check (rather than multiple separate ones).
It also short-circuits the path for any instruction that can't be SSE38/SSE3A.

fiigii

LGTM overall, just one naming suggestion.

fiigii · 2018-12-13T20:28:41Z

src/jit/emitxarch.cpp

-bool emitter::Is4ByteSSE4Instruction(instruction ins)
+// that use the SSE38 or SSE3A macro but returns false if the VEX encoding is
+// in use, since that encoding does not require an additional byte.
+bool emitter::Is4ByteSSEInstruction(instruction ins)


Now, the code mixes using EncodedBySSE38orSSE3A and Is4ByteSSEInstruction that looks confusing. Perhaps, we can rename this function to EncodedByLegacySSE38orSSE3A or something similar.

I think it might be okay to keep it as is, and instead follow up with the code calling Is4ByteSSEInstruction so we can remove this as well.

Currently, the majority of places calling Is4ByteSSEInstruction are doing so to determine if they should update the sz estimate (7/11 references). This logic would be better suited to be centralized (as most of the other size calculations are).

Of the remaining 4 usages, 2 are assert(IsAVXInstruction(ins) || Is4ByteSSEInstruction(ins));, which can be simplified to assert(IsAVXInstruction(ins) || EncodedBySSE38OrSSE3A(ins)); and the remaining two are for determining if the additional byte needs to be output (and could just be simplified to !UseVEXEncoding() && EncodedBySSE38orSSE3A(ins) or centralized).

It would be a slightly more involved change, however; so I think it would be better as a followup fix.

fiigii · 2018-12-13T20:32:02Z

Please also run noavx and sse2only CI tests.

tannergooding · 2018-12-13T21:09:27Z

test Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0)

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic
test Windows_NT x64 Checked jitsse2only

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic
test Windows_NT x86 Checked jitsse2only

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic
test Ubuntu x64 Checked jitsse2only

tannergooding · 2018-12-14T21:01:08Z

CC. @CarolEidt. This should be ready for review.
As per the title, this is just an iterative improvement trying to simplify some of the emitter logic around the SSE types.

CarolEidt

LGTM - I'm kind of surprised that it's possible to make incremental improvements like this!

…otnet/coreclr#21528) Commit migrated from dotnet/coreclr@813bd6e

Simplifying the emitter handling of 4-byte encoded SSE instructions

4b978fe

tannergooding commented Dec 13, 2018

View reviewed changes

fiigii approved these changes Dec 13, 2018

View reviewed changes

CarolEidt approved these changes Dec 14, 2018

View reviewed changes

tannergooding merged commit 813bd6e into dotnet:master Dec 14, 2018

mikedn mentioned this pull request Mar 8, 2019

Handle addressing modes for HW intrinsics #22944

Merged

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Simplifying the emitter handling of 4-byte encoded SSE instructions (d…

51d747c

…otnet/coreclr#21528) Commit migrated from dotnet/coreclr@813bd6e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplifying the emitter handling of 4-byte encoded SSE instructions #21528

Simplifying the emitter handling of 4-byte encoded SSE instructions #21528

tannergooding commented Dec 13, 2018

tannergooding commented Dec 13, 2018

tannergooding Dec 13, 2018

tannergooding Dec 13, 2018

tannergooding Dec 13, 2018

tannergooding Dec 13, 2018

tannergooding Dec 13, 2018

tannergooding Dec 13, 2018

tannergooding Dec 13, 2018

fiigii left a comment

fiigii Dec 13, 2018

tannergooding Dec 13, 2018 •

edited

Loading

fiigii commented Dec 13, 2018

tannergooding commented Dec 13, 2018

tannergooding commented Dec 14, 2018

CarolEidt left a comment


		INST3(FIRST_SSE4_INSTRUCTION, "FIRST_SSE4_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)

		@@ -1306,6 +1286,11 @@ bool emitter::EncodedBySSE38orSSE3A(instruction ins)

		size_t insCode = 0;

		if (!IsSSEOrAVXInstruction(ins))

Simplifying the emitter handling of 4-byte encoded SSE instructions #21528

Simplifying the emitter handling of 4-byte encoded SSE instructions #21528

Conversation

tannergooding commented Dec 13, 2018

tannergooding commented Dec 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fiigii left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Dec 13, 2018 • edited Loading

Choose a reason for hiding this comment

fiigii commented Dec 13, 2018

tannergooding commented Dec 13, 2018

tannergooding commented Dec 14, 2018

CarolEidt left a comment

Choose a reason for hiding this comment

tannergooding Dec 13, 2018 •

edited

Loading