-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Simplifying the emitter handling of 4-byte encoded SSE instructions #21528
Conversation
CC. @CarolEidt, @fiigii |
@@ -25,19 +25,14 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | |||
#include "emit.h" | |||
#include "codegen.h" | |||
|
|||
bool IsSSE2Instruction(instruction ins) | |||
bool IsSSEInstruction(instruction ins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SSE2
wasn't an accurate name, since this was also checking SSE
and SSE3
instructions
return (ins >= INS_FIRST_SSE2_INSTRUCTION) && (ins <= INS_LAST_SSE2_INSTRUCTION); | ||
} | ||
|
||
bool IsSSE4Instruction(instruction ins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this grouping anymore, since we just rely on EncodedBySSE38OrSSE3A
(and since we had some instructions in this grouping that weren't 4-byte encoded).
// | ||
// Note that this should be true for any of the instructions in instrsXArch.h | ||
// that use the SSE38 or SSE3A macro. | ||
bool emitter::Is4ByteSSE4OrAVXInstruction(instruction ins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just simplified to EncodedBySSE38OrSSE3A
@@ -177,7 +177,7 @@ INSTMUL(imul_15, "imul", IUM_RD, BAD_CODE, 0x4400003868, | |||
#define VEX3INT(c1,c2) PACK4(c1, 0xc5, 0x02, c2) | |||
#define VEX3FLT(c1,c2) PACK4(c1, 0xc5, 0x02, c2) | |||
|
|||
INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None) | |||
INST3(FIRST_SSE_INSTRUCTION, "FIRST_SSE_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simplify this grouping to just SSE
, which covers everything prior to AVX
@@ -177,7 +177,7 @@ INSTMUL(imul_15, "imul", IUM_RD, BAD_CODE, 0x4400003868, | |||
#define VEX3INT(c1,c2) PACK4(c1, 0xc5, 0x02, c2) | |||
#define VEX3FLT(c1,c2) PACK4(c1, 0xc5, 0x02, c2) | |||
|
|||
INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SSE2
wasn't accurate since we also had SSE
and SSE3
instructions.
|
||
INST3(FIRST_SSE4_INSTRUCTION, "FIRST_SSE4_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SSE4
wasn't accurate, since this covered anything that was SSE38
or SSE3A
encoded (including some SSSE3
and other instructions). It also didn't only cover instructions that required the 4-byte encoding.
@@ -1306,6 +1286,11 @@ bool emitter::EncodedBySSE38orSSE3A(instruction ins) | |||
|
|||
size_t insCode = 0; | |||
|
|||
if (!IsSSEOrAVXInstruction(ins)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This covers the additional check that Is4ByteSSE4Instruction
and Is4ByteSSE4OrAVXInstruction
were doing, but as a single check (rather than multiple separate ones).
It also short-circuits the path for any instruction that can't be SSE38/SSE3A.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just one naming suggestion.
bool emitter::Is4ByteSSE4Instruction(instruction ins) | ||
// that use the SSE38 or SSE3A macro but returns false if the VEX encoding is | ||
// in use, since that encoding does not require an additional byte. | ||
bool emitter::Is4ByteSSEInstruction(instruction ins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, the code mixes using EncodedBySSE38orSSE3A
and Is4ByteSSEInstruction
that looks confusing. Perhaps, we can rename this function to EncodedByLegacySSE38orSSE3A
or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be okay to keep it as is, and instead follow up with the code calling Is4ByteSSEInstruction
so we can remove this as well.
Currently, the majority of places calling Is4ByteSSEInstruction
are doing so to determine if they should update the sz
estimate (7/11 references). This logic would be better suited to be centralized (as most of the other size calculations are).
Of the remaining 4 usages, 2 are assert(IsAVXInstruction(ins) || Is4ByteSSEInstruction(ins));
, which can be simplified to assert(IsAVXInstruction(ins) || EncodedBySSE38OrSSE3A(ins));
and the remaining two are for determining if the additional byte needs to be output (and could just be simplified to !UseVEXEncoding() && EncodedBySSE38orSSE3A(ins)
or centralized).
It would be a slightly more involved change, however; so I think it would be better as a followup fix.
Please also run noavx and sse2only CI tests. |
test Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) test Windows_NT x64 Checked jitincompletehwintrinsic test Windows_NT x86 Checked jitincompletehwintrinsic test Ubuntu x64 Checked jitincompletehwintrinsic |
CC. @CarolEidt. This should be ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I'm kind of surprised that it's possible to make incremental improvements like this!
This is an incremental cleanup on the emitter around the 4-byte encoded SSE instruction handling.
Currently, we set
UseSSE4=true
whenever the compiler supports any ISA that requires such encoding. In the emitter, we then check this value along with some other metadata/values to determine if we need to increase the estimated bytes emitted or if we have an extra byte to actually emit.We can simplify this logic greatly by just getting rid of
UseSSE4
and relying only onEncodedBySSE38OrSSE3A
andUseVexEncoding
.