Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Simplifying the emitter handling of 4-byte encoded SSE instructions #21528

Merged
merged 1 commit into from
Dec 14, 2018
Merged

Simplifying the emitter handling of 4-byte encoded SSE instructions #21528

merged 1 commit into from
Dec 14, 2018

Conversation

tannergooding
Copy link
Member

This is an incremental cleanup on the emitter around the 4-byte encoded SSE instruction handling.

Currently, we set UseSSE4=true whenever the compiler supports any ISA that requires such encoding. In the emitter, we then check this value along with some other metadata/values to determine if we need to increase the estimated bytes emitted or if we have an extra byte to actually emit.

We can simplify this logic greatly by just getting rid of UseSSE4 and relying only on EncodedBySSE38OrSSE3A and UseVexEncoding.

@tannergooding
Copy link
Member Author

CC. @CarolEidt, @fiigii

@@ -25,19 +25,14 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
#include "emit.h"
#include "codegen.h"

bool IsSSE2Instruction(instruction ins)
bool IsSSEInstruction(instruction ins)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSE2 wasn't an accurate name, since this was also checking SSE and SSE3 instructions

return (ins >= INS_FIRST_SSE2_INSTRUCTION) && (ins <= INS_LAST_SSE2_INSTRUCTION);
}

bool IsSSE4Instruction(instruction ins)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this grouping anymore, since we just rely on EncodedBySSE38OrSSE3A (and since we had some instructions in this grouping that weren't 4-byte encoded).

//
// Note that this should be true for any of the instructions in instrsXArch.h
// that use the SSE38 or SSE3A macro.
bool emitter::Is4ByteSSE4OrAVXInstruction(instruction ins)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just simplified to EncodedBySSE38OrSSE3A

@@ -177,7 +177,7 @@ INSTMUL(imul_15, "imul", IUM_RD, BAD_CODE, 0x4400003868,
#define VEX3INT(c1,c2) PACK4(c1, 0xc5, 0x02, c2)
#define VEX3FLT(c1,c2) PACK4(c1, 0xc5, 0x02, c2)

INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)
INST3(FIRST_SSE_INSTRUCTION, "FIRST_SSE_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this grouping to just SSE, which covers everything prior to AVX

@@ -177,7 +177,7 @@ INSTMUL(imul_15, "imul", IUM_RD, BAD_CODE, 0x4400003868,
#define VEX3INT(c1,c2) PACK4(c1, 0xc5, 0x02, c2)
#define VEX3FLT(c1,c2) PACK4(c1, 0xc5, 0x02, c2)

INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSE2 wasn't accurate since we also had SSE and SSE3 instructions.


INST3(FIRST_SSE4_INSTRUCTION, "FIRST_SSE4_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSE4 wasn't accurate, since this covered anything that was SSE38 or SSE3A encoded (including some SSSE3 and other instructions). It also didn't only cover instructions that required the 4-byte encoding.

@@ -1306,6 +1286,11 @@ bool emitter::EncodedBySSE38orSSE3A(instruction ins)

size_t insCode = 0;

if (!IsSSEOrAVXInstruction(ins))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This covers the additional check that Is4ByteSSE4Instruction and Is4ByteSSE4OrAVXInstruction were doing, but as a single check (rather than multiple separate ones).
It also short-circuits the path for any instruction that can't be SSE38/SSE3A.

Copy link

@fiigii fiigii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just one naming suggestion.

bool emitter::Is4ByteSSE4Instruction(instruction ins)
// that use the SSE38 or SSE3A macro but returns false if the VEX encoding is
// in use, since that encoding does not require an additional byte.
bool emitter::Is4ByteSSEInstruction(instruction ins)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, the code mixes using EncodedBySSE38orSSE3A and Is4ByteSSEInstruction that looks confusing. Perhaps, we can rename this function to EncodedByLegacySSE38orSSE3A or something similar.

Copy link
Member Author

@tannergooding tannergooding Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be okay to keep it as is, and instead follow up with the code calling Is4ByteSSEInstruction so we can remove this as well.

Currently, the majority of places calling Is4ByteSSEInstruction are doing so to determine if they should update the sz estimate (7/11 references). This logic would be better suited to be centralized (as most of the other size calculations are).

Of the remaining 4 usages, 2 are assert(IsAVXInstruction(ins) || Is4ByteSSEInstruction(ins));, which can be simplified to assert(IsAVXInstruction(ins) || EncodedBySSE38OrSSE3A(ins)); and the remaining two are for determining if the additional byte needs to be output (and could just be simplified to !UseVEXEncoding() && EncodedBySSE38orSSE3A(ins) or centralized).

It would be a slightly more involved change, however; so I think it would be better as a followup fix.

@fiigii
Copy link

fiigii commented Dec 13, 2018

Please also run noavx and sse2only CI tests.

@tannergooding
Copy link
Member Author

test Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0)

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic
test Windows_NT x64 Checked jitsse2only

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic
test Windows_NT x86 Checked jitsse2only

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic
test Ubuntu x64 Checked jitsse2only

@tannergooding
Copy link
Member Author

CC. @CarolEidt. This should be ready for review.
As per the title, this is just an iterative improvement trying to simplify some of the emitter logic around the SSE types.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I'm kind of surprised that it's possible to make incremental improvements like this!

@tannergooding tannergooding merged commit 813bd6e into dotnet:master Dec 14, 2018
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants