[AMD64][APX] Introduce the remaining NCI instructions - CTEST, CFCMOV#127536
[AMD64][APX] Introduce the remaining NCI instructions - CTEST, CFCMOV#127536Ruihan-Yin wants to merge 6 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR extends the x64 JIT’s APX support by introducing new conditional instructions (CTEST and CFCMOV), updating EVEX/APX encoding paths in the emitter, and adding targeted emitter unit tests.
Changes:
- Add APX instruction definitions for CFCMOV and CTEST (and related instruction-range markers).
- Update emitter logic to recognize/encode these new conditional instructions and related EVEX fields (e.g., DFV/NF/ND handling).
- Add AMD64 emitter unit tests covering CFCMOV and CTEST encodings and integrate them into the unit test dispatcher.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/instrsxarch.h | Adds APX instruction definitions/ranges for CFCMOV + CTEST under TARGET_AMD64. |
| src/coreclr/jit/emitxarch.h | Updates emitter APIs and adds helpers to classify APX conditional instructions. |
| src/coreclr/jit/emitxarch.cpp | Implements new APX conditional classification and extends EVEX/APX emission logic (ND/NF/DFV, prefix/layout handling). |
| src/coreclr/jit/codegenxarch.cpp | Adds CCMP-to-CTEST optimization for compare-with-zero and introduces emitter unit tests for CFCMOV/CTEST. |
| src/coreclr/jit/codegenlinear.cpp | Wires new “cfcmov” and “ctest” unit test sections into genEmitterUnitTests(). |
| src/coreclr/jit/codegen.h | Declares new emitter unit test entry points and tightens AMD64-only helpers. |
| INST3(cfcmovg, "cfcmovg", IUM_WR, 0x00004F, BAD_CODE, 0x00004F, 1C, 2X, INS_TT_NONE, Reads_OF | Reads_SF | Reads_ZF | INS_Flags_Has_NDD | INS_Flags_Has_NF) | ||
| #define LAST_CFCMOV_INSTRUCTION INS_cfcmovg | ||
| #define FIRST_CTEST_INSTRUCTION INS_ctesto | ||
| INST3(ctesto, "ctesto", IUM_RD, 0x000084, 0x00008F6, BAD_CODE, 1C, 2X, INS_TT_NONE, Writes_OF | Writes_SF | Writes_ZF | Writes_CF | INS_FLAGS_Has_Sbit) |
There was a problem hiding this comment.
ctesto uses a different MI encoding value (0x00008F6) than the rest of the CTEST variants (0x00000F6). Since the intent (per emitter logic) appears to be that these variants share the same base encoding and differ only by the SCC field, this inconsistency is very likely a typo and can break emission for ctesto immediate forms. Make ctesto’s MI encoding consistent with the other CTEST instructions (or document why it must differ).
| INST3(ctesto, "ctesto", IUM_RD, 0x000084, 0x00008F6, BAD_CODE, 1C, 2X, INS_TT_NONE, Writes_OF | Writes_SF | Writes_ZF | Writes_CF | INS_FLAGS_Has_Sbit) | |
| INST3(ctesto, "ctesto", IUM_RD, 0x000084, 0x00000F6, BAD_CODE, 1C, 2X, INS_TT_NONE, Writes_OF | Writes_SF | Writes_ZF | Writes_CF | INS_FLAGS_Has_Sbit) |
| if (intConst->IconValue() == 0) | ||
| { | ||
| // ctest reg, reg is 1-byte shorter encoding than ccmp reg, 0. | ||
| instruction ctestIns = (instruction)(ccmpIns + FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION); |
There was a problem hiding this comment.
Deriving ctestIns via instruction-enum arithmetic is brittle (it silently depends on the relative ordering and contiguity of CCMP and CTEST instruction enums). Prefer an explicit mapping helper (e.g., derive insCC once and map to the corresponding CTEST instruction via a small table/switch) so future instruction table edits don’t break this transformation.
| instruction ctestIns = (instruction)(ccmpIns + FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION); | |
| // Derive the matching ctest instruction explicitly from the same condition | |
| // instead of relying on the relative enum layout of CCMP/CTEST instructions. | |
| instruction ctestIns = JumpKindToCtest(condDesc.jumpKind1); |
There was a problem hiding this comment.
added an assert to make sure the offset between FIRST_CCMP_INSTRUCTION and FIRST_CTEST_INSTRUCTION is not changed. This is a simply, 1:1 mapping, probably not necessarily to use a switch table.
But I am open to change the design if other reviewers are inclined to enhance the robustness with the switch table.
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
| assert((FIRST_CTEST_INSTRUCTION - FIRST_APX_INSTRUCTION) == 32); | ||
| instruction ctestIns = (instruction)(ccmpIns + FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION); |
There was a problem hiding this comment.
This peephole relies on a hard-coded ‘32’ offset and an assert that references FIRST_APX_INSTRUCTION (even though the conversion uses FIRST_CCMP_INSTRUCTION). To make the mapping more robust/clear, prefer a compile-time check (static_assert) on (FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION) and compute the offset once from those two symbols (removing the magic number and the unrelated FIRST_APX reference).
| assert((FIRST_CTEST_INSTRUCTION - FIRST_APX_INSTRUCTION) == 32); | |
| instruction ctestIns = (instruction)(ccmpIns + FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION); | |
| constexpr int ccmpToCtestOffset = FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION; | |
| static_assert(ccmpToCtestOffset == 32); | |
| instruction ctestIns = static_cast<instruction>(ccmpIns + ccmpToCtestOffset); |
| dst += emitOutputByte(dst, 0x66); | ||
| } | ||
|
|
||
| if (IsCFCMOV(ins)) | ||
| { | ||
| // XArch-APX-TODO: JIT does not emit sub-32b CMOV, whether to use 16b operands in CFCMOV is to be | ||
| // determined. | ||
| break; | ||
| } |
There was a problem hiding this comment.
For EA_2BYTE, this emits an operand-size prefix (0x66) and then breaks for CFCMOV, leaving the emitter to continue with a partially-applied size handling. If CFCMOV truly must not be encoded for 16-bit operands, it would be safer to assert/noway_assert before emitting the prefix (or otherwise prevent reaching this case), rather than emitting bytes and then bailing out mid-handling.
There was a problem hiding this comment.
Updated the comments.
The reason to break here is that CFCMOV does not follow the normal encoding rule that
A = (B | 0x01)
A: opcode for 16b/32b/64b
B: opcode for 8b
The handling here is to mostly reuse the emitting infra but make sure CFCMOV gets the right opcode.
Co-authored-by: Copilot <copilot@github.com>
|
Failures look unrelated. Hi, @tannergooding @kg, please take a look at this PR, thanks :) |
Introduce the renaming APX-NCI instructions:
CTESTccandCFCMOVccCTESTcc:

with CTEST, now CCMP reg, 0 can be optimized to CTEST reg, reg, which is self contained and 1-byte smaller, this optimization comes with the PR as well
CFCMOVcc:

There is no CFCMOV optimization in JIT at the moment.