Skip to content

[AMD64][APX] Introduce the remaining NCI instructions - CTEST, CFCMOV#127536

Open
Ruihan-Yin wants to merge 6 commits intodotnet:mainfrom
Ruihan-Yin:apx-nci-PR
Open

[AMD64][APX] Introduce the remaining NCI instructions - CTEST, CFCMOV#127536
Ruihan-Yin wants to merge 6 commits intodotnet:mainfrom
Ruihan-Yin:apx-nci-PR

Conversation

@Ruihan-Yin
Copy link
Copy Markdown
Member

Introduce the renaming APX-NCI instructions: CTESTcc and CFCMOVcc

CTESTcc:
image

image

with CTEST, now CCMP reg, 0 can be optimized to CTEST reg, reg, which is self contained and 1-byte smaller, this optimization comes with the PR as well

CFCMOVcc:
image

There is no CFCMOV optimization in JIT at the moment.

Copilot AI review requested due to automatic review settings April 28, 2026 22:56
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 28, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 28, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR extends the x64 JIT’s APX support by introducing new conditional instructions (CTEST and CFCMOV), updating EVEX/APX encoding paths in the emitter, and adding targeted emitter unit tests.

Changes:

  • Add APX instruction definitions for CFCMOV and CTEST (and related instruction-range markers).
  • Update emitter logic to recognize/encode these new conditional instructions and related EVEX fields (e.g., DFV/NF/ND handling).
  • Add AMD64 emitter unit tests covering CFCMOV and CTEST encodings and integrate them into the unit test dispatcher.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/coreclr/jit/instrsxarch.h Adds APX instruction definitions/ranges for CFCMOV + CTEST under TARGET_AMD64.
src/coreclr/jit/emitxarch.h Updates emitter APIs and adds helpers to classify APX conditional instructions.
src/coreclr/jit/emitxarch.cpp Implements new APX conditional classification and extends EVEX/APX emission logic (ND/NF/DFV, prefix/layout handling).
src/coreclr/jit/codegenxarch.cpp Adds CCMP-to-CTEST optimization for compare-with-zero and introduces emitter unit tests for CFCMOV/CTEST.
src/coreclr/jit/codegenlinear.cpp Wires new “cfcmov” and “ctest” unit test sections into genEmitterUnitTests().
src/coreclr/jit/codegen.h Declares new emitter unit test entry points and tightens AMD64-only helpers.

Comment thread src/coreclr/jit/instrsxarch.h Outdated
INST3(cfcmovg, "cfcmovg", IUM_WR, 0x00004F, BAD_CODE, 0x00004F, 1C, 2X, INS_TT_NONE, Reads_OF | Reads_SF | Reads_ZF | INS_Flags_Has_NDD | INS_Flags_Has_NF)
#define LAST_CFCMOV_INSTRUCTION INS_cfcmovg
#define FIRST_CTEST_INSTRUCTION INS_ctesto
INST3(ctesto, "ctesto", IUM_RD, 0x000084, 0x00008F6, BAD_CODE, 1C, 2X, INS_TT_NONE, Writes_OF | Writes_SF | Writes_ZF | Writes_CF | INS_FLAGS_Has_Sbit)
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctesto uses a different MI encoding value (0x00008F6) than the rest of the CTEST variants (0x00000F6). Since the intent (per emitter logic) appears to be that these variants share the same base encoding and differ only by the SCC field, this inconsistency is very likely a typo and can break emission for ctesto immediate forms. Make ctesto’s MI encoding consistent with the other CTEST instructions (or document why it must differ).

Suggested change
INST3(ctesto, "ctesto", IUM_RD, 0x000084, 0x00008F6, BAD_CODE, 1C, 2X, INS_TT_NONE, Writes_OF | Writes_SF | Writes_ZF | Writes_CF | INS_FLAGS_Has_Sbit)
INST3(ctesto, "ctesto", IUM_RD, 0x000084, 0x00000F6, BAD_CODE, 1C, 2X, INS_TT_NONE, Writes_OF | Writes_SF | Writes_ZF | Writes_CF | INS_FLAGS_Has_Sbit)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected,

to provide more details, there are 2 equivalent forms for the MI form for CTEST
image

There is no semantical difference between the two, but for consistency, we use /0 variant.

Comment thread src/coreclr/jit/emitxarch.cpp Outdated
if (intConst->IconValue() == 0)
{
// ctest reg, reg is 1-byte shorter encoding than ccmp reg, 0.
instruction ctestIns = (instruction)(ccmpIns + FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION);
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deriving ctestIns via instruction-enum arithmetic is brittle (it silently depends on the relative ordering and contiguity of CCMP and CTEST instruction enums). Prefer an explicit mapping helper (e.g., derive insCC once and map to the corresponding CTEST instruction via a small table/switch) so future instruction table edits don’t break this transformation.

Suggested change
instruction ctestIns = (instruction)(ccmpIns + FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION);
// Derive the matching ctest instruction explicitly from the same condition
// instead of relying on the relative enum layout of CCMP/CTEST instructions.
instruction ctestIns = JumpKindToCtest(condDesc.jumpKind1);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

@Ruihan-Yin Ruihan-Yin Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an assert to make sure the offset between FIRST_CCMP_INSTRUCTION and FIRST_CTEST_INSTRUCTION is not changed. This is a simply, 1:1 mapping, probably not necessarily to use a switch table.

But I am open to change the design if other reviewers are inclined to enhance the robustness with the switch table.

Comment thread src/coreclr/jit/codegenxarch.cpp Outdated
Comment thread src/coreclr/jit/emitxarch.cpp Outdated
Comment thread src/coreclr/jit/emitxarch.cpp Outdated
Comment thread src/coreclr/jit/codegenxarch.cpp
Ruihan-Yin and others added 2 commits April 29, 2026 12:04
Co-authored-by: Copilot <copilot@github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Comment thread src/coreclr/jit/emitxarch.cpp
Comment thread src/coreclr/jit/emitxarch.cpp Outdated
Comment thread src/coreclr/jit/emitxarch.cpp Outdated
Comment thread src/coreclr/jit/emitxarch.cpp Outdated
Co-authored-by: Copilot <copilot@github.com>
Copilot AI review requested due to automatic review settings April 30, 2026 19:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Comment thread src/coreclr/jit/emitxarch.cpp Outdated
Comment thread src/coreclr/jit/emitxarch.cpp
Comment on lines +8706 to +8707
assert((FIRST_CTEST_INSTRUCTION - FIRST_APX_INSTRUCTION) == 32);
instruction ctestIns = (instruction)(ccmpIns + FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION);
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This peephole relies on a hard-coded ‘32’ offset and an assert that references FIRST_APX_INSTRUCTION (even though the conversion uses FIRST_CCMP_INSTRUCTION). To make the mapping more robust/clear, prefer a compile-time check (static_assert) on (FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION) and compute the offset once from those two symbols (removing the magic number and the unrelated FIRST_APX reference).

Suggested change
assert((FIRST_CTEST_INSTRUCTION - FIRST_APX_INSTRUCTION) == 32);
instruction ctestIns = (instruction)(ccmpIns + FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION);
constexpr int ccmpToCtestOffset = FIRST_CTEST_INSTRUCTION - FIRST_CCMP_INSTRUCTION;
static_assert(ccmpToCtestOffset == 32);
instruction ctestIns = static_cast<instruction>(ccmpIns + ccmpToCtestOffset);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix proposed here: #127536 (comment)

Comment on lines 14783 to +14791
dst += emitOutputByte(dst, 0x66);
}

if (IsCFCMOV(ins))
{
// XArch-APX-TODO: JIT does not emit sub-32b CMOV, whether to use 16b operands in CFCMOV is to be
// determined.
break;
}
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For EA_2BYTE, this emits an operand-size prefix (0x66) and then breaks for CFCMOV, leaving the emitter to continue with a partially-applied size handling. If CFCMOV truly must not be encoded for 16-bit operands, it would be safer to assert/noway_assert before emitting the prefix (or otherwise prevent reaching this case), rather than emitting bytes and then bailing out mid-handling.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

@Ruihan-Yin Ruihan-Yin Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comments.
The reason to break here is that CFCMOV does not follow the normal encoding rule that

A = (B | 0x01)
A: opcode for 16b/32b/64b
B: opcode for 8b

The handling here is to mostly reuse the emitting infra but make sure CFCMOV gets the right opcode.

Comment thread src/coreclr/jit/emitxarch.h Outdated
Comment thread src/coreclr/jit/instrsxarch.h Outdated
Co-authored-by: Copilot <copilot@github.com>
@Ruihan-Yin
Copy link
Copy Markdown
Member Author

Failures look unrelated.

Hi, @tannergooding @kg, please take a look at this PR, thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants