Remove unnecessary delayfree from xarch FMA and TERNLOG instructions by tannergooding · Pull Request #128350 · dotnet/runtime

tannergooding · 2026-05-19T00:37:59Z

These instructions are fully reorderable and so much like various commutative nodes do not need to be marked delay free except in special scenarios.

This resolves #62215

…ctions

Copilot

Pull request overview

This PR updates xarch JIT HWIntrinsic register allocation/codegen handling to avoid marking AVX FMA and AVX-512 TernaryLogic operands as “delay-free” in most cases, while adding codegen support for additional operand/target overlap scenarios for TernaryLogic.

Changes:

Simplifies LSRA operand-use construction for AVX2/AVX512 FMA intrinsics, only using delay-free constraints when required by CopyUpperBits semantics.
Adds a dedicated LSRA path for NI_AVX512_TernaryLogic when the control byte is an immediate, avoiding delay-free uses in that case.
Extends genHWIntrinsic_R_R_R_RM_I to adjust TernaryLogic control immediates when the target register overlaps certain operands.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/coreclr/jit/lsraxarch.cpp	Adjusts LSRA use/target preferencing for FMA intrinsics and adds a special LSRA build path for `AVX512_TernaryLogic` with immediate control byte.
src/coreclr/jit/hwintrinsiccodegenxarch.cpp	Adds `TernaryLogic`-specific handling to rewrite the control byte when operand/target register overlap is detected.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

tannergooding · 2026-05-19T20:58:16Z

CC. @dotnet/jit-contrib, @EgorBo for review.

Diffs are here: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1427007&view=ms.vss-build-web.run-extensions-tab

This removes a large number of unnecessary vmovaps prior/after to vpternlog and vfmadd instructions, such as:

-       vpternlogq xmm4, xmm0, xmm9, -106
-       vmovaps  xmm0, xmm4
+       vpternlogq xmm0, xmm4, xmm9, -106

The x86 diffs show the largest size improvement due to the limited register set available (XMM0-7 only).

The Linux x64 diffs then show a smaller improvement because they end up selecting XMM16-XMM31 in many cases which requires more bytes to encode. All XMM registers are CALLEE_TRASH and most of the SIMD methods don't involve calls.

The Windows x64 diffs then show a size regression because the register allocator ends up selecting callee saved registers more causing a bloat in the method prologue/epilogue due to them having to be saved/restored.

The change is overall an improvement and can be seen when observing the three variations here in unison. We probably want to look a bit into the Windowx 64 register ordering though since it really should be preferencing the EVEX registers over using the callee save registers.

EgorBo · 2026-05-19T22:25:32Z

The Windows x64 diffs then show a size regression

I'm trying to understand why quite a few lines of changes regressed more contexts than improved (size-wise) and PerfScore says that 6 collections regressed (overall) and only 3 improved and we should take it 😐

tannergooding added 2 commits May 18, 2026 17:06

Remove some unnecessary delay free markings from xarch FMA instructions

5a77621

Remove some unnecessary delay free markings from xarch TERNLOG instru…

8a0bfb2

…ctions

Copilot AI review requested due to automatic review settings May 19, 2026 00:37

Copilot started reviewing on behalf of tannergooding May 19, 2026 00:38 View session

github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 19, 2026

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread src/coreclr/jit/hwintrinsiccodegenxarch.cpp Outdated

Comment thread src/coreclr/jit/lsraxarch.cpp Outdated

tannergooding added 2 commits May 18, 2026 18:05

Resolve feedback

c679229

Remove a bad assert

de5be2c

Copilot AI review requested due to automatic review settings May 19, 2026 02:02

Copilot started reviewing on behalf of tannergooding May 19, 2026 02:02 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Remove an assert that's invalid for the jmp table fallback case

2e39f1c

build-analysis Bot mentioned this pull request May 19, 2026

[wasm] Tests failing with DirectoryNotFoundException trying to load test data #128293

Open

Fix formatting

4366dce

Copilot AI review requested due to automatic review settings May 19, 2026 12:57

Copilot started reviewing on behalf of tannergooding May 19, 2026 12:58 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread src/coreclr/jit/lsraxarch.cpp

Comment thread src/coreclr/jit/lsraxarch.cpp Outdated

tannergooding added 2 commits May 19, 2026 07:16

Only swap if all operands are used

adf7887

Don't reorder if op3 is a used from a spill temp

0b0ce37

Copilot AI review requested due to automatic review settings May 19, 2026 17:21

Copilot started reviewing on behalf of tannergooding May 19, 2026 17:22 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread src/coreclr/jit/hwintrinsiccodegenxarch.cpp

tannergooding requested a review from EgorBo May 19, 2026 20:46

This was referenced May 19, 2026

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary delayfree from xarch FMA and TERNLOG instructions#128350

Remove unnecessary delayfree from xarch FMA and TERNLOG instructions#128350
tannergooding wants to merge 8 commits into
dotnet:mainfrom
tannergooding:remove-unnecessary-delayfree

tannergooding commented May 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

tannergooding commented May 19, 2026

Uh oh!

EgorBo commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tannergooding commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

tannergooding commented May 19, 2026

Uh oh!

EgorBo commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tannergooding commented May 19, 2026 •

edited

Loading

EgorBo commented May 19, 2026 •

edited

Loading