Skip to content

Ensure FMA optimizations kick in under embedded broadcast #116891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 23, 2025

Conversation

tannergooding
Copy link
Member

Follow up to #116804. The logic is correct, but it missed a case where embedded broadcast containment would interfere with the FMA optimization and cause it to be skipped entirely.

@Copilot Copilot AI review requested due to automatic review settings June 21, 2025 20:01
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 21, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR ensures that FMA (Fused Multiply-Add/Subtract) optimizations are correctly applied even when constant vectors are candidates for embedded broadcast. It refactors the intrinsic classification and refines the containment logic to account for negative-zero broadcasts interfering with FMA.

  • Unified handling of AVX2/AVX512 FMA intrinsics in LowerFusedMultiplyOp
  • Updated containment logic in ContainCheckHWIntrinsic to skip embedded broadcasts when better FMA transformations exist
  • Added a new predicate OperIsVectorFusedMultiplyOp to identify FMA intrinsics

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/coreclr/jit/lowerxarch.cpp Refactored FMA case switches, adjusted operand indexing, and enhanced embedded-broadcast checks
src/coreclr/jit/gentree.h Declared OperIsVectorFusedMultiplyOp to flag FMA intrinsics
src/coreclr/jit/gentree.cpp Defined GenTree::OperIsVectorFusedMultiplyOp() with documentation and intrinsic ID checks
Comments suppressed due to low confidence (2)

src/coreclr/jit/lowerxarch.cpp:9881

  • New logic skips embedded-broadcast folding for FMA when negative-zero constants are involved. Add targeted unit tests to cover both the ordinary embedded-broadcast path and the skipped path to prevent regressions.
                            containedOperand->IsCnsVec() && node->isEmbeddedBroadcastCompatibleHWIntrinsic(comp);

src/coreclr/jit/lowerxarch.cpp:1509

  • The operand index was changed from Op(1) to Op(2). Verify that this matches the intended third operand of the FMA node and doesn't introduce an off-by-one reference.
                GenTree* argOp = hwArg->Op(2);

@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib, small improvement that with some diffs in common math helpers ensuring we can do the valid FMA transformations when embedded broadcast exists

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tannergooding tannergooding merged commit 383f9af into dotnet:main Jun 23, 2025
110 checks passed
@tannergooding tannergooding deleted the fma-emb-broadcast branch June 23, 2025 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants