JIT: Rewrite BlendVariableMask when mask is created from vector by saucecontrol · Pull Request #126062 · dotnet/runtime

saucecontrol · 2026-03-24T23:11:52Z

This catches cases where BlendVariable is 'upgraded' to BlendVariableMask on import but the blend mask was not TYP_MASK.

There is logic in place that checks whether the blend could be used as an EVEX embedded mask and rewrites back to BlendVariable if not. However, it misses cases where the mask is created from a vector anyway, and creating the mask just to embed it is a deoptimization.

dotnet-policy-service · 2026-03-24T23:13:06Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR updates JIT rationalization for xarch NI_AVX512_BlendVariableMask to avoid keeping the mask-form blend when the mask operand originates from a vector-to-mask conversion, preventing a deoptimization where a mask is created only to be embedded.

Changes:

Extend RewriteHWIntrinsicBlendv to detect when the blend mask is produced via NI_AVX512_ConvertVectorToMask.
Avoid the “keep embedded mask” early-return in that scenario so the blend can be rewritten back to the non-mask form.

Comments suppressed due to low confidence (1)

src/coreclr/jit/rationalize.cpp:685

op3 is now a local GenTree*, but it is later passed by address to RewriteHWIntrinsicToNonMask(&op3, ...). RewriteHWIntrinsicToNonMask expects use to be the actual operand edge so it can replace/remove nodes (e.g., it removes NI_AVX512_ConvertVectorToMask and updates the parent via ReplaceOperand). Passing a local pointer means node->Op(3) will not be updated, leaving the blend node still pointing at the removed intrinsic (dangling operand / miscompile). Use GenTree*& op3 = node->Op(3); (or otherwise pass &node->Op(3) / an operand reference) when calling RewriteHWIntrinsicToNonMask.

    GenTree* op2 = node->Op(2);
    GenTree* op3 = node->Op(3);

    // We're in the post-order visit and are traversing in execution order, so
    // everything between op2 and node will have already been rewritten to LIR
    // form and doing the IsInvariantInRange check is safe. This allows us to
    // catch cases where something is embedded masking compatible but where we
    // could never actually contain it and so we want to rewrite it to the non-mask
    // variant
    SideEffectSet scratchSideEffects;

    if (scratchSideEffects.IsLirInvariantInRange(m_compiler, op2, node))
    {
        unsigned  tgtMaskSize     = simdSize / genTypeSize(simdBaseType);
        var_types tgtSimdBaseType = TYP_UNDEF;

        if (op2->isEmbeddedMaskingCompatible(m_compiler, tgtMaskSize, tgtSimdBaseType))
        {
            // Make sure we had a mask to begin with. We don't want to create a mask
            // solely for the purpose of embedding it.

            if (!op3->OperIsHWIntrinsic() ||
                (op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask))
            {
                // We are going to utilize the embedded mask, so we don't need to rewrite. However,
                // we want to fixup the simdBaseType here since it simplifies lowering and allows
                // both embedded broadcast and the mask to be live simultaneously.

                if (tgtSimdBaseType != TYP_UNDEF)
                {
                    op2->AsHWIntrinsic()->SetSimdBaseType(tgtSimdBaseType);
                }
                return;
            }
        }
    }

    if (!ShouldRewriteToNonMaskHWIntrinsic(op3))
    {
        return;
    }

    parents.Push(op3);
    RewriteHWIntrinsicToNonMask(&op3, parents);
    (void)parents.Pop();

saucecontrol · 2026-03-25T05:45:25Z

cc @dotnet/jit-contrib

SPMI doesn't show any diffs, but this does fix up my motivating case. Something like:

static Vector128<float> AddToNegative(Vector128<float> v1, Vector128<float> v2)
    => Sse41.BlendVariable(v1, v1 + v2, v1);

    vmovups  xmm0, xmmword ptr [rdx]
-   vpmovd2m k1, xmm0
-   vaddps   xmm0 {k1}, xmm0, xmmword ptr [r8]
+   vaddps   xmm1, xmm0, xmmword ptr [r8]
+   vblendvps xmm0, xmm0, xmm1, xmm0
    vmovups  xmmword ptr [rcx], xmm0
    mov      rax, rcx
    ret      
-; Total bytes of code: 24
+; Total bytes of code: 23

JulieLeeMSFT · 2026-04-20T14:51:18Z

@tannergooding, @kg PTAL.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-20T15:01:27Z

+            if (!op3->OperIsHWIntrinsic() ||
+                (op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask))


The ConvertVectorToMask check can be simplified/standardized by using the existing helper OperIsConvertVectorToMask() instead of checking OperIsHWIntrinsic() + GetHWIntrinsicId(). This avoids duplicating intrinsic IDs and reads more clearly.

Suggested change

if (!op3->OperIsHWIntrinsic() ||

(op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask))

if (!op3->OperIsConvertVectorToMask())

Copilot · 2026-04-20T15:01:28Z

+            // solely for the purpose of embedding it.

-            if (tgtSimdBaseType != TYP_UNDEF)
+            if (!op3->OperIsHWIntrinsic() ||
+                (op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask))
            {


This changes when BlendVariableMask is rewritten back to BlendVariable (and can eliminate a ConvertVectorToMask), but I couldn't find an existing JIT codegen test covering this scenario (no BlendVariableMask hits under src/tests). Consider adding an asm pattern test to prevent regressions for Vector128/256 cases.

tannergooding · 2026-04-20T16:41:09Z

+            // Make sure we had a mask to begin with. We don't want to create a mask
+            // solely for the purpose of embedding it.


This needs an elaboration covering why as it's operating under the presumption that vpmov*2m is more expensive than vblendvps, which can depend on the hardware and may change in the future

For example, this is what we have today where it may be slower or may be similar perf (using XMM/YMM/ZMM and vpmov*2m vs vblendvps):

AMD Zen4: 3/4/5 vs 1

AMD Zen5: 3/4/6 vs 2

Intel Skylake-X: 3 vs 1-2

Intel Emerald Rapids: 3 vs 2-3

Then there is also the case where we don't have a vpblendvw and vpblendvb has slightly different behavior here, so it might not be valid to switch back. Consider that 0x8000 selects a whole word but needs to be 0x8080 if using vpblendvb.

Copilot AI review requested due to automatic review settings March 24, 2026 23:11

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 24, 2026

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 24, 2026

Copilot started reviewing on behalf of saucecontrol March 24, 2026 23:12 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

rewrite BlendVariableMask when mask is created from vector

f8fae7f

saucecontrol force-pushed the less-mask branch from ab8d220 to f8fae7f Compare March 25, 2026 02:37

This was referenced Mar 25, 2026

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

[android-arm64] The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#6408

Open

JulieLeeMSFT assigned saucecontrol Apr 20, 2026

JulieLeeMSFT requested a review from tannergooding April 20, 2026 14:50

JulieLeeMSFT requested a review from kg April 20, 2026 14:51

Merge branch 'main' into less-mask

a934fad

Copilot AI review requested due to automatic review settings April 20, 2026 14:51

Copilot started reviewing on behalf of tannergooding April 20, 2026 14:52 View session

JulieLeeMSFT assigned kg Apr 20, 2026

Copilot AI reviewed Apr 20, 2026

View reviewed changes

tannergooding reviewed Apr 20, 2026

View reviewed changes

JulieLeeMSFT added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 20, 2026

This was referenced Apr 20, 2026

[wasm] WBT SatelliteAssembliesTests.CheckThatSatelliteAssembliesAreNotAOTed failing #90458

Open

browser-wasm linux Release LibraryTests queues timing out #117974

Open

System.Net.NameResolution.Tests DNS failures: Name or service not known #126641

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Rewrite BlendVariableMask when mask is created from vector#126062

JIT: Rewrite BlendVariableMask when mask is created from vector#126062
saucecontrol wants to merge 2 commits intodotnet:mainfrom
saucecontrol:less-mask

saucecontrol commented Mar 24, 2026 •

edited

Loading

Uh oh!

dotnet-policy-service bot commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

saucecontrol commented Mar 25, 2026

Uh oh!

JulieLeeMSFT commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

tannergooding Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		if (!op3->OperIsHWIntrinsic() \|\|
		(op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask))

	if (!op3->OperIsHWIntrinsic() \|\|
	(op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask))
	if (!op3->OperIsConvertVectorToMask())

		// Make sure we had a mask to begin with. We don't want to create a mask
		// solely for the purpose of embedding it.

Conversation

saucecontrol commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

saucecontrol commented Mar 25, 2026

Uh oh!

JulieLeeMSFT commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

tannergooding Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

saucecontrol commented Mar 24, 2026 •

edited

Loading