Fix typing of BlendVariableMask condition when optimizing TernaryLogic #114837

BruceForstall · 2025-04-20T02:36:02Z

TernaryLogic nodes normalize small types to large types. When optimizing a TernaryLogic node to a BlendVariableMask, if we leave the normalized type, then the Blend uses the wrong instruction.

To fix this, change the Blend node to use the mask simd base type. We only do this for mask base types that are small, which is unsatisfying: there exists codegen today for double->int casts, for example, that uses a 'double' type mask but an 'int' sized TernaryLogic node. This works because we end up only using lane '0', as the vector is converted to scalar. I didn't find any other case like this, so hopefully if the mask type is small we can safely use it.

Fixes #114572

Copilot

Pull Request Overview

This PR fixes a type mismatch issue in the BlendVariableMask optimization for TernaryLogic nodes by adjusting the SIMD base type used for the condition masks. It also adds a regression test to ensure the correctness of the optimization fix.

Adjust normalization of TernaryLogic nodes to use the mask’s SIMD base type when the condition element type is small.
Add a regression test that validates the behavior under specific vectorized scenarios.

Reviewed Changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File	Description
src/tests/JIT/Regression/JitBlue/Runtime_114572/Runtime_114572.cs	New regression test for TernaryLogic node behavior
src/coreclr/jit/lowerxarch.cpp	Updates to adjust the SIMD base type for BlendVariableMask condition masks

Files not reviewed (1)

src/tests/JIT/Regression/JitBlue/Runtime_114572/Runtime_114572.csproj: Language not supported

Comments suppressed due to low confidence (1)

src/tests/JIT/Regression/JitBlue/Runtime_114572/Runtime_114572.cs:37

Consider adding additional tests to cover cases where the condition mask has a type other than ushort (such as for double-to-int conversions) to ensure the new type adjustment logic is robust.

Assert.Equal(Vector256.Create((ushort)0, (ushort)1, ...), s_2);

Copilot · 2025-04-20T02:36:18Z

src/coreclr/jit/lowerxarch.cpp

+                var_types conditionBaseType = condition->AsHWIntrinsic()->GetSimdBaseType();
+                uint32_t  conditionElemSize = genTypeSize(conditionBaseType);
+                uint32_t  elemSize          = genTypeSize(simdBaseType);
+                if (varTypeIsShort(conditionBaseType) && (conditionElemSize < elemSize))


Consider explicitly documenting why non-short condition types (e.g., for double-to-int conversion) are not adjusted here, or add handling for these cases if needed.

BruceForstall · 2025-04-20T02:36:23Z

Looks like this optimization logic originated in #97468

dotnet-policy-service · 2025-04-20T02:36:37Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

BruceForstall · 2025-04-20T02:37:14Z

@tannergooding PTAL
cc @dotnet/jit-contrib

BruceForstall · 2025-04-20T02:37:53Z

cc @dotnet/intel

saucecontrol · 2025-04-20T19:18:44Z

src/coreclr/jit/lowerxarch.cpp

+                var_types conditionBaseType = condition->AsHWIntrinsic()->GetSimdBaseType();
+                uint32_t  conditionElemSize = genTypeSize(conditionBaseType);
+                uint32_t  elemSize          = genTypeSize(simdBaseType);
+                if (varTypeIsShort(conditionBaseType) && (conditionElemSize < elemSize))


Shouldn't this be checking varTypeIsSmall? Same problem can happen with a byte compare.

Simplified repro:

public class Runtime_114572 { public static byte s_ub = 1; public static ushort s_us = 1; public static void Problem1() { var v0 = Vector128<ushort>.Zero; var v1 = Vector128<ushort>.One; var vs = Vector128.CreateScalar(s_us); var vr = Avx512F.VL.TernaryLogic(v0, v1, Avx512BW.VL.CompareGreaterThanOrEqual(v0, vs), 0xE4); Console.WriteLine(vr); } public static void Problem2() { var v0 = Vector128<byte>.Zero; var v1 = Vector128<byte>.One; var vs = Vector128.CreateScalar(s_ub); var vr = Avx512F.VL.TernaryLogic(v0, v1, Avx512BW.VL.CompareGreaterThanOrEqual(v0, vs), 0xE4); Console.WriteLine(vr); } }

Though I'm not sure it's not always wrong to allow a larger blend when the condition element size was smaller.

Notably, the TernaryLogic nodes mentioned in the comment (from cast normalization lowering) are gone in the rewrite I did in #114410. If anything, they should have been ConditionalSelect nodes so that they wouldn't have required transformation back to mask blend, but GT_SELECT ended up being more efficient since the result is a scalar anyway.

Shouldn't this be checking varTypeIsSmall? Same problem can happen with a byte compare.

Indeed. Fixed.

I added your byte test case (and simplified ushort case as well).

BruceForstall · 2025-04-20T21:23:34Z

/azp run runtime-coreclr outerloop

azure-pipelines · 2025-04-20T21:23:45Z

Azure Pipelines successfully started running 1 pipeline(s).

BruceForstall · 2025-04-22T18:02:58Z

No diffs

tannergooding · 2025-04-23T15:28:24Z

src/coreclr/jit/lowerxarch.cpp

+                // However, there is code, such as for double->int conversions, that generates a 'double'
+                // base type mask for a TernaryLogic node with base type 'int'. That works because it only
+                // cares about the '0' elem, as the result will be cast to scalar.
+                var_types conditionBaseType = condition->AsHWIntrinsic()->GetSimdBaseType();


Don't think this is quite correct. We only require that condition produce TYP_MASK which could include things like LCL_VAR (such as from some CSE'd node).

In general we should also likely be doing this for all scenarios and only making the swap if condition is TYP_MASK and if we know the base type that the mask produced.

Consider for example the following where the mask will be TYP_UINT but the TernaryLogic node is TYP_ULONG, so we'd end up ignoring half the mask bits.

var v0 = Vector128.Create<ulong>(x); var v1 = Vector128.Create<ulong>(y); var v2 = Vector128.Create<uint>(z); var v3 = Vector128.Create<uint>(w); var vr = Avx512F.VL.TernaryLogic(v0, v1, Avx512F.VL.CompareGreaterThanOrEqual(v2, v3).AsUInt64(), 0xE4);

This means that any scenario where the type of the mask is smaller than the size of the ternary logic node, we need to retype to the smaller type (as more mask bits are present). You also need to consider the other scenario, where if the mask is larger than the size of the ternary logic node we'd also need to retype (as fewer mask bits are present).

This is why normally we strictly require that maskSize == operSize. The case of a TernaryLogic representing ConditionalSelect is then special because it is bitwise and so equates to BlendVariableMask based on the base type of the mask.

I don't think I understand the implications of your comment. Should we simply give up on this optimization converting TernaryLogic to BlendVariableMask if the TernaryLogic and mask sizes are not equal?

In your example above, are you saying that we need to retype to the minimum of the mask and TernaryLogic node operation size, since no matter which is larger, the upper bits are garbage?

This code I added is trying to choose the correct BlendVariableMask operation size. Maybe after #114410 it can do this unconditionally by using the operand mask size. But maybe you're suggesting it should be unconditional based on the minimum of the sizes?

I'm saying that for TernaryLogic (CndSel in particular) -> BlendVariableMask you need to always take the base type of the condition, regardless of what the type of the TernaryLogic was

If the base type of the condition isn't known (i.e. it's a CSE'd TYP_MASK local), then we need to give up.

Ok, I changed this code to unconditionally use the 'condition' base element type, and bail if the condition is not a HW intrinsic. This leads to some asm diffs due to the cast typing issue in the comment, previously, but the functionality should be identical, since in the casting case, no matter the opmask size, we only use the lower dword in the end.

e.g., diffs:

< vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4} > vblendmps xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4} ... < vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4} > vblendmpd xmm0 {k1}, xmm0, qword ptr [reloc @RWD96] {1to2}

TernaryLogic nodes normalize small types to large types. When optimizing a TernaryLogic node to a BlendVariableMask, if we leave the normalized type, then the Blend uses the wrong instruction. To fix this, change the Blend node to use the mask simd base type. We only do this for mask base types that are small, which is unsatisfying: there exists codegen today for double->int casts, for example, that uses a 'double' type mask but an 'int' sized TernaryLogic node. This works because we end up only using lane '0', as the vector is converted to scalar. I didn't find any other case like this, so hopefully if the mask type is small we can safely use it. Fixes dotnet#114572

…nt type But only if the condition is a HW intrinsic.

Copilot AI review requested due to automatic review settings April 20, 2025 02:36

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 20, 2025

Copilot AI reviewed Apr 20, 2025

View reviewed changes

dotnet-policy-service bot assigned BruceForstall Apr 20, 2025

BruceForstall requested a review from tannergooding April 20, 2025 02:37

BruceForstall mentioned this pull request Apr 20, 2025

JIT: Debug/Release generate different results #114572

Closed

build-analysis bot mentioned this pull request Apr 20, 2025

CI flakiness: mono interpreter build getting killed #114123

Open

saucecontrol reviewed Apr 20, 2025

View reviewed changes

BruceForstall force-pushed the Fix114572 branch from 98c7dbe to 8ad5895 Compare April 20, 2025 21:21

BruceForstall requested a review from EgorBo April 22, 2025 18:03

tannergooding reviewed Apr 23, 2025

View reviewed changes

BruceForstall force-pushed the Fix114572 branch from 8ad5895 to 785427f Compare April 26, 2025 00:04

BruceForstall added 2 commits April 25, 2025 17:05

Unconditionally set the base element type to the condition base eleme…

88ccf95

…nt type But only if the condition is a HW intrinsic.

BruceForstall force-pushed the Fix114572 branch from 785427f to 88ccf95 Compare April 26, 2025 00:05

tannergooding approved these changes Apr 26, 2025

View reviewed changes

BruceForstall merged commit 9df77ee into dotnet:main Apr 26, 2025
114 checks passed

BruceForstall deleted the Fix114572 branch April 26, 2025 02:09

github-actions bot locked and limited conversation to collaborators May 27, 2025

Fix typing of BlendVariableMask condition when optimizing TernaryLogic #114837

Fix typing of BlendVariableMask condition when optimizing TernaryLogic #114837

Uh oh!

Conversation

BruceForstall commented Apr 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Apr 20, 2025

Choose a reason for hiding this comment

Uh oh!

BruceForstall commented Apr 20, 2025

Uh oh!

dotnet-policy-service bot commented Apr 20, 2025

Uh oh!

BruceForstall commented Apr 20, 2025

Uh oh!

BruceForstall commented Apr 20, 2025

Uh oh!

saucecontrol Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BruceForstall Apr 20, 2025

Choose a reason for hiding this comment

Uh oh!

BruceForstall commented Apr 20, 2025

Uh oh!

azure-pipelines bot commented Apr 20, 2025

Uh oh!

BruceForstall commented Apr 22, 2025

Uh oh!

tannergooding Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

BruceForstall Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

tannergooding Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

BruceForstall Apr 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

saucecontrol Apr 20, 2025 •

edited

Loading