Skip to content

Fix typing of BlendVariableMask condition when optimizing TernaryLogic #114837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 26, 2025

Conversation

BruceForstall
Copy link
Contributor

TernaryLogic nodes normalize small types to large types. When optimizing a TernaryLogic node to a BlendVariableMask, if we leave the normalized type, then the Blend uses the wrong instruction.

To fix this, change the Blend node to use the mask simd base type. We only do this for mask base types that are small, which is unsatisfying: there exists codegen today for double->int casts, for example, that uses a 'double' type mask but an 'int' sized TernaryLogic node. This works because we end up only using lane '0', as the vector is converted to scalar. I didn't find any other case like this, so hopefully if the mask type is small we can safely use it.

Fixes #114572

@Copilot Copilot AI review requested due to automatic review settings April 20, 2025 02:36
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 20, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a type mismatch issue in the BlendVariableMask optimization for TernaryLogic nodes by adjusting the SIMD base type used for the condition masks. It also adds a regression test to ensure the correctness of the optimization fix.

  • Adjust normalization of TernaryLogic nodes to use the mask’s SIMD base type when the condition element type is small.
  • Add a regression test that validates the behavior under specific vectorized scenarios.

Reviewed Changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/tests/JIT/Regression/JitBlue/Runtime_114572/Runtime_114572.cs New regression test for TernaryLogic node behavior
src/coreclr/jit/lowerxarch.cpp Updates to adjust the SIMD base type for BlendVariableMask condition masks
Files not reviewed (1)
  • src/tests/JIT/Regression/JitBlue/Runtime_114572/Runtime_114572.csproj: Language not supported
Comments suppressed due to low confidence (1)

src/tests/JIT/Regression/JitBlue/Runtime_114572/Runtime_114572.cs:37

  • Consider adding additional tests to cover cases where the condition mask has a type other than ushort (such as for double-to-int conversions) to ensure the new type adjustment logic is robust.
Assert.Equal(Vector256.Create((ushort)0, (ushort)1, ...), s_2);

var_types conditionBaseType = condition->AsHWIntrinsic()->GetSimdBaseType();
uint32_t conditionElemSize = genTypeSize(conditionBaseType);
uint32_t elemSize = genTypeSize(simdBaseType);
if (varTypeIsShort(conditionBaseType) && (conditionElemSize < elemSize))
Copy link
Preview

Copilot AI Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider explicitly documenting why non-short condition types (e.g., for double-to-int conversion) are not adjusted here, or add handling for these cases if needed.

Copilot uses AI. Check for mistakes.

@BruceForstall
Copy link
Contributor Author

Looks like this optimization logic originated in #97468

Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@BruceForstall
Copy link
Contributor Author

@tannergooding PTAL
cc @dotnet/jit-contrib

@BruceForstall
Copy link
Contributor Author

cc @dotnet/intel

var_types conditionBaseType = condition->AsHWIntrinsic()->GetSimdBaseType();
uint32_t conditionElemSize = genTypeSize(conditionBaseType);
uint32_t elemSize = genTypeSize(simdBaseType);
if (varTypeIsShort(conditionBaseType) && (conditionElemSize < elemSize))
Copy link
Member

@saucecontrol saucecontrol Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be checking varTypeIsSmall? Same problem can happen with a byte compare.

Simplified repro:

public class Runtime_114572
{
    public static byte s_ub = 1;
    public static ushort s_us = 1;

    public static void Problem1()
    {
        var v0 = Vector128<ushort>.Zero;
        var v1 = Vector128<ushort>.One;
        var vs = Vector128.CreateScalar(s_us);

        var vr = Avx512F.VL.TernaryLogic(v0, v1, Avx512BW.VL.CompareGreaterThanOrEqual(v0, vs), 0xE4);
        Console.WriteLine(vr);
    }

    public static void Problem2()
    {
        var v0 = Vector128<byte>.Zero;
        var v1 = Vector128<byte>.One;
        var vs = Vector128.CreateScalar(s_ub);

        var vr = Avx512F.VL.TernaryLogic(v0, v1, Avx512BW.VL.CompareGreaterThanOrEqual(v0, vs), 0xE4);
        Console.WriteLine(vr);
    }
}

Though I'm not sure it's not always wrong to allow a larger blend when the condition element size was smaller.

Notably, the TernaryLogic nodes mentioned in the comment (from cast normalization lowering) are gone in the rewrite I did in #114410. If anything, they should have been ConditionalSelect nodes so that they wouldn't have required transformation back to mask blend, but GT_SELECT ended up being more efficient since the result is a scalar anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be checking varTypeIsSmall? Same problem can happen with a byte compare.

Indeed. Fixed.

I added your byte test case (and simplified ushort case as well).

@BruceForstall
Copy link
Contributor Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall
Copy link
Contributor Author

No diffs

@BruceForstall BruceForstall requested a review from EgorBo April 22, 2025 18:03
// However, there is code, such as for double->int conversions, that generates a 'double'
// base type mask for a TernaryLogic node with base type 'int'. That works because it only
// cares about the '0' elem, as the result will be cast to scalar.
var_types conditionBaseType = condition->AsHWIntrinsic()->GetSimdBaseType();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this is quite correct. We only require that condition produce TYP_MASK which could include things like LCL_VAR (such as from some CSE'd node).

In general we should also likely be doing this for all scenarios and only making the swap if condition is TYP_MASK and if we know the base type that the mask produced.

Consider for example the following where the mask will be TYP_UINT but the TernaryLogic node is TYP_ULONG, so we'd end up ignoring half the mask bits.

var v0 = Vector128.Create<ulong>(x);
var v1 = Vector128.Create<ulong>(y);
var v2 = Vector128.Create<uint>(z);
var v3 = Vector128.Create<uint>(w);

var vr = Avx512F.VL.TernaryLogic(v0, v1, Avx512F.VL.CompareGreaterThanOrEqual(v2, v3).AsUInt64(), 0xE4);

This means that any scenario where the type of the mask is smaller than the size of the ternary logic node, we need to retype to the smaller type (as more mask bits are present). You also need to consider the other scenario, where if the mask is larger than the size of the ternary logic node we'd also need to retype (as fewer mask bits are present).

This is why normally we strictly require that maskSize == operSize. The case of a TernaryLogic representing ConditionalSelect is then special because it is bitwise and so equates to BlendVariableMask based on the base type of the mask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand the implications of your comment. Should we simply give up on this optimization converting TernaryLogic to BlendVariableMask if the TernaryLogic and mask sizes are not equal?

In your example above, are you saying that we need to retype to the minimum of the mask and TernaryLogic node operation size, since no matter which is larger, the upper bits are garbage?

This code I added is trying to choose the correct BlendVariableMask operation size. Maybe after #114410 it can do this unconditionally by using the operand mask size. But maybe you're suggesting it should be unconditional based on the minimum of the sizes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying that for TernaryLogic (CndSel in particular) -> BlendVariableMask you need to always take the base type of the condition, regardless of what the type of the TernaryLogic was

If the base type of the condition isn't known (i.e. it's a CSE'd TYP_MASK local), then we need to give up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I changed this code to unconditionally use the 'condition' base element type, and bail if the condition is not a HW intrinsic. This leads to some asm diffs due to the cast typing issue in the comment, previously, but the functionality should be identical, since in the casting case, no matter the opmask size, we only use the lower dword in the end.

e.g., diffs:

<        vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4}
>        vblendmps xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4}
...
<       vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4}
>       vblendmpd xmm0 {k1}, xmm0, qword ptr [reloc @RWD96] {1to2}

TernaryLogic nodes normalize small types to large types. When optimizing
a TernaryLogic node to a BlendVariableMask, if we leave the normalized
type, then the Blend uses the wrong instruction.

To fix this, change the Blend node to use the mask simd base type.
We only do this for mask base types that are small, which is
unsatisfying: there exists codegen today for double->int casts,
for example, that uses a 'double' type mask but an 'int' sized
TernaryLogic node. This works because we end up only using lane '0', as
the vector is converted to scalar. I didn't find any other case like
this, so hopefully if the mask type is small we can safely use it.

Fixes dotnet#114572
…nt type

But only if the condition is a HW intrinsic.
@BruceForstall BruceForstall merged commit 9df77ee into dotnet:main Apr 26, 2025
114 checks passed
@BruceForstall BruceForstall deleted the Fix114572 branch April 26, 2025 02:09
@github-actions github-actions bot locked and limited conversation to collaborators May 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: Debug/Release generate different results
3 participants