-
Notifications
You must be signed in to change notification settings - Fork 5k
Fix typing of BlendVariableMask condition when optimizing TernaryLogic #114837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a type mismatch issue in the BlendVariableMask optimization for TernaryLogic nodes by adjusting the SIMD base type used for the condition masks. It also adds a regression test to ensure the correctness of the optimization fix.
- Adjust normalization of TernaryLogic nodes to use the mask’s SIMD base type when the condition element type is small.
- Add a regression test that validates the behavior under specific vectorized scenarios.
Reviewed Changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/tests/JIT/Regression/JitBlue/Runtime_114572/Runtime_114572.cs | New regression test for TernaryLogic node behavior |
src/coreclr/jit/lowerxarch.cpp | Updates to adjust the SIMD base type for BlendVariableMask condition masks |
Files not reviewed (1)
- src/tests/JIT/Regression/JitBlue/Runtime_114572/Runtime_114572.csproj: Language not supported
Comments suppressed due to low confidence (1)
src/tests/JIT/Regression/JitBlue/Runtime_114572/Runtime_114572.cs:37
- Consider adding additional tests to cover cases where the condition mask has a type other than ushort (such as for double-to-int conversions) to ensure the new type adjustment logic is robust.
Assert.Equal(Vector256.Create((ushort)0, (ushort)1, ...), s_2);
src/coreclr/jit/lowerxarch.cpp
Outdated
var_types conditionBaseType = condition->AsHWIntrinsic()->GetSimdBaseType(); | ||
uint32_t conditionElemSize = genTypeSize(conditionBaseType); | ||
uint32_t elemSize = genTypeSize(simdBaseType); | ||
if (varTypeIsShort(conditionBaseType) && (conditionElemSize < elemSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider explicitly documenting why non-short condition types (e.g., for double-to-int conversion) are not adjusted here, or add handling for these cases if needed.
Copilot uses AI. Check for mistakes.
Looks like this optimization logic originated in #97468 |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
@tannergooding PTAL |
cc @dotnet/intel |
src/coreclr/jit/lowerxarch.cpp
Outdated
var_types conditionBaseType = condition->AsHWIntrinsic()->GetSimdBaseType(); | ||
uint32_t conditionElemSize = genTypeSize(conditionBaseType); | ||
uint32_t elemSize = genTypeSize(simdBaseType); | ||
if (varTypeIsShort(conditionBaseType) && (conditionElemSize < elemSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be checking varTypeIsSmall
? Same problem can happen with a byte compare.
Simplified repro:
public class Runtime_114572
{
public static byte s_ub = 1;
public static ushort s_us = 1;
public static void Problem1()
{
var v0 = Vector128<ushort>.Zero;
var v1 = Vector128<ushort>.One;
var vs = Vector128.CreateScalar(s_us);
var vr = Avx512F.VL.TernaryLogic(v0, v1, Avx512BW.VL.CompareGreaterThanOrEqual(v0, vs), 0xE4);
Console.WriteLine(vr);
}
public static void Problem2()
{
var v0 = Vector128<byte>.Zero;
var v1 = Vector128<byte>.One;
var vs = Vector128.CreateScalar(s_ub);
var vr = Avx512F.VL.TernaryLogic(v0, v1, Avx512BW.VL.CompareGreaterThanOrEqual(v0, vs), 0xE4);
Console.WriteLine(vr);
}
}
Though I'm not sure it's not always wrong to allow a larger blend when the condition element size was smaller.
Notably, the TernaryLogic nodes mentioned in the comment (from cast normalization lowering) are gone in the rewrite I did in #114410. If anything, they should have been ConditionalSelect nodes so that they wouldn't have required transformation back to mask blend, but GT_SELECT
ended up being more efficient since the result is a scalar anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be checking
varTypeIsSmall
? Same problem can happen with a byte compare.
Indeed. Fixed.
I added your byte test case (and simplified ushort case as well).
98c7dbe
to
8ad5895
Compare
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
src/coreclr/jit/lowerxarch.cpp
Outdated
// However, there is code, such as for double->int conversions, that generates a 'double' | ||
// base type mask for a TernaryLogic node with base type 'int'. That works because it only | ||
// cares about the '0' elem, as the result will be cast to scalar. | ||
var_types conditionBaseType = condition->AsHWIntrinsic()->GetSimdBaseType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think this is quite correct. We only require that condition
produce TYP_MASK
which could include things like LCL_VAR
(such as from some CSE'd node).
In general we should also likely be doing this for all scenarios and only making the swap if condition is TYP_MASK
and if we know the base type that the mask produced.
Consider for example the following where the mask
will be TYP_UINT
but the TernaryLogic
node is TYP_ULONG
, so we'd end up ignoring half the mask bits.
var v0 = Vector128.Create<ulong>(x);
var v1 = Vector128.Create<ulong>(y);
var v2 = Vector128.Create<uint>(z);
var v3 = Vector128.Create<uint>(w);
var vr = Avx512F.VL.TernaryLogic(v0, v1, Avx512F.VL.CompareGreaterThanOrEqual(v2, v3).AsUInt64(), 0xE4);
This means that any scenario where the type of the mask is smaller than the size of the ternary logic node, we need to retype to the smaller type (as more mask bits are present). You also need to consider the other scenario, where if the mask is larger than the size of the ternary logic node we'd also need to retype (as fewer mask bits are present).
This is why normally we strictly require that maskSize == operSize
. The case of a TernaryLogic representing ConditionalSelect
is then special because it is bitwise and so equates to BlendVariableMask
based on the base type of the mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand the implications of your comment. Should we simply give up on this optimization converting TernaryLogic to BlendVariableMask if the TernaryLogic and mask sizes are not equal?
In your example above, are you saying that we need to retype to the minimum of the mask and TernaryLogic node operation size, since no matter which is larger, the upper bits are garbage?
This code I added is trying to choose the correct BlendVariableMask operation size. Maybe after #114410 it can do this unconditionally by using the operand mask size. But maybe you're suggesting it should be unconditional based on the minimum of the sizes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying that for TernaryLogic (CndSel in particular) -> BlendVariableMask
you need to always take the base type of the condition, regardless of what the type of the TernaryLogic
was
If the base type of the condition isn't known (i.e. it's a CSE'd TYP_MASK local), then we need to give up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I changed this code to unconditionally use the 'condition' base element type, and bail if the condition is not a HW intrinsic. This leads to some asm diffs due to the cast typing issue in the comment, previously, but the functionality should be identical, since in the casting case, no matter the opmask size, we only use the lower dword in the end.
e.g., diffs:
< vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4}
> vblendmps xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4}
...
< vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4}
> vblendmpd xmm0 {k1}, xmm0, qword ptr [reloc @RWD96] {1to2}
8ad5895
to
785427f
Compare
TernaryLogic nodes normalize small types to large types. When optimizing a TernaryLogic node to a BlendVariableMask, if we leave the normalized type, then the Blend uses the wrong instruction. To fix this, change the Blend node to use the mask simd base type. We only do this for mask base types that are small, which is unsatisfying: there exists codegen today for double->int casts, for example, that uses a 'double' type mask but an 'int' sized TernaryLogic node. This works because we end up only using lane '0', as the vector is converted to scalar. I didn't find any other case like this, so hopefully if the mask type is small we can safely use it. Fixes dotnet#114572
…nt type But only if the condition is a HW intrinsic.
785427f
to
88ccf95
Compare
TernaryLogic nodes normalize small types to large types. When optimizing a TernaryLogic node to a BlendVariableMask, if we leave the normalized type, then the Blend uses the wrong instruction.
To fix this, change the Blend node to use the mask simd base type. We only do this for mask base types that are small, which is unsatisfying: there exists codegen today for double->int casts, for example, that uses a 'double' type mask but an 'int' sized TernaryLogic node. This works because we end up only using lane '0', as the vector is converted to scalar. I didn't find any other case like this, so hopefully if the mask type is small we can safely use it.
Fixes #114572