Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPlan] Account for dead FOR splice simplification in cost model #131486

Merged

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Mar 16, 2025

Fixes #131359

After #129645, a first-order recurrence will no longer have it's splice costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and is dead.

The legacy cost model didn't account for this, so this accounts for it in planContainsAdditionalSimplifications to avoid the "VPlan cost model and legacy cost model disagreed" assertion.

Fixes llvm#131359

After llvm#129645, a first-order recurrence will no longer have it's splice costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and is dead.

The legacy cost model didn't account for this, so update this to avoid the "VPlan cost model and legacy cost model disagreed" assertion.

Alternatively we could also account for this in planContainsAdditionalSimplifications
@llvmbot
Copy link
Member

llvmbot commented Mar 16, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

Fixes #131359

After #129645, a first-order recurrence will no longer have it's splice costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and is dead.

The legacy cost model didn't account for this, so update this to avoid the "VPlan cost model and legacy cost model disagreed" assertion.

Alternatively we could also account for this in planContainsAdditionalSimplifications


Full diff: https://github.com/llvm/llvm-project/pull/131486.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+5)
  • (added) llvm/test/Transforms/LoopVectorize/X86/pr131359.ll (+61)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 08e125eca591e..f6b91000a2e7b 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -6541,6 +6541,11 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,
       // TODO: Consider vscale_range info.
       if (VF.isScalable() && VF.getKnownMinValue() == 1)
         return InstructionCost::getInvalid();
+      // If a FOR has no users inside the loop we won't generate a splice.
+      if (none_of(Phi->users(), [this](User *U) {
+            return TheLoop->contains(cast<Instruction>(U));
+          }))
+        return 0;
       SmallVector<int> Mask(VF.getKnownMinValue());
       std::iota(Mask.begin(), Mask.end(), VF.getKnownMinValue() - 1);
       return TTI.getShuffleCost(TargetTransformInfo::SK_Splice,
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr131359.ll b/llvm/test/Transforms/LoopVectorize/X86/pr131359.ll
new file mode 100644
index 0000000000000..2d0c0ce891ae7
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr131359.ll
@@ -0,0 +1,61 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -p loop-vectorize -S %s | FileCheck %s
+
+; Make sure the legacy cost model doesn't add a cost for a splice when the
+; first-order recurrence isn't used inside the loop. The VPlan cost model
+; eliminates the dead VPInstruction::FirstOrderRecurrenceSplice so the two cost
+; models would go out of sync otherwise.
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64"
+
+define void @h() {
+; CHECK-LABEL: define void @h() {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VECTOR_RECUR:%.*]] = phi <4 x i32> [ <i32 poison, i32 poison, i32 poison, i32 0>, %[[VECTOR_PH]] ], [ [[STEP_ADD:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_IND:%.*]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[STEP_ADD]] = add <4 x i32> [[VEC_IND]], splat (i32 4)
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <4 x i32> [[STEP_ADD]], splat (i32 4)
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp eq i32 [[INDEX_NEXT]], 40
+; CHECK-NEXT:    br i1 [[TMP0]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[STEP_ADD]], i32 3
+; CHECK-NEXT:    br i1 false, label %[[F_EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 40, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    br label %[[FOR_COND_I:.*]]
+; CHECK:       [[FOR_COND_I]]:
+; CHECK-NEXT:    [[D_0_I:%.*]] = phi i32 [ [[SCALAR_RECUR_INIT]], %[[SCALAR_PH]] ], [ [[E_0_I:%.*]], %[[FOR_COND_I]] ]
+; CHECK-NEXT:    [[E_0_I]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INC_I:%.*]], %[[FOR_COND_I]] ]
+; CHECK-NEXT:    [[INC_I]] = add i32 [[E_0_I]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[E_0_I]], 43
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT_I]], label %[[F_EXIT]], label %[[FOR_COND_I]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       [[F_EXIT]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.cond.i
+
+for.cond.i:
+  %d.0.i = phi i32 [ 0, %entry ], [ %e.0.i, %for.cond.i ]
+  %e.0.i = phi i32 [ 0, %entry ], [ %inc.i, %for.cond.i ]
+  %inc.i = add i32 %e.0.i, 1
+  %exitcond.not.i = icmp eq i32 %e.0.i, 43
+  br i1 %exitcond.not.i, label %f.exit, label %for.cond.i
+
+f.exit:
+  ret void
+}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+;.

@llvmbot
Copy link
Member

llvmbot commented Mar 16, 2025

@llvm/pr-subscribers-vectorizers

Author: Luke Lau (lukel97)

Changes

Fixes #131359

After #129645, a first-order recurrence will no longer have it's splice costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and is dead.

The legacy cost model didn't account for this, so update this to avoid the "VPlan cost model and legacy cost model disagreed" assertion.

Alternatively we could also account for this in planContainsAdditionalSimplifications


Full diff: https://github.com/llvm/llvm-project/pull/131486.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+5)
  • (added) llvm/test/Transforms/LoopVectorize/X86/pr131359.ll (+61)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 08e125eca591e..f6b91000a2e7b 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -6541,6 +6541,11 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,
       // TODO: Consider vscale_range info.
       if (VF.isScalable() && VF.getKnownMinValue() == 1)
         return InstructionCost::getInvalid();
+      // If a FOR has no users inside the loop we won't generate a splice.
+      if (none_of(Phi->users(), [this](User *U) {
+            return TheLoop->contains(cast<Instruction>(U));
+          }))
+        return 0;
       SmallVector<int> Mask(VF.getKnownMinValue());
       std::iota(Mask.begin(), Mask.end(), VF.getKnownMinValue() - 1);
       return TTI.getShuffleCost(TargetTransformInfo::SK_Splice,
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr131359.ll b/llvm/test/Transforms/LoopVectorize/X86/pr131359.ll
new file mode 100644
index 0000000000000..2d0c0ce891ae7
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr131359.ll
@@ -0,0 +1,61 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -p loop-vectorize -S %s | FileCheck %s
+
+; Make sure the legacy cost model doesn't add a cost for a splice when the
+; first-order recurrence isn't used inside the loop. The VPlan cost model
+; eliminates the dead VPInstruction::FirstOrderRecurrenceSplice so the two cost
+; models would go out of sync otherwise.
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64"
+
+define void @h() {
+; CHECK-LABEL: define void @h() {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VECTOR_RECUR:%.*]] = phi <4 x i32> [ <i32 poison, i32 poison, i32 poison, i32 0>, %[[VECTOR_PH]] ], [ [[STEP_ADD:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_IND:%.*]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[STEP_ADD]] = add <4 x i32> [[VEC_IND]], splat (i32 4)
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <4 x i32> [[STEP_ADD]], splat (i32 4)
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp eq i32 [[INDEX_NEXT]], 40
+; CHECK-NEXT:    br i1 [[TMP0]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[STEP_ADD]], i32 3
+; CHECK-NEXT:    br i1 false, label %[[F_EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 40, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    br label %[[FOR_COND_I:.*]]
+; CHECK:       [[FOR_COND_I]]:
+; CHECK-NEXT:    [[D_0_I:%.*]] = phi i32 [ [[SCALAR_RECUR_INIT]], %[[SCALAR_PH]] ], [ [[E_0_I:%.*]], %[[FOR_COND_I]] ]
+; CHECK-NEXT:    [[E_0_I]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INC_I:%.*]], %[[FOR_COND_I]] ]
+; CHECK-NEXT:    [[INC_I]] = add i32 [[E_0_I]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[E_0_I]], 43
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT_I]], label %[[F_EXIT]], label %[[FOR_COND_I]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       [[F_EXIT]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.cond.i
+
+for.cond.i:
+  %d.0.i = phi i32 [ 0, %entry ], [ %e.0.i, %for.cond.i ]
+  %e.0.i = phi i32 [ 0, %entry ], [ %inc.i, %for.cond.i ]
+  %inc.i = add i32 %e.0.i, 1
+  %exitcond.not.i = icmp eq i32 %e.0.i, 43
+  br i1 %exitcond.not.i, label %f.exit, label %for.cond.i
+
+f.exit:
+  ret void
+}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+;.

@@ -6541,6 +6541,11 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,
// TODO: Consider vscale_range info.
if (VF.isScalable() && VF.getKnownMinValue() == 1)
return InstructionCost::getInvalid();
// If a FOR has no users inside the loop we won't generate a splice.
if (none_of(Phi->users(), [this](User *U) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves another case where we would crash, e.g. if the first-order recurrence is used by another instruction that can be removed. It may be enough to add a variant of the test where the FOR is used by another binary instruction? Actually, that simplification would already trigger planContainsAdditionalSimplifications.

Could we catch this FOR simplification in planContainsAdditionalSimplifications as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point, adding a dead binary op does indeed cause another assertion failure, I've added a test for it.

It looks like a dead use of the FOR isn't enough to trigger planContainsAdditionalSimplifications on its own because the legacy cost model will detect it as dead and return true for skipCostComputation.

So I've moved the check from the legacy cost model to planContainsAdditionalSimplifications to check for VPFirstOrderRecurrencePHIs without any VPInstruction::FirstOrderRecurrenceSplice uses.

@lukel97 lukel97 changed the title [VPlan] Don't cost FOR splice if unused in legacy cost model [VPlan] Account for dead FOR splice simplification in cost model Mar 17, 2025
Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@Mel-Chen Mel-Chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

; and won't be costed in the VPlan cost model. Make sure we account for this
; simplifcation in comparison to the legacy cost model.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we remove target datalayout?

Comment on lines 104 to 111
;.
; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META1]]}
;.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use --check-globals=none to remove those change lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about that option, that's very handy. Thanks!

@@ -7467,6 +7467,16 @@ static bool planContainsAdditionalSimplifications(VPlan &Plan,
}
continue;
}
// If a FOR's splice wasn't used it will have been removed, so the VPlan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be slightly more compact

Suggested change
// If a FOR's splice wasn't used it will have been removed, so the VPlan
// Unused FOR splices are removed by VPlan transforms, so the VPlan-based cost

@@ -0,0 +1,111 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth using a slightly more descriptive name for the file, e.g. `llvm/test/Transforms/LoopVectorize/X86/pr131359-dead-for-splice.ll

Comment on lines 46 to 47
%d.0.i = phi i32 [ 0, %entry ], [ %e.0.i, %for.cond.i ]
%e.0.i = phi i32 [ 0, %entry ], [ %inc.i, %for.cond.i ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be slightly clearer to use more descriptive names for phis.

Suggested change
%d.0.i = phi i32 [ 0, %entry ], [ %e.0.i, %for.cond.i ]
%e.0.i = phi i32 [ 0, %entry ], [ %inc.i, %for.cond.i ]
%for = phi i32 [ 0, %entry ], [ %e.0.i, %for.cond.i ]
%iv = phi i32 [ 0, %entry ], [ %iv.next, %for.cond.i ]

entry:
br label %for.cond.i

for.cond.i:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for.cond.i:
loop:

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ,thanks!

@lukel97 lukel97 merged commit eef5ea0 into llvm:main Mar 17, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants