-
Notifications
You must be signed in to change notification settings - Fork 14k
[LV] Use vscale for tuning when updating profile information #143690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-vectorizers Author: David Sherwood (david-arm) ChangesIn fixVectorizedLoop we call setProfileInfoAfterUnrolling to update the profile information after vectorising, however for scalable VFs we pessimistically assume vscale=1. We can improve upon this by using the value of vscale used for tuning, i.e. when targeting neoverse-v1 the expected value is 2. Patch is 21.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143690.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 333e50ee98418..eeea1cad6abff 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2688,6 +2688,20 @@ static void cse(BasicBlock *BB) {
}
}
+/// This function attempts to return a value that represents the vectorization
+/// factor at runtime. For fixed-width VFs we know this precisely at compile
+/// time, but for scalable VFs we calculate it based on an estimate of the
+/// vscale value.
+static unsigned getEstimatedRuntimeVF(ElementCount VF,
+ std::optional<unsigned> VScale) {
+ unsigned EstimatedVF = VF.getKnownMinValue();
+ if (VF.isScalable())
+ if (VScale)
+ EstimatedVF *= *VScale;
+ assert(EstimatedVF >= 1 && "Estimated VF shouldn't be less than 1");
+ return EstimatedVF;
+}
+
InstructionCost
LoopVectorizationCostModel::getVectorCallCost(CallInst *CI,
ElementCount VF) const {
@@ -2787,10 +2801,10 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
//
// For scalable vectorization we can't know at compile time how many
// iterations of the loop are handled in one vector iteration, so instead
- // assume a pessimistic vscale of '1'.
+ // use the value of vscale used for tuning.
Loop *VectorLoop = LI->getLoopFor(HeaderBB);
- setProfileInfoAfterUnrolling(OrigLoop, VectorLoop, OrigLoop,
- VF.getKnownMinValue() * UF);
+ unsigned VFxUF = getEstimatedRuntimeVF(VF * UF, Cost->getVScaleForTuning());
+ setProfileInfoAfterUnrolling(OrigLoop, VectorLoop, OrigLoop, VFxUF);
}
void InnerLoopVectorizer::fixNonInductionPHIs(VPTransformState &State) {
@@ -4017,20 +4031,6 @@ ElementCount LoopVectorizationCostModel::getMaximizedVFForTarget(
return MaxVF;
}
-/// This function attempts to return a value that represents the vectorization
-/// factor at runtime. For fixed-width VFs we know this precisely at compile
-/// time, but for scalable VFs we calculate it based on an estimate of the
-/// vscale value.
-static unsigned getEstimatedRuntimeVF(ElementCount VF,
- std::optional<unsigned> VScale) {
- unsigned EstimatedVF = VF.getKnownMinValue();
- if (VF.isScalable())
- if (VScale)
- EstimatedVF *= *VScale;
- assert(EstimatedVF >= 1 && "Estimated VF shouldn't be less than 1");
- return EstimatedVF;
-}
-
bool LoopVectorizationPlanner::isMoreProfitable(const VectorizationFactor &A,
const VectorizationFactor &B,
const unsigned MaxTripCount,
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll b/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll
new file mode 100644
index 0000000000000..9661f1b3b6641
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll
@@ -0,0 +1,123 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --filter "br" --filter "^.*:" --version 5
+; RUN: opt -passes="print<block-freq>,loop-vectorize" -mcpu=neoverse-v1 -force-vector-interleave=1 -S < %s | FileCheck %s -check-prefix=CHECK-V1-IC1
+; RUN: opt -passes="print<block-freq>,loop-vectorize" -mcpu=neoverse-v2 -force-vector-interleave=1 -S < %s | FileCheck %s -check-prefix=CHECK-V2-IC1
+; RUN: opt -passes="print<block-freq>,loop-vectorize" -mcpu=neoverse-v2 -force-vector-interleave=4 -S < %s | FileCheck %s -check-prefix=CHECK-V2-IC4
+
+target triple = "aarch64-unknown-linux-gnu"
+
+@a = dso_local global [1024 x i32] zeroinitializer, align 16
+@b = dso_local global [1024 x i32] zeroinitializer, align 16
+
+; Check correctness of profile info for vectorization without epilog.
+; Function Attrs: nofree norecurse nounwind uwtable
+define dso_local void @_Z3foov() local_unnamed_addr #0 {
+; CHECK-V1-IC1-LABEL: define dso_local void @_Z3foov(
+; CHECK-V1-IC1-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-V1-IC1: [[ENTRY:.*:]]
+; CHECK-V1-IC1: br i1 [[MIN_ITERS_CHECK:%.*]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF0:![0-9]+]]
+; CHECK-V1-IC1: [[VECTOR_PH]]:
+; CHECK-V1-IC1: br label %[[VECTOR_BODY:.*]]
+; CHECK-V1-IC1: [[VECTOR_BODY]]:
+; CHECK-V1-IC1: br i1 [[TMP16:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF0]], !llvm.loop [[LOOP1:![0-9]+]]
+; CHECK-V1-IC1: [[MIDDLE_BLOCK]]:
+; CHECK-V1-IC1: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF4:![0-9]+]]
+; CHECK-V1-IC1: [[SCALAR_PH]]:
+; CHECK-V1-IC1: br label %[[FOR_BODY:.*]]
+; CHECK-V1-IC1: [[FOR_BODY]]:
+; CHECK-V1-IC1: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF5:![0-9]+]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-V1-IC1: [[FOR_COND_CLEANUP]]:
+;
+; CHECK-V2-IC1-LABEL: define dso_local void @_Z3foov(
+; CHECK-V2-IC1-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-V2-IC1: [[ENTRY:.*:]]
+; CHECK-V2-IC1: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF0:![0-9]+]]
+; CHECK-V2-IC1: [[VECTOR_PH]]:
+; CHECK-V2-IC1: br label %[[VECTOR_BODY:.*]]
+; CHECK-V2-IC1: [[VECTOR_BODY]]:
+; CHECK-V2-IC1: br i1 [[TMP6:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF1:![0-9]+]], !llvm.loop [[LOOP2:![0-9]+]]
+; CHECK-V2-IC1: [[MIDDLE_BLOCK]]:
+; CHECK-V2-IC1: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF5:![0-9]+]]
+; CHECK-V2-IC1: [[SCALAR_PH]]:
+; CHECK-V2-IC1: br label %[[FOR_BODY:.*]]
+; CHECK-V2-IC1: [[FOR_BODY]]:
+; CHECK-V2-IC1: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF6:![0-9]+]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-V2-IC1: [[FOR_COND_CLEANUP]]:
+;
+; CHECK-V2-IC4-LABEL: define dso_local void @_Z3foov(
+; CHECK-V2-IC4-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-V2-IC4: [[VEC_EPILOG_VECTOR_BODY1:.*:]]
+; CHECK-V2-IC4: br i1 [[MIN_ITERS_CHECK:%.*]], label %[[VEC_EPILOG_SCALAR_PH:.*]], label %[[VECTOR_MAIN_LOOP_ITER_CHECK:.*]], !prof [[PROF0:![0-9]+]]
+; CHECK-V2-IC4: [[VECTOR_MAIN_LOOP_ITER_CHECK]]:
+; CHECK-V2-IC4: br i1 false, label %[[VEC_EPILOG_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF0]]
+; CHECK-V2-IC4: [[VECTOR_PH]]:
+; CHECK-V2-IC4: br label %[[VECTOR_BODY:.*]]
+; CHECK-V2-IC4: [[VECTOR_BODY]]:
+; CHECK-V2-IC4: br i1 [[TMP20:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF1:![0-9]+]], !llvm.loop [[LOOP2:![0-9]+]]
+; CHECK-V2-IC4: [[MIDDLE_BLOCK]]:
+; CHECK-V2-IC4: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF5:![0-9]+]]
+; CHECK-V2-IC4: [[VEC_EPILOG_ITER_CHECK]]:
+; CHECK-V2-IC4: br i1 [[MIN_EPILOG_ITERS_CHECK:%.*]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF6:![0-9]+]]
+; CHECK-V2-IC4: [[VEC_EPILOG_PH]]:
+; CHECK-V2-IC4: br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
+; CHECK-V2-IC4: [[VEC_EPILOG_VECTOR_BODY]]:
+; CHECK-V2-IC4: br i1 [[TMP38:%.*]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-V2-IC4: [[VEC_EPILOG_MIDDLE_BLOCK]]:
+; CHECK-V2-IC4: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF8:![0-9]+]]
+; CHECK-V2-IC4: [[VEC_EPILOG_SCALAR_PH]]:
+; CHECK-V2-IC4: br label %[[FOR_BODY:.*]]
+; CHECK-V2-IC4: [[FOR_BODY]]:
+; CHECK-V2-IC4: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF9:![0-9]+]], !llvm.loop [[LOOP10:![0-9]+]]
+; CHECK-V2-IC4: [[FOR_COND_CLEANUP]]:
+;
+entry:
+ br label %for.body
+
+for.body: ; preds = %for.body, %entry
+ %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+ %arrayidx = getelementptr inbounds [1024 x i32], ptr @b, i64 0, i64 %indvars.iv
+ %0 = load i32, ptr %arrayidx, align 4
+ %1 = trunc i64 %indvars.iv to i32
+ %mul = mul nsw i32 %0, %1
+ %arrayidx2 = getelementptr inbounds [1024 x i32], ptr @a, i64 0, i64 %indvars.iv
+ %2 = load i32, ptr %arrayidx2, align 4
+ %add = add nsw i32 %2, %mul
+ store i32 %add, ptr %arrayidx2, align 4
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond = icmp eq i64 %indvars.iv.next, 1024
+ br i1 %exitcond, label %for.cond.cleanup, label %for.body, !prof !0
+
+for.cond.cleanup: ; preds = %for.body
+ ret void
+}
+
+!0 = !{!"branch_weights", i32 1, i32 1023}
+;.
+; CHECK-V1-IC1: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
+; CHECK-V1-IC1: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK-V1-IC1: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-V1-IC1: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK-V1-IC1: [[PROF4]] = !{!"branch_weights", i32 1, i32 3}
+; CHECK-V1-IC1: [[PROF5]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V1-IC1: [[LOOP6]] = distinct !{[[LOOP6]], [[META3]], [[META2]]}
+;.
+; CHECK-V2-IC1: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
+; CHECK-V2-IC1: [[PROF1]] = !{!"branch_weights", i32 1, i32 255}
+; CHECK-V2-IC1: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK-V2-IC1: [[META3]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-V2-IC1: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK-V2-IC1: [[PROF5]] = !{!"branch_weights", i32 1, i32 3}
+; CHECK-V2-IC1: [[PROF6]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V2-IC1: [[LOOP7]] = distinct !{[[LOOP7]], [[META4]], [[META3]]}
+;.
+; CHECK-V2-IC4: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
+; CHECK-V2-IC4: [[PROF1]] = !{!"branch_weights", i32 1, i32 63}
+; CHECK-V2-IC4: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK-V2-IC4: [[META3]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-V2-IC4: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK-V2-IC4: [[PROF5]] = !{!"branch_weights", i32 1, i32 15}
+; CHECK-V2-IC4: [[PROF6]] = !{!"branch_weights", i32 2, i32 0}
+; CHECK-V2-IC4: [[LOOP7]] = distinct !{[[LOOP7]], [[META3]], [[META4]]}
+; CHECK-V2-IC4: [[PROF8]] = !{!"branch_weights", i32 1, i32 1}
+; CHECK-V2-IC4: [[PROF9]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V2-IC4: [[LOOP10]] = distinct !{[[LOOP10]], [[META4]], [[META3]]}
+;.
diff --git a/llvm/test/Transforms/LoopVectorize/check-prof-info.ll b/llvm/test/Transforms/LoopVectorize/check-prof-info.ll
index 17013c5908065..0e1e4dfecd1e6 100644
--- a/llvm/test/Transforms/LoopVectorize/check-prof-info.ll
+++ b/llvm/test/Transforms/LoopVectorize/check-prof-info.ll
@@ -1,6 +1,8 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --filter "br" --filter "^.*:" --version 5
; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=1 -S < %s | FileCheck %s
-; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=4 -S < %s | FileCheck %s -check-prefix=CHECK-MASKED
+; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=4 -S < %s | FileCheck %s -check-prefix=CHECK-IC4
+; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=1 \
+; RUN: -scalable-vectorization=on -force-target-supports-scalable-vectors -S < %s | FileCheck %s -check-prefix=CHECK-SCALABLE
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
@@ -10,15 +12,53 @@ target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
; Check correctness of profile info for vectorization without epilog.
; Function Attrs: nofree norecurse nounwind uwtable
define dso_local void @_Z3foov() local_unnamed_addr #0 {
-; CHECK-LABEL: @_Z3foov(
-; CHECK: [[VECTOR_BODY:vector\.body]]:
-; CHECK: br i1 [[TMP:%.*]], label [[MIDDLE_BLOCK:%.*]], label %[[VECTOR_BODY]], !prof [[LP1_255:\!.*]],
-; CHECK: [[FOR_BODY:for\.body]]:
-; CHECK: br i1 [[EXITCOND:%.*]], label [[FOR_END_LOOPEXIT:%.*]], label %[[FOR_BODY]], !prof [[LP0_0:\!.*]],
-; CHECK-MASKED: [[VECTOR_BODY:vector\.body]]:
-; CHECK-MASKED: br i1 [[TMP:%.*]], label [[MIDDLE_BLOCK:%.*]], label %[[VECTOR_BODY]], !prof [[LP1_63:\!.*]],
-; CHECK-MASKED: [[FOR_BODY:for\.body]]:
-; CHECK-MASKED: br i1 [[EXITCOND:%.*]], label [[FOR_END_LOOPEXIT:%.*]], label %[[FOR_BODY]], !prof [[LP0_0:\!.*]],
+; CHECK-LABEL: define dso_local void @_Z3foov(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK: [[ENTRY:.*:]]
+; CHECK: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2:![0-9]+]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK: br i1 [[TMP6:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7:![0-9]+]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11:![0-9]+]]
+; CHECK: [[SCALAR_PH]]:
+; CHECK: br label %[[FOR_BODY:.*]]
+; CHECK: [[FOR_COND_CLEANUP]]:
+; CHECK: [[FOR_BODY]]:
+; CHECK: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF12:![0-9]+]], !llvm.loop [[LOOP13:![0-9]+]]
+;
+; CHECK-IC4-LABEL: define dso_local void @_Z3foov(
+; CHECK-IC4-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-IC4: [[ENTRY:.*:]]
+; CHECK-IC4: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2:![0-9]+]]
+; CHECK-IC4: [[VECTOR_PH]]:
+; CHECK-IC4: br label %[[VECTOR_BODY:.*]]
+; CHECK-IC4: [[VECTOR_BODY]]:
+; CHECK-IC4: br i1 [[TMP18:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7:![0-9]+]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK-IC4: [[MIDDLE_BLOCK]]:
+; CHECK-IC4: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11:![0-9]+]]
+; CHECK-IC4: [[SCALAR_PH]]:
+; CHECK-IC4: br label %[[FOR_BODY:.*]]
+; CHECK-IC4: [[FOR_COND_CLEANUP]]:
+; CHECK-IC4: [[FOR_BODY]]:
+; CHECK-IC4: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF12:![0-9]+]], !llvm.loop [[LOOP13:![0-9]+]]
+;
+; CHECK-SCALABLE-LABEL: define dso_local void @_Z3foov(
+; CHECK-SCALABLE-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-SCALABLE: [[ENTRY:.*:]]
+; CHECK-SCALABLE: br i1 [[MIN_ITERS_CHECK:%.*]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2:![0-9]+]]
+; CHECK-SCALABLE: [[VECTOR_PH]]:
+; CHECK-SCALABLE: br label %[[VECTOR_BODY:.*]]
+; CHECK-SCALABLE: [[VECTOR_BODY]]:
+; CHECK-SCALABLE: br i1 [[TMP16:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7:![0-9]+]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK-SCALABLE: [[MIDDLE_BLOCK]]:
+; CHECK-SCALABLE: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11:![0-9]+]]
+; CHECK-SCALABLE: [[SCALAR_PH]]:
+; CHECK-SCALABLE: br label %[[FOR_BODY:.*]]
+; CHECK-SCALABLE: [[FOR_COND_CLEANUP]]:
+; CHECK-SCALABLE: [[FOR_BODY]]:
+; CHECK-SCALABLE: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF12:![0-9]+]], !llvm.loop [[LOOP13:![0-9]+]]
;
entry:
br label %for.body
@@ -44,15 +84,53 @@ for.body: ; preds = %for.body, %entry
; Check correctness of profile info for vectorization with epilog.
; Function Attrs: nofree norecurse nounwind uwtable
define dso_local void @_Z3foo2v() local_unnamed_addr #0 {
-; CHECK-LABEL: @_Z3foo2v(
-; CHECK: [[VECTOR_BODY:vector\.body]]:
-; CHECK: br i1 [[TMP:%.*]], label [[MIDDLE_BLOCK:%.*]], label %[[VECTOR_BODY]], !prof [[LP1_255:\!.*]],
-; CHECK: [[FOR_BODY:for\.body]]:
-; CHECK: br i1 [[EXITCOND:%.*]], label [[FOR_END_LOOPEXIT:%.*]], label %[[FOR_BODY]], !prof [[LP1_2:\!.*]],
-; CHECK-MASKED: [[VECTOR_BODY:vector\.body]]:
-; CHECK-MASKED: br i1 [[TMP:%.*]], label [[MIDDLE_BLOCK:%.*]], label %[[VECTOR_BODY]], !prof [[LP1_63:\!.*]],
-; CHECK-MASKED: [[FOR_BODY:for\.body]]:
-; CHECK-MASKED: br i1 [[EXITCOND:%.*]], label [[FOR_END_LOOPEXIT:%.*]], label %[[FOR_BODY]], !prof [[LP1_2:\!.*]],
+; CHECK-LABEL: define dso_local void @_Z3foo2v(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK: [[ENTRY:.*:]]
+; CHECK: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK: br i1 [[TMP6:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7]], !llvm.loop [[LOOP14:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK: br i1 false, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11]]
+; CHECK: [[SCALAR_PH]]:
+; CHECK: br label %[[FOR_BODY:.*]]
+; CHECK: [[FOR_COND_CLEANUP]]:
+; CHECK: [[FOR_BODY]]:
+; CHECK: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF15:![0-9]+]], !llvm.loop [[LOOP16:![0-9]+]]
+;
+; CHECK-IC4-LABEL: define dso_local void @_Z3foo2v(
+; CHECK-IC4-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-IC4: [[ENTRY:.*:]]
+; CHECK-IC4: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2]]
+; CHECK-IC4: [[VECTOR_PH]]:
+; CHECK-IC4: br label %[[VECTOR_BODY:.*]]
+; CHECK-IC4: [[VECTOR_BODY]]:
+; CHECK-IC4: br i1 [[TMP18:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7]], !llvm.loop [[LOOP14:![0-9]+]]
+; CHECK-IC4: [[MIDDLE_BLOCK]]:
+; CHECK-IC4: br i1 false, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11]]
+; CHECK-IC4: [[SCALAR_PH]]:
+; CHECK-IC4: br label %[[FOR_BODY:.*]]
+; CHECK-IC4: [[FOR_COND_CLEANUP]]:
+; CHECK-IC4: [[FOR_BODY]]:
+; CHECK-IC4: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF15:![0-9]+]], !llvm.loop [[LOOP16:![0-9]+]]
+;
+; CHECK-SCALABLE-LABEL: define dso_local void @_Z3foo2v(
+; CHECK-SCALABLE-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-SCALABLE: [[ENTRY:.*:]]
+; CHECK-SCALABLE: br i1 [[MIN_ITERS_CHECK:%.*]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2]]
+; CHECK-SCALABLE: [[VECTOR_PH]]:
+; CHECK-SCALABLE: br label %[[VECTOR_BODY:.*]]
+; CHECK-SCALABLE: [[VECTOR_BODY]]:
+; CHECK-SCALABLE: br i1 [[TMP16:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7]], !llvm.loop [[LOOP14:![0-9]+]]
+; CHECK-SCALABLE: [[MIDDLE_BLOCK]]:
+; CHECK-SCALABLE: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11]]
+; CHECK-SCALABLE: [[SCALAR_PH]]:
+; CHECK-SCALABLE: br label %[[FOR_BODY:.*]]
+; CHECK-SCALABLE: [[FOR_COND_CLEANUP]]:
+; CHECK-SCALABLE: [[FOR_BODY]]:
+; CHECK-SCALABLE: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF15:![0-9]+]], !llvm.loop [[LOOP16:![0-9]+]]
;
entry:
br label %for.body
@@ -80,11 +158,6 @@ attributes #0 = { "use-soft-float"="false" }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
-; CHECK: [[LP1_255]] = !{!"branch_weights", i32 1, i32 255}
-; CHECK: [[LP0_0]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK-MASKED: [[LP1_63]] = !{!"branch_weights", i32 1, i32 63}
-; CHECK-MASKED: [[LP0_0]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK: [[LP1_2]] = !{!"branch_weights", i32 1, i32 2}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project c292b5b5e059e6ce3e6449e6827ef7e1037c21c4)"}
@@ -94,3 +167,40 @@ attributes #0 = { "use-soft-float"="false" }
!5 = !{!"Simple C++ TBAA"}
!6 = !{!"branch_weights", i32 1, i32 1023}
!7 = !{!"branch_weights", i32 1, i32 1026}
+;.
+; CHECK: [[PROF2]] = !{!"branch_weights", i32 1, i32 127}
+; CHECK: [[PROF7]] = !{!"branch_weights", i32 1, i32 255}
+; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META9:![0-9]+]], [[META10:![0-9]+]]}
+; CHECK: [[META9]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META10]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[PROF11]] = !{!"branch_weights", i32 1, i32 3}
+; CHECK: [[PROF12]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK: [[LOOP13]] = distinct !{[[LOOP13]], [[META10]], [[META9]]}
+; C...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
; Check correctness of profile info for vectorization without epilog. | ||
; Function Attrs: nofree norecurse nounwind uwtable | ||
define dso_local void @_Z3foov() local_unnamed_addr #0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to clean up the test a bit, dropping dso_local
, local_unnamed_addr
, #0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied this from the existing test, which means we should also clean up the existing test too then. Happy to do that in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
br label %for.body | ||
|
||
for.body: ; preds = %for.body, %entry | ||
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] | |
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] |
for consistency with other, newer tests
%arrayidx = getelementptr inbounds [1024 x i32], ptr @b, i64 0, i64 %indvars.iv | ||
%0 = load i32, ptr %arrayidx, align 4 | ||
%1 = trunc i64 %indvars.iv to i32 | ||
%mul = mul nsw i32 %0, %1 | ||
%arrayidx2 = getelementptr inbounds [1024 x i32], ptr @a, i64 0, i64 %indvars.iv | ||
%2 = load i32, ptr %arrayidx2, align 4 | ||
%add = add nsw i32 %2, %mul | ||
store i32 %add, ptr %arrayidx2, align 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really care about the body here right? Might be worth making it a bit simpler, maybe just with a store of the IV or load/store?
Loop *VectorLoop = LI->getLoopFor(HeaderBB); | ||
setProfileInfoAfterUnrolling(OrigLoop, VectorLoop, OrigLoop, | ||
VF.getKnownMinValue() * UF); | ||
unsigned VFxUF = getEstimatedRuntimeVF(VF * UF, Cost->getVScaleForTuning()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned VFxUF = getEstimatedRuntimeVF(VF * UF, Cost->getVScaleForTuning()); | |
unsigned EstimatedVFxUF = getEstimatedRuntimeVF(VF * UF, Cost->getVScaleForTuning()); |
Might be worth updating the name as well?
@b = dso_local global [1024 x i32] zeroinitializer, align 16 | ||
|
||
; Check correctness of profile info for vectorization without epilog. | ||
; Function Attrs: nofree norecurse nounwind uwtable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; Function Attrs: nofree norecurse nounwind uwtable |
@a = dso_local global [1024 x i32] zeroinitializer, align 16 | ||
@b = dso_local global [1024 x i32] zeroinitializer, align 16 | ||
|
||
; Check correctness of profile info for vectorization without epilog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to spell spell out that we expect the branch weigth computations to use vscale = 1 for neoverse-v1 and vscale = 2 for neoverse-v2?
In fixVectorizedLoop we call setProfileInfoAfterUnrolling to update the profile information after vectorising, however for scalable VFs we pessimistically assume vscale=1. We can improve upon this by using the value of vscale used for tuning, i.e. when targeting neoverse-v1 the expected value is 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
In fixVectorizedLoop we call setProfileInfoAfterUnrolling to update the profile information after vectorising, however for scalable VFs we pessimistically assume vscale=1. We can improve upon this by using the value of vscale used for tuning, i.e. when targeting neoverse-v1 the expected value is 2.