[LV] Use vscale for tuning when updating profile information #143690

david-arm · 2025-06-11T12:17:48Z

In fixVectorizedLoop we call setProfileInfoAfterUnrolling to update the profile information after vectorising, however for scalable VFs we pessimistically assume vscale=1. We can improve upon this by using the value of vscale used for tuning, i.e. when targeting neoverse-v1 the expected value is 2.

llvmbot · 2025-06-11T12:18:28Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: David Sherwood (david-arm)

Changes

In fixVectorizedLoop we call setProfileInfoAfterUnrolling to update the profile information after vectorising, however for scalable VFs we pessimistically assume vscale=1. We can improve upon this by using the value of vscale used for tuning, i.e. when targeting neoverse-v1 the expected value is 2.

Patch is 21.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143690.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+17-17)
(added) llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll (+123)
(modified) llvm/test/Transforms/LoopVectorize/check-prof-info.ll (+135-25)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 333e50ee98418..eeea1cad6abff 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2688,6 +2688,20 @@ static void cse(BasicBlock *BB) {
   }
 }
 
+/// This function attempts to return a value that represents the vectorization
+/// factor at runtime. For fixed-width VFs we know this precisely at compile
+/// time, but for scalable VFs we calculate it based on an estimate of the
+/// vscale value.
+static unsigned getEstimatedRuntimeVF(ElementCount VF,
+                                      std::optional<unsigned> VScale) {
+  unsigned EstimatedVF = VF.getKnownMinValue();
+  if (VF.isScalable())
+    if (VScale)
+      EstimatedVF *= *VScale;
+  assert(EstimatedVF >= 1 && "Estimated VF shouldn't be less than 1");
+  return EstimatedVF;
+}
+
 InstructionCost
 LoopVectorizationCostModel::getVectorCallCost(CallInst *CI,
                                               ElementCount VF) const {
@@ -2787,10 +2801,10 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
   //
   // For scalable vectorization we can't know at compile time how many
   // iterations of the loop are handled in one vector iteration, so instead
-  // assume a pessimistic vscale of '1'.
+  // use the value of vscale used for tuning.
   Loop *VectorLoop = LI->getLoopFor(HeaderBB);
-  setProfileInfoAfterUnrolling(OrigLoop, VectorLoop, OrigLoop,
-                               VF.getKnownMinValue() * UF);
+  unsigned VFxUF = getEstimatedRuntimeVF(VF * UF, Cost->getVScaleForTuning());
+  setProfileInfoAfterUnrolling(OrigLoop, VectorLoop, OrigLoop, VFxUF);
 }
 
 void InnerLoopVectorizer::fixNonInductionPHIs(VPTransformState &State) {
@@ -4017,20 +4031,6 @@ ElementCount LoopVectorizationCostModel::getMaximizedVFForTarget(
   return MaxVF;
 }
 
-/// This function attempts to return a value that represents the vectorization
-/// factor at runtime. For fixed-width VFs we know this precisely at compile
-/// time, but for scalable VFs we calculate it based on an estimate of the
-/// vscale value.
-static unsigned getEstimatedRuntimeVF(ElementCount VF,
-                                      std::optional<unsigned> VScale) {
-  unsigned EstimatedVF = VF.getKnownMinValue();
-  if (VF.isScalable())
-    if (VScale)
-      EstimatedVF *= *VScale;
-  assert(EstimatedVF >= 1 && "Estimated VF shouldn't be less than 1");
-  return EstimatedVF;
-}
-
 bool LoopVectorizationPlanner::isMoreProfitable(const VectorizationFactor &A,
                                                 const VectorizationFactor &B,
                                                 const unsigned MaxTripCount,
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll b/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll
new file mode 100644
index 0000000000000..9661f1b3b6641
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll
@@ -0,0 +1,123 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --filter "br" --filter "^.*:" --version 5
+; RUN: opt -passes="print<block-freq>,loop-vectorize" -mcpu=neoverse-v1 -force-vector-interleave=1 -S < %s |  FileCheck %s -check-prefix=CHECK-V1-IC1
+; RUN: opt -passes="print<block-freq>,loop-vectorize" -mcpu=neoverse-v2 -force-vector-interleave=1 -S < %s |  FileCheck %s -check-prefix=CHECK-V2-IC1
+; RUN: opt -passes="print<block-freq>,loop-vectorize" -mcpu=neoverse-v2 -force-vector-interleave=4 -S < %s |  FileCheck %s -check-prefix=CHECK-V2-IC4
+
+target triple = "aarch64-unknown-linux-gnu"
+
+@a = dso_local global [1024 x i32] zeroinitializer, align 16
+@b = dso_local global [1024 x i32] zeroinitializer, align 16
+
+; Check correctness of profile info for vectorization without epilog.
+; Function Attrs: nofree norecurse nounwind uwtable
+define dso_local void @_Z3foov() local_unnamed_addr #0 {
+; CHECK-V1-IC1-LABEL: define dso_local void @_Z3foov(
+; CHECK-V1-IC1-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-V1-IC1:  [[ENTRY:.*:]]
+; CHECK-V1-IC1:    br i1 [[MIN_ITERS_CHECK:%.*]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF0:![0-9]+]]
+; CHECK-V1-IC1:  [[VECTOR_PH]]:
+; CHECK-V1-IC1:    br label %[[VECTOR_BODY:.*]]
+; CHECK-V1-IC1:  [[VECTOR_BODY]]:
+; CHECK-V1-IC1:    br i1 [[TMP16:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF0]], !llvm.loop [[LOOP1:![0-9]+]]
+; CHECK-V1-IC1:  [[MIDDLE_BLOCK]]:
+; CHECK-V1-IC1:    br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF4:![0-9]+]]
+; CHECK-V1-IC1:  [[SCALAR_PH]]:
+; CHECK-V1-IC1:    br label %[[FOR_BODY:.*]]
+; CHECK-V1-IC1:  [[FOR_BODY]]:
+; CHECK-V1-IC1:    br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF5:![0-9]+]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-V1-IC1:  [[FOR_COND_CLEANUP]]:
+;
+; CHECK-V2-IC1-LABEL: define dso_local void @_Z3foov(
+; CHECK-V2-IC1-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-V2-IC1:  [[ENTRY:.*:]]
+; CHECK-V2-IC1:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF0:![0-9]+]]
+; CHECK-V2-IC1:  [[VECTOR_PH]]:
+; CHECK-V2-IC1:    br label %[[VECTOR_BODY:.*]]
+; CHECK-V2-IC1:  [[VECTOR_BODY]]:
+; CHECK-V2-IC1:    br i1 [[TMP6:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF1:![0-9]+]], !llvm.loop [[LOOP2:![0-9]+]]
+; CHECK-V2-IC1:  [[MIDDLE_BLOCK]]:
+; CHECK-V2-IC1:    br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF5:![0-9]+]]
+; CHECK-V2-IC1:  [[SCALAR_PH]]:
+; CHECK-V2-IC1:    br label %[[FOR_BODY:.*]]
+; CHECK-V2-IC1:  [[FOR_BODY]]:
+; CHECK-V2-IC1:    br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF6:![0-9]+]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-V2-IC1:  [[FOR_COND_CLEANUP]]:
+;
+; CHECK-V2-IC4-LABEL: define dso_local void @_Z3foov(
+; CHECK-V2-IC4-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-V2-IC4:  [[VEC_EPILOG_VECTOR_BODY1:.*:]]
+; CHECK-V2-IC4:    br i1 [[MIN_ITERS_CHECK:%.*]], label %[[VEC_EPILOG_SCALAR_PH:.*]], label %[[VECTOR_MAIN_LOOP_ITER_CHECK:.*]], !prof [[PROF0:![0-9]+]]
+; CHECK-V2-IC4:  [[VECTOR_MAIN_LOOP_ITER_CHECK]]:
+; CHECK-V2-IC4:    br i1 false, label %[[VEC_EPILOG_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF0]]
+; CHECK-V2-IC4:  [[VECTOR_PH]]:
+; CHECK-V2-IC4:    br label %[[VECTOR_BODY:.*]]
+; CHECK-V2-IC4:  [[VECTOR_BODY]]:
+; CHECK-V2-IC4:    br i1 [[TMP20:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF1:![0-9]+]], !llvm.loop [[LOOP2:![0-9]+]]
+; CHECK-V2-IC4:  [[MIDDLE_BLOCK]]:
+; CHECK-V2-IC4:    br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF5:![0-9]+]]
+; CHECK-V2-IC4:  [[VEC_EPILOG_ITER_CHECK]]:
+; CHECK-V2-IC4:    br i1 [[MIN_EPILOG_ITERS_CHECK:%.*]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF6:![0-9]+]]
+; CHECK-V2-IC4:  [[VEC_EPILOG_PH]]:
+; CHECK-V2-IC4:    br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
+; CHECK-V2-IC4:  [[VEC_EPILOG_VECTOR_BODY]]:
+; CHECK-V2-IC4:    br i1 [[TMP38:%.*]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-V2-IC4:  [[VEC_EPILOG_MIDDLE_BLOCK]]:
+; CHECK-V2-IC4:    br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF8:![0-9]+]]
+; CHECK-V2-IC4:  [[VEC_EPILOG_SCALAR_PH]]:
+; CHECK-V2-IC4:    br label %[[FOR_BODY:.*]]
+; CHECK-V2-IC4:  [[FOR_BODY]]:
+; CHECK-V2-IC4:    br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF9:![0-9]+]], !llvm.loop [[LOOP10:![0-9]+]]
+; CHECK-V2-IC4:  [[FOR_COND_CLEANUP]]:
+;
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %for.body, %entry
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds [1024 x i32], ptr @b, i64 0, i64 %indvars.iv
+  %0 = load i32, ptr %arrayidx, align 4
+  %1 = trunc i64 %indvars.iv to i32
+  %mul = mul nsw i32 %0, %1
+  %arrayidx2 = getelementptr inbounds [1024 x i32], ptr @a, i64 0, i64 %indvars.iv
+  %2 = load i32, ptr %arrayidx2, align 4
+  %add = add nsw i32 %2, %mul
+  store i32 %add, ptr %arrayidx2, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond = icmp eq i64 %indvars.iv.next, 1024
+  br i1 %exitcond, label %for.cond.cleanup, label %for.body, !prof !0
+
+for.cond.cleanup:                                 ; preds = %for.body
+  ret void
+}
+
+!0 = !{!"branch_weights", i32 1, i32 1023}
+;.
+; CHECK-V1-IC1: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
+; CHECK-V1-IC1: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK-V1-IC1: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-V1-IC1: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK-V1-IC1: [[PROF4]] = !{!"branch_weights", i32 1, i32 3}
+; CHECK-V1-IC1: [[PROF5]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V1-IC1: [[LOOP6]] = distinct !{[[LOOP6]], [[META3]], [[META2]]}
+;.
+; CHECK-V2-IC1: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
+; CHECK-V2-IC1: [[PROF1]] = !{!"branch_weights", i32 1, i32 255}
+; CHECK-V2-IC1: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK-V2-IC1: [[META3]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-V2-IC1: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK-V2-IC1: [[PROF5]] = !{!"branch_weights", i32 1, i32 3}
+; CHECK-V2-IC1: [[PROF6]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V2-IC1: [[LOOP7]] = distinct !{[[LOOP7]], [[META4]], [[META3]]}
+;.
+; CHECK-V2-IC4: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
+; CHECK-V2-IC4: [[PROF1]] = !{!"branch_weights", i32 1, i32 63}
+; CHECK-V2-IC4: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK-V2-IC4: [[META3]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-V2-IC4: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK-V2-IC4: [[PROF5]] = !{!"branch_weights", i32 1, i32 15}
+; CHECK-V2-IC4: [[PROF6]] = !{!"branch_weights", i32 2, i32 0}
+; CHECK-V2-IC4: [[LOOP7]] = distinct !{[[LOOP7]], [[META3]], [[META4]]}
+; CHECK-V2-IC4: [[PROF8]] = !{!"branch_weights", i32 1, i32 1}
+; CHECK-V2-IC4: [[PROF9]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V2-IC4: [[LOOP10]] = distinct !{[[LOOP10]], [[META4]], [[META3]]}
+;.
diff --git a/llvm/test/Transforms/LoopVectorize/check-prof-info.ll b/llvm/test/Transforms/LoopVectorize/check-prof-info.ll
index 17013c5908065..0e1e4dfecd1e6 100644
--- a/llvm/test/Transforms/LoopVectorize/check-prof-info.ll
+++ b/llvm/test/Transforms/LoopVectorize/check-prof-info.ll
@@ -1,6 +1,8 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --filter "br" --filter "^.*:" --version 5
 ; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=1 -S < %s |  FileCheck %s
-; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=4 -S < %s |  FileCheck %s -check-prefix=CHECK-MASKED
+; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=4 -S < %s |  FileCheck %s -check-prefix=CHECK-IC4
+; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=1 \
+; RUN:   -scalable-vectorization=on -force-target-supports-scalable-vectors -S < %s |  FileCheck %s -check-prefix=CHECK-SCALABLE
 
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 
@@ -10,15 +12,53 @@ target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 ; Check correctness of profile info for vectorization without epilog.
 ; Function Attrs: nofree norecurse nounwind uwtable
 define dso_local void @_Z3foov() local_unnamed_addr #0 {
-; CHECK-LABEL: @_Z3foov(
-; CHECK:  [[VECTOR_BODY:vector\.body]]:
-; CHECK:    br i1 [[TMP:%.*]], label [[MIDDLE_BLOCK:%.*]], label %[[VECTOR_BODY]], !prof [[LP1_255:\!.*]],
-; CHECK:  [[FOR_BODY:for\.body]]:
-; CHECK:    br i1 [[EXITCOND:%.*]], label [[FOR_END_LOOPEXIT:%.*]], label %[[FOR_BODY]], !prof [[LP0_0:\!.*]],
-; CHECK-MASKED:  [[VECTOR_BODY:vector\.body]]:
-; CHECK-MASKED:    br i1 [[TMP:%.*]], label [[MIDDLE_BLOCK:%.*]], label %[[VECTOR_BODY]], !prof [[LP1_63:\!.*]],
-; CHECK-MASKED:  [[FOR_BODY:for\.body]]:
-; CHECK-MASKED:    br i1 [[EXITCOND:%.*]], label [[FOR_END_LOOPEXIT:%.*]], label %[[FOR_BODY]], !prof [[LP0_0:\!.*]],
+; CHECK-LABEL: define dso_local void @_Z3foov(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK:  [[ENTRY:.*:]]
+; CHECK:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2:![0-9]+]]
+; CHECK:  [[VECTOR_PH]]:
+; CHECK:    br label %[[VECTOR_BODY:.*]]
+; CHECK:  [[VECTOR_BODY]]:
+; CHECK:    br i1 [[TMP6:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7:![0-9]+]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK:  [[MIDDLE_BLOCK]]:
+; CHECK:    br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11:![0-9]+]]
+; CHECK:  [[SCALAR_PH]]:
+; CHECK:    br label %[[FOR_BODY:.*]]
+; CHECK:  [[FOR_COND_CLEANUP]]:
+; CHECK:  [[FOR_BODY]]:
+; CHECK:    br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF12:![0-9]+]], !llvm.loop [[LOOP13:![0-9]+]]
+;
+; CHECK-IC4-LABEL: define dso_local void @_Z3foov(
+; CHECK-IC4-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-IC4:  [[ENTRY:.*:]]
+; CHECK-IC4:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2:![0-9]+]]
+; CHECK-IC4:  [[VECTOR_PH]]:
+; CHECK-IC4:    br label %[[VECTOR_BODY:.*]]
+; CHECK-IC4:  [[VECTOR_BODY]]:
+; CHECK-IC4:    br i1 [[TMP18:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7:![0-9]+]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK-IC4:  [[MIDDLE_BLOCK]]:
+; CHECK-IC4:    br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11:![0-9]+]]
+; CHECK-IC4:  [[SCALAR_PH]]:
+; CHECK-IC4:    br label %[[FOR_BODY:.*]]
+; CHECK-IC4:  [[FOR_COND_CLEANUP]]:
+; CHECK-IC4:  [[FOR_BODY]]:
+; CHECK-IC4:    br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF12:![0-9]+]], !llvm.loop [[LOOP13:![0-9]+]]
+;
+; CHECK-SCALABLE-LABEL: define dso_local void @_Z3foov(
+; CHECK-SCALABLE-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-SCALABLE:  [[ENTRY:.*:]]
+; CHECK-SCALABLE:    br i1 [[MIN_ITERS_CHECK:%.*]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2:![0-9]+]]
+; CHECK-SCALABLE:  [[VECTOR_PH]]:
+; CHECK-SCALABLE:    br label %[[VECTOR_BODY:.*]]
+; CHECK-SCALABLE:  [[VECTOR_BODY]]:
+; CHECK-SCALABLE:    br i1 [[TMP16:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7:![0-9]+]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK-SCALABLE:  [[MIDDLE_BLOCK]]:
+; CHECK-SCALABLE:    br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11:![0-9]+]]
+; CHECK-SCALABLE:  [[SCALAR_PH]]:
+; CHECK-SCALABLE:    br label %[[FOR_BODY:.*]]
+; CHECK-SCALABLE:  [[FOR_COND_CLEANUP]]:
+; CHECK-SCALABLE:  [[FOR_BODY]]:
+; CHECK-SCALABLE:    br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF12:![0-9]+]], !llvm.loop [[LOOP13:![0-9]+]]
 ;
 entry:
   br label %for.body
@@ -44,15 +84,53 @@ for.body:                                         ; preds = %for.body, %entry
 ; Check correctness of profile info for vectorization with epilog.
 ; Function Attrs: nofree norecurse nounwind uwtable
 define dso_local void @_Z3foo2v() local_unnamed_addr #0 {
-; CHECK-LABEL: @_Z3foo2v(
-; CHECK:  [[VECTOR_BODY:vector\.body]]:
-; CHECK:    br i1 [[TMP:%.*]], label [[MIDDLE_BLOCK:%.*]], label %[[VECTOR_BODY]], !prof [[LP1_255:\!.*]],
-; CHECK:  [[FOR_BODY:for\.body]]:
-; CHECK:    br i1 [[EXITCOND:%.*]], label [[FOR_END_LOOPEXIT:%.*]], label %[[FOR_BODY]], !prof [[LP1_2:\!.*]],
-; CHECK-MASKED:  [[VECTOR_BODY:vector\.body]]:
-; CHECK-MASKED:    br i1 [[TMP:%.*]], label [[MIDDLE_BLOCK:%.*]], label %[[VECTOR_BODY]], !prof [[LP1_63:\!.*]],
-; CHECK-MASKED:  [[FOR_BODY:for\.body]]:
-; CHECK-MASKED:    br i1 [[EXITCOND:%.*]], label [[FOR_END_LOOPEXIT:%.*]], label %[[FOR_BODY]], !prof [[LP1_2:\!.*]],
+; CHECK-LABEL: define dso_local void @_Z3foo2v(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK:  [[ENTRY:.*:]]
+; CHECK:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2]]
+; CHECK:  [[VECTOR_PH]]:
+; CHECK:    br label %[[VECTOR_BODY:.*]]
+; CHECK:  [[VECTOR_BODY]]:
+; CHECK:    br i1 [[TMP6:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7]], !llvm.loop [[LOOP14:![0-9]+]]
+; CHECK:  [[MIDDLE_BLOCK]]:
+; CHECK:    br i1 false, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11]]
+; CHECK:  [[SCALAR_PH]]:
+; CHECK:    br label %[[FOR_BODY:.*]]
+; CHECK:  [[FOR_COND_CLEANUP]]:
+; CHECK:  [[FOR_BODY]]:
+; CHECK:    br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF15:![0-9]+]], !llvm.loop [[LOOP16:![0-9]+]]
+;
+; CHECK-IC4-LABEL: define dso_local void @_Z3foo2v(
+; CHECK-IC4-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-IC4:  [[ENTRY:.*:]]
+; CHECK-IC4:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2]]
+; CHECK-IC4:  [[VECTOR_PH]]:
+; CHECK-IC4:    br label %[[VECTOR_BODY:.*]]
+; CHECK-IC4:  [[VECTOR_BODY]]:
+; CHECK-IC4:    br i1 [[TMP18:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7]], !llvm.loop [[LOOP14:![0-9]+]]
+; CHECK-IC4:  [[MIDDLE_BLOCK]]:
+; CHECK-IC4:    br i1 false, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11]]
+; CHECK-IC4:  [[SCALAR_PH]]:
+; CHECK-IC4:    br label %[[FOR_BODY:.*]]
+; CHECK-IC4:  [[FOR_COND_CLEANUP]]:
+; CHECK-IC4:  [[FOR_BODY]]:
+; CHECK-IC4:    br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF15:![0-9]+]], !llvm.loop [[LOOP16:![0-9]+]]
+;
+; CHECK-SCALABLE-LABEL: define dso_local void @_Z3foo2v(
+; CHECK-SCALABLE-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-SCALABLE:  [[ENTRY:.*:]]
+; CHECK-SCALABLE:    br i1 [[MIN_ITERS_CHECK:%.*]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]], !prof [[PROF2]]
+; CHECK-SCALABLE:  [[VECTOR_PH]]:
+; CHECK-SCALABLE:    br label %[[VECTOR_BODY:.*]]
+; CHECK-SCALABLE:  [[VECTOR_BODY]]:
+; CHECK-SCALABLE:    br i1 [[TMP16:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF7]], !llvm.loop [[LOOP14:![0-9]+]]
+; CHECK-SCALABLE:  [[MIDDLE_BLOCK]]:
+; CHECK-SCALABLE:    br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF11]]
+; CHECK-SCALABLE:  [[SCALAR_PH]]:
+; CHECK-SCALABLE:    br label %[[FOR_BODY:.*]]
+; CHECK-SCALABLE:  [[FOR_COND_CLEANUP]]:
+; CHECK-SCALABLE:  [[FOR_BODY]]:
+; CHECK-SCALABLE:    br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF15:![0-9]+]], !llvm.loop [[LOOP16:![0-9]+]]
 ;
 entry:
   br label %for.body
@@ -80,11 +158,6 @@ attributes #0 = { "use-soft-float"="false" }
 !llvm.module.flags = !{!0}
 !llvm.ident = !{!1}
 
-; CHECK: [[LP1_255]] = !{!"branch_weights", i32 1, i32 255}
-; CHECK: [[LP0_0]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK-MASKED: [[LP1_63]] = !{!"branch_weights", i32 1, i32 63}
-; CHECK-MASKED: [[LP0_0]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK: [[LP1_2]] = !{!"branch_weights", i32 1, i32 2}
 
 !0 = !{i32 1, !"wchar_size", i32 4}
 !1 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project c292b5b5e059e6ce3e6449e6827ef7e1037c21c4)"}
@@ -94,3 +167,40 @@ attributes #0 = { "use-soft-float"="false" }
 !5 = !{!"Simple C++ TBAA"}
 !6 = !{!"branch_weights", i32 1, i32 1023}
 !7 = !{!"branch_weights", i32 1, i32 1026}
+;.
+; CHECK: [[PROF2]] = !{!"branch_weights", i32 1, i32 127}
+; CHECK: [[PROF7]] = !{!"branch_weights", i32 1, i32 255}
+; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META9:![0-9]+]], [[META10:![0-9]+]]}
+; CHECK: [[META9]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META10]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[PROF11]] = !{!"branch_weights", i32 1, i32 3}
+; CHECK: [[PROF12]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK: [[LOOP13]] = distinct !{[[LOOP13]], [[META10]], [[META9]]}
+; C...
[truncated]

lukel97

LGTM

fhahn · 2025-06-11T13:52:31Z

llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll

+
+; Check correctness of profile info for vectorization without epilog.
+; Function Attrs: nofree norecurse nounwind uwtable
+define dso_local void @_Z3foov() local_unnamed_addr #0 {


Would be good to clean up the test a bit, dropping dso_local, local_unnamed_addr , #0

I just copied this from the existing test, which means we should also clean up the existing test too then. Happy to do that in this patch.

fhahn · 2025-06-11T13:53:49Z

llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll

+  br label %for.body
+
+for.body:                                         ; preds = %for.body, %entry
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]


Suggested change

%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]

%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]

for consistency with other, newer tests

fhahn · 2025-06-11T13:53:51Z

llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll

+  %arrayidx = getelementptr inbounds [1024 x i32], ptr @b, i64 0, i64 %indvars.iv
+  %0 = load i32, ptr %arrayidx, align 4
+  %1 = trunc i64 %indvars.iv to i32
+  %mul = mul nsw i32 %0, %1
+  %arrayidx2 = getelementptr inbounds [1024 x i32], ptr @a, i64 0, i64 %indvars.iv
+  %2 = load i32, ptr %arrayidx2, align 4
+  %add = add nsw i32 %2, %mul
+  store i32 %add, ptr %arrayidx2, align 4


We don't really care about the body here right? Might be worth making it a bit simpler, maybe just with a store of the IV or load/store?

fhahn · 2025-06-11T13:54:15Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

  Loop *VectorLoop = LI->getLoopFor(HeaderBB);
-  setProfileInfoAfterUnrolling(OrigLoop, VectorLoop, OrigLoop,
-                               VF.getKnownMinValue() * UF);
+  unsigned VFxUF = getEstimatedRuntimeVF(VF * UF, Cost->getVScaleForTuning());


Suggested change

unsigned VFxUF = getEstimatedRuntimeVF(VF * UF, Cost->getVScaleForTuning());

unsigned EstimatedVFxUF = getEstimatedRuntimeVF(VF * UF, Cost->getVScaleForTuning());

Might be worth updating the name as well?

fhahn · 2025-06-11T13:56:09Z

llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll

+@b = dso_local global [1024 x i32] zeroinitializer, align 16
+
+; Check correctness of profile info for vectorization without epilog.
+; Function Attrs: nofree norecurse nounwind uwtable


Suggested change

; Function Attrs: nofree norecurse nounwind uwtable

fhahn · 2025-06-11T13:57:07Z

llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll

+@a = dso_local global [1024 x i32] zeroinitializer, align 16
+@b = dso_local global [1024 x i32] zeroinitializer, align 16
+
+; Check correctness of profile info for vectorization without epilog.


Might be good to spell spell out that we expect the branch weigth computations to use vscale = 1 for neoverse-v1 and vscale = 2 for neoverse-v2?

In fixVectorizedLoop we call setProfileInfoAfterUnrolling to update the profile information after vectorising, however for scalable VFs we pessimistically assume vscale=1. We can improve upon this by using the value of vscale used for tuning, i.e. when targeting neoverse-v1 the expected value is 2.

fhahn

LGTM, thanks!

david-arm requested review from fhahn, lukel97, hassnaaHamdi and paulwalker-arm June 11, 2025 12:17

llvmbot added vectorizers llvm:transforms labels Jun 11, 2025

lukel97 approved these changes Jun 11, 2025

View reviewed changes

fhahn reviewed Jun 11, 2025

View reviewed changes

david-arm force-pushed the prof_vscale branch from 7fbe672 to 0866c8a Compare June 13, 2025 12:46

paulwalker-arm approved these changes Jun 13, 2025

View reviewed changes

fhahn reviewed Jun 14, 2025

View reviewed changes

david-arm merged commit a75e062 into llvm:main Jun 16, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Use vscale for tuning when updating profile information #143690

[LV] Use vscale for tuning when updating profile information #143690

Uh oh!

david-arm commented Jun 11, 2025

Uh oh!

llvmbot commented Jun 11, 2025 •

edited

Loading

Uh oh!

lukel97 left a comment

Uh oh!

fhahn Jun 11, 2025

Uh oh!

david-arm Jun 12, 2025

Uh oh!

david-arm Jun 12, 2025

Uh oh!

fhahn Jun 11, 2025

Uh oh!

fhahn Jun 11, 2025

Uh oh!

fhahn Jun 11, 2025

Uh oh!

fhahn Jun 11, 2025

Uh oh!

fhahn Jun 11, 2025

Uh oh!

fhahn left a comment

Uh oh!

Uh oh!

Uh oh!

	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]

	unsigned VFxUF = getEstimatedRuntimeVF(VF * UF, Cost->getVScaleForTuning());
	unsigned EstimatedVFxUF = getEstimatedRuntimeVF(VF * UF, Cost->getVScaleForTuning());

[LV] Use vscale for tuning when updating profile information #143690

[LV] Use vscale for tuning when updating profile information #143690

Uh oh!

Conversation

david-arm commented Jun 11, 2025

Uh oh!

llvmbot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Jun 11, 2025 •

edited

Loading