Skip to content

[PGO] Make the PGO instrumentation insert point after alloca #142043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 30, 2025

Conversation

xur-llvm
Copy link
Contributor

We're changing PGO instrumentation to insert the intrinsic after alloca instructions. For sampled instrumentation, a conditional check is placed before the intrinsic. If this intrinsic comes before an alloca, the alloca (whose size might be unknown due to Phi node) becomes conditional, resulting in inefficient code. We have seen some stack overflows due to this.

This patch guarantees the intrinsic is always after the alloca.

@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:transforms labels May 29, 2025
@llvmbot
Copy link
Member

llvmbot commented May 29, 2025

@llvm/pr-subscribers-pgo

@llvm/pr-subscribers-llvm-transforms

Author: None (xur-llvm)

Changes

We're changing PGO instrumentation to insert the intrinsic after alloca instructions. For sampled instrumentation, a conditional check is placed before the intrinsic. If this intrinsic comes before an alloca, the alloca (whose size might be unknown due to Phi node) becomes conditional, resulting in inefficient code. We have seen some stack overflows due to this.

This patch guarantees the intrinsic is always after the alloca.


Full diff: https://github.com/llvm/llvm-project/pull/142043.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (+4-4)
  • (added) llvm/test/Transforms/PGOProfile/entry_alloca.ll (+38)
  • (modified) llvm/test/Transforms/PGOProfile/split-indirectbr-critical-edges.ll (+1)
diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
index a063fb2ec3fe1..3347f3d376c94 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
@@ -855,7 +855,7 @@ BasicBlock *FuncPGOInstrumentation<Edge, BBInfo>::getInstrBB(Edge *E) {
   auto canInstrument = [](BasicBlock *BB) -> BasicBlock * {
     // There are basic blocks (such as catchswitch) cannot be instrumented.
     // If the returned first insertion point is the end of BB, skip this BB.
-    if (BB->getFirstInsertionPt() == BB->end())
+    if (BB->getFirstNonPHIOrDbgOrAlloca() == BB->end())
       return nullptr;
     return BB;
   };
@@ -952,7 +952,7 @@ void FunctionInstrumenter::instrument() {
       Name, PointerType::get(M.getContext(), 0));
   if (PGOFunctionEntryCoverage) {
     auto &EntryBB = F.getEntryBlock();
-    IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt());
+    IRBuilder<> Builder(&EntryBB, EntryBB.getFirstNonPHIOrDbgOrAlloca());
     // llvm.instrprof.cover(i8* <name>, i64 <hash>, i32 <num-counters>,
     //                      i32 <index>)
     Builder.CreateIntrinsic(
@@ -1010,7 +1010,7 @@ void FunctionInstrumenter::instrument() {
   if (PGOTemporalInstrumentation) {
     NumCounters += PGOBlockCoverage ? 8 : 1;
     auto &EntryBB = F.getEntryBlock();
-    IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt());
+    IRBuilder<> Builder(&EntryBB, EntryBB.getFirstNonPHIOrDbgOrAlloca());
     // llvm.instrprof.timestamp(i8* <name>, i64 <hash>, i32 <num-counters>,
     //                          i32 <index>)
     Builder.CreateIntrinsic(Intrinsic::instrprof_timestamp,
@@ -1021,7 +1021,7 @@ void FunctionInstrumenter::instrument() {
   }
 
   for (auto *InstrBB : InstrumentBBs) {
-    IRBuilder<> Builder(InstrBB, InstrBB->getFirstInsertionPt());
+    IRBuilder<> Builder(InstrBB, InstrBB->getFirstNonPHIOrDbgOrAlloca());
     assert(Builder.GetInsertPoint() != InstrBB->end() &&
            "Cannot get the Instrumentation point");
     // llvm.instrprof.increment(i8* <name>, i64 <hash>, i32 <num-counters>,
diff --git a/llvm/test/Transforms/PGOProfile/entry_alloca.ll b/llvm/test/Transforms/PGOProfile/entry_alloca.ll
new file mode 100644
index 0000000000000..bc06bd14d21ca
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/entry_alloca.ll
@@ -0,0 +1,38 @@
+; Note: Make sure that instrumention intrinsic is after entry alloca.
+;RUN: opt < %s -passes=pgo-instr-gen -S | FileCheck %s
+
+%struct.A = type { i32, [0 x i32] }
+
+; CHECK-LABEL: @foo()
+; CHECK-NEXT:   %1 = alloca %struct.A
+; CHECK-NEXT:   call void @llvm.instrprof.increment(ptr @__profn_foo
+; CHECK-NEXT:   call void @bar(ptr
+
+define dso_local i32 @foo() {
+  %1 = alloca %struct.A, align 4
+  call void @bar(ptr noundef nonnull %1) #3
+  %2 = load i32, ptr %1, align 4
+  %3 = icmp sgt i32 %2, 0
+  br i1 %3, label %4, label %15
+
+4:
+  %5 = getelementptr inbounds i8, ptr %1, i64 4
+  %6 = zext nneg i32 %2 to i64
+  br label %7
+
+7:
+  %8 = phi i64 [ 0, %4 ], [ %13, %7 ]
+  %9 = phi i32 [ 0, %4 ], [ %12, %7 ]
+  %10 = getelementptr inbounds [0 x i32], ptr %5, i64 0, i64 %8
+  %11 = load i32, ptr %10, align 4
+  %12 = add nsw i32 %11, %9
+  %13 = add nuw nsw i64 %8, 1
+  %14 = icmp eq i64 %13, %6
+  br i1 %14, label %15, label %7
+
+15:
+  %16 = phi i32 [ 0, %0 ], [ %12, %7 ]
+  ret i32 %16
+}
+
+declare void @bar(ptr noundef)
diff --git a/llvm/test/Transforms/PGOProfile/split-indirectbr-critical-edges.ll b/llvm/test/Transforms/PGOProfile/split-indirectbr-critical-edges.ll
index 8b92f8ccb51fb..cf022dd7c4e65 100644
--- a/llvm/test/Transforms/PGOProfile/split-indirectbr-critical-edges.ll
+++ b/llvm/test/Transforms/PGOProfile/split-indirectbr-critical-edges.ll
@@ -42,6 +42,7 @@ if.end:                                           ; preds = %if.end.preheader, %
 ;; The edge will not be profiled.
 ; CHECK-LABEL: @cannot_split(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    %targets = alloca <2 x ptr>, align 16
 ; CHECK-NEXT:    call void @llvm.instrprof.increment
 ; CHECK: indirect:
 ; CHECK-NOT:     call void @llvm.instrprof.increment

@xur-llvm xur-llvm requested review from david-xl and weiguozhi May 29, 2025 21:47
@@ -0,0 +1,38 @@
; Note: Make sure that instrumention intrinsic is after entry alloca.
;RUN: opt < %s -passes=pgo-instr-gen -S | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case with sampling on too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@xur-llvm xur-llvm force-pushed the users/xur-llvm/pgowork branch from 75769f2 to 834741e Compare May 29, 2025 22:48
We're changing PGO instrumentation to insert the intrinsic after
alloca instructions. For sampled instrumentation, a conditional
check is placed before the intrinsic. If this intrinsic comes
before an alloca, the alloca will be placed in the BB after entry.
In this case, the allocas will not be combined, resulting in
inefficient code. We have seen some stack overflows due to this.

This patch guarantees the intrinsic is always after the alloca.
@xur-llvm xur-llvm force-pushed the users/xur-llvm/pgowork branch from 834741e to c987f23 Compare May 30, 2025 21:00
@xur-llvm
Copy link
Contributor Author

Updated the commit message and used two allocas in the test (if we use llc, we can see that allocas are not combined without this patch).

@xur-llvm xur-llvm merged commit a004c70 into main May 30, 2025
6 of 10 checks passed
@xur-llvm xur-llvm deleted the users/xur-llvm/pgowork branch May 30, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants