[Inline] Accumulate the cost of the inlined function to the new call site #111104

dianqk · 2024-10-04T07:21:10Z

Consider the following IR, the call void @bar is a local host call site:

@define void foo() {
  ...
loop:
  call void @bar()
  ...
}

define void @bar() {
  ...
  call void @baz()
  ...
}

define void @baz() {
  ...
}

If foo can inline baz after inlined bar, I think it can also inline a bar that has inlined baz. With this in mind, I accumulated the previous cost into the new call site, which is a linearly increasing limit.

I did some experimenting on DianQK/perf/inline-extra-cost and the current thresholds look OK. I'm looking forward to getting some feedback before modifying this threshold to an option.

…site

llvmbot · 2024-10-04T07:21:45Z

@llvm/pr-subscribers-llvm-transforms

Author: DianQK (DianQK)

Changes

Fixes #111102.

Consider the following IR, the call void @bar is a local host call site:

@<!-- -->define void foo() {
  ...
loop:
  call void @<!-- -->bar()
  ...
}

define void @<!-- -->bar() {
  ...
  call void @<!-- -->baz()
  ...
}

define void @<!-- -->baz() {
  ...
}

If foo can inline baz after inlined bar, I think it can also inline a bar that has inlined baz. With this in mind, I accumulated the previous cost into the new call site, which is a linearly increasing limit.

I did some experimenting on DianQK/perf/inline-extra-cost and the current thresholds look OK. I'm looking forward to getting some feedback before modifying this threshold to an option.

Full diff: https://github.com/llvm/llvm-project/pull/111104.diff

8 Files Affected:

(modified) llvm/include/llvm/Analysis/InlineAdvisor.h (+9-3)
(modified) llvm/include/llvm/Analysis/InlineCost.h (+3)
(modified) llvm/lib/Analysis/InlineAdvisor.cpp (+4-2)
(modified) llvm/lib/Analysis/InlineCost.cpp (+5)
(modified) llvm/lib/Transforms/IPO/Inliner.cpp (+15)
(modified) llvm/lib/Transforms/IPO/ModuleInliner.cpp (+18-1)
(modified) llvm/test/Transforms/Inline/inline-history-noinline.ll (+1-1)
(added) llvm/test/Transforms/Inline/inline-hot-callsite-limit.ll (+109)

diff --git a/llvm/include/llvm/Analysis/InlineAdvisor.h b/llvm/include/llvm/Analysis/InlineAdvisor.h
index 871a6e97861e29..2880553817ebd8 100644
--- a/llvm/include/llvm/Analysis/InlineAdvisor.h
+++ b/llvm/include/llvm/Analysis/InlineAdvisor.h
@@ -74,7 +74,8 @@ class InlineAdvisor;
 class InlineAdvice {
 public:
   InlineAdvice(InlineAdvisor *Advisor, CallBase &CB,
-               OptimizationRemarkEmitter &ORE, bool IsInliningRecommended);
+               OptimizationRemarkEmitter &ORE, bool IsInliningRecommended,
+               std::optional<int> InliningCost = std::nullopt);
 
   InlineAdvice(InlineAdvice &&) = delete;
   InlineAdvice(const InlineAdvice &) = delete;
@@ -108,6 +109,7 @@ class InlineAdvice {
 
   /// Get the inlining recommendation.
   bool isInliningRecommended() const { return IsInliningRecommended; }
+  std::optional<int> inliningCost() const { return InliningCost; }
   const DebugLoc &getOriginalCallSiteDebugLoc() const { return DLoc; }
   const BasicBlock *getOriginalCallSiteBasicBlock() const { return Block; }
 
@@ -129,6 +131,7 @@ class InlineAdvice {
   const BasicBlock *const Block;
   OptimizationRemarkEmitter &ORE;
   const bool IsInliningRecommended;
+  const std::optional<int> InliningCost;
 
 private:
   void markRecorded() {
@@ -145,8 +148,11 @@ class DefaultInlineAdvice : public InlineAdvice {
   DefaultInlineAdvice(InlineAdvisor *Advisor, CallBase &CB,
                       std::optional<InlineCost> OIC,
                       OptimizationRemarkEmitter &ORE, bool EmitRemarks = true)
-      : InlineAdvice(Advisor, CB, ORE, OIC.has_value()), OriginalCB(&CB),
-        OIC(OIC), EmitRemarks(EmitRemarks) {}
+      : InlineAdvice(Advisor, CB, ORE, OIC.has_value(),
+                     OIC && OIC->isVariable()
+                         ? std::optional<int>(OIC->getCost())
+                         : std::nullopt),
+        OriginalCB(&CB), OIC(OIC), EmitRemarks(EmitRemarks) {}
 
 private:
   void recordUnsuccessfulInliningImpl(const InlineResult &Result) override;
diff --git a/llvm/include/llvm/Analysis/InlineCost.h b/llvm/include/llvm/Analysis/InlineCost.h
index c5978ce54fc18b..1190308dba4193 100644
--- a/llvm/include/llvm/Analysis/InlineCost.h
+++ b/llvm/include/llvm/Analysis/InlineCost.h
@@ -58,6 +58,9 @@ const uint64_t MaxSimplifiedDynamicAllocaToInline = 65536;
 
 const char FunctionInlineCostMultiplierAttributeName[] =
     "function-inline-cost-multiplier";
+/// Cost of call site accumulation added after inlining.
+const char FunctionInlineAdditionalCostAttributeName[] =
+    "function-inline-additional-cost";
 
 const char MaxInlineStackSizeAttributeName[] = "inline-max-stacksize";
 } // namespace InlineConstants
diff --git a/llvm/lib/Analysis/InlineAdvisor.cpp b/llvm/lib/Analysis/InlineAdvisor.cpp
index c6907cb128bb47..9706dd212d7bc5 100644
--- a/llvm/lib/Analysis/InlineAdvisor.cpp
+++ b/llvm/lib/Analysis/InlineAdvisor.cpp
@@ -175,10 +175,12 @@ DefaultInlineAdvisor::getAdviceImpl(CallBase &CB) {
 
 InlineAdvice::InlineAdvice(InlineAdvisor *Advisor, CallBase &CB,
                            OptimizationRemarkEmitter &ORE,
-                           bool IsInliningRecommended)
+                           bool IsInliningRecommended,
+                           std::optional<int> InliningCost)
     : Advisor(Advisor), Caller(CB.getCaller()), Callee(CB.getCalledFunction()),
       DLoc(CB.getDebugLoc()), Block(CB.getParent()), ORE(ORE),
-      IsInliningRecommended(IsInliningRecommended) {}
+      IsInliningRecommended(IsInliningRecommended), InliningCost(InliningCost) {
+}
 
 void InlineAdvice::recordInlineStatsIfNeeded() {
   if (Advisor->ImportedFunctionsStats)
diff --git a/llvm/lib/Analysis/InlineCost.cpp b/llvm/lib/Analysis/InlineCost.cpp
index d2c329ba748e58..e45423f2130d8a 100644
--- a/llvm/lib/Analysis/InlineCost.cpp
+++ b/llvm/lib/Analysis/InlineCost.cpp
@@ -1017,6 +1017,11 @@ class InlineCostCallAnalyzer final : public CallAnalyzer {
             InlineConstants::FunctionInlineCostMultiplierAttributeName))
       Cost *= *AttrCostMult;
 
+    if (std::optional<int> AttrAdditonalCost = getStringFnAttrAsInt(
+            CandidateCall,
+            InlineConstants::FunctionInlineAdditionalCostAttributeName))
+      Cost += *AttrAdditonalCost;
+
     if (std::optional<int> AttrThreshold =
             getStringFnAttrAsInt(CandidateCall, "function-inline-threshold"))
       Threshold = *AttrThreshold;
diff --git a/llvm/lib/Transforms/IPO/Inliner.cpp b/llvm/lib/Transforms/IPO/Inliner.cpp
index 23ee23eb047f58..977bbcd35f73e8 100644
--- a/llvm/lib/Transforms/IPO/Inliner.cpp
+++ b/llvm/lib/Transforms/IPO/Inliner.cpp
@@ -376,6 +376,11 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
           getStringFnAttrAsInt(
               *CB, InlineConstants::FunctionInlineCostMultiplierAttributeName)
               .value_or(1);
+      int CBInliningAdditionalCost =
+          getStringFnAttrAsInt(
+              *CB, InlineConstants::FunctionInlineAdditionalCostAttributeName)
+              .value_or(0);
+      std::optional<int> InliningCost = Advice->inliningCost();
 
       // Setup the data structure used to plumb customization into the
       // `InlineFunction` routine.
@@ -435,6 +440,16 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
                     InlineConstants::FunctionInlineCostMultiplierAttributeName,
                     itostr(CBCostMult * IntraSCCCostMultiplier));
                 ICB->addFnAttr(NewCBCostMult);
+              } else if (InliningCost && *InliningCost > 0) {
+                // Similar to hot call site thresholds that can cause Inliner to
+                // inline numerous functions causing compile time issues, a
+                // linear accumulator was created to mitigate the problem.
+                Attribute NewCBAdditionalCost = Attribute::get(
+                    M.getContext(),
+                    InlineConstants::FunctionInlineAdditionalCostAttributeName,
+                    itostr(CBInliningAdditionalCost +
+                           (*InliningCost - CBInliningAdditionalCost) / 16));
+                ICB->addFnAttr(NewCBAdditionalCost);
               }
             }
           }
diff --git a/llvm/lib/Transforms/IPO/ModuleInliner.cpp b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
index dbc733826944b9..c196be9d6dd163 100644
--- a/llvm/lib/Transforms/IPO/ModuleInliner.cpp
+++ b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
@@ -225,6 +225,11 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
       Advice->recordUnattemptedInlining();
       continue;
     }
+    int CBInliningAdditionalCost =
+        getStringFnAttrAsInt(
+            *CB, InlineConstants::FunctionInlineAdditionalCostAttributeName)
+            .value_or(0);
+    std::optional<int> InliningCost = Advice->inliningCost();
 
     // Setup the data structure used to plumb customization into the
     // `InlineFunction` routine.
@@ -265,8 +270,20 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
               NewCallee = ICB->getCalledFunction();
         }
         if (NewCallee)
-          if (!NewCallee->isDeclaration())
+          if (!NewCallee->isDeclaration()) {
             Calls->push({ICB, NewHistoryID});
+            if (InliningCost && *InliningCost > 0) {
+              // Similar to hot call site thresholds that can cause Inliner to
+              // inline numerous functions causing compile time issues, a linear
+              // accumulator was created to mitigate the problem.
+              Attribute NewCBAdditionalCost = Attribute::get(
+                  M.getContext(),
+                  InlineConstants::FunctionInlineAdditionalCostAttributeName,
+                  itostr(CBInliningAdditionalCost +
+                         (*InliningCost - CBInliningAdditionalCost) / 16));
+              ICB->addFnAttr(NewCBAdditionalCost);
+            }
+          }
       }
     }
 
diff --git a/llvm/test/Transforms/Inline/inline-history-noinline.ll b/llvm/test/Transforms/Inline/inline-history-noinline.ll
index 742bd25ecd9bb9..fbe633fc3c797b 100644
--- a/llvm/test/Transforms/Inline/inline-history-noinline.ll
+++ b/llvm/test/Transforms/Inline/inline-history-noinline.ll
@@ -29,4 +29,4 @@ define internal void @a() {
   ret void
 }
 
-; CHECK: [[NOINLINE]] = { noinline }
+; CHECK: [[NOINLINE]] = { noinline {{.*}}}
diff --git a/llvm/test/Transforms/Inline/inline-hot-callsite-limit.ll b/llvm/test/Transforms/Inline/inline-hot-callsite-limit.ll
new file mode 100644
index 00000000000000..a1730d76cd547c
--- /dev/null
+++ b/llvm/test/Transforms/Inline/inline-hot-callsite-limit.ll
@@ -0,0 +1,109 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; This tests that a hot callsite gets the (higher) inlinehint-threshold even without
+; without inline hints and gets inlined because the cost is less than
+; RUN: opt < %s -passes=inline -inline-threshold=0 -locally-hot-callsite-threshold=30 -S | FileCheck %s
+; RUN: opt < %s -passes=module-inline -inline-threshold=0 -locally-hot-callsite-threshold=30 -S | FileCheck %s
+
+; Due to the hot call site, foo0 inlined foo1, foo2, and foo3,
+; but foo4 is not inlined due to the accumulated cost.
+
+declare void @bar(ptr)
+
+define void @foo0(ptr %p) {
+; CHECK-LABEL: define void @foo0(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:  [[HEADER:.*:]]
+; CHECK-NEXT:    [[I_I2:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    [[I_I1:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    [[I_I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 4, ptr [[I_I]])
+; CHECK-NEXT:    call void @bar(ptr [[I_I]])
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 4, ptr [[I_I1]])
+; CHECK-NEXT:    call void @bar(ptr [[I_I1]])
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 4, ptr [[I_I2]])
+; CHECK-NEXT:    call void @bar(ptr [[I_I2]])
+; CHECK-NEXT:    call void @foo4(ptr [[P]]) #[[ATTR1:[0-9]+]]
+; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 4, ptr [[I_I2]])
+; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 4, ptr [[I_I1]])
+; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 4, ptr [[I_I]])
+; CHECK-NEXT:    br label %[[LOOP]]
+;
+header:
+  br label %loop
+
+loop:
+  call void @foo1(ptr %p)
+  br label %loop
+}
+
+define void @foo1(ptr %p) {
+; CHECK-LABEL: define void @foo1(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    call void @foo2(ptr [[P]])
+; CHECK-NEXT:    ret void
+;
+  %i = alloca i32
+  call void @bar(ptr %i)
+  call void @foo2(ptr %p)
+  ret void
+}
+
+define void @foo2(ptr %p) {
+; CHECK-LABEL: define void @foo2(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    call void @foo3(ptr [[P]])
+; CHECK-NEXT:    ret void
+;
+  %i = alloca i32
+  call void @bar(ptr %i)
+  call void @foo3(ptr %p)
+  ret void
+}
+
+define void @foo3(ptr %p) {
+; CHECK-LABEL: define void @foo3(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    call void @foo4(ptr [[P]])
+; CHECK-NEXT:    ret void
+;
+  %i = alloca i32
+  call void @bar(ptr %i)
+  call void @foo4(ptr %p)
+  ret void
+}
+
+define void @foo4(ptr %p) {
+; CHECK-LABEL: define void @foo4(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    call void @foo5(ptr [[P]])
+; CHECK-NEXT:    ret void
+;
+  %i = alloca i32
+  call void @bar(ptr %i)
+  call void @foo5(ptr %p)
+  ret void
+}
+
+define void @foo5(ptr %p) {
+; CHECK-LABEL: define void @foo5(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    ret void
+;
+  %i = alloca i32
+  call void @bar(ptr %i)
+  call void @bar(ptr %i)
+  ret void
+}

llvmbot · 2024-10-04T07:21:46Z

@llvm/pr-subscribers-llvm-analysis

Author: DianQK (DianQK)

Changes

Fixes #111102.

Consider the following IR, the call void @bar is a local host call site:

@<!-- -->define void foo() {
  ...
loop:
  call void @<!-- -->bar()
  ...
}

define void @<!-- -->bar() {
  ...
  call void @<!-- -->baz()
  ...
}

define void @<!-- -->baz() {
  ...
}

If foo can inline baz after inlined bar, I think it can also inline a bar that has inlined baz. With this in mind, I accumulated the previous cost into the new call site, which is a linearly increasing limit.

I did some experimenting on DianQK/perf/inline-extra-cost and the current thresholds look OK. I'm looking forward to getting some feedback before modifying this threshold to an option.

Full diff: https://github.com/llvm/llvm-project/pull/111104.diff

8 Files Affected:

(modified) llvm/include/llvm/Analysis/InlineAdvisor.h (+9-3)
(modified) llvm/include/llvm/Analysis/InlineCost.h (+3)
(modified) llvm/lib/Analysis/InlineAdvisor.cpp (+4-2)
(modified) llvm/lib/Analysis/InlineCost.cpp (+5)
(modified) llvm/lib/Transforms/IPO/Inliner.cpp (+15)
(modified) llvm/lib/Transforms/IPO/ModuleInliner.cpp (+18-1)
(modified) llvm/test/Transforms/Inline/inline-history-noinline.ll (+1-1)
(added) llvm/test/Transforms/Inline/inline-hot-callsite-limit.ll (+109)

diff --git a/llvm/include/llvm/Analysis/InlineAdvisor.h b/llvm/include/llvm/Analysis/InlineAdvisor.h
index 871a6e97861e29..2880553817ebd8 100644
--- a/llvm/include/llvm/Analysis/InlineAdvisor.h
+++ b/llvm/include/llvm/Analysis/InlineAdvisor.h
@@ -74,7 +74,8 @@ class InlineAdvisor;
 class InlineAdvice {
 public:
   InlineAdvice(InlineAdvisor *Advisor, CallBase &CB,
-               OptimizationRemarkEmitter &ORE, bool IsInliningRecommended);
+               OptimizationRemarkEmitter &ORE, bool IsInliningRecommended,
+               std::optional<int> InliningCost = std::nullopt);
 
   InlineAdvice(InlineAdvice &&) = delete;
   InlineAdvice(const InlineAdvice &) = delete;
@@ -108,6 +109,7 @@ class InlineAdvice {
 
   /// Get the inlining recommendation.
   bool isInliningRecommended() const { return IsInliningRecommended; }
+  std::optional<int> inliningCost() const { return InliningCost; }
   const DebugLoc &getOriginalCallSiteDebugLoc() const { return DLoc; }
   const BasicBlock *getOriginalCallSiteBasicBlock() const { return Block; }
 
@@ -129,6 +131,7 @@ class InlineAdvice {
   const BasicBlock *const Block;
   OptimizationRemarkEmitter &ORE;
   const bool IsInliningRecommended;
+  const std::optional<int> InliningCost;
 
 private:
   void markRecorded() {
@@ -145,8 +148,11 @@ class DefaultInlineAdvice : public InlineAdvice {
   DefaultInlineAdvice(InlineAdvisor *Advisor, CallBase &CB,
                       std::optional<InlineCost> OIC,
                       OptimizationRemarkEmitter &ORE, bool EmitRemarks = true)
-      : InlineAdvice(Advisor, CB, ORE, OIC.has_value()), OriginalCB(&CB),
-        OIC(OIC), EmitRemarks(EmitRemarks) {}
+      : InlineAdvice(Advisor, CB, ORE, OIC.has_value(),
+                     OIC && OIC->isVariable()
+                         ? std::optional<int>(OIC->getCost())
+                         : std::nullopt),
+        OriginalCB(&CB), OIC(OIC), EmitRemarks(EmitRemarks) {}
 
 private:
   void recordUnsuccessfulInliningImpl(const InlineResult &Result) override;
diff --git a/llvm/include/llvm/Analysis/InlineCost.h b/llvm/include/llvm/Analysis/InlineCost.h
index c5978ce54fc18b..1190308dba4193 100644
--- a/llvm/include/llvm/Analysis/InlineCost.h
+++ b/llvm/include/llvm/Analysis/InlineCost.h
@@ -58,6 +58,9 @@ const uint64_t MaxSimplifiedDynamicAllocaToInline = 65536;
 
 const char FunctionInlineCostMultiplierAttributeName[] =
     "function-inline-cost-multiplier";
+/// Cost of call site accumulation added after inlining.
+const char FunctionInlineAdditionalCostAttributeName[] =
+    "function-inline-additional-cost";
 
 const char MaxInlineStackSizeAttributeName[] = "inline-max-stacksize";
 } // namespace InlineConstants
diff --git a/llvm/lib/Analysis/InlineAdvisor.cpp b/llvm/lib/Analysis/InlineAdvisor.cpp
index c6907cb128bb47..9706dd212d7bc5 100644
--- a/llvm/lib/Analysis/InlineAdvisor.cpp
+++ b/llvm/lib/Analysis/InlineAdvisor.cpp
@@ -175,10 +175,12 @@ DefaultInlineAdvisor::getAdviceImpl(CallBase &CB) {
 
 InlineAdvice::InlineAdvice(InlineAdvisor *Advisor, CallBase &CB,
                            OptimizationRemarkEmitter &ORE,
-                           bool IsInliningRecommended)
+                           bool IsInliningRecommended,
+                           std::optional<int> InliningCost)
     : Advisor(Advisor), Caller(CB.getCaller()), Callee(CB.getCalledFunction()),
       DLoc(CB.getDebugLoc()), Block(CB.getParent()), ORE(ORE),
-      IsInliningRecommended(IsInliningRecommended) {}
+      IsInliningRecommended(IsInliningRecommended), InliningCost(InliningCost) {
+}
 
 void InlineAdvice::recordInlineStatsIfNeeded() {
   if (Advisor->ImportedFunctionsStats)
diff --git a/llvm/lib/Analysis/InlineCost.cpp b/llvm/lib/Analysis/InlineCost.cpp
index d2c329ba748e58..e45423f2130d8a 100644
--- a/llvm/lib/Analysis/InlineCost.cpp
+++ b/llvm/lib/Analysis/InlineCost.cpp
@@ -1017,6 +1017,11 @@ class InlineCostCallAnalyzer final : public CallAnalyzer {
             InlineConstants::FunctionInlineCostMultiplierAttributeName))
       Cost *= *AttrCostMult;
 
+    if (std::optional<int> AttrAdditonalCost = getStringFnAttrAsInt(
+            CandidateCall,
+            InlineConstants::FunctionInlineAdditionalCostAttributeName))
+      Cost += *AttrAdditonalCost;
+
     if (std::optional<int> AttrThreshold =
             getStringFnAttrAsInt(CandidateCall, "function-inline-threshold"))
       Threshold = *AttrThreshold;
diff --git a/llvm/lib/Transforms/IPO/Inliner.cpp b/llvm/lib/Transforms/IPO/Inliner.cpp
index 23ee23eb047f58..977bbcd35f73e8 100644
--- a/llvm/lib/Transforms/IPO/Inliner.cpp
+++ b/llvm/lib/Transforms/IPO/Inliner.cpp
@@ -376,6 +376,11 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
           getStringFnAttrAsInt(
               *CB, InlineConstants::FunctionInlineCostMultiplierAttributeName)
               .value_or(1);
+      int CBInliningAdditionalCost =
+          getStringFnAttrAsInt(
+              *CB, InlineConstants::FunctionInlineAdditionalCostAttributeName)
+              .value_or(0);
+      std::optional<int> InliningCost = Advice->inliningCost();
 
       // Setup the data structure used to plumb customization into the
       // `InlineFunction` routine.
@@ -435,6 +440,16 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
                     InlineConstants::FunctionInlineCostMultiplierAttributeName,
                     itostr(CBCostMult * IntraSCCCostMultiplier));
                 ICB->addFnAttr(NewCBCostMult);
+              } else if (InliningCost && *InliningCost > 0) {
+                // Similar to hot call site thresholds that can cause Inliner to
+                // inline numerous functions causing compile time issues, a
+                // linear accumulator was created to mitigate the problem.
+                Attribute NewCBAdditionalCost = Attribute::get(
+                    M.getContext(),
+                    InlineConstants::FunctionInlineAdditionalCostAttributeName,
+                    itostr(CBInliningAdditionalCost +
+                           (*InliningCost - CBInliningAdditionalCost) / 16));
+                ICB->addFnAttr(NewCBAdditionalCost);
               }
             }
           }
diff --git a/llvm/lib/Transforms/IPO/ModuleInliner.cpp b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
index dbc733826944b9..c196be9d6dd163 100644
--- a/llvm/lib/Transforms/IPO/ModuleInliner.cpp
+++ b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
@@ -225,6 +225,11 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
       Advice->recordUnattemptedInlining();
       continue;
     }
+    int CBInliningAdditionalCost =
+        getStringFnAttrAsInt(
+            *CB, InlineConstants::FunctionInlineAdditionalCostAttributeName)
+            .value_or(0);
+    std::optional<int> InliningCost = Advice->inliningCost();
 
     // Setup the data structure used to plumb customization into the
     // `InlineFunction` routine.
@@ -265,8 +270,20 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
               NewCallee = ICB->getCalledFunction();
         }
         if (NewCallee)
-          if (!NewCallee->isDeclaration())
+          if (!NewCallee->isDeclaration()) {
             Calls->push({ICB, NewHistoryID});
+            if (InliningCost && *InliningCost > 0) {
+              // Similar to hot call site thresholds that can cause Inliner to
+              // inline numerous functions causing compile time issues, a linear
+              // accumulator was created to mitigate the problem.
+              Attribute NewCBAdditionalCost = Attribute::get(
+                  M.getContext(),
+                  InlineConstants::FunctionInlineAdditionalCostAttributeName,
+                  itostr(CBInliningAdditionalCost +
+                         (*InliningCost - CBInliningAdditionalCost) / 16));
+              ICB->addFnAttr(NewCBAdditionalCost);
+            }
+          }
       }
     }
 
diff --git a/llvm/test/Transforms/Inline/inline-history-noinline.ll b/llvm/test/Transforms/Inline/inline-history-noinline.ll
index 742bd25ecd9bb9..fbe633fc3c797b 100644
--- a/llvm/test/Transforms/Inline/inline-history-noinline.ll
+++ b/llvm/test/Transforms/Inline/inline-history-noinline.ll
@@ -29,4 +29,4 @@ define internal void @a() {
   ret void
 }
 
-; CHECK: [[NOINLINE]] = { noinline }
+; CHECK: [[NOINLINE]] = { noinline {{.*}}}
diff --git a/llvm/test/Transforms/Inline/inline-hot-callsite-limit.ll b/llvm/test/Transforms/Inline/inline-hot-callsite-limit.ll
new file mode 100644
index 00000000000000..a1730d76cd547c
--- /dev/null
+++ b/llvm/test/Transforms/Inline/inline-hot-callsite-limit.ll
@@ -0,0 +1,109 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; This tests that a hot callsite gets the (higher) inlinehint-threshold even without
+; without inline hints and gets inlined because the cost is less than
+; RUN: opt < %s -passes=inline -inline-threshold=0 -locally-hot-callsite-threshold=30 -S | FileCheck %s
+; RUN: opt < %s -passes=module-inline -inline-threshold=0 -locally-hot-callsite-threshold=30 -S | FileCheck %s
+
+; Due to the hot call site, foo0 inlined foo1, foo2, and foo3,
+; but foo4 is not inlined due to the accumulated cost.
+
+declare void @bar(ptr)
+
+define void @foo0(ptr %p) {
+; CHECK-LABEL: define void @foo0(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:  [[HEADER:.*:]]
+; CHECK-NEXT:    [[I_I2:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    [[I_I1:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    [[I_I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 4, ptr [[I_I]])
+; CHECK-NEXT:    call void @bar(ptr [[I_I]])
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 4, ptr [[I_I1]])
+; CHECK-NEXT:    call void @bar(ptr [[I_I1]])
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 4, ptr [[I_I2]])
+; CHECK-NEXT:    call void @bar(ptr [[I_I2]])
+; CHECK-NEXT:    call void @foo4(ptr [[P]]) #[[ATTR1:[0-9]+]]
+; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 4, ptr [[I_I2]])
+; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 4, ptr [[I_I1]])
+; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 4, ptr [[I_I]])
+; CHECK-NEXT:    br label %[[LOOP]]
+;
+header:
+  br label %loop
+
+loop:
+  call void @foo1(ptr %p)
+  br label %loop
+}
+
+define void @foo1(ptr %p) {
+; CHECK-LABEL: define void @foo1(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    call void @foo2(ptr [[P]])
+; CHECK-NEXT:    ret void
+;
+  %i = alloca i32
+  call void @bar(ptr %i)
+  call void @foo2(ptr %p)
+  ret void
+}
+
+define void @foo2(ptr %p) {
+; CHECK-LABEL: define void @foo2(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    call void @foo3(ptr [[P]])
+; CHECK-NEXT:    ret void
+;
+  %i = alloca i32
+  call void @bar(ptr %i)
+  call void @foo3(ptr %p)
+  ret void
+}
+
+define void @foo3(ptr %p) {
+; CHECK-LABEL: define void @foo3(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    call void @foo4(ptr [[P]])
+; CHECK-NEXT:    ret void
+;
+  %i = alloca i32
+  call void @bar(ptr %i)
+  call void @foo4(ptr %p)
+  ret void
+}
+
+define void @foo4(ptr %p) {
+; CHECK-LABEL: define void @foo4(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    call void @foo5(ptr [[P]])
+; CHECK-NEXT:    ret void
+;
+  %i = alloca i32
+  call void @bar(ptr %i)
+  call void @foo5(ptr %p)
+  ret void
+}
+
+define void @foo5(ptr %p) {
+; CHECK-LABEL: define void @foo5(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[I:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    call void @bar(ptr [[I]])
+; CHECK-NEXT:    ret void
+;
+  %i = alloca i32
+  call void @bar(ptr %i)
+  call void @bar(ptr %i)
+  ret void
+}

dtcxzyw · 2024-10-06T03:16:39Z

llvm/test/Transforms/Inline/inline-hot-callsite-limit.ll

@@ -0,0 +1,109 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; This tests that a hot callsite gets the (higher) inlinehint-threshold even without
+; without inline hints and gets inlined because the cost is less than


Suggested change

; without inline hints and gets inlined because the cost is less than

; inline hints and gets inlined because the cost is less than ???

aeubanks · 2024-10-10T17:55:07Z

I believe part of the design of the new inliner was that it should be idempotent. As in rerunning the inliner should not result in any changes. If you accumulate inline costs between function calls, that breaks idempotency. I'd want more justification besides a vague "the inliner inlined too much stuff in one case" to break that.

dianqk · 2024-10-13T13:33:43Z

I think I've done something similar to the function-inline-cost-multiplier. However, I believe that idempotent makes sense here. Other approaches I can think of are:

Lowering the inline threshold when the function itself contains plenty of instructions.
Not applying the hot call site threshold when there is a loop inside the function at the call site.

In fact, I believe the main reason for the increased compile time is that we've inlined nested loops multiple times.

dianqk added 8 commits October 1, 2024 20:35

[Inline] Accumulate the cost of the inlined function to the new call …

558f121

…site

Only apply when cost is positive

d133af8

Reduce extra cost by half

2081f59

Continue to reduce by half

b8db1f5

Continue to reduce by half

5900ba5

Continue to reduce by half

5287a53

Rename Extra to Aditional

e6e7dd9

Add a test case

f3517f7

llvmbot added llvm:analysis llvm:transforms labels Oct 4, 2024

dianqk requested review from aeubanks, nikic and dtcxzyw October 4, 2024 07:22

dtcxzyw reviewed Oct 6, 2024

View reviewed changes

dtcxzyw requested a review from mtrofin October 6, 2024 03:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inline] Accumulate the cost of the inlined function to the new call site #111104

[Inline] Accumulate the cost of the inlined function to the new call site #111104

dianqk commented Oct 4, 2024

llvmbot commented Oct 4, 2024

llvmbot commented Oct 4, 2024

dtcxzyw Oct 6, 2024

aeubanks commented Oct 10, 2024

dianqk commented Oct 13, 2024

	; without inline hints and gets inlined because the cost is less than
	; inline hints and gets inlined because the cost is less than ???

[Inline] Accumulate the cost of the inlined function to the new call site #111104

Are you sure you want to change the base?

[Inline] Accumulate the cost of the inlined function to the new call site #111104

Conversation

dianqk commented Oct 4, 2024

llvmbot commented Oct 4, 2024

llvmbot commented Oct 4, 2024

dtcxzyw Oct 6, 2024

Choose a reason for hiding this comment

aeubanks commented Oct 10, 2024

dianqk commented Oct 13, 2024