Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoopInterchange] Prevent from undoing its own transformation #127473

Merged
merged 1 commit into from
Mar 21, 2025

Conversation

kasuga-fj
Copy link
Contributor

@kasuga-fj kasuga-fj commented Feb 17, 2025

LoopInterchange uses the bubble-sort fashion algorithm to sort the loops in a loop nest. For two loops A and B, it calls isProfitable(A, B) to determine whether these loops should be exchanged. The result of isProfitable(A, B) should be conservative, that is, it should return true only when we are sure exchanging A and B will improve performance. If it's not conservative enough, it's possible that a subsequent isProfitable(B, A) will also return true, in which case LoopInterchange will undo its previous transformation. To avoid such cases, isProfitable(B, A) must not return true if isProfitable(A, B) returned true in the past. However, the current implementation can be in such a situation. This patch resolves it by correcting the handling of two loops that have the same cache cost.

This resolve the problem mentioned in #118267 (comment).

@llvmbot
Copy link
Member

llvmbot commented Feb 17, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Ryotaro Kasuga (kasuga-fj)

Changes

LoopInterchange uses the bubble-sort fashion algorithm to sort the loops, but the comparison function (called isProfitable) didn't satisfy asymmetry. This means that both isProfitable(a, b) and isProfitable(b, a) can return true, triggering an unprofitable interchange. This patch fixes the problem and prevents the interchange from performing unprofitable transformations.


Full diff: https://github.com/llvm/llvm-project/pull/127473.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+89-43)
  • (added) llvm/test/Transforms/LoopInterchange/profitability-redundant-interchange.ll (+80)
diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index f45d90ff13e14..faa4e22d0d161 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -356,26 +356,24 @@ class LoopInterchangeLegality {
   SmallVector<PHINode *, 8> InnerLoopInductions;
 };
 
+using CostMapTy = DenseMap<const Loop *, std::pair<unsigned, CacheCostTy>>;
+
 /// LoopInterchangeProfitability checks if it is profitable to interchange the
 /// loop.
 class LoopInterchangeProfitability {
 public:
   LoopInterchangeProfitability(Loop *Outer, Loop *Inner, ScalarEvolution *SE,
-                               OptimizationRemarkEmitter *ORE)
-      : OuterLoop(Outer), InnerLoop(Inner), SE(SE), ORE(ORE) {}
+                               OptimizationRemarkEmitter *ORE, const std::optional<CostMapTy> &CM)
+      : OuterLoop(Outer), InnerLoop(Inner), SE(SE), ORE(ORE), CostMap(CM) {}
 
   /// Check if the loop interchange is profitable.
   bool isProfitable(const Loop *InnerLoop, const Loop *OuterLoop,
                     unsigned InnerLoopId, unsigned OuterLoopId,
-                    CharMatrix &DepMatrix,
-                    const DenseMap<const Loop *, unsigned> &CostMap,
-                    std::unique_ptr<CacheCost> &CC);
+                    CharMatrix &DepMatrix);
 
 private:
   int getInstrOrderCost();
-  std::optional<bool> isProfitablePerLoopCacheAnalysis(
-      const DenseMap<const Loop *, unsigned> &CostMap,
-      std::unique_ptr<CacheCost> &CC);
+  std::optional<bool> isProfitablePerLoopCacheAnalysis();
   std::optional<bool> isProfitablePerInstrOrderCost();
   std::optional<bool> isProfitableForVectorization(unsigned InnerLoopId,
                                                    unsigned OuterLoopId,
@@ -388,6 +386,8 @@ class LoopInterchangeProfitability {
 
   /// Interface to emit optimization remarks.
   OptimizationRemarkEmitter *ORE;
+
+  const std::optional<CostMapTy> &CostMap;
 };
 
 /// LoopInterchangeTransform interchanges the loop.
@@ -497,11 +497,13 @@ struct LoopInterchange {
     // indicates the loop should be placed as the innermost loop.
     //
     // For the old pass manager CacheCost would be null.
-    DenseMap<const Loop *, unsigned> CostMap;
+    std::optional<CostMapTy> CostMap = std::nullopt;
     if (CC != nullptr) {
+      CostMap = CostMapTy();
       const auto &LoopCosts = CC->getLoopCosts();
       for (unsigned i = 0; i < LoopCosts.size(); i++) {
-        CostMap[LoopCosts[i].first] = i;
+        const auto &Cost = LoopCosts[i];
+        (*CostMap)[Cost.first] = std::make_pair(i, Cost.second);
       }
     }
     // We try to achieve the globally optimal memory access for the loopnest,
@@ -537,7 +539,7 @@ struct LoopInterchange {
   bool processLoop(Loop *InnerLoop, Loop *OuterLoop, unsigned InnerLoopId,
                    unsigned OuterLoopId,
                    std::vector<std::vector<char>> &DependencyMatrix,
-                   const DenseMap<const Loop *, unsigned> &CostMap) {
+                   const std::optional<CostMapTy> &CostMap) {
     LLVM_DEBUG(dbgs() << "Processing InnerLoopId = " << InnerLoopId
                       << " and OuterLoopId = " << OuterLoopId << "\n");
     LoopInterchangeLegality LIL(OuterLoop, InnerLoop, SE, ORE);
@@ -546,9 +548,8 @@ struct LoopInterchange {
       return false;
     }
     LLVM_DEBUG(dbgs() << "Loops are legal to interchange\n");
-    LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE);
-    if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId,
-                          DependencyMatrix, CostMap, CC)) {
+    LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE, CostMap);
+    if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId, DependencyMatrix)) {
       LLVM_DEBUG(dbgs() << "Interchanging loops not profitable.\n");
       return false;
     }
@@ -1127,29 +1128,60 @@ int LoopInterchangeProfitability::getInstrOrderCost() {
 }
 
 std::optional<bool>
-LoopInterchangeProfitability::isProfitablePerLoopCacheAnalysis(
-    const DenseMap<const Loop *, unsigned> &CostMap,
-    std::unique_ptr<CacheCost> &CC) {
+LoopInterchangeProfitability::isProfitablePerLoopCacheAnalysis() {
   // This is the new cost model returned from loop cache analysis.
   // A smaller index means the loop should be placed an outer loop, and vice
   // versa.
-  if (CostMap.contains(InnerLoop) && CostMap.contains(OuterLoop)) {
-    unsigned InnerIndex = 0, OuterIndex = 0;
-    InnerIndex = CostMap.find(InnerLoop)->second;
-    OuterIndex = CostMap.find(OuterLoop)->second;
-    LLVM_DEBUG(dbgs() << "InnerIndex = " << InnerIndex
-                      << ", OuterIndex = " << OuterIndex << "\n");
-    if (InnerIndex < OuterIndex)
-      return std::optional<bool>(true);
-    assert(InnerIndex != OuterIndex && "CostMap should assign unique "
-                                       "numbers to each loop");
-    if (CC->getLoopCost(*OuterLoop) == CC->getLoopCost(*InnerLoop))
-      return std::nullopt;
-    return std::optional<bool>(false);
-  }
-  return std::nullopt;
+  if (!CostMap.has_value())
+    return std::nullopt;
+
+  auto InnerIte = CostMap->find(InnerLoop);
+  auto OuterIte = CostMap->find(OuterLoop);
+  if (InnerIte == CostMap->end() || OuterIte == CostMap->end())
+    return std::nullopt;
+
+  const auto &[InnerIndex, InnerCost] = InnerIte->second;
+  const auto &[OuterIndex, OuterCost] = OuterIte->second;
+  LLVM_DEBUG(dbgs() << "InnerIndex = " << InnerIndex
+                    << ", OuterIndex = " << OuterIndex << "\n");
+  assert(InnerIndex != OuterIndex && "CostMap should assign unique "
+                                     "numbers to each loop");
+
+  if (InnerCost == OuterCost)
+    return std::nullopt;
+
+  return InnerIndex < OuterIndex;
 }
 
+// This function doesn't satisfy transitivity. Consider the following case.
+//
+// ```
+// for (int k = 0; k < N; k++) {
+//   for (int j = 0; j < N; j++) {
+//     for (int i = 0; i < N; i++) {
+//       dst0[i][j][k] += aa[i][j] + bb[i][j] + cc[j][k];
+//       dst1[k][j][i] += dd[i][j] + ee[i][j] + ff[j][k];
+//     }
+//   }
+// }
+//
+// ```
+//
+// The getInstrOrderCost will return the following value.
+//
+//  Outer | Inner | Cost
+// -------+-------+------
+//    k   |   j   |  -2
+//    j   |   i   |  -4
+//    k   |   i   |   0
+//
+// This means that this function says interchanging (k, j) loops and (j, i)
+// loops are profitable, but not (k, i). The root cause of this is that the
+// getInstrOrderCost only see the loops we are checking. We can resolve this if
+// we also consider the order going through other inductions. As for the above
+// case, we can induce that interchanging `k` and `i` is profitable (it is
+// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`.
+// However, such accurate calculation is expensive, so that we don't do it.
 std::optional<bool>
 LoopInterchangeProfitability::isProfitablePerInstrOrderCost() {
   // Legacy cost model: this is rough cost estimation algorithm. It counts the
@@ -1184,11 +1216,25 @@ std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(
   return std::optional<bool>(!DepMatrix.empty());
 }
 
+// The bubble-sort fashion algorithm is adopted to sort the loop nest, so the
+// comparison function should ideally induce a strict weak ordering required by
+// some standard C++ libraries. In particular, isProfitable should hold the
+// following properties.
+//
+// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.
+// Transitivity: If both isProfitable(a, b) and isProfitable(b, c) is true then
+// isProfitable(a, c) is true.
+//
+// The most important thing is not to make unprofitable interchange. From this
+// point of view, asymmetry is important. This is because if both
+// isProfitable(a, b) and isProfitable(b, a) are true, then an unprofitable
+// transformation (one of them) will be performed. On the other hand, a lack of
+// transitivity might cause some optimization opportunities to be lost, but
+// won't trigger an unprofitable one. Moreover, guaranteeing transitivity is
+// expensive. Therefore, isProfitable only holds the asymmetry.
 bool LoopInterchangeProfitability::isProfitable(
     const Loop *InnerLoop, const Loop *OuterLoop, unsigned InnerLoopId,
-    unsigned OuterLoopId, CharMatrix &DepMatrix,
-    const DenseMap<const Loop *, unsigned> &CostMap,
-    std::unique_ptr<CacheCost> &CC) {
+    unsigned OuterLoopId, CharMatrix &DepMatrix) {
   // isProfitable() is structured to avoid endless loop interchange.
   // If loop cache analysis could decide the profitability then,
   // profitability check will stop and return the analysis result.
@@ -1197,15 +1243,14 @@ bool LoopInterchangeProfitability::isProfitable(
   // profitable for InstrOrderCost. Likewise, if InstrOrderCost failed to
   // analysis the profitability then only, isProfitableForVectorization
   // will decide.
-  std::optional<bool> shouldInterchange =
-      isProfitablePerLoopCacheAnalysis(CostMap, CC);
-  if (!shouldInterchange.has_value()) {
-    shouldInterchange = isProfitablePerInstrOrderCost();
-    if (!shouldInterchange.has_value())
-      shouldInterchange =
+  std::optional<bool> ShouldInterchange = isProfitablePerLoopCacheAnalysis();
+  if (!ShouldInterchange.has_value()) {
+    ShouldInterchange = isProfitablePerInstrOrderCost();
+    if (!ShouldInterchange.has_value())
+      ShouldInterchange =
           isProfitableForVectorization(InnerLoopId, OuterLoopId, DepMatrix);
   }
-  if (!shouldInterchange.has_value()) {
+  if (!ShouldInterchange.has_value()) {
     ORE->emit([&]() {
       return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable",
                                       InnerLoop->getStartLoc(),
@@ -1214,7 +1259,8 @@ bool LoopInterchangeProfitability::isProfitable(
                 "interchange.";
     });
     return false;
-  } else if (!shouldInterchange.value()) {
+  }
+  if (!ShouldInterchange.value()) {
     ORE->emit([&]() {
       return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable",
                                       InnerLoop->getStartLoc(),
diff --git a/llvm/test/Transforms/LoopInterchange/profitability-redundant-interchange.ll b/llvm/test/Transforms/LoopInterchange/profitability-redundant-interchange.ll
new file mode 100644
index 0000000000000..55056dca15d3e
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/profitability-redundant-interchange.ll
@@ -0,0 +1,80 @@
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=1 -pass-remarks-output=%t -disable-output \
+;           -verify-dom-info -verify-loop-info
+; RUN: FileCheck -input-file %t %s
+
+
+; Test that the same pair of loops are not interchanged twice. This is the case
+; when the cost computed by CacheCost is the same for the loop of `j` and `k`.
+;
+; #define N 4
+; int a[N*N][N*N][N*N];
+; void f() {
+;   for (int i = 0; i < N; i++)
+;     for (int j = 1; j < 2*N; j++)
+;       for (int k = 1; k < 2*N; k++)
+;         a[i][k+1][j-1] -= a[i+N-1][k][j];
+; }
+
+; CHECK:      --- !Passed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            Interchanged
+; CHECK-NEXT: Function:        f
+; CHECK-NEXT: Args:
+; CHECK-NEXT:    - String:          Loop interchanged with enclosing loop.
+; CHECK-NEXT: ...
+; CHECK-NEXT: --- !Missed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            Dependence
+; CHECK-NEXT: Function:        f
+; CHECK-NEXT: Args:
+; CHECK-NEXT:  - String:       Cannot interchange loops due to dependences.
+; CHECK-NEXT: ...
+; CHECK-NEXT: --- !Missed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            InterchangeNotProfitable
+; CHECK-NEXT: Function:        f
+; CHECK-NEXT: Args:
+; CHECK-NEXT:   - String:          Interchanging loops is not considered to improve cache locality nor vectorization.
+; CHECK-NEXT: ...
+
+@a = dso_local local_unnamed_addr global [16 x [16 x [16 x i32]]] zeroinitializer, align 4
+
+define dso_local void @f() {
+entry:
+  br label %for.cond1.preheader
+
+for.cond1.preheader:
+  %indvars.iv46 = phi i64 [ 0, %entry ], [ %indvars.iv.next47, %for.cond.cleanup3 ]
+  %0 = add nuw nsw i64 %indvars.iv46, 3
+  br label %for.cond5.preheader
+
+for.cond5.preheader:
+  %indvars.iv41 = phi i64 [ 1, %for.cond1.preheader ], [ %indvars.iv.next42, %for.cond.cleanup7 ]
+  %1 = add nsw i64 %indvars.iv41, -1
+  br label %for.body8
+
+for.cond.cleanup3:
+  %indvars.iv.next47 = add nuw nsw i64 %indvars.iv46, 1
+  %exitcond50 = icmp ne i64 %indvars.iv.next47, 4
+  br i1 %exitcond50, label %for.cond1.preheader, label %for.cond.cleanup
+
+for.cond.cleanup7:
+  %indvars.iv.next42 = add nuw nsw i64 %indvars.iv41, 1
+  %exitcond45 = icmp ne i64 %indvars.iv.next42, 8
+  br i1 %exitcond45, label %for.cond5.preheader, label %for.cond.cleanup3
+
+for.body8:
+  %indvars.iv = phi i64 [ 1, %for.cond5.preheader ], [ %indvars.iv.next, %for.body8 ]
+  %arrayidx12 = getelementptr inbounds nuw [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %0, i64 %indvars.iv, i64 %indvars.iv41
+  %2 = load i32, ptr %arrayidx12, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %arrayidx20 = getelementptr inbounds [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %indvars.iv46, i64 %indvars.iv.next, i64 %1
+  %3 = load i32, ptr %arrayidx20, align 4
+  %sub21 = sub nsw i32 %3, %2
+  store i32 %sub21, ptr %arrayidx20, align 4
+  %exitcond = icmp ne i64 %indvars.iv.next, 8
+  br i1 %exitcond, label %for.body8, label %for.cond.cleanup7
+
+for.cond.cleanup:
+  ret void
+}

Copy link

github-actions bot commented Feb 17, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@kasuga-fj kasuga-fj force-pushed the loop-interchange-profitable branch from a08068a to 45a8373 Compare February 18, 2025 07:45
@@ -388,6 +387,8 @@ class LoopInterchangeProfitability {

/// Interface to emit optimization remarks.
OptimizationRemarkEmitter *ORE;

const std::optional<CostMapTy> &CostMap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: a comment explaining what this is exactly capturing would be good. I.e., it's a map of:

Loop -> [index, cost]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -497,11 +498,13 @@ struct LoopInterchange {
// indicates the loop should be placed as the innermost loop.
//
// For the old pass manager CacheCost would be null.
DenseMap<const Loop *, unsigned> CostMap;
std::optional<CostMapTy> CostMap = std::nullopt;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've also added a CostMap to the LoopInterchangeProfitability class, it's not yet clear to me if we need both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CostMap in LoopInterchangeProfitability is a reference to this object. I just changed the way it is passed: from the argument of isProfitable to the ctor.
I think we should clean up the code in this area. For example, the arguments InnerLoop and OuterLoop of isProfitable look redundant. They would be the same as what is passed by the ctor.

// we also consider the order going through other inductions. As for the above
// case, we can induce that interchanging `k` and `i` is profitable (it is
// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`.
// However, such accurate calculation is expensive, so that we don't do it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this example and comment, this really helps!

One remark about the last statement:

However, such accurate calculation is expensive, so that we don't do it.

Can you add where this leaves us, what we do instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this comment because it is not related to the purpose of this patch. If it is helpful, I will add it back as another PR.

// some standard C++ libraries. In particular, isProfitable should hold the
// following properties.
//
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question on the assymetry: does this really need to hold? Can it not be the case that isProfitable(a, b) and isProfitable(b, a) are both profitable, or both neutral, or both not really good? I haven't tried to come up with examples, but I thought about this as a way of saying the cost-model saying "don't know"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the asymmetry is necessary.

The isProfitable returns bool, but the profitability check is actually std::optional<bool>, which is expressed by ShouldInterchange. This means that the cost-model has three types of results: "should interchange them", "should NOT interchange them", and "don't know". The nullopt is squashed into false, so not isProfitable(a, b) could mean either "should NOT" or "don't know".

To answer your question, there are cases where both isProfitable(a, b) and isProfitable(b, a) become false. Asymmetry doesn't say anything about this case, it requires that they cannot both be true. In this case cost-model may be saying "I don't know both", or "I don't know one, but we should not perform the other". Also, I think we should design the cost-model so that it never says both are profitable, and that the asymmetry says. At least as far as we use the bubble sort fashion algorithm, it can cause the same pair of loops to be interchanged more than twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My use of mathematical terms was incorrect. What I wanted to argue here is the following part in the comment #127473 (review) by @Meinersbur .

LoopInterchange should indeed not undo its own optimization, which indicates a problem with the cost model (which should be conservative, never apply an transformation we are not pretty sure it will improve performance; if we do we were not conservative enough).

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Summary:

This patch fixes the problem and prevents the interchange from performing unprofitable transformations.

Please eloborate on how this PR does this. Due to refactorings in the diff I haven't yet found what the actual change is.

LoopInterchange should indeed not undo its own optimization, which indicates a problem with the cost model (which should be conservative, never apply an transformation we are not pretty sure it will improve performance; if we do we were not conservative enough). However, this is a greedy algorithm on neighboring loops, properties that need to take the entire loop nest into consideration is beyond this scope. Feel free to replace the greedy algorithm, but you cannot ensure global properties with just local information1.

Footnotes

  1. Well, it is the case for some problems, such as minimal spanning tree. Program performance is propbably not one of them.

Comment on lines 1158 to 1186
// This function doesn't satisfy transitivity. Consider the following case.
//
// ```
// for (int k = 0; k < N; k++) {
// for (int j = 0; j < N; j++) {
// for (int i = 0; i < N; i++) {
// dst0[i][j][k] += aa[i][j] + bb[i][j] + cc[j][k];
// dst1[k][j][i] += dd[i][j] + ee[i][j] + ff[j][k];
// }
// }
// }
//
// ```
//
// The getInstrOrderCost will return the following value.
//
// Outer | Inner | Cost
// -------+-------+------
// k | j | -2
// j | i | -4
// k | i | 0
//
// This means that this function says interchanging (k, j) loops and (j, i)
// loops are profitable, but not (k, i). The root cause of this is that the
// getInstrOrderCost only see the loops we are checking. We can resolve this if
// we also consider the order going through other inductions. As for the above
// case, we can induce that interchanging `k` and `i` is profitable (it is
// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`.
// However, such accurate calculation is expensive, so that we don't do it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making your comments on of:

  1. Doygen comment at the function's declaration documenting its usage
  2. Comment inside the function definition documenting the implementation

Comment on lines 1227 to 1228
// Transitivity: If both isProfitable(a, b) and isProfitable(b, c) is true then
// isProfitable(a, c) is true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think transitivity needs to apply. Loop interchange's bubble sort is a greedy algorithm: Make local decision in the hopes that it improves the global result. In particular, it depends on the order of a,b,c in the loop nest. After exchanging a and b, exchanging b and c might not be profitable anymot

for c:
  for a:
    for b:
      A[b][a]
      B[c]

Interchanging a with b is profitable (making A[b][a] consecutive). Interchanging b with c as well (making B[c] consecutive)1. But after interchanging a and b, interchanging b and c is not necessarily profitable: it destroys that the entire A[b][a] is accessed consecutively.

Of course there are more concerns than conscecutive accesses, such as data reuse. But this is not how isProfitable: It consists of multiple single concern decision-making rules, applying the one with the higherst priority. They do not (and cannot) have a global view on what the eventual best loop order would be. If that would be the case, we should compute that order in advance, then apply bubble sort until we reach that order. That wouln't be a greedy algorithm anymore though.

Footnotes

  1. You may argue the question is malformed: b and c are not neighbors, they cannot even be interchanged.

Copy link
Contributor Author

@kasuga-fj kasuga-fj Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't intend to require transitivity for LoopInterchange. I just wrote a property that the comparison function of a comparison sort holds, but that LoopInterchangeProfitability::isProfitable does not. But I still think that the transitivity should be satisfied if it is possible, under the assumption that the profitability decisions don't change depending on the current order of the nest. Your case looks to me the case where the profitability changes after interchanging, where in my understanding the decision for b and c doesn't change before and after interchanging a and b in the current implementation.

// some standard C++ libraries. In particular, isProfitable should hold the
// following properties.
//
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that interchange(a, b) is the same as interchange(b, a), the relation whether the two should be exchanged is symmetric, not asymmetric. What you are looking for is "areLoopsInTheRightOrder", independent of what the current order is.

Also consider that isProfitable(a, b) and isProfitable(a, b) on the interchanged loops are not the same input. The first is a query on

for a:
  for b:

the second is a query on:

for b':
  for a':

a and a', as well as b, b' are not the same loops. They look very different to isProfitablePerLoopCacheAnalysis, isProfitablePerInstrOrderCost, and isProfitableForVectorization.

Copy link
Contributor Author

@kasuga-fj kasuga-fj Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I want to argue here is that for all f in the three single rules (isProfitablePerLoopCacheAnalysis, isProfitablePerInstrOrderCost, or isProfitableForVectorization) the following property holds regardless of the current loop order

If f(a, b) is true, then f(b, a) is false (or nullopt).

This patch fixes the problem that isProfitablePerLoopCacheAnalysis didn't satisfy it. I believe this is the same as what you call areLoopsInTheRightOrder.

Copy link
Member

@Meinersbur Meinersbur Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If f(a, b) is true, then f(b, a) is false (or nullopt).

The mathematical terms do not match what you want to say.

  1. isProfitable requires the inner loop to be the first argument, the second is the outer loop. If a is the outer loop, then isProfitable(b,a) is undefined.

  2. In order for f to be a function, all inputs must be in its arguments, namly the LLVM-IR, for f R ( a , b ) . After interchange having been applied, R has changed. You get a new function f R ( a , b ) . Even if the a (and b) are considered "equal" in some sense, the underlying LLVM-IR has changed, and with it the function result. One could say make relationship claims between f R and f R , but its not asymmetry or transitivity, since those are about a single relation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, my use of mathematical terms is incorrect. What I want is like

  • f R ( a , b ) is true then f R ( b , a ) is false, where
    • a is the inner loop and b is the outer one in R .
    • R is what some interchanges are applied to R so that the order of a and b is exchanged.

I will fix the comment. So, do you think the changes of this patch is correct to accomplish that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you want to express the "transivity" part? That is, that after a series of swaps you get back to the original order, without directly undoing a previous swap?

There is no need to use math for this, unless you want to get into formal proofs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the comments about transitivity. IIUIC, the transitivity doesn't affect what I want to fix in this patch.

The reason I mentioned the transitivity is that I was initially thinking about the properties that the comparison function of a comparison sort must have in general. It may affect the performance improvements, but is different from the topic of this PR.

@kasuga-fj kasuga-fj changed the title [LoopInterchange] Stop performing unprofitable interchange [LoopInterchange] Prevent from undoing its own transformation Feb 18, 2025
Copy link
Contributor Author

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description was really bad. This patch fixes the issue that LoopInterchange can undo its optimization, as I described in #118267 (comment). I don't intend to archive the global optimal order. Fixed the PR title and description.

// some standard C++ libraries. In particular, isProfitable should hold the
// following properties.
//
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.
Copy link
Contributor Author

@kasuga-fj kasuga-fj Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I want to argue here is that for all f in the three single rules (isProfitablePerLoopCacheAnalysis, isProfitablePerInstrOrderCost, or isProfitableForVectorization) the following property holds regardless of the current loop order

If f(a, b) is true, then f(b, a) is false (or nullopt).

This patch fixes the problem that isProfitablePerLoopCacheAnalysis didn't satisfy it. I believe this is the same as what you call areLoopsInTheRightOrder.

Comment on lines 1227 to 1228
// Transitivity: If both isProfitable(a, b) and isProfitable(b, c) is true then
// isProfitable(a, c) is true.
Copy link
Contributor Author

@kasuga-fj kasuga-fj Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't intend to require transitivity for LoopInterchange. I just wrote a property that the comparison function of a comparison sort holds, but that LoopInterchangeProfitability::isProfitable does not. But I still think that the transitivity should be satisfied if it is possible, under the assumption that the profitability decisions don't change depending on the current order of the nest. Your case looks to me the case where the profitability changes after interchanging, where in my understanding the decision for b and c doesn't change before and after interchanging a and b in the current implementation.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move the refactorings/cleanups out of this patch? I am still confused what CostMap actually does.

// some standard C++ libraries. In particular, isProfitable should hold the
// following properties.
//
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you want to express the "transivity" part? That is, that after a series of swaps you get back to the original order, without directly undoing a previous swap?

There is no need to use math for this, unless you want to get into formal proofs.

Copy link
Contributor Author

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. I erased the off topic comments on this patch.

The problem is the current behavior of isProfitablePerLoopCacheAnalysis when CC->getLoopCost(*OuterLoop) == CC->getLoopCost(*InnerLoop). It returns true if InnerIndex < OuterIndex, but otherwise returns nullopt, which may trigger to undo a past interchange.

@@ -388,6 +387,8 @@ class LoopInterchangeProfitability {

/// Interface to emit optimization remarks.
OptimizationRemarkEmitter *ORE;

const std::optional<CostMapTy> &CostMap;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -497,11 +498,13 @@ struct LoopInterchange {
// indicates the loop should be placed as the innermost loop.
//
// For the old pass manager CacheCost would be null.
DenseMap<const Loop *, unsigned> CostMap;
std::optional<CostMapTy> CostMap = std::nullopt;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CostMap in LoopInterchangeProfitability is a reference to this object. I just changed the way it is passed: from the argument of isProfitable to the ctor.
I think we should clean up the code in this area. For example, the arguments InnerLoop and OuterLoop of isProfitable look redundant. They would be the same as what is passed by the ctor.

// some standard C++ libraries. In particular, isProfitable should hold the
// following properties.
//
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the comments about transitivity. IIUIC, the transitivity doesn't affect what I want to fix in this patch.

The reason I mentioned the transitivity is that I was initially thinking about the properties that the comparison function of a comparison sort must have in general. It may affect the performance improvements, but is different from the topic of this PR.

// some standard C++ libraries. In particular, isProfitable should hold the
// following properties.
//
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My use of mathematical terms was incorrect. What I wanted to argue here is the following part in the comment #127473 (review) by @Meinersbur .

LoopInterchange should indeed not undo its own optimization, which indicates a problem with the cost model (which should be conservative, never apply an transformation we are not pretty sure it will improve performance; if we do we were not conservative enough).

// we also consider the order going through other inductions. As for the above
// case, we can induce that interchanging `k` and `i` is profitable (it is
// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`.
// However, such accurate calculation is expensive, so that we don't do it.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this comment because it is not related to the purpose of this patch. If it is helpful, I will add it back as another PR.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move the refactorings/cleanups out of this patch?

Please still consider.

/// Map each loop to a pair of (index, cost). The index is the best position
/// of this loop, according to CacheCostAnalysis. The cost is the actual cost
/// as calculated by CacheCostAnalysis.
const std::optional<CostMapTy> &CostMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing a reference to memory not owned by this object is asking for problems (I am speaking out of experience). If LoopInterchangeProfitability cannot own it, it should still be passed when needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undid this change. I believe LoopInterchangeProfitability can own the CostMap, but if so, I'll do it in another patch.

Copy link
Contributor Author

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move the refactorings/cleanups out of this patch?

Please still consider.

Sorry, I've misread your comment...

/// Map each loop to a pair of (index, cost). The index is the best position
/// of this loop, according to CacheCostAnalysis. The cost is the actual cost
/// as calculated by CacheCostAnalysis.
const std::optional<CostMapTy> &CostMap;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undid this change. I believe LoopInterchangeProfitability can own the CostMap, but if so, I'll do it in another patch.

@kasuga-fj kasuga-fj requested a review from Meinersbur February 27, 2025 13:51
@kasuga-fj
Copy link
Contributor Author

@Meinersbur Could you please take a look again?

@kasuga-fj
Copy link
Contributor Author

Gentle ping

LoopInterchange uses the bubble-sort fashion algorithm to sort the loops
in a loop nest. For two loops A and B, it calls isProfitable(A, B) to
determine whether these loops should be exchanged. The result of
isProfitable(A, B) should be conservative, that is, it should return
true only when we are sure exchanging A and B will improve performance.
If it's not conservative enough, it's possible that a subsequent
isProfitable(B, A) will also return true, in which case LoopInterchange
will undo its previous transformation. To make isProfitable conservative
enough, isProfitable(B, A) must not return true if isProfitable(A, B)
returned true in the past. However, the current implementation can be in
such a situation.

isProfitable consists of several multiple rules, and they are applied
one by one. The first rule is isProfitablePerLoopCacheAnalysis, and the
problem is that there are cases where
isProfitablePerLoopCacheAnalysis(A, B) returns true but
isProfitablePerLoopCacheAnalysis(B, A) returns nullopt. This happens
when the cost computed by CacheCostAnalysis is the same for A and B. In
this case, the result of isProfitablePerLoopCacheAnalysis depends on the
value of CostMap, which maps a loop to an index that is pre-computed in
some way: If CostMap[A] is less than CostMap[B], it returns true,
otherwise it returns nullopt, where the above problem occurs. To avoid
such a situation, if the cost of CacheCostAnalysis is the same for A and
B, it should return nullopt regardless of CostMap.
@kasuga-fj kasuga-fj force-pushed the loop-interchange-profitable branch from d2424cf to 6a5ce1f Compare March 14, 2025 10:58
@kasuga-fj
Copy link
Contributor Author

Rebased to resolve conflicts. Also updated the description.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directly fixes the root cause. LGTM and thanks.

Don't forget to update the summary for the commit message.

@kasuga-fj
Copy link
Contributor Author

Thanks for the kind review!

@kasuga-fj kasuga-fj merged commit 281028e into llvm:main Mar 21, 2025
11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 21, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/18115

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/gpupgo/pgo2.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate
# note: command had no output on stdout or stderr
# RUN: at line 2
env LLVM_PROFILE_FILE=pgo2.c.llvm.profraw      /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp 2>&1
# executed command: env LLVM_PROFILE_FILE=pgo2.c.llvm.profraw /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp
# note: command had no output on stdout or stderr
# RUN: at line 4
llvm-profdata show --all-functions --counts      pgo2.c.llvm.profraw | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c      --check-prefix="LLVM-HOST"
# executed command: llvm-profdata show --all-functions --counts pgo2.c.llvm.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=LLVM-HOST
# note: command had no output on stdout or stderr
# RUN: at line 7
llvm-profdata show --all-functions --counts      amdgcn-amd-amdhsa.pgo2.c.llvm.profraw      | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix="LLVM-DEVICE"
# executed command: llvm-profdata show --all-functions --counts amdgcn-amd-amdhsa.pgo2.c.llvm.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=LLVM-DEVICE
# note: command had no output on stdout or stderr
# RUN: at line 11
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-instr-generate
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-instr-generate
# note: command had no output on stdout or stderr
# RUN: at line 12
env LLVM_PROFILE_FILE=pgo2.c.clang.profraw      /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp 2>&1
# executed command: env LLVM_PROFILE_FILE=pgo2.c.clang.profraw /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp
# note: command had no output on stdout or stderr
# RUN: at line 14
llvm-profdata show --all-functions --counts      pgo2.c.clang.profraw | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c      --check-prefix="CLANG-HOST"
# executed command: llvm-profdata show --all-functions --counts pgo2.c.clang.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=CLANG-HOST
# note: command had no output on stdout or stderr
# RUN: at line 17
llvm-profdata show --all-functions --counts      amdgcn-amd-amdhsa.pgo2.c.clang.profraw |      /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix="CLANG-DEV"
# executed command: llvm-profdata show --all-functions --counts amdgcn-amd-amdhsa.pgo2.c.clang.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=CLANG-DEV
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c:101:15: error: CLANG-DEV: expected string not found in input
# | // CLANG-DEV: Block counts: [11]
# |               ^
# | <stdin>:5:19: note: scanning from here
# |  Function count: 0
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants