[LoopInterchange] Prevent from undoing its own transformation #127473

kasuga-fj · 2025-02-17T11:19:04Z

LoopInterchange uses the bubble-sort fashion algorithm to sort the loops in a loop nest. For two loops A and B, it calls isProfitable(A, B) to determine whether these loops should be exchanged. The result of isProfitable(A, B) should be conservative, that is, it should return true only when we are sure exchanging A and B will improve performance. If it's not conservative enough, it's possible that a subsequent isProfitable(B, A) will also return true, in which case LoopInterchange will undo its previous transformation. To avoid such cases, isProfitable(B, A) must not return true if isProfitable(A, B) returned true in the past. However, the current implementation can be in such a situation. This patch resolves it by correcting the handling of two loops that have the same cache cost.

This resolve the problem mentioned in #118267 (comment).

llvmbot · 2025-02-17T11:19:42Z

@llvm/pr-subscribers-llvm-transforms

Author: Ryotaro Kasuga (kasuga-fj)

Changes

LoopInterchange uses the bubble-sort fashion algorithm to sort the loops, but the comparison function (called isProfitable) didn't satisfy asymmetry. This means that both isProfitable(a, b) and isProfitable(b, a) can return true, triggering an unprofitable interchange. This patch fixes the problem and prevents the interchange from performing unprofitable transformations.

Full diff: https://github.com/llvm/llvm-project/pull/127473.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+89-43)
(added) llvm/test/Transforms/LoopInterchange/profitability-redundant-interchange.ll (+80)

diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index f45d90ff13e14..faa4e22d0d161 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -356,26 +356,24 @@ class LoopInterchangeLegality {
   SmallVector<PHINode *, 8> InnerLoopInductions;
 };
 
+using CostMapTy = DenseMap<const Loop *, std::pair<unsigned, CacheCostTy>>;
+
 /// LoopInterchangeProfitability checks if it is profitable to interchange the
 /// loop.
 class LoopInterchangeProfitability {
 public:
   LoopInterchangeProfitability(Loop *Outer, Loop *Inner, ScalarEvolution *SE,
-                               OptimizationRemarkEmitter *ORE)
-      : OuterLoop(Outer), InnerLoop(Inner), SE(SE), ORE(ORE) {}
+                               OptimizationRemarkEmitter *ORE, const std::optional<CostMapTy> &CM)
+      : OuterLoop(Outer), InnerLoop(Inner), SE(SE), ORE(ORE), CostMap(CM) {}
 
   /// Check if the loop interchange is profitable.
   bool isProfitable(const Loop *InnerLoop, const Loop *OuterLoop,
                     unsigned InnerLoopId, unsigned OuterLoopId,
-                    CharMatrix &DepMatrix,
-                    const DenseMap<const Loop *, unsigned> &CostMap,
-                    std::unique_ptr<CacheCost> &CC);
+                    CharMatrix &DepMatrix);
 
 private:
   int getInstrOrderCost();
-  std::optional<bool> isProfitablePerLoopCacheAnalysis(
-      const DenseMap<const Loop *, unsigned> &CostMap,
-      std::unique_ptr<CacheCost> &CC);
+  std::optional<bool> isProfitablePerLoopCacheAnalysis();
   std::optional<bool> isProfitablePerInstrOrderCost();
   std::optional<bool> isProfitableForVectorization(unsigned InnerLoopId,
                                                    unsigned OuterLoopId,
@@ -388,6 +386,8 @@ class LoopInterchangeProfitability {
 
   /// Interface to emit optimization remarks.
   OptimizationRemarkEmitter *ORE;
+
+  const std::optional<CostMapTy> &CostMap;
 };
 
 /// LoopInterchangeTransform interchanges the loop.
@@ -497,11 +497,13 @@ struct LoopInterchange {
     // indicates the loop should be placed as the innermost loop.
     //
     // For the old pass manager CacheCost would be null.
-    DenseMap<const Loop *, unsigned> CostMap;
+    std::optional<CostMapTy> CostMap = std::nullopt;
     if (CC != nullptr) {
+      CostMap = CostMapTy();
       const auto &LoopCosts = CC->getLoopCosts();
       for (unsigned i = 0; i < LoopCosts.size(); i++) {
-        CostMap[LoopCosts[i].first] = i;
+        const auto &Cost = LoopCosts[i];
+        (*CostMap)[Cost.first] = std::make_pair(i, Cost.second);
       }
     }
     // We try to achieve the globally optimal memory access for the loopnest,
@@ -537,7 +539,7 @@ struct LoopInterchange {
   bool processLoop(Loop *InnerLoop, Loop *OuterLoop, unsigned InnerLoopId,
                    unsigned OuterLoopId,
                    std::vector<std::vector<char>> &DependencyMatrix,
-                   const DenseMap<const Loop *, unsigned> &CostMap) {
+                   const std::optional<CostMapTy> &CostMap) {
     LLVM_DEBUG(dbgs() << "Processing InnerLoopId = " << InnerLoopId
                       << " and OuterLoopId = " << OuterLoopId << "\n");
     LoopInterchangeLegality LIL(OuterLoop, InnerLoop, SE, ORE);
@@ -546,9 +548,8 @@ struct LoopInterchange {
       return false;
     }
     LLVM_DEBUG(dbgs() << "Loops are legal to interchange\n");
-    LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE);
-    if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId,
-                          DependencyMatrix, CostMap, CC)) {
+    LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE, CostMap);
+    if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId, DependencyMatrix)) {
       LLVM_DEBUG(dbgs() << "Interchanging loops not profitable.\n");
       return false;
     }
@@ -1127,29 +1128,60 @@ int LoopInterchangeProfitability::getInstrOrderCost() {
 }
 
 std::optional<bool>
-LoopInterchangeProfitability::isProfitablePerLoopCacheAnalysis(
-    const DenseMap<const Loop *, unsigned> &CostMap,
-    std::unique_ptr<CacheCost> &CC) {
+LoopInterchangeProfitability::isProfitablePerLoopCacheAnalysis() {
   // This is the new cost model returned from loop cache analysis.
   // A smaller index means the loop should be placed an outer loop, and vice
   // versa.
-  if (CostMap.contains(InnerLoop) && CostMap.contains(OuterLoop)) {
-    unsigned InnerIndex = 0, OuterIndex = 0;
-    InnerIndex = CostMap.find(InnerLoop)->second;
-    OuterIndex = CostMap.find(OuterLoop)->second;
-    LLVM_DEBUG(dbgs() << "InnerIndex = " << InnerIndex
-                      << ", OuterIndex = " << OuterIndex << "\n");
-    if (InnerIndex < OuterIndex)
-      return std::optional<bool>(true);
-    assert(InnerIndex != OuterIndex && "CostMap should assign unique "
-                                       "numbers to each loop");
-    if (CC->getLoopCost(*OuterLoop) == CC->getLoopCost(*InnerLoop))
-      return std::nullopt;
-    return std::optional<bool>(false);
-  }
-  return std::nullopt;
+  if (!CostMap.has_value())
+    return std::nullopt;
+
+  auto InnerIte = CostMap->find(InnerLoop);
+  auto OuterIte = CostMap->find(OuterLoop);
+  if (InnerIte == CostMap->end() || OuterIte == CostMap->end())
+    return std::nullopt;
+
+  const auto &[InnerIndex, InnerCost] = InnerIte->second;
+  const auto &[OuterIndex, OuterCost] = OuterIte->second;
+  LLVM_DEBUG(dbgs() << "InnerIndex = " << InnerIndex
+                    << ", OuterIndex = " << OuterIndex << "\n");
+  assert(InnerIndex != OuterIndex && "CostMap should assign unique "
+                                     "numbers to each loop");
+
+  if (InnerCost == OuterCost)
+    return std::nullopt;
+
+  return InnerIndex < OuterIndex;
 }
 
+// This function doesn't satisfy transitivity. Consider the following case.
+//
+// ```
+// for (int k = 0; k < N; k++) {
+//   for (int j = 0; j < N; j++) {
+//     for (int i = 0; i < N; i++) {
+//       dst0[i][j][k] += aa[i][j] + bb[i][j] + cc[j][k];
+//       dst1[k][j][i] += dd[i][j] + ee[i][j] + ff[j][k];
+//     }
+//   }
+// }
+//
+// ```
+//
+// The getInstrOrderCost will return the following value.
+//
+//  Outer | Inner | Cost
+// -------+-------+------
+//    k   |   j   |  -2
+//    j   |   i   |  -4
+//    k   |   i   |   0
+//
+// This means that this function says interchanging (k, j) loops and (j, i)
+// loops are profitable, but not (k, i). The root cause of this is that the
+// getInstrOrderCost only see the loops we are checking. We can resolve this if
+// we also consider the order going through other inductions. As for the above
+// case, we can induce that interchanging `k` and `i` is profitable (it is
+// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`.
+// However, such accurate calculation is expensive, so that we don't do it.
 std::optional<bool>
 LoopInterchangeProfitability::isProfitablePerInstrOrderCost() {
   // Legacy cost model: this is rough cost estimation algorithm. It counts the
@@ -1184,11 +1216,25 @@ std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(
   return std::optional<bool>(!DepMatrix.empty());
 }
 
+// The bubble-sort fashion algorithm is adopted to sort the loop nest, so the
+// comparison function should ideally induce a strict weak ordering required by
+// some standard C++ libraries. In particular, isProfitable should hold the
+// following properties.
+//
+// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.
+// Transitivity: If both isProfitable(a, b) and isProfitable(b, c) is true then
+// isProfitable(a, c) is true.
+//
+// The most important thing is not to make unprofitable interchange. From this
+// point of view, asymmetry is important. This is because if both
+// isProfitable(a, b) and isProfitable(b, a) are true, then an unprofitable
+// transformation (one of them) will be performed. On the other hand, a lack of
+// transitivity might cause some optimization opportunities to be lost, but
+// won't trigger an unprofitable one. Moreover, guaranteeing transitivity is
+// expensive. Therefore, isProfitable only holds the asymmetry.
 bool LoopInterchangeProfitability::isProfitable(
     const Loop *InnerLoop, const Loop *OuterLoop, unsigned InnerLoopId,
-    unsigned OuterLoopId, CharMatrix &DepMatrix,
-    const DenseMap<const Loop *, unsigned> &CostMap,
-    std::unique_ptr<CacheCost> &CC) {
+    unsigned OuterLoopId, CharMatrix &DepMatrix) {
   // isProfitable() is structured to avoid endless loop interchange.
   // If loop cache analysis could decide the profitability then,
   // profitability check will stop and return the analysis result.
@@ -1197,15 +1243,14 @@ bool LoopInterchangeProfitability::isProfitable(
   // profitable for InstrOrderCost. Likewise, if InstrOrderCost failed to
   // analysis the profitability then only, isProfitableForVectorization
   // will decide.
-  std::optional<bool> shouldInterchange =
-      isProfitablePerLoopCacheAnalysis(CostMap, CC);
-  if (!shouldInterchange.has_value()) {
-    shouldInterchange = isProfitablePerInstrOrderCost();
-    if (!shouldInterchange.has_value())
-      shouldInterchange =
+  std::optional<bool> ShouldInterchange = isProfitablePerLoopCacheAnalysis();
+  if (!ShouldInterchange.has_value()) {
+    ShouldInterchange = isProfitablePerInstrOrderCost();
+    if (!ShouldInterchange.has_value())
+      ShouldInterchange =
           isProfitableForVectorization(InnerLoopId, OuterLoopId, DepMatrix);
   }
-  if (!shouldInterchange.has_value()) {
+  if (!ShouldInterchange.has_value()) {
     ORE->emit([&]() {
       return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable",
                                       InnerLoop->getStartLoc(),
@@ -1214,7 +1259,8 @@ bool LoopInterchangeProfitability::isProfitable(
                 "interchange.";
     });
     return false;
-  } else if (!shouldInterchange.value()) {
+  }
+  if (!ShouldInterchange.value()) {
     ORE->emit([&]() {
       return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable",
                                       InnerLoop->getStartLoc(),
diff --git a/llvm/test/Transforms/LoopInterchange/profitability-redundant-interchange.ll b/llvm/test/Transforms/LoopInterchange/profitability-redundant-interchange.ll
new file mode 100644
index 0000000000000..55056dca15d3e
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/profitability-redundant-interchange.ll
@@ -0,0 +1,80 @@
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=1 -pass-remarks-output=%t -disable-output \
+;           -verify-dom-info -verify-loop-info
+; RUN: FileCheck -input-file %t %s
+
+
+; Test that the same pair of loops are not interchanged twice. This is the case
+; when the cost computed by CacheCost is the same for the loop of `j` and `k`.
+;
+; #define N 4
+; int a[N*N][N*N][N*N];
+; void f() {
+;   for (int i = 0; i < N; i++)
+;     for (int j = 1; j < 2*N; j++)
+;       for (int k = 1; k < 2*N; k++)
+;         a[i][k+1][j-1] -= a[i+N-1][k][j];
+; }
+
+; CHECK:      --- !Passed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            Interchanged
+; CHECK-NEXT: Function:        f
+; CHECK-NEXT: Args:
+; CHECK-NEXT:    - String:          Loop interchanged with enclosing loop.
+; CHECK-NEXT: ...
+; CHECK-NEXT: --- !Missed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            Dependence
+; CHECK-NEXT: Function:        f
+; CHECK-NEXT: Args:
+; CHECK-NEXT:  - String:       Cannot interchange loops due to dependences.
+; CHECK-NEXT: ...
+; CHECK-NEXT: --- !Missed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            InterchangeNotProfitable
+; CHECK-NEXT: Function:        f
+; CHECK-NEXT: Args:
+; CHECK-NEXT:   - String:          Interchanging loops is not considered to improve cache locality nor vectorization.
+; CHECK-NEXT: ...
+
+@a = dso_local local_unnamed_addr global [16 x [16 x [16 x i32]]] zeroinitializer, align 4
+
+define dso_local void @f() {
+entry:
+  br label %for.cond1.preheader
+
+for.cond1.preheader:
+  %indvars.iv46 = phi i64 [ 0, %entry ], [ %indvars.iv.next47, %for.cond.cleanup3 ]
+  %0 = add nuw nsw i64 %indvars.iv46, 3
+  br label %for.cond5.preheader
+
+for.cond5.preheader:
+  %indvars.iv41 = phi i64 [ 1, %for.cond1.preheader ], [ %indvars.iv.next42, %for.cond.cleanup7 ]
+  %1 = add nsw i64 %indvars.iv41, -1
+  br label %for.body8
+
+for.cond.cleanup3:
+  %indvars.iv.next47 = add nuw nsw i64 %indvars.iv46, 1
+  %exitcond50 = icmp ne i64 %indvars.iv.next47, 4
+  br i1 %exitcond50, label %for.cond1.preheader, label %for.cond.cleanup
+
+for.cond.cleanup7:
+  %indvars.iv.next42 = add nuw nsw i64 %indvars.iv41, 1
+  %exitcond45 = icmp ne i64 %indvars.iv.next42, 8
+  br i1 %exitcond45, label %for.cond5.preheader, label %for.cond.cleanup3
+
+for.body8:
+  %indvars.iv = phi i64 [ 1, %for.cond5.preheader ], [ %indvars.iv.next, %for.body8 ]
+  %arrayidx12 = getelementptr inbounds nuw [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %0, i64 %indvars.iv, i64 %indvars.iv41
+  %2 = load i32, ptr %arrayidx12, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %arrayidx20 = getelementptr inbounds [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %indvars.iv46, i64 %indvars.iv.next, i64 %1
+  %3 = load i32, ptr %arrayidx20, align 4
+  %sub21 = sub nsw i32 %3, %2
+  store i32 %sub21, ptr %arrayidx20, align 4
+  %exitcond = icmp ne i64 %indvars.iv.next, 8
+  br i1 %exitcond, label %for.body8, label %for.cond.cleanup7
+
+for.cond.cleanup:
+  ret void
+}

github-actions · 2025-02-17T11:22:23Z

✅ With the latest revision this PR passed the C/C++ code formatter.

sjoerdmeijer · 2025-02-18T10:17:00Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

@@ -388,6 +387,8 @@ class LoopInterchangeProfitability {

  /// Interface to emit optimization remarks.
  OptimizationRemarkEmitter *ORE;
+
+  const std::optional<CostMapTy> &CostMap;


Nit: a comment explaining what this is exactly capturing would be good. I.e., it's a map of:

Loop -> [index, cost]?

sjoerdmeijer · 2025-02-18T10:25:17Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

@@ -497,11 +498,13 @@ struct LoopInterchange {
    // indicates the loop should be placed as the innermost loop.
    //
    // For the old pass manager CacheCost would be null.
-    DenseMap<const Loop *, unsigned> CostMap;
+    std::optional<CostMapTy> CostMap = std::nullopt;


You've also added a CostMap to the LoopInterchangeProfitability class, it's not yet clear to me if we need both.

The CostMap in LoopInterchangeProfitability is a reference to this object. I just changed the way it is passed: from the argument of isProfitable to the ctor.
I think we should clean up the code in this area. For example, the arguments InnerLoop and OuterLoop of isProfitable look redundant. They would be the same as what is passed by the ctor.

sjoerdmeijer · 2025-02-18T10:28:37Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+// we also consider the order going through other inductions. As for the above
+// case, we can induce that interchanging `k` and `i` is profitable (it is
+// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`.
+// However, such accurate calculation is expensive, so that we don't do it.


Thanks for adding this example and comment, this really helps!

One remark about the last statement:

However, such accurate calculation is expensive, so that we don't do it.

Can you add where this leaves us, what we do instead?

I have removed this comment because it is not related to the purpose of this patch. If it is helpful, I will add it back as another PR.

sjoerdmeijer · 2025-02-18T10:36:12Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+// some standard C++ libraries. In particular, isProfitable should hold the
+// following properties.
+//
+// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.


Question on the assymetry: does this really need to hold? Can it not be the case that isProfitable(a, b) and isProfitable(b, a) are both profitable, or both neutral, or both not really good? I haven't tried to come up with examples, but I thought about this as a way of saying the cost-model saying "don't know"?

I think the asymmetry is necessary.

The isProfitable returns bool, but the profitability check is actually std::optional<bool>, which is expressed by ShouldInterchange. This means that the cost-model has three types of results: "should interchange them", "should NOT interchange them", and "don't know". The nullopt is squashed into false, so not isProfitable(a, b) could mean either "should NOT" or "don't know".

To answer your question, there are cases where both isProfitable(a, b) and isProfitable(b, a) become false. Asymmetry doesn't say anything about this case, it requires that they cannot both be true. In this case cost-model may be saying "I don't know both", or "I don't know one, but we should not perform the other". Also, I think we should design the cost-model so that it never says both are profitable, and that the asymmetry says. At least as far as we use the bubble sort fashion algorithm, it can cause the same pair of loops to be interchanged more than twice.

My use of mathematical terms was incorrect. What I wanted to argue here is the following part in the comment #127473 (review) by @Meinersbur .

LoopInterchange should indeed not undo its own optimization, which indicates a problem with the cost model (which should be conservative, never apply an transformation we are not pretty sure it will improve performance; if we do we were not conservative enough).

Meinersbur

From the Summary:

This patch fixes the problem and prevents the interchange from performing unprofitable transformations.

Please eloborate on how this PR does this. Due to refactorings in the diff I haven't yet found what the actual change is.

LoopInterchange should indeed not undo its own optimization, which indicates a problem with the cost model (which should be conservative, never apply an transformation we are not pretty sure it will improve performance; if we do we were not conservative enough). However, this is a greedy algorithm on neighboring loops, properties that need to take the entire loop nest into consideration is beyond this scope. Feel free to replace the greedy algorithm, but you cannot ensure global properties with just local information¹.

Well, it is the case for some problems, such as minimal spanning tree. Program performance is propbably not one of them. ↩

Meinersbur · 2025-02-18T12:38:03Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+// This function doesn't satisfy transitivity. Consider the following case.
+//
+// ```
+// for (int k = 0; k < N; k++) {
+//   for (int j = 0; j < N; j++) {
+//     for (int i = 0; i < N; i++) {
+//       dst0[i][j][k] += aa[i][j] + bb[i][j] + cc[j][k];
+//       dst1[k][j][i] += dd[i][j] + ee[i][j] + ff[j][k];
+//     }
+//   }
+// }
+//
+// ```
+//
+// The getInstrOrderCost will return the following value.
+//
+//  Outer | Inner | Cost
+// -------+-------+------
+//    k   |   j   |  -2
+//    j   |   i   |  -4
+//    k   |   i   |   0
+//
+// This means that this function says interchanging (k, j) loops and (j, i)
+// loops are profitable, but not (k, i). The root cause of this is that the
+// getInstrOrderCost only see the loops we are checking. We can resolve this if
+// we also consider the order going through other inductions. As for the above
+// case, we can induce that interchanging `k` and `i` is profitable (it is
+// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`.
+// However, such accurate calculation is expensive, so that we don't do it.


Consider making your comments on of:

Doygen comment at the function's declaration documenting its usage

Comment inside the function definition documenting the implementation

Meinersbur · 2025-02-18T13:02:22Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+// Transitivity: If both isProfitable(a, b) and isProfitable(b, c) is true then
+// isProfitable(a, c) is true.


I don't think transitivity needs to apply. Loop interchange's bubble sort is a greedy algorithm: Make local decision in the hopes that it improves the global result. In particular, it depends on the order of a,b,c in the loop nest. After exchanging a and b, exchanging b and c might not be profitable anymot

for c: for a: for b: A[b][a] B[c]

Interchanging a with b is profitable (making A[b][a] consecutive). Interchanging b with c as well (making B[c] consecutive)¹. But after interchanging a and b, interchanging b and c is not necessarily profitable: it destroys that the entire A[b][a] is accessed consecutively.

Of course there are more concerns than conscecutive accesses, such as data reuse. But this is not how isProfitable: It consists of multiple single concern decision-making rules, applying the one with the higherst priority. They do not (and cannot) have a global view on what the eventual best loop order would be. If that would be the case, we should compute that order in advance, then apply bubble sort until we reach that order. That wouln't be a greedy algorithm anymore though.

Footnotes

You may argue the question is malformed: b and c are not neighbors, they cannot even be interchanged. ↩

I don't intend to require transitivity for LoopInterchange. I just wrote a property that the comparison function of a comparison sort holds, but that LoopInterchangeProfitability::isProfitable does not. But I still think that the transitivity should be satisfied if it is possible, under the assumption that the profitability decisions don't change depending on the current order of the nest. Your case looks to me the case where the profitability changes after interchanging, where in my understanding the decision for b and c doesn't change before and after interchanging a and b in the current implementation.

Meinersbur · 2025-02-18T13:05:20Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+// some standard C++ libraries. In particular, isProfitable should hold the
+// following properties.
+//
+// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.


Considering that interchange(a, b) is the same as interchange(b, a), the relation whether the two should be exchanged is symmetric, not asymmetric. What you are looking for is "areLoopsInTheRightOrder", independent of what the current order is.

Also consider that isProfitable(a, b) and isProfitable(a, b) on the interchanged loops are not the same input. The first is a query on

for a: for b:

the second is a query on:

for b': for a':

a and a', as well as b, b' are not the same loops. They look very different to isProfitablePerLoopCacheAnalysis, isProfitablePerInstrOrderCost, and isProfitableForVectorization.

What I want to argue here is that for all f in the three single rules (isProfitablePerLoopCacheAnalysis, isProfitablePerInstrOrderCost, or isProfitableForVectorization) the following property holds regardless of the current loop order

If f(a, b) is true, then f(b, a) is false ~~(or nullopt)~~.

This patch fixes the problem that isProfitablePerLoopCacheAnalysis didn't satisfy it. I believe this is the same as what you call areLoopsInTheRightOrder.

If f(a, b) is true, then f(b, a) is false (or nullopt).

The mathematical terms do not match what you want to say.

isProfitable requires the inner loop to be the first argument, the second is the outer loop. If a is the outer loop, then isProfitable(b,a) is undefined.

In order for $f$ to be a function, all inputs must be in its arguments, namly the LLVM-IR, for $f_{R} (a, b)$ . After interchange having been applied, R has changed. You get a new function $f_{R^{'}} (a, b)$ . Even if the a (and b) are considered "equal" in some sense, the underlying LLVM-IR has changed, and with it the function result. One could say make relationship claims between $f_{R}$ and $f_{R^{'}}$ , but its not asymmetry or transitivity, since those are about a single relation.

You are right, my use of mathematical terms is incorrect. What I want is like

$f_{R} (a, b)$ is true then $f_{R^{'}} (b, a)$ is false, where

a is the inner loop and b is the outer one in $R$ .

$R^{'}$ is what some interchanges are applied to $R$ so that the order of a and b is exchanged.

I will fix the comment. So, do you think the changes of this patch is correct to accomplish that?

How do you want to express the "transivity" part? That is, that after a series of swaps you get back to the original order, without directly undoing a previous swap?

There is no need to use math for this, unless you want to get into formal proofs.

Removed the comments about transitivity. IIUIC, the transitivity doesn't affect what I want to fix in this patch.

The reason I mentioned the transitivity is that I was initially thinking about the properties that the comparison function of a comparison sort must have in general. It may affect the performance improvements, but is different from the topic of this PR.

kasuga-fj

The description was really bad. This patch fixes the issue that LoopInterchange can undo its optimization, as I described in #118267 (comment). I don't intend to archive the global optimal order. Fixed the PR title and description.

kasuga-fj · 2025-02-18T15:07:09Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+// some standard C++ libraries. In particular, isProfitable should hold the
+// following properties.
+//
+// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.


What I want to argue here is that for all f in the three single rules (isProfitablePerLoopCacheAnalysis, isProfitablePerInstrOrderCost, or isProfitableForVectorization) the following property holds regardless of the current loop order

If f(a, b) is true, then f(b, a) is false ~~(or nullopt)~~.

This patch fixes the problem that isProfitablePerLoopCacheAnalysis didn't satisfy it. I believe this is the same as what you call areLoopsInTheRightOrder.

kasuga-fj · 2025-02-18T15:39:49Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+// Transitivity: If both isProfitable(a, b) and isProfitable(b, c) is true then
+// isProfitable(a, c) is true.


I don't intend to require transitivity for LoopInterchange. I just wrote a property that the comparison function of a comparison sort holds, but that LoopInterchangeProfitability::isProfitable does not. But I still think that the transitivity should be satisfied if it is possible, under the assumption that the profitability decisions don't change depending on the current order of the nest. Your case looks to me the case where the profitability changes after interchanging, where in my understanding the decision for b and c doesn't change before and after interchanging a and b in the current implementation.

Meinersbur

Could you please move the refactorings/cleanups out of this patch? I am still confused what CostMap actually does.

Meinersbur · 2025-02-19T15:47:37Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+// some standard C++ libraries. In particular, isProfitable should hold the
+// following properties.
+//
+// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.


How do you want to express the "transivity" part? That is, that after a series of swaps you get back to the original order, without directly undoing a previous swap?

There is no need to use math for this, unless you want to get into formal proofs.

kasuga-fj

Thanks for your review. I erased the off topic comments on this patch.

The problem is the current behavior of isProfitablePerLoopCacheAnalysis when CC->getLoopCost(*OuterLoop) == CC->getLoopCost(*InnerLoop). It returns true if InnerIndex < OuterIndex, but otherwise returns nullopt, which may trigger to undo a past interchange.

kasuga-fj · 2025-02-20T06:42:47Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

@@ -388,6 +387,8 @@ class LoopInterchangeProfitability {

  /// Interface to emit optimization remarks.
  OptimizationRemarkEmitter *ORE;
+
+  const std::optional<CostMapTy> &CostMap;


kasuga-fj · 2025-02-20T06:50:35Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

@@ -497,11 +498,13 @@ struct LoopInterchange {
    // indicates the loop should be placed as the innermost loop.
    //
    // For the old pass manager CacheCost would be null.
-    DenseMap<const Loop *, unsigned> CostMap;
+    std::optional<CostMapTy> CostMap = std::nullopt;


The CostMap in LoopInterchangeProfitability is a reference to this object. I just changed the way it is passed: from the argument of isProfitable to the ctor.
I think we should clean up the code in this area. For example, the arguments InnerLoop and OuterLoop of isProfitable look redundant. They would be the same as what is passed by the ctor.

kasuga-fj · 2025-02-20T07:00:13Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+// some standard C++ libraries. In particular, isProfitable should hold the
+// following properties.
+//
+// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.


Removed the comments about transitivity. IIUIC, the transitivity doesn't affect what I want to fix in this patch.

The reason I mentioned the transitivity is that I was initially thinking about the properties that the comparison function of a comparison sort must have in general. It may affect the performance improvements, but is different from the topic of this PR.

kasuga-fj · 2025-02-20T07:05:52Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+// some standard C++ libraries. In particular, isProfitable should hold the
+// following properties.
+//
+// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.


My use of mathematical terms was incorrect. What I wanted to argue here is the following part in the comment #127473 (review) by @Meinersbur .

LoopInterchange should indeed not undo its own optimization, which indicates a problem with the cost model (which should be conservative, never apply an transformation we are not pretty sure it will improve performance; if we do we were not conservative enough).

kasuga-fj · 2025-02-20T07:09:34Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+// we also consider the order going through other inductions. As for the above
+// case, we can induce that interchanging `k` and `i` is profitable (it is
+// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`.
+// However, such accurate calculation is expensive, so that we don't do it.


I have removed this comment because it is not related to the purpose of this patch. If it is helpful, I will add it back as another PR.

Meinersbur

Could you please move the refactorings/cleanups out of this patch?

Please still consider.

Meinersbur · 2025-02-20T16:40:37Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+  /// Map each loop to a pair of (index, cost). The index is the best position
+  /// of this loop, according to CacheCostAnalysis. The cost is the actual cost
+  /// as calculated by CacheCostAnalysis.
+  const std::optional<CostMapTy> &CostMap;


Storing a reference to memory not owned by this object is asking for problems (I am speaking out of experience). If LoopInterchangeProfitability cannot own it, it should still be passed when needed.

Undid this change. I believe LoopInterchangeProfitability can own the CostMap, but if so, I'll do it in another patch.

kasuga-fj

Could you please move the refactorings/cleanups out of this patch?

Please still consider.

Sorry, I've misread your comment...

kasuga-fj · 2025-02-20T23:44:58Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+  /// Map each loop to a pair of (index, cost). The index is the best position
+  /// of this loop, according to CacheCostAnalysis. The cost is the actual cost
+  /// as calculated by CacheCostAnalysis.
+  const std::optional<CostMapTy> &CostMap;


Undid this change. I believe LoopInterchangeProfitability can own the CostMap, but if so, I'll do it in another patch.

kasuga-fj · 2025-03-05T13:59:36Z

@Meinersbur Could you please take a look again?

kasuga-fj · 2025-03-12T08:09:35Z

Gentle ping

LoopInterchange uses the bubble-sort fashion algorithm to sort the loops in a loop nest. For two loops A and B, it calls isProfitable(A, B) to determine whether these loops should be exchanged. The result of isProfitable(A, B) should be conservative, that is, it should return true only when we are sure exchanging A and B will improve performance. If it's not conservative enough, it's possible that a subsequent isProfitable(B, A) will also return true, in which case LoopInterchange will undo its previous transformation. To make isProfitable conservative enough, isProfitable(B, A) must not return true if isProfitable(A, B) returned true in the past. However, the current implementation can be in such a situation. isProfitable consists of several multiple rules, and they are applied one by one. The first rule is isProfitablePerLoopCacheAnalysis, and the problem is that there are cases where isProfitablePerLoopCacheAnalysis(A, B) returns true but isProfitablePerLoopCacheAnalysis(B, A) returns nullopt. This happens when the cost computed by CacheCostAnalysis is the same for A and B. In this case, the result of isProfitablePerLoopCacheAnalysis depends on the value of CostMap, which maps a loop to an index that is pre-computed in some way: If CostMap[A] is less than CostMap[B], it returns true, otherwise it returns nullopt, where the above problem occurs. To avoid such a situation, if the cost of CacheCostAnalysis is the same for A and B, it should return nullopt regardless of CostMap.

kasuga-fj · 2025-03-14T11:04:52Z

Rebased to resolve conflicts. Also updated the description.

Meinersbur

This directly fixes the root cause. LGTM and thanks.

Don't forget to update the summary for the commit message.

kasuga-fj · 2025-03-21T12:41:03Z

Thanks for the kind review!

llvm-ci · 2025-03-21T12:48:26Z

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/18115

Here is the relevant piece of the build log for the reference

Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/gpupgo/pgo2.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate
# note: command had no output on stdout or stderr
# RUN: at line 2
env LLVM_PROFILE_FILE=pgo2.c.llvm.profraw      /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp 2>&1
# executed command: env LLVM_PROFILE_FILE=pgo2.c.llvm.profraw /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp
# note: command had no output on stdout or stderr
# RUN: at line 4
llvm-profdata show --all-functions --counts      pgo2.c.llvm.profraw | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c      --check-prefix="LLVM-HOST"
# executed command: llvm-profdata show --all-functions --counts pgo2.c.llvm.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=LLVM-HOST
# note: command had no output on stdout or stderr
# RUN: at line 7
llvm-profdata show --all-functions --counts      amdgcn-amd-amdhsa.pgo2.c.llvm.profraw      | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix="LLVM-DEVICE"
# executed command: llvm-profdata show --all-functions --counts amdgcn-amd-amdhsa.pgo2.c.llvm.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=LLVM-DEVICE
# note: command had no output on stdout or stderr
# RUN: at line 11
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-instr-generate
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-instr-generate
# note: command had no output on stdout or stderr
# RUN: at line 12
env LLVM_PROFILE_FILE=pgo2.c.clang.profraw      /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp 2>&1
# executed command: env LLVM_PROFILE_FILE=pgo2.c.clang.profraw /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp
# note: command had no output on stdout or stderr
# RUN: at line 14
llvm-profdata show --all-functions --counts      pgo2.c.clang.profraw | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c      --check-prefix="CLANG-HOST"
# executed command: llvm-profdata show --all-functions --counts pgo2.c.clang.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=CLANG-HOST
# note: command had no output on stdout or stderr
# RUN: at line 17
llvm-profdata show --all-functions --counts      amdgcn-amd-amdhsa.pgo2.c.clang.profraw |      /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix="CLANG-DEV"
# executed command: llvm-profdata show --all-functions --counts amdgcn-amd-amdhsa.pgo2.c.clang.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=CLANG-DEV
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c:101:15: error: CLANG-DEV: expected string not found in input
# | // CLANG-DEV: Block counts: [11]
# |               ^
# | <stdin>:5:19: note: scanning from here
# |  Function count: 0
...

llvmbot added the llvm:transforms label Feb 17, 2025

kasuga-fj requested review from Meinersbur, madhur13490, CongzheUalberta and sjoerdmeijer February 17, 2025 11:20

kasuga-fj force-pushed the loop-interchange-profitable branch from a08068a to 45a8373 Compare February 18, 2025 07:45

sjoerdmeijer reviewed Feb 18, 2025

View reviewed changes

Meinersbur requested changes Feb 18, 2025

View reviewed changes

kasuga-fj changed the title ~~[LoopInterchange] Stop performing unprofitable interchange~~ [LoopInterchange] Prevent from undoing its own transformation Feb 18, 2025

kasuga-fj commented Feb 18, 2025

View reviewed changes

Meinersbur reviewed Feb 19, 2025

View reviewed changes

kasuga-fj commented Feb 20, 2025

View reviewed changes

Meinersbur requested changes Feb 20, 2025

View reviewed changes

kasuga-fj commented Feb 20, 2025

View reviewed changes

kasuga-fj requested a review from Meinersbur February 27, 2025 13:51

kasuga-fj force-pushed the loop-interchange-profitable branch from d2424cf to 6a5ce1f Compare March 14, 2025 10:58

Meinersbur approved these changes Mar 21, 2025

View reviewed changes

kasuga-fj merged commit 281028e into llvm:main Mar 21, 2025
11 checks passed

		// Transitivity: If both isProfitable(a, b) and isProfitable(b, c) is true then
		// isProfitable(a, c) is true.

[LoopInterchange] Prevent from undoing its own transformation #127473

[LoopInterchange] Prevent from undoing its own transformation #127473

Conversation

kasuga-fj commented Feb 17, 2025 • edited Loading

llvmbot commented Feb 17, 2025

github-actions bot commented Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Meinersbur left a comment • edited Loading

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

kasuga-fj Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasuga-fj Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Meinersbur Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasuga-fj left a comment

Choose a reason for hiding this comment

kasuga-fj Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

kasuga-fj Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Meinersbur left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasuga-fj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Meinersbur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasuga-fj left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasuga-fj commented Mar 5, 2025

kasuga-fj commented Mar 12, 2025

kasuga-fj commented Mar 14, 2025

Meinersbur left a comment

Choose a reason for hiding this comment

kasuga-fj commented Mar 21, 2025

llvm-ci commented Mar 21, 2025

kasuga-fj commented Feb 17, 2025 •

edited

Loading

github-actions bot commented Feb 17, 2025 •

edited

Loading

Meinersbur left a comment •

edited

Loading

kasuga-fj Feb 18, 2025 •

edited

Loading

kasuga-fj Feb 18, 2025 •

edited

Loading

Meinersbur Feb 19, 2025 •

edited

Loading

kasuga-fj Feb 18, 2025 •

edited

Loading

kasuga-fj Feb 18, 2025 •

edited

Loading

Meinersbur left a comment •

edited

Loading

kasuga-fj left a comment •

edited

Loading