-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoopInterchange] Prevent from undoing its own transformation #127473
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Ryotaro Kasuga (kasuga-fj) ChangesLoopInterchange uses the bubble-sort fashion algorithm to sort the loops, but the comparison function (called isProfitable) didn't satisfy asymmetry. This means that both isProfitable(a, b) and isProfitable(b, a) can return true, triggering an unprofitable interchange. This patch fixes the problem and prevents the interchange from performing unprofitable transformations. Full diff: https://github.com/llvm/llvm-project/pull/127473.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index f45d90ff13e14..faa4e22d0d161 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -356,26 +356,24 @@ class LoopInterchangeLegality {
SmallVector<PHINode *, 8> InnerLoopInductions;
};
+using CostMapTy = DenseMap<const Loop *, std::pair<unsigned, CacheCostTy>>;
+
/// LoopInterchangeProfitability checks if it is profitable to interchange the
/// loop.
class LoopInterchangeProfitability {
public:
LoopInterchangeProfitability(Loop *Outer, Loop *Inner, ScalarEvolution *SE,
- OptimizationRemarkEmitter *ORE)
- : OuterLoop(Outer), InnerLoop(Inner), SE(SE), ORE(ORE) {}
+ OptimizationRemarkEmitter *ORE, const std::optional<CostMapTy> &CM)
+ : OuterLoop(Outer), InnerLoop(Inner), SE(SE), ORE(ORE), CostMap(CM) {}
/// Check if the loop interchange is profitable.
bool isProfitable(const Loop *InnerLoop, const Loop *OuterLoop,
unsigned InnerLoopId, unsigned OuterLoopId,
- CharMatrix &DepMatrix,
- const DenseMap<const Loop *, unsigned> &CostMap,
- std::unique_ptr<CacheCost> &CC);
+ CharMatrix &DepMatrix);
private:
int getInstrOrderCost();
- std::optional<bool> isProfitablePerLoopCacheAnalysis(
- const DenseMap<const Loop *, unsigned> &CostMap,
- std::unique_ptr<CacheCost> &CC);
+ std::optional<bool> isProfitablePerLoopCacheAnalysis();
std::optional<bool> isProfitablePerInstrOrderCost();
std::optional<bool> isProfitableForVectorization(unsigned InnerLoopId,
unsigned OuterLoopId,
@@ -388,6 +386,8 @@ class LoopInterchangeProfitability {
/// Interface to emit optimization remarks.
OptimizationRemarkEmitter *ORE;
+
+ const std::optional<CostMapTy> &CostMap;
};
/// LoopInterchangeTransform interchanges the loop.
@@ -497,11 +497,13 @@ struct LoopInterchange {
// indicates the loop should be placed as the innermost loop.
//
// For the old pass manager CacheCost would be null.
- DenseMap<const Loop *, unsigned> CostMap;
+ std::optional<CostMapTy> CostMap = std::nullopt;
if (CC != nullptr) {
+ CostMap = CostMapTy();
const auto &LoopCosts = CC->getLoopCosts();
for (unsigned i = 0; i < LoopCosts.size(); i++) {
- CostMap[LoopCosts[i].first] = i;
+ const auto &Cost = LoopCosts[i];
+ (*CostMap)[Cost.first] = std::make_pair(i, Cost.second);
}
}
// We try to achieve the globally optimal memory access for the loopnest,
@@ -537,7 +539,7 @@ struct LoopInterchange {
bool processLoop(Loop *InnerLoop, Loop *OuterLoop, unsigned InnerLoopId,
unsigned OuterLoopId,
std::vector<std::vector<char>> &DependencyMatrix,
- const DenseMap<const Loop *, unsigned> &CostMap) {
+ const std::optional<CostMapTy> &CostMap) {
LLVM_DEBUG(dbgs() << "Processing InnerLoopId = " << InnerLoopId
<< " and OuterLoopId = " << OuterLoopId << "\n");
LoopInterchangeLegality LIL(OuterLoop, InnerLoop, SE, ORE);
@@ -546,9 +548,8 @@ struct LoopInterchange {
return false;
}
LLVM_DEBUG(dbgs() << "Loops are legal to interchange\n");
- LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE);
- if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId,
- DependencyMatrix, CostMap, CC)) {
+ LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE, CostMap);
+ if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId, DependencyMatrix)) {
LLVM_DEBUG(dbgs() << "Interchanging loops not profitable.\n");
return false;
}
@@ -1127,29 +1128,60 @@ int LoopInterchangeProfitability::getInstrOrderCost() {
}
std::optional<bool>
-LoopInterchangeProfitability::isProfitablePerLoopCacheAnalysis(
- const DenseMap<const Loop *, unsigned> &CostMap,
- std::unique_ptr<CacheCost> &CC) {
+LoopInterchangeProfitability::isProfitablePerLoopCacheAnalysis() {
// This is the new cost model returned from loop cache analysis.
// A smaller index means the loop should be placed an outer loop, and vice
// versa.
- if (CostMap.contains(InnerLoop) && CostMap.contains(OuterLoop)) {
- unsigned InnerIndex = 0, OuterIndex = 0;
- InnerIndex = CostMap.find(InnerLoop)->second;
- OuterIndex = CostMap.find(OuterLoop)->second;
- LLVM_DEBUG(dbgs() << "InnerIndex = " << InnerIndex
- << ", OuterIndex = " << OuterIndex << "\n");
- if (InnerIndex < OuterIndex)
- return std::optional<bool>(true);
- assert(InnerIndex != OuterIndex && "CostMap should assign unique "
- "numbers to each loop");
- if (CC->getLoopCost(*OuterLoop) == CC->getLoopCost(*InnerLoop))
- return std::nullopt;
- return std::optional<bool>(false);
- }
- return std::nullopt;
+ if (!CostMap.has_value())
+ return std::nullopt;
+
+ auto InnerIte = CostMap->find(InnerLoop);
+ auto OuterIte = CostMap->find(OuterLoop);
+ if (InnerIte == CostMap->end() || OuterIte == CostMap->end())
+ return std::nullopt;
+
+ const auto &[InnerIndex, InnerCost] = InnerIte->second;
+ const auto &[OuterIndex, OuterCost] = OuterIte->second;
+ LLVM_DEBUG(dbgs() << "InnerIndex = " << InnerIndex
+ << ", OuterIndex = " << OuterIndex << "\n");
+ assert(InnerIndex != OuterIndex && "CostMap should assign unique "
+ "numbers to each loop");
+
+ if (InnerCost == OuterCost)
+ return std::nullopt;
+
+ return InnerIndex < OuterIndex;
}
+// This function doesn't satisfy transitivity. Consider the following case.
+//
+// ```
+// for (int k = 0; k < N; k++) {
+// for (int j = 0; j < N; j++) {
+// for (int i = 0; i < N; i++) {
+// dst0[i][j][k] += aa[i][j] + bb[i][j] + cc[j][k];
+// dst1[k][j][i] += dd[i][j] + ee[i][j] + ff[j][k];
+// }
+// }
+// }
+//
+// ```
+//
+// The getInstrOrderCost will return the following value.
+//
+// Outer | Inner | Cost
+// -------+-------+------
+// k | j | -2
+// j | i | -4
+// k | i | 0
+//
+// This means that this function says interchanging (k, j) loops and (j, i)
+// loops are profitable, but not (k, i). The root cause of this is that the
+// getInstrOrderCost only see the loops we are checking. We can resolve this if
+// we also consider the order going through other inductions. As for the above
+// case, we can induce that interchanging `k` and `i` is profitable (it is
+// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`.
+// However, such accurate calculation is expensive, so that we don't do it.
std::optional<bool>
LoopInterchangeProfitability::isProfitablePerInstrOrderCost() {
// Legacy cost model: this is rough cost estimation algorithm. It counts the
@@ -1184,11 +1216,25 @@ std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(
return std::optional<bool>(!DepMatrix.empty());
}
+// The bubble-sort fashion algorithm is adopted to sort the loop nest, so the
+// comparison function should ideally induce a strict weak ordering required by
+// some standard C++ libraries. In particular, isProfitable should hold the
+// following properties.
+//
+// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false.
+// Transitivity: If both isProfitable(a, b) and isProfitable(b, c) is true then
+// isProfitable(a, c) is true.
+//
+// The most important thing is not to make unprofitable interchange. From this
+// point of view, asymmetry is important. This is because if both
+// isProfitable(a, b) and isProfitable(b, a) are true, then an unprofitable
+// transformation (one of them) will be performed. On the other hand, a lack of
+// transitivity might cause some optimization opportunities to be lost, but
+// won't trigger an unprofitable one. Moreover, guaranteeing transitivity is
+// expensive. Therefore, isProfitable only holds the asymmetry.
bool LoopInterchangeProfitability::isProfitable(
const Loop *InnerLoop, const Loop *OuterLoop, unsigned InnerLoopId,
- unsigned OuterLoopId, CharMatrix &DepMatrix,
- const DenseMap<const Loop *, unsigned> &CostMap,
- std::unique_ptr<CacheCost> &CC) {
+ unsigned OuterLoopId, CharMatrix &DepMatrix) {
// isProfitable() is structured to avoid endless loop interchange.
// If loop cache analysis could decide the profitability then,
// profitability check will stop and return the analysis result.
@@ -1197,15 +1243,14 @@ bool LoopInterchangeProfitability::isProfitable(
// profitable for InstrOrderCost. Likewise, if InstrOrderCost failed to
// analysis the profitability then only, isProfitableForVectorization
// will decide.
- std::optional<bool> shouldInterchange =
- isProfitablePerLoopCacheAnalysis(CostMap, CC);
- if (!shouldInterchange.has_value()) {
- shouldInterchange = isProfitablePerInstrOrderCost();
- if (!shouldInterchange.has_value())
- shouldInterchange =
+ std::optional<bool> ShouldInterchange = isProfitablePerLoopCacheAnalysis();
+ if (!ShouldInterchange.has_value()) {
+ ShouldInterchange = isProfitablePerInstrOrderCost();
+ if (!ShouldInterchange.has_value())
+ ShouldInterchange =
isProfitableForVectorization(InnerLoopId, OuterLoopId, DepMatrix);
}
- if (!shouldInterchange.has_value()) {
+ if (!ShouldInterchange.has_value()) {
ORE->emit([&]() {
return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable",
InnerLoop->getStartLoc(),
@@ -1214,7 +1259,8 @@ bool LoopInterchangeProfitability::isProfitable(
"interchange.";
});
return false;
- } else if (!shouldInterchange.value()) {
+ }
+ if (!ShouldInterchange.value()) {
ORE->emit([&]() {
return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable",
InnerLoop->getStartLoc(),
diff --git a/llvm/test/Transforms/LoopInterchange/profitability-redundant-interchange.ll b/llvm/test/Transforms/LoopInterchange/profitability-redundant-interchange.ll
new file mode 100644
index 0000000000000..55056dca15d3e
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/profitability-redundant-interchange.ll
@@ -0,0 +1,80 @@
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=1 -pass-remarks-output=%t -disable-output \
+; -verify-dom-info -verify-loop-info
+; RUN: FileCheck -input-file %t %s
+
+
+; Test that the same pair of loops are not interchanged twice. This is the case
+; when the cost computed by CacheCost is the same for the loop of `j` and `k`.
+;
+; #define N 4
+; int a[N*N][N*N][N*N];
+; void f() {
+; for (int i = 0; i < N; i++)
+; for (int j = 1; j < 2*N; j++)
+; for (int k = 1; k < 2*N; k++)
+; a[i][k+1][j-1] -= a[i+N-1][k][j];
+; }
+
+; CHECK: --- !Passed
+; CHECK-NEXT: Pass: loop-interchange
+; CHECK-NEXT: Name: Interchanged
+; CHECK-NEXT: Function: f
+; CHECK-NEXT: Args:
+; CHECK-NEXT: - String: Loop interchanged with enclosing loop.
+; CHECK-NEXT: ...
+; CHECK-NEXT: --- !Missed
+; CHECK-NEXT: Pass: loop-interchange
+; CHECK-NEXT: Name: Dependence
+; CHECK-NEXT: Function: f
+; CHECK-NEXT: Args:
+; CHECK-NEXT: - String: Cannot interchange loops due to dependences.
+; CHECK-NEXT: ...
+; CHECK-NEXT: --- !Missed
+; CHECK-NEXT: Pass: loop-interchange
+; CHECK-NEXT: Name: InterchangeNotProfitable
+; CHECK-NEXT: Function: f
+; CHECK-NEXT: Args:
+; CHECK-NEXT: - String: Interchanging loops is not considered to improve cache locality nor vectorization.
+; CHECK-NEXT: ...
+
+@a = dso_local local_unnamed_addr global [16 x [16 x [16 x i32]]] zeroinitializer, align 4
+
+define dso_local void @f() {
+entry:
+ br label %for.cond1.preheader
+
+for.cond1.preheader:
+ %indvars.iv46 = phi i64 [ 0, %entry ], [ %indvars.iv.next47, %for.cond.cleanup3 ]
+ %0 = add nuw nsw i64 %indvars.iv46, 3
+ br label %for.cond5.preheader
+
+for.cond5.preheader:
+ %indvars.iv41 = phi i64 [ 1, %for.cond1.preheader ], [ %indvars.iv.next42, %for.cond.cleanup7 ]
+ %1 = add nsw i64 %indvars.iv41, -1
+ br label %for.body8
+
+for.cond.cleanup3:
+ %indvars.iv.next47 = add nuw nsw i64 %indvars.iv46, 1
+ %exitcond50 = icmp ne i64 %indvars.iv.next47, 4
+ br i1 %exitcond50, label %for.cond1.preheader, label %for.cond.cleanup
+
+for.cond.cleanup7:
+ %indvars.iv.next42 = add nuw nsw i64 %indvars.iv41, 1
+ %exitcond45 = icmp ne i64 %indvars.iv.next42, 8
+ br i1 %exitcond45, label %for.cond5.preheader, label %for.cond.cleanup3
+
+for.body8:
+ %indvars.iv = phi i64 [ 1, %for.cond5.preheader ], [ %indvars.iv.next, %for.body8 ]
+ %arrayidx12 = getelementptr inbounds nuw [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %0, i64 %indvars.iv, i64 %indvars.iv41
+ %2 = load i32, ptr %arrayidx12, align 4
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %arrayidx20 = getelementptr inbounds [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %indvars.iv46, i64 %indvars.iv.next, i64 %1
+ %3 = load i32, ptr %arrayidx20, align 4
+ %sub21 = sub nsw i32 %3, %2
+ store i32 %sub21, ptr %arrayidx20, align 4
+ %exitcond = icmp ne i64 %indvars.iv.next, 8
+ br i1 %exitcond, label %for.body8, label %for.cond.cleanup7
+
+for.cond.cleanup:
+ ret void
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
a08068a
to
45a8373
Compare
@@ -388,6 +387,8 @@ class LoopInterchangeProfitability { | |||
|
|||
/// Interface to emit optimization remarks. | |||
OptimizationRemarkEmitter *ORE; | |||
|
|||
const std::optional<CostMapTy> &CostMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: a comment explaining what this is exactly capturing would be good. I.e., it's a map of:
Loop
-> [index, cost]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -497,11 +498,13 @@ struct LoopInterchange { | |||
// indicates the loop should be placed as the innermost loop. | |||
// | |||
// For the old pass manager CacheCost would be null. | |||
DenseMap<const Loop *, unsigned> CostMap; | |||
std::optional<CostMapTy> CostMap = std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've also added a CostMap
to the LoopInterchangeProfitability
class, it's not yet clear to me if we need both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CostMap
in LoopInterchangeProfitability
is a reference to this object. I just changed the way it is passed: from the argument of isProfitable
to the ctor.
I think we should clean up the code in this area. For example, the arguments InnerLoop
and OuterLoop
of isProfitable
look redundant. They would be the same as what is passed by the ctor.
// we also consider the order going through other inductions. As for the above | ||
// case, we can induce that interchanging `k` and `i` is profitable (it is | ||
// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`. | ||
// However, such accurate calculation is expensive, so that we don't do it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this example and comment, this really helps!
One remark about the last statement:
However, such accurate calculation is expensive, so that we don't do it.
Can you add where this leaves us, what we do instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this comment because it is not related to the purpose of this patch. If it is helpful, I will add it back as another PR.
// some standard C++ libraries. In particular, isProfitable should hold the | ||
// following properties. | ||
// | ||
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question on the assymetry: does this really need to hold? Can it not be the case that isProfitable(a, b)
and isProfitable(b, a)
are both profitable, or both neutral, or both not really good? I haven't tried to come up with examples, but I thought about this as a way of saying the cost-model saying "don't know
"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the asymmetry is necessary.
The isProfitable
returns bool
, but the profitability check is actually std::optional<bool>
, which is expressed by ShouldInterchange
. This means that the cost-model has three types of results: "should interchange them", "should NOT interchange them", and "don't know". The nullopt
is squashed into false
, so not isProfitable(a, b)
could mean either "should NOT" or "don't know".
To answer your question, there are cases where both isProfitable(a, b)
and isProfitable(b, a)
become false. Asymmetry doesn't say anything about this case, it requires that they cannot both be true. In this case cost-model may be saying "I don't know both", or "I don't know one, but we should not perform the other". Also, I think we should design the cost-model so that it never says both are profitable, and that the asymmetry says. At least as far as we use the bubble sort fashion algorithm, it can cause the same pair of loops to be interchanged more than twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My use of mathematical terms was incorrect. What I wanted to argue here is the following part in the comment #127473 (review) by @Meinersbur .
LoopInterchange should indeed not undo its own optimization, which indicates a problem with the cost model (which should be conservative, never apply an transformation we are not pretty sure it will improve performance; if we do we were not conservative enough).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the Summary:
This patch fixes the problem and prevents the interchange from performing unprofitable transformations.
Please eloborate on how this PR does this. Due to refactorings in the diff I haven't yet found what the actual change is.
LoopInterchange should indeed not undo its own optimization, which indicates a problem with the cost model (which should be conservative, never apply an transformation we are not pretty sure it will improve performance; if we do we were not conservative enough). However, this is a greedy algorithm on neighboring loops, properties that need to take the entire loop nest into consideration is beyond this scope. Feel free to replace the greedy algorithm, but you cannot ensure global properties with just local information1.
Footnotes
-
Well, it is the case for some problems, such as minimal spanning tree. Program performance is propbably not one of them. ↩
// This function doesn't satisfy transitivity. Consider the following case. | ||
// | ||
// ``` | ||
// for (int k = 0; k < N; k++) { | ||
// for (int j = 0; j < N; j++) { | ||
// for (int i = 0; i < N; i++) { | ||
// dst0[i][j][k] += aa[i][j] + bb[i][j] + cc[j][k]; | ||
// dst1[k][j][i] += dd[i][j] + ee[i][j] + ff[j][k]; | ||
// } | ||
// } | ||
// } | ||
// | ||
// ``` | ||
// | ||
// The getInstrOrderCost will return the following value. | ||
// | ||
// Outer | Inner | Cost | ||
// -------+-------+------ | ||
// k | j | -2 | ||
// j | i | -4 | ||
// k | i | 0 | ||
// | ||
// This means that this function says interchanging (k, j) loops and (j, i) | ||
// loops are profitable, but not (k, i). The root cause of this is that the | ||
// getInstrOrderCost only see the loops we are checking. We can resolve this if | ||
// we also consider the order going through other inductions. As for the above | ||
// case, we can induce that interchanging `k` and `i` is profitable (it is | ||
// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`. | ||
// However, such accurate calculation is expensive, so that we don't do it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making your comments on of:
- Doygen comment at the function's declaration documenting its usage
- Comment inside the function definition documenting the implementation
// Transitivity: If both isProfitable(a, b) and isProfitable(b, c) is true then | ||
// isProfitable(a, c) is true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think transitivity needs to apply. Loop interchange's bubble sort is a greedy algorithm: Make local decision in the hopes that it improves the global result. In particular, it depends on the order of a,b,c in the loop nest. After exchanging a and b, exchanging b and c might not be profitable anymot
for c:
for a:
for b:
A[b][a]
B[c]
Interchanging a with b is profitable (making A[b][a]
consecutive). Interchanging b with c as well (making B[c]
consecutive)1. But after interchanging a and b, interchanging b and c is not necessarily profitable: it destroys that the entire A[b][a]
is accessed consecutively.
Of course there are more concerns than conscecutive accesses, such as data reuse. But this is not how isProfitable
: It consists of multiple single concern decision-making rules, applying the one with the higherst priority. They do not (and cannot) have a global view on what the eventual best loop order would be. If that would be the case, we should compute that order in advance, then apply bubble sort until we reach that order. That wouln't be a greedy algorithm anymore though.
Footnotes
-
You may argue the question is malformed: b and c are not neighbors, they cannot even be interchanged. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't intend to require transitivity for LoopInterchange. I just wrote a property that the comparison function of a comparison sort holds, but that LoopInterchangeProfitability::isProfitable
does not. But I still think that the transitivity should be satisfied if it is possible, under the assumption that the profitability decisions don't change depending on the current order of the nest. Your case looks to me the case where the profitability changes after interchanging, where in my understanding the decision for b and c doesn't change before and after interchanging a and b in the current implementation.
// some standard C++ libraries. In particular, isProfitable should hold the | ||
// following properties. | ||
// | ||
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that interchange(a, b)
is the same as interchange(b, a)
, the relation whether the two should be exchanged is symmetric, not asymmetric. What you are looking for is "areLoopsInTheRightOrder", independent of what the current order is.
Also consider that isProfitable(a, b)
and isProfitable(a, b)
on the interchanged loops are not the same input. The first is a query on
for a:
for b:
the second is a query on:
for b':
for a':
a
and a'
, as well as b
, b'
are not the same loops. They look very different to isProfitablePerLoopCacheAnalysis
, isProfitablePerInstrOrderCost
, and isProfitableForVectorization
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I want to argue here is that for all f
in the three single rules (isProfitablePerLoopCacheAnalysis
, isProfitablePerInstrOrderCost
, or isProfitableForVectorization
) the following property holds regardless of the current loop order
If f(a, b)
is true, then f(b, a)
is false (or nullopt).
This patch fixes the problem that isProfitablePerLoopCacheAnalysis
didn't satisfy it. I believe this is the same as what you call areLoopsInTheRightOrder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If f(a, b) is true, then f(b, a) is false (or nullopt).
The mathematical terms do not match what you want to say.
-
isProfitable
requires the inner loop to be the first argument, the second is the outer loop. Ifa
is the outer loop, thenisProfitable(b,a)
is undefined. -
In order for
to be a function, all inputs must be in its arguments, namly the LLVM-IR, for . After interchange having been applied, R
has changed. You get a new function. Even if the a (and b) are considered "equal" in some sense, the underlying LLVM-IR has changed, and with it the function result. One could say make relationship claims between and , but its not asymmetry or transitivity, since those are about a single relation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, my use of mathematical terms is incorrect. What I want is like
-
is true then is false, where - a is the inner loop and b is the outer one in
. -
is what some interchanges are applied to so that the order of a and b is exchanged.
- a is the inner loop and b is the outer one in
I will fix the comment. So, do you think the changes of this patch is correct to accomplish that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you want to express the "transivity" part? That is, that after a series of swaps you get back to the original order, without directly undoing a previous swap?
There is no need to use math for this, unless you want to get into formal proofs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the comments about transitivity. IIUIC, the transitivity doesn't affect what I want to fix in this patch.
The reason I mentioned the transitivity is that I was initially thinking about the properties that the comparison function of a comparison sort must have in general. It may affect the performance improvements, but is different from the topic of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description was really bad. This patch fixes the issue that LoopInterchange can undo its optimization, as I described in #118267 (comment). I don't intend to archive the global optimal order. Fixed the PR title and description.
// some standard C++ libraries. In particular, isProfitable should hold the | ||
// following properties. | ||
// | ||
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I want to argue here is that for all f
in the three single rules (isProfitablePerLoopCacheAnalysis
, isProfitablePerInstrOrderCost
, or isProfitableForVectorization
) the following property holds regardless of the current loop order
If f(a, b)
is true, then f(b, a)
is false (or nullopt).
This patch fixes the problem that isProfitablePerLoopCacheAnalysis
didn't satisfy it. I believe this is the same as what you call areLoopsInTheRightOrder
.
// Transitivity: If both isProfitable(a, b) and isProfitable(b, c) is true then | ||
// isProfitable(a, c) is true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't intend to require transitivity for LoopInterchange. I just wrote a property that the comparison function of a comparison sort holds, but that LoopInterchangeProfitability::isProfitable
does not. But I still think that the transitivity should be satisfied if it is possible, under the assumption that the profitability decisions don't change depending on the current order of the nest. Your case looks to me the case where the profitability changes after interchanging, where in my understanding the decision for b and c doesn't change before and after interchanging a and b in the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move the refactorings/cleanups out of this patch? I am still confused what CostMap
actually does.
// some standard C++ libraries. In particular, isProfitable should hold the | ||
// following properties. | ||
// | ||
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you want to express the "transivity" part? That is, that after a series of swaps you get back to the original order, without directly undoing a previous swap?
There is no need to use math for this, unless you want to get into formal proofs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. I erased the off topic comments on this patch.
The problem is the current behavior of isProfitablePerLoopCacheAnalysis
when CC->getLoopCost(*OuterLoop) == CC->getLoopCost(*InnerLoop)
. It returns true if InnerIndex < OuterIndex
, but otherwise returns nullopt, which may trigger to undo a past interchange.
@@ -388,6 +387,8 @@ class LoopInterchangeProfitability { | |||
|
|||
/// Interface to emit optimization remarks. | |||
OptimizationRemarkEmitter *ORE; | |||
|
|||
const std::optional<CostMapTy> &CostMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -497,11 +498,13 @@ struct LoopInterchange { | |||
// indicates the loop should be placed as the innermost loop. | |||
// | |||
// For the old pass manager CacheCost would be null. | |||
DenseMap<const Loop *, unsigned> CostMap; | |||
std::optional<CostMapTy> CostMap = std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CostMap
in LoopInterchangeProfitability
is a reference to this object. I just changed the way it is passed: from the argument of isProfitable
to the ctor.
I think we should clean up the code in this area. For example, the arguments InnerLoop
and OuterLoop
of isProfitable
look redundant. They would be the same as what is passed by the ctor.
// some standard C++ libraries. In particular, isProfitable should hold the | ||
// following properties. | ||
// | ||
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the comments about transitivity. IIUIC, the transitivity doesn't affect what I want to fix in this patch.
The reason I mentioned the transitivity is that I was initially thinking about the properties that the comparison function of a comparison sort must have in general. It may affect the performance improvements, but is different from the topic of this PR.
// some standard C++ libraries. In particular, isProfitable should hold the | ||
// following properties. | ||
// | ||
// Asymmetry: If isProfitable(a, b) is true then isProfitable(b, a) is false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My use of mathematical terms was incorrect. What I wanted to argue here is the following part in the comment #127473 (review) by @Meinersbur .
LoopInterchange should indeed not undo its own optimization, which indicates a problem with the cost model (which should be conservative, never apply an transformation we are not pretty sure it will improve performance; if we do we were not conservative enough).
// we also consider the order going through other inductions. As for the above | ||
// case, we can induce that interchanging `k` and `i` is profitable (it is | ||
// better to move the `k` loop to inner position) by `bb[i][j]` and `cc[j][k]`. | ||
// However, such accurate calculation is expensive, so that we don't do it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this comment because it is not related to the purpose of this patch. If it is helpful, I will add it back as another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move the refactorings/cleanups out of this patch?
Please still consider.
/// Map each loop to a pair of (index, cost). The index is the best position | ||
/// of this loop, according to CacheCostAnalysis. The cost is the actual cost | ||
/// as calculated by CacheCostAnalysis. | ||
const std::optional<CostMapTy> &CostMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing a reference to memory not owned by this object is asking for problems (I am speaking out of experience). If LoopInterchangeProfitability
cannot own it, it should still be passed when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undid this change. I believe LoopInterchangeProfitability
can own the CostMap
, but if so, I'll do it in another patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move the refactorings/cleanups out of this patch?
Please still consider.
Sorry, I've misread your comment...
/// Map each loop to a pair of (index, cost). The index is the best position | ||
/// of this loop, according to CacheCostAnalysis. The cost is the actual cost | ||
/// as calculated by CacheCostAnalysis. | ||
const std::optional<CostMapTy> &CostMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undid this change. I believe LoopInterchangeProfitability
can own the CostMap
, but if so, I'll do it in another patch.
@Meinersbur Could you please take a look again? |
Gentle ping |
LoopInterchange uses the bubble-sort fashion algorithm to sort the loops in a loop nest. For two loops A and B, it calls isProfitable(A, B) to determine whether these loops should be exchanged. The result of isProfitable(A, B) should be conservative, that is, it should return true only when we are sure exchanging A and B will improve performance. If it's not conservative enough, it's possible that a subsequent isProfitable(B, A) will also return true, in which case LoopInterchange will undo its previous transformation. To make isProfitable conservative enough, isProfitable(B, A) must not return true if isProfitable(A, B) returned true in the past. However, the current implementation can be in such a situation. isProfitable consists of several multiple rules, and they are applied one by one. The first rule is isProfitablePerLoopCacheAnalysis, and the problem is that there are cases where isProfitablePerLoopCacheAnalysis(A, B) returns true but isProfitablePerLoopCacheAnalysis(B, A) returns nullopt. This happens when the cost computed by CacheCostAnalysis is the same for A and B. In this case, the result of isProfitablePerLoopCacheAnalysis depends on the value of CostMap, which maps a loop to an index that is pre-computed in some way: If CostMap[A] is less than CostMap[B], it returns true, otherwise it returns nullopt, where the above problem occurs. To avoid such a situation, if the cost of CacheCostAnalysis is the same for A and B, it should return nullopt regardless of CostMap.
d2424cf
to
6a5ce1f
Compare
Rebased to resolve conflicts. Also updated the description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This directly fixes the root cause. LGTM and thanks.
Don't forget to update the summary for the commit message.
Thanks for the kind review! |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/18115 Here is the relevant piece of the build log for the reference
|
LoopInterchange uses the bubble-sort fashion algorithm to sort the loops in a loop nest. For two loops
A
andB
, it callsisProfitable(A, B)
to determine whether these loops should be exchanged. The result ofisProfitable(A, B)
should be conservative, that is, it should return true only when we are sure exchangingA
andB
will improve performance. If it's not conservative enough, it's possible that a subsequentisProfitable(B, A)
will also return true, in which case LoopInterchange will undo its previous transformation. To avoid such cases,isProfitable(B, A)
must not return true ifisProfitable(A, B)
returned true in the past. However, the current implementation can be in such a situation. This patch resolves it by correcting the handling of two loops that have the same cache cost.This resolve the problem mentioned in #118267 (comment).