New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate undefs wherever possible by finding def-free paths through loop. #19193
Conversation
@swift-ci please test tensorflow linux |
@swift-ci please test tensorflow macos |
I can't quite parse this sentence. Can you please clarify, and perhaps give an example (based on the unit test) on what that path could be? The code checks on "domination", but here we use the concept "def-free" -- can you elaborate on the relationship? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this PR is one of the steps, I'm approving it to unblock incremental progress. We can continue the discussion on the side.
|
||
using namespace swift; | ||
using namespace tf; | ||
|
||
static llvm::cl::opt<bool> TFEnsureSingleLoopExit( | ||
"tf-ensure-single-loop-exit", llvm::cl::init(true), | ||
llvm::cl::desc("Transform loops to have a single exit from header.")); | ||
static llvm::cl::opt<bool> TFNoUndefsInSESE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the intended use of this flag?
are there possible bugs. if yes it'd be good to try addressing them before flipping to true.
Note if the bug causes wrong results, user may not be aware of the root cause and thus "turning off the flag" in that case would not be a useful workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly for debugging. I just needed a way to flip to the old behavior without recompiling the binary. (Will update the comment accordingly.) I don't expect the flag to be in the code base for long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG
@@ -361,8 +369,10 @@ class SingleExitLoopTransformer { | |||
// the new latchBlock as appropriate. | |||
SmallVector<std::pair<const SILBasicBlock *, const SILBasicBlock *>, 8> | |||
edgesToFix; | |||
/// Identify the set of values that escape the loop. | |||
llvm::SmallPtrSet<SILValue, 8> escapingValues; | |||
/// Identify the set of values that escape the loop. The keys represent the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: keys -> key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -386,13 +396,17 @@ void SingleExitLoopTransformer::initialize() { | |||
} | |||
} | |||
|
|||
computeEscapingValues(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be cleaner to:
-
make computeEscapingValues() return a dense map, and assign that to escapingValueSubstMap. this makes it clear what the (side)effects of computeEscapingValues() are. even better, mark computeEscapingValues() static if that's possible.
-
can we even make escapingValueSubstMap a const member, and set it in the c'tor initializer list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 is a good idea. Instead of static, I made this a const function.
2 is not possible. In subsequent PRs, I will do CFG transformations before computing the escaping values.
if (!TFNoUndefsInSESE) return; | ||
|
||
// Find a def-free path from the escaping value to the preheader for each | ||
// escaping value. If no such path is found, set the escaping value at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in what cases can we not find such a path? an example (or maybe unit test) could help.
it's useful to be able to characterize the cases that we can remove (or cannot remove) undefs, because then we can assess if the gained benefits outweigh the added complexity. also that'll give us a mental model of looking at user code, and predicting if undefs will be there.
I will add a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately, we should never have undefs when I finish all the PRs.
do..while is a simple case, where we cannot find a def-free path:
{
var count:Int32 = 0
var sum:Int32 = 0
repeat {
sum += count
count += 1
} while (count < 100)
return sum
}
In this case, the only way is to clone the body of the loop as follows:
{
var count:Int32 = 0
var sum:Int32 = 0
sum += count
count += 1
stayInLoop = (count < 100) ? true : false
while (stayInLoop) {
sum += count
count += 1
stayInLoop = (count < 100) ? true : false
}
return sum
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test case that is similar to the one outlined here. In this case, we will still produce undefs, but will eliminate it eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
kv.second = current; | ||
break; | ||
} | ||
} else if (auto *arg = dyn_cast<SILArgument>(current)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these the only two cases relevant here (are we guaranteed to never see an undef at this point)? if yes it'd be good to assert on the second case (or assert there is no other case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these are the cases that will lead to a def-free path. I think this part will become clearer if I rewrite this in terms of computing equivalence classes induced by argument passing. It will be more efficient and cleaner. Let me rewrite this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote teh code to use equivalance classes which conveys the intention better and is more efficient.
@@ -470,7 +524,8 @@ SingleExitLoopTransformer::createNewHeader() { | |||
} | |||
header->dropAllArguments(); | |||
// Add phi arguments in the new header corresponding to the escaping values. | |||
for (const auto &escapingValue : escapingValues) { | |||
for (const auto &kv : escapingValueSubstMap) { | |||
const SILValue& escapingValue = kv.first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe SILValue is a lightweight wrapper, and we need not take a reference. (making a copy is fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
for (const SILValue &escapingValue : escapingValues) { | ||
newArgs.push_back(getUndef(escapingValue->getType())); | ||
for (const auto &kv : escapingValueSubstMap) { | ||
newArgs.push_back(kv.second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a note for future consideration: if we really think that some undefs cannot be eliminated (may be useful to discuss that first), we may want to log the undefs and/or add stats counter on them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there should not be any undefs in the final version.
SmallVector<SILValue, 8> incomingValues; | ||
arg->getIncomingValues(incomingValues); | ||
for (const SILValue &incomingValue : incomingValues) { | ||
if (visited.count(incomingValue) > 0) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just write if (visited.count(incomingValue))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually write (visited.count(...) > 0) as it reduces some cognitive load when reading code. I don't have a strong preference though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. No strong preference either.
break; | ||
} | ||
// This is not usable. Add incoming values to worklist. | ||
SmallVector<SILValue, 8> incomingValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code here suggests if any incoming value is in a block that dominates preheader, we can use it.
but actually don't we need all, not any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rewrite this code using equivalence classes and the idea will be come clear.
I have changed the implementation to use equivalence classes. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Equiv class based code does make it more readable. Great!
} | ||
|
||
llvm::DenseMap<SILValue, SILValue> | ||
SingleExitLoopTransformer::computeEscapingValues() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider passing llvm::DenseMap<SILValue, SILValue> as an output (pointer) param into the func.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am conflicted here. :) I generally prefer the functional style as it makes it easy to read code, but llvm and swift seems to mostly have inout param style. Is that the preferred style? If so, I will change it.
This should be as efficient as the compiler will optimize away copies with Return Value Optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference. I don't quite like the current closure syntax "&result" though, but it's not a big issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will leave it as is then. (Even if I pass an argument, I will need to add the inout parameter to the capture?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you pass the input by pointer/address into computeEscapingValues(), you can then pass that pointer to the closure by value. but it's not a big deal.
const SILValue &escapingValue = kv.first; | ||
// Find an equivalent value that dominates the header. | ||
for (auto equivalentValue : | ||
make_range(equivalentValues.findLeader(escapingValue), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that we have an equiv class [v1, v2], and v1 is chosen as the leader, but v2 dominates preheader, not v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that can happen. The leader for a class depends upon a bunch of factors:
- order in which unification constraints are processed
- the implementation of the union-find
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it would seem to lead to suboptimal transformation -- if we choose v2 in the above example, the undef can be removed. Do we want to iterate over the members (as in your old code), rather than only checking on the "leader" picked by the underlying llvm infra? (pls feel free to scope it out of this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are actually iterating over all the elements of the equivalence class. For this example, we will chose v2 as we iterate over {v1, v2}. See the make_range(...) call.
(It seems to me that the code is not readable. Any suggestions to improve it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. So we are iterating between equivalentValues.findLeader(escapingValue)
and equivalentValues.member_end()
. say the container has a list of elems [v1, v2, v3]
, and findLeader()
returns v2
, does that then skip processing v1
?
I wonder if we should call member_begin()
instead of findLeader()
.
In any case, it might help to add a comment right before make_range()
to clarify the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see the source of confusion now. findLeader() returns an member iterator for the equivalence class.
Unfortunately, calling member_begin() on non-leaders returns null.
So, we are iterating over all the elements of the equivalence class. We won't be skipping v1 in your example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got on the same page with Gogul through offline discussion.
@@ -591,7 +641,8 @@ SingleExitLoopTransformer::patchEdges(SILBasicBlock *newHeader, | |||
getUserSourceLocation(src->getTerminator()->getDebugLocation())); | |||
// Find an appropriate value to use for each escaping value. | |||
unsigned argIndex = oldHeaderNumArgs; | |||
for (const SILValue &escapingValue : escapingValues) { | |||
for (const auto &kv : escapingValueSubstMap) { | |||
const SILValue &escapingValue = kv.first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const SILValue &escapingValue -> SILValue escapingValue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@swift-ci please test TensorFlow linux |
Thanks for the review, Ming. I am merging this now. |
Eliminate undefs during canonicalization by finding an def-free path from the escaping value to a value defined in the node dominated that dominates the preheader.
This is a first in the series of PRs to completely eliminate undefs during SESE loop canoncialization.
Partially resolves SR-7765.