Eliminate undefs wherever possible by finding def-free paths through loop. #19193

bgogul · 2018-09-07T17:42:40Z

Eliminate undefs during canonicalization by finding an def-free path from the escaping value to a value defined in the node dominated that dominates the preheader.

This is a first in the series of PRs to completely eliminate undefs during SESE loop canoncialization.

Partially resolves SR-7765.

bgogul · 2018-09-07T17:43:37Z

@swift-ci please test tensorflow linux

bgogul · 2018-09-07T17:43:47Z

@swift-ci please test tensorflow macos

mhong · 2018-09-08T04:30:49Z

Eliminate undefs during canonicalization by finding an def-free path from the escaping value to a value defined in the node dominated that dominates the preheader.

I can't quite parse this sentence. Can you please clarify, and perhaps give an example (based on the unit test) on what that path could be? The code checks on "domination", but here we use the concept "def-free" -- can you elaborate on the relationship?

mhong

Since this PR is one of the steps, I'm approving it to unblock incremental progress. We can continue the discussion on the side.

mhong · 2018-09-08T03:49:10Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp


 using namespace swift;
 using namespace tf;

 static llvm::cl::opt<bool> TFEnsureSingleLoopExit(
    "tf-ensure-single-loop-exit", llvm::cl::init(true),
    llvm::cl::desc("Transform loops to have a single exit from header."));
+static llvm::cl::opt<bool> TFNoUndefsInSESE(


what's the intended use of this flag?

are there possible bugs. if yes it'd be good to try addressing them before flipping to true.

Note if the bug causes wrong results, user may not be aware of the root cause and thus "turning off the flag" in that case would not be a useful workaround.

This is mostly for debugging. I just needed a way to flip to the old behavior without recompiling the binary. (Will update the comment accordingly.) I don't expect the flag to be in the code base for long.

mhong · 2018-09-08T03:50:26Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

@@ -361,8 +369,10 @@ class SingleExitLoopTransformer {
  // the new latchBlock as appropriate.
  SmallVector<std::pair<const SILBasicBlock *, const SILBasicBlock *>, 8>
      edgesToFix;
-  /// Identify the set of values that escape the loop.
-  llvm::SmallPtrSet<SILValue, 8> escapingValues;
+  /// Identify the set of values that escape the loop. The keys represent the


nit: keys -> key?

mhong · 2018-09-08T03:54:16Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

@@ -386,13 +396,17 @@ void SingleExitLoopTransformer::initialize() {
    }
  }

+  computeEscapingValues();


would it be cleaner to:

make computeEscapingValues() return a dense map, and assign that to escapingValueSubstMap. this makes it clear what the (side)effects of computeEscapingValues() are. even better, mark computeEscapingValues() static if that's possible.

can we even make escapingValueSubstMap a const member, and set it in the c'tor initializer list?

1 is a good idea. Instead of static, I made this a const function.
2 is not possible. In subsequent PRs, I will do CFG transformations before computing the escaping values.

mhong · 2018-09-08T03:55:59Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+  if (!TFNoUndefsInSESE)  return;
+
+  // Find a def-free path from the escaping value to the preheader for each
+  // escaping value. If no such path is found, set the escaping value at


in what cases can we not find such a path? an example (or maybe unit test) could help.

it's useful to be able to characterize the cases that we can remove (or cannot remove) undefs, because then we can assess if the gained benefits outweigh the added complexity. also that'll give us a mental model of looking at user code, and predicting if undefs will be there.

I will add a unit test.

Ultimately, we should never have undefs when I finish all the PRs.

do..while is a simple case, where we cannot find a def-free path:
{
var count:Int32 = 0
var sum:Int32 = 0
repeat {
sum += count
count += 1
} while (count < 100)
return sum
}
In this case, the only way is to clone the body of the loop as follows:
{
var count:Int32 = 0
var sum:Int32 = 0
sum += count
count += 1
stayInLoop = (count < 100) ? true : false
while (stayInLoop) {
sum += count
count += 1
stayInLoop = (count < 100) ? true : false
}
return sum
}

Added a test case that is similar to the one outlined here. In this case, we will still produce undefs, but will eliminate it eventually.

mhong · 2018-09-08T03:58:45Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+          kv.second = current;
+          break;
+        }
+      } else if (auto *arg = dyn_cast<SILArgument>(current)) {


are these the only two cases relevant here (are we guaranteed to never see an undef at this point)? if yes it'd be good to assert on the second case (or assert there is no other case).

No, these are the cases that will lead to a def-free path. I think this part will become clearer if I rewrite this in terms of computing equivalence classes induced by argument passing. It will be more efficient and cleaner. Let me rewrite this code.

I rewrote teh code to use equivalance classes which conveys the intention better and is more efficient.

mhong · 2018-09-08T04:00:10Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

@@ -470,7 +524,8 @@ SingleExitLoopTransformer::createNewHeader() {
  }
  header->dropAllArguments();
  // Add phi arguments in the new header corresponding to the escaping values.
-  for (const auto &escapingValue : escapingValues) {
+  for (const auto &kv : escapingValueSubstMap) {
+    const SILValue& escapingValue = kv.first;


i believe SILValue is a lightweight wrapper, and we need not take a reference. (making a copy is fine)

mhong · 2018-09-08T04:01:51Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

-  for (const SILValue &escapingValue : escapingValues) {
-    newArgs.push_back(getUndef(escapingValue->getType()));
+  for (const auto &kv : escapingValueSubstMap) {
+    newArgs.push_back(kv.second);


a note for future consideration: if we really think that some undefs cannot be eliminated (may be useful to discuss that first), we may want to log the undefs and/or add stats counter on them

No, there should not be any undefs in the final version.

mhong · 2018-09-08T04:26:33Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+        SmallVector<SILValue, 8> incomingValues;
+        arg->getIncomingValues(incomingValues);
+        for (const SILValue &incomingValue : incomingValues) {
+          if (visited.count(incomingValue) > 0) continue;


just write if (visited.count(incomingValue))

I usually write (visited.count(...) > 0) as it reduces some cognitive load when reading code. I don't have a strong preference though.

SG. No strong preference either.

mhong · 2018-09-08T04:27:19Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+          break;
+        }
+        // This is not usable. Add incoming values to worklist.
+        SmallVector<SILValue, 8> incomingValues;


the code here suggests if any incoming value is in a block that dominates preheader, we can use it.

but actually don't we need all, not any?

Let me rewrite this code using equivalence classes and the idea will be come clear.

…ping

… change.

bgogul · 2018-09-11T22:47:12Z

I have changed the implementation to use equivalence classes. PTAL.

mhong

LGTM

Equiv class based code does make it more readable. Great!

mhong · 2018-09-12T02:27:25Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+}
+
+llvm::DenseMap<SILValue, SILValue>
+SingleExitLoopTransformer::computeEscapingValues() const {


nit: consider passing llvm::DenseMap<SILValue, SILValue> as an output (pointer) param into the func.

I am conflicted here. :) I generally prefer the functional style as it makes it easy to read code, but llvm and swift seems to mostly have inout param style. Is that the preferred style? If so, I will change it.

This should be as efficient as the compiler will optimize away copies with Return Value Optimization.

I don't have a strong preference. I don't quite like the current closure syntax "&result" though, but it's not a big issue.

I will leave it as is then. (Even if I pass an argument, I will need to add the inout parameter to the capture?)

if you pass the input by pointer/address into computeEscapingValues(), you can then pass that pointer to the closure by value. but it's not a big deal.

mhong · 2018-09-12T02:29:24Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+    const SILValue &escapingValue = kv.first;
+    // Find an equivalent value that dominates the header.
+    for (auto equivalentValue :
+         make_range(equivalentValues.findLeader(escapingValue),


is it possible that we have an equiv class [v1, v2], and v1 is chosen as the leader, but v2 dominates preheader, not v1?

Yes, that can happen. The leader for a class depends upon a bunch of factors:

order in which unification constraints are processed

the implementation of the union-find

Then it would seem to lead to suboptimal transformation -- if we choose v2 in the above example, the undef can be removed. Do we want to iterate over the members (as in your old code), rather than only checking on the "leader" picked by the underlying llvm infra? (pls feel free to scope it out of this PR.)

We are actually iterating over all the elements of the equivalence class. For this example, we will chose v2 as we iterate over {v1, v2}. See the make_range(...) call.

(It seems to me that the code is not readable. Any suggestions to improve it?)

Thanks for clarifying. So we are iterating between equivalentValues.findLeader(escapingValue) and equivalentValues.member_end(). say the container has a list of elems [v1, v2, v3], and findLeader() returns v2, does that then skip processing v1?

I wonder if we should call member_begin() instead of findLeader().

In any case, it might help to add a comment right before make_range() to clarify the intention.

Ah, I see the source of confusion now. findLeader() returns an member iterator for the equivalence class.

Unfortunately, calling member_begin() on non-leaders returns null.

So, we are iterating over all the elements of the equivalence class. We won't be skipping v1 in your example.

Got on the same page with Gogul through offline discussion.

mhong · 2018-09-12T02:30:28Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

@@ -591,7 +641,8 @@ SingleExitLoopTransformer::patchEdges(SILBasicBlock *newHeader,
        getUserSourceLocation(src->getTerminator()->getDebugLocation()));
    // Find an appropriate value to use for each escaping value.
    unsigned argIndex = oldHeaderNumArgs;
-    for (const SILValue &escapingValue : escapingValues) {
+    for (const auto &kv : escapingValueSubstMap) {
+      const SILValue &escapingValue = kv.first;


const SILValue &escapingValue -> SILValue escapingValue?

bgogul · 2018-09-12T20:14:11Z

@swift-ci please test TensorFlow linux

bgogul · 2018-09-13T04:47:39Z

Thanks for the review, Ming. I am merging this now.

bgogul added 4 commits September 7, 2018 10:34

Convert escapingValues to map that tracks arguments at preheader.

3e4a4f7

Eliminate undefs wherever possible by finding def-free paths.

48a2d94

Add a flag to control undef elimination.

4f0dbcf

Fixed the stale comment in def-free path detection.

bc2adfa

bgogul requested review from mhong and lattner September 7, 2018 17:43

mhong approved these changes Sep 8, 2018

View reviewed changes

bgogul added 6 commits September 9, 2018 01:38

Addressed high-level readability comments

013da21

Use equivalence classes to find values for escaping values.

740fef2

Merge branch 'tensorflow' of github.com:apple/swift into compute_esca…

e8715e4

…ping

Fix tests

80bab84

Recompute equivalence classes after every loop transformation as they…

5088a5f

… change.

Added a simple test where undef is still left.

3a44e2c

mhong approved these changes Sep 12, 2018

View reviewed changes

Minor style chagne.

3114184

Document the iteration over all values of equivalence classes.

e421a5f

bgogul merged commit 20e0cca into apple:tensorflow Sep 13, 2018

bgogul deleted the compute_escaping branch September 19, 2018 15:55

swift-ci mentioned this pull request Sep 24, 2018

[SR-7765] Lack of support for imperfect loop exits blocking simpleCounterLoop test #50304

Closed

Eliminate undefs wherever possible by finding def-free paths through loop. #19193

Eliminate undefs wherever possible by finding def-free paths through loop. #19193

Conversation

bgogul commented Sep 7, 2018 • edited

bgogul commented Sep 7, 2018

bgogul commented Sep 7, 2018

mhong commented Sep 8, 2018

mhong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhong Sep 8, 2018 • edited by bgogul

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgogul commented Sep 11, 2018

mhong left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhong Sep 12, 2018 • edited

Choose a reason for hiding this comment

bgogul Sep 12, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgogul commented Sep 12, 2018

bgogul commented Sep 13, 2018

bgogul commented Sep 7, 2018 •

edited

mhong Sep 8, 2018 •

edited by bgogul

mhong left a comment •

edited

mhong Sep 12, 2018 •

edited

bgogul Sep 12, 2018 •

edited