SESE canonicalization: unroll loop to eliminate undefs. #19811

bgogul · 2018-10-10T00:20:29Z

This patch takes care of the case, where we can eliminate an undef only by unrolling the loop body once. The doWhileLoop example in test/TensorFlow/sese_canonicalization.sil has an example.

The implementation is straightforward and simply uses the SILCloner.

blah balh

bgogul · 2018-10-10T00:21:34Z

@swift-ci please test tensorflow

mhong · 2018-10-10T01:11:21Z

I left a few comments on the details, but here's a high level question: it seems this patch always unrolls a loop if there are some undefs -- but to unroll a loop without changing the semantics, do we need to guard the unrolling based on certain loop condition? e.g. if the original loop is over indices [0, 100), after unrolling we'll change the loop index range to be [1, 100).

Also, consider extending the PR description with a src code level example (no need to write CFG if that's too complexity) to show what happens before this PR (where the undef is), and why unrolling helps remove that undef. That'd make the review much easier (and also help us retrieve/recall such knowledge in the future when needed).

bgogul · 2018-10-10T01:14:41Z

@swift-ci please test tensorflow

bgogul · 2018-10-10T01:27:54Z

I left a few comments on the details, but here's a high level question: it seems this patch always unrolls a loop if there are some undefs -- but to unroll a loop without changing the semantics, do we need to guard the unrolling based on certain loop condition? e.g. if the original loop is over indices [0, 100), after unrolling we'll change the loop index range to be [1, 100).

No, we don't have to adjust the condition. Note that we are simply making a copy of the loop body. Consider the following canonicalized example.

  var count:Int32 = 0
  var sum:Int32 = 0 
  var stayInLoop = true;
  while (stayInLoop) {
    sum += count
    count += 1
    stayInLoop = (count < 100) ? true : false
  }
  return sum
}

The unrolled version is as follows

  var count:Int32 = 0
  var sum:Int32 = 0
  stayInLoop = true;
  if (stayInLoop) {
    sum += count
    count += 1
    stayInLoop =  (count < 100) ? true : false
  }
  while (stayInLoop) {
    sum += count
    count += 1
    stayInLoop = (count < 100) ? true : false
  }
  return sum
}

Note that after the unrolled loop body executes, the state at the header would be equivalent to the state after one execution of the loop without unrolling.

Also, consider extending the PR description with a src code level example (no need to write CFG if that's too complexity) to show what happens before this PR (where the undef is), and why unrolling helps remove that undef. That'd make the review much easier (and also help us retrieve/recall such knowledge in the future when needed).

I would be easier to explain with a CFG. The unrollLoopBody implementation shows a detailed example in comments. I will build on it and put it in the PR description.

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

test/TensorFlowRuntime/sese_loop_canonicalization.swift

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

mhong · 2018-10-10T00:54:39Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+    return Value;
+  }
+
+  void updateValueMap(SILValue oldValue, SILValue newValue)  {


should we find better names for the param names? like oldValue -> key, newValue -> value

I think oldValue and newValue are better. It just says that when we are cloning replace occurrences of oldValue with new Value. Updated the function comment.

mhong · 2018-10-10T00:59:44Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

@@ -1150,6 +1189,128 @@ void SESERegionBuilder::ensureSingleExitFromLoops() {
  }
 }

+void SingleExitLoopTransformer::unrollLoopBody() {
+  BasicBlockCloner cloner(*currentFn);
+  // Setup cloner so that newHeader's argument's are replaced with values in


nit: argument's -> arguments?

mhong · 2018-10-10T01:01:16Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

@@ -421,6 +447,9 @@ class SingleExitLoopTransformer {
  /// we will get a single exit block.
  void ensureSingleExitBlock();

+  ///  Unroll the body of the loop once.
+  void unrollLoopBody();


consider renaming to unrollLoopBodyOnce(), and remove the comment.

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

mhong · 2018-10-10T17:13:58Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+  // --Canonicalized CFG (not everything is shown)--
+  //   preheader: i0 = 0; br newHeader(i0, undef)
+  //
+  //   newHeader(i0, i3): cond stayInLoop, header, exit(i3)


i'm not able to follow this example overall. the entire textual representation is hard to read. for example, where is stayInLoop updated?

it might help if:

we give some high level textual description on why undef is present in the Canonicalized CFG (is it along the lines of: "the second bb arg of newHeader is the updated value i that we want to return; when we enter newHeader for the first time from preheader, we don't know that value, but we won't ever return it, so setting it to undef is safe"), and why the undef can be removed after unrolling.

Also, to make things simpler, is it possible to avoid generating undefs in the first place, vs first generating it, and then try eliminating it? specifically, can we achieve that by moving loop rotation earlier?

we add some comments on the semantics of the bb args. e.g. what do i4 and i5 represent in newLatch(i4, i5)

mhong · 2018-10-10T17:21:42Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+  // these are not remapped when the loop body is unrolled (as we won't know
+  // what value to use in the unrolled body as it is undefined along that path).
+  // This following code patches these arguments by picking a value that
+  // dominates `pred` and is equivalent to the corresponding argument in the


mentioning pred here makes the comment block hard to read and confusing, as that name does not appear in the large block of CFG example below (I was initially wondering if there's a typo), but seems to instead refer to a var in the code that's many lines down below.

consider first explaining the rationale/benefit/mechanics (the why and what) of unrolling in terms of the example below. we can then add another comment block right above the code below, to describe how that is in general implemented in terms of variables in the code, so that the code-related comment would echo / reiterate on what the example has illustrated for the readers.

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

mhong · 2018-10-10T17:33:55Z

Thanks for clarifying. I left some more questions/comments on the core changes of this patch.

bgogul

Addressed some of the style and readability comments.

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

bgogul · 2018-10-10T18:54:47Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+    return Value;
+  }
+
+  void updateValueMap(SILValue oldValue, SILValue newValue)  {


I think oldValue and newValue are better. It just says that when we are cloning replace occurrences of oldValue with new Value. Updated the function comment.

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

bgogul · 2018-10-10T18:56:12Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

@@ -421,6 +447,9 @@ class SingleExitLoopTransformer {
  /// we will get a single exit block.
  void ensureSingleExitBlock();

+  ///  Unroll the body of the loop once.
+  void unrollLoopBody();


lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

test/TensorFlowRuntime/sese_loop_canonicalization.swift

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

bgogul · 2018-10-11T01:11:36Z

it might help if:

we give some high level textual description on why undef is present in the Canonicalized CFG (is it along the lines of: "the second bb arg of newHeader is the updated value i that we want to return; when we enter newHeader for the first time from preheader, we don't know that value, but we won't ever return it, so setting it to undef is safe"), and why the undef can be removed after unrolling.
Also, to make things simpler, is it possible to avoid generating undefs in the first place, vs first generating it, and then try eliminating it? specifically, can we achieve that by moving loop rotation earlier?

we add some comments on the semantics of the bb args. e.g. what do i4 and i5 represent in newLatch(i4, i5)

I have updated the comment block. Let me know what you think.

mhong

Thanks. The code is more readable now. I left some more comments, but LG this patch otherwise.

This following higher level question is not yet addressed (please feel free to move the discussion out of this patch):

Also, to make things simpler, is it possible to avoid generating undefs in the first place, vs first generating it, and then try eliminating it? specifically, can we achieve that by moving loop rotation earlier?

mhong · 2018-10-11T01:16:50Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+  // Note that the dataflow between iterations of the loop is captured by
+  // argument `i1` of the header. If we need to exit from the header of the
+  // loop, we will need to "freeze" the state at the current exit points and
+  // propagate it to the new header. The canonicalization pass does that by


"new header" is not defined so far? i believe you are referring to the next CFG, but that's not clear from the text here.

I have updated the comment block a bit more. PTAL.

mhong · 2018-10-11T01:22:15Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+  //   do {
+  //     i += 1
+  //     if (...) break;
+  //     i += 1


the example could be more readable if we use sth different from "i += 1" here, say "i += 2"

Good idea. Done.

mhong · 2018-10-11T01:26:13Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+  // loop, we will need to "freeze" the state at the current exit points and
+  // propagate it to the new header. The canonicalization pass does that by
+  // adding an argument to the header and passing it from the exit points. See
+  // argument `i5` of the header. We also have a `stayInLoop` argument to


why citing "i5" here? We are passing "i4" to exit.

Sorry, it was a typo. Should have been i4.

mhong · 2018-10-11T01:27:05Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+  // argument `i1` of the header. If we need to exit from the header of the
+  // loop, we will need to "freeze" the state at the current exit points and
+  // propagate it to the new header. The canonicalization pass does that by
+  // adding an argument to the header and passing it from the exit points. See


what is "from the exit points"? i believe you are not talking about the "exit" block below.

I meant exit points from the loop. Clarified the comment.

mhong · 2018-10-11T01:31:15Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+  // dominates `break'`. (Note that it is enough to search for equivalent values
+  // among the immediate predecessors of the `newLatch'` block.)
+  //
+  // (1) Unroll the loop body once.


this comment is confusing:

the comment block above so far talks about an example. Here we start explaining what the code does. But it's not clear to reader if this is still talking about the above example.

The comment coincides with the function name unrollLoopBodyOnce() -- so then why should there be subsequent steps (2), etc?

mhong · 2018-10-11T01:37:59Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+      SmallVector<SILValue, 8> incomingValues;
+      destBBArg->getIncomingPhiValues(incomingValues);
+      for (auto value : incomingValues) {
+        if (value != arg && DI->properlyDominates(value, predTermInst)) {


what if we cannot find such a value? should we assert this won't ever happen?

It is guaranteed to find a value. SIL verification will fail otherwise. No need to add another check that will essentially replicate SIL verification.

Debugging SIL verifier failure usually takes more work because the context is non-local. It'd usually be preferrable to have a local check so that we can fail fast. Also, the check serves as a documentation for this important invariant.

Good point. Added an assert to check that we patch such arguments.

mhong · 2018-10-11T01:40:52Z

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp

+  }
+
+  // Get the clone for the original and new header.
+  SILBasicBlock *clonedHeader = cloner.remapBasicBlock(header);


given "newHeader", it'd be less confusing if we change this to "oldHeader", and same for clonedHeader.

Changed to clonedOldHeader. header is a class field.

bgogul · 2018-10-11T18:52:13Z

Also, to make things simpler, is it possible to avoid generating undefs in the first place, vs first generating it, and then try eliminating it? specifically, can we achieve that by moving loop rotation earlier?

Sorry, I forgot to answer your query earlier. It is possible, but won't be much simpler. In fact, I started that way, but this turned out to be simpler. If you are curious, you can look at the very first commit "working proto" in this PR to see what the other solution would look like.

Doing it after SESE canonicalization puts the loop in a form where there is only one exit point in the unrolled loop body. This makes hooking up the unrolled loop body to the loop easier. Also, it enables me to patch arguments without having to recompute dominator and post dominator information.

bgogul · 2018-10-11T18:54:45Z

@swift-ci please test tensorflow

… loop body.

bgogul · 2018-10-11T21:06:00Z

@swift-ci please test tensorflow

… loop's latch and not the cloned latch.

bgogul · 2018-10-12T00:02:36Z

@swift-ci please test tensorflow

bgogul added 4 commits October 9, 2018 17:00

Working proto

b1a5479

blah balh

Unroll loop body after SESE canonicalization.

cca2adf

Update sese_loop_canonicalization test with cloning behavior.

12b8668

Added a runtime test with while loop.

778972f

bgogul requested review from mhong and lattner October 10, 2018 00:21

Rename function name appropriately after merge.

e1c5f1a

rxwei reviewed Oct 10, 2018

View reviewed changes

mhong reviewed Oct 10, 2018

View reviewed changes

Fix cosmetic and readability comments.

3f7e4e1

bgogul commented Oct 10, 2018

View reviewed changes

bgogul added 2 commits October 10, 2018 18:00

Updated the documentation

4e534df

Add cloned blocks to parent loop if nested.

9b6e361

mhong approved these changes Oct 11, 2018

View reviewed changes

Updated comments based on review.

472ee0d

mhong approved these changes Oct 11, 2018

View reviewed changes

Add an assert to check that we have patched up arguments in unrolloed…

83a9d73

… loop body.

Added documentation as to why we need to iterate over predecessors of…

b111c5b

… loop's latch and not the cloned latch.

bgogul merged commit 4e9fafa into apple:tensorflow Oct 12, 2018

bgogul deleted the sese_unroll_loop branch October 12, 2018 00:50

swift-ci mentioned this pull request Oct 11, 2018

[SR-7765] Lack of support for imperfect loop exits blocking simpleCounterLoop test #50304

Closed

SESE canonicalization: unroll loop to eliminate undefs. #19811

SESE canonicalization: unroll loop to eliminate undefs. #19811

Conversation

bgogul commented Oct 10, 2018

bgogul commented Oct 10, 2018

mhong commented Oct 10, 2018

bgogul commented Oct 10, 2018

bgogul commented Oct 10, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhong commented Oct 10, 2018

bgogul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgogul commented Oct 11, 2018

mhong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgogul commented Oct 11, 2018 • edited

bgogul commented Oct 11, 2018

bgogul commented Oct 11, 2018

bgogul commented Oct 12, 2018

bgogul commented Oct 10, 2018 •

edited

bgogul commented Oct 11, 2018 •

edited