Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SESE canonicalization: unroll loop to eliminate undefs. #19811

Merged
merged 11 commits into from Oct 12, 2018

Conversation

bgogul
Copy link
Collaborator

@bgogul bgogul commented Oct 10, 2018

This patch takes care of the case, where we can eliminate an undef only by unrolling the loop body once. The doWhileLoop example in test/TensorFlow/sese_canonicalization.sil has an example.

The implementation is straightforward and simply uses the SILCloner.

@bgogul
Copy link
Collaborator Author

bgogul commented Oct 10, 2018

@swift-ci please test tensorflow

@bgogul bgogul requested review from mhong and lattner October 10, 2018 00:21
@mhong
Copy link
Contributor

mhong commented Oct 10, 2018

I left a few comments on the details, but here's a high level question: it seems this patch always unrolls a loop if there are some undefs -- but to unroll a loop without changing the semantics, do we need to guard the unrolling based on certain loop condition? e.g. if the original loop is over indices [0, 100), after unrolling we'll change the loop index range to be [1, 100).

Also, consider extending the PR description with a src code level example (no need to write CFG if that's too complexity) to show what happens before this PR (where the undef is), and why unrolling helps remove that undef. That'd make the review much easier (and also help us retrieve/recall such knowledge in the future when needed).

@bgogul
Copy link
Collaborator Author

bgogul commented Oct 10, 2018

@swift-ci please test tensorflow

@bgogul
Copy link
Collaborator Author

bgogul commented Oct 10, 2018

I left a few comments on the details, but here's a high level question: it seems this patch always unrolls a loop if there are some undefs -- but to unroll a loop without changing the semantics, do we need to guard the unrolling based on certain loop condition? e.g. if the original loop is over indices [0, 100), after unrolling we'll change the loop index range to be [1, 100).

No, we don't have to adjust the condition. Note that we are simply making a copy of the loop body. Consider the following canonicalized example.

  var count:Int32 = 0
  var sum:Int32 = 0 
  var stayInLoop = true;
  while (stayInLoop) {
    sum += count
    count += 1
    stayInLoop = (count < 100) ? true : false
  }
  return sum
}

The unrolled version is as follows

  var count:Int32 = 0
  var sum:Int32 = 0
  stayInLoop = true;
  if (stayInLoop) {
    sum += count
    count += 1
    stayInLoop =  (count < 100) ? true : false
  }
  while (stayInLoop) {
    sum += count
    count += 1
    stayInLoop = (count < 100) ? true : false
  }
  return sum
}

Note that after the unrolled loop body executes, the state at the header would be equivalent to the state after one execution of the loop without unrolling.

Also, consider extending the PR description with a src code level example (no need to write CFG if that's too complexity) to show what happens before this PR (where the undef is), and why unrolling helps remove that undef. That'd make the review much easier (and also help us retrieve/recall such knowledge in the future when needed).

I would be easier to explain with a CFG. The unrollLoopBody implementation shows a detailed example in comments. I will build on it and put it in the PR description.

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
test/TensorFlowRuntime/sese_loop_canonicalization.swift Outdated Show resolved Hide resolved
test/TensorFlowRuntime/sese_loop_canonicalization.swift Outdated Show resolved Hide resolved
lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
return Value;
}

void updateValueMap(SILValue oldValue, SILValue newValue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we find better names for the param names? like oldValue -> key, newValue -> value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think oldValue and newValue are better. It just says that when we are cloning replace occurrences of oldValue with new Value. Updated the function comment.

@@ -1150,6 +1189,128 @@ void SESERegionBuilder::ensureSingleExitFromLoops() {
}
}

void SingleExitLoopTransformer::unrollLoopBody() {
BasicBlockCloner cloner(*currentFn);
// Setup cloner so that newHeader's argument's are replaced with values in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: argument's -> arguments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -421,6 +447,9 @@ class SingleExitLoopTransformer {
/// we will get a single exit block.
void ensureSingleExitBlock();

/// Unroll the body of the loop once.
void unrollLoopBody();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider renaming to unrollLoopBodyOnce(), and remove the comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
// --Canonicalized CFG (not everything is shown)--
// preheader: i0 = 0; br newHeader(i0, undef)
//
// newHeader(i0, i3): cond stayInLoop, header, exit(i3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not able to follow this example overall. the entire textual representation is hard to read. for example, where is stayInLoop updated?

it might help if:

  1. we give some high level textual description on why undef is present in the Canonicalized CFG (is it along the lines of: "the second bb arg of newHeader is the updated value i that we want to return; when we enter newHeader for the first time from preheader, we don't know that value, but we won't ever return it, so setting it to undef is safe"), and why the undef can be removed after unrolling.

Also, to make things simpler, is it possible to avoid generating undefs in the first place, vs first generating it, and then try eliminating it? specifically, can we achieve that by moving loop rotation earlier?

  1. we add some comments on the semantics of the bb args. e.g. what do i4 and i5 represent in newLatch(i4, i5)

// these are not remapped when the loop body is unrolled (as we won't know
// what value to use in the unrolled body as it is undefined along that path).
// This following code patches these arguments by picking a value that
// dominates `pred` and is equivalent to the corresponding argument in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioning pred here makes the comment block hard to read and confusing, as that name does not appear in the large block of CFG example below (I was initially wondering if there's a typo), but seems to instead refer to a var in the code that's many lines down below.

consider first explaining the rationale/benefit/mechanics (the why and what) of unrolling in terms of the example below. we can then add another comment block right above the code below, to describe how that is in general implemented in terms of variables in the code, so that the code-related comment would echo / reiterate on what the example has illustrated for the readers.

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
@mhong
Copy link
Contributor

mhong commented Oct 10, 2018

Thanks for clarifying. I left some more questions/comments on the core changes of this patch.

Copy link
Collaborator Author

@bgogul bgogul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed some of the style and readability comments.

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
return Value;
}

void updateValueMap(SILValue oldValue, SILValue newValue) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think oldValue and newValue are better. It just says that when we are cloning replace occurrences of oldValue with new Value. Updated the function comment.

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
@@ -421,6 +447,9 @@ class SingleExitLoopTransformer {
/// we will get a single exit block.
void ensureSingleExitBlock();

/// Unroll the body of the loop once.
void unrollLoopBody();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
test/TensorFlowRuntime/sese_loop_canonicalization.swift Outdated Show resolved Hide resolved
test/TensorFlowRuntime/sese_loop_canonicalization.swift Outdated Show resolved Hide resolved
lib/SILOptimizer/Mandatory/TFCanonicalizeCFG.cpp Outdated Show resolved Hide resolved
@bgogul
Copy link
Collaborator Author

bgogul commented Oct 11, 2018

it might help if:

we give some high level textual description on why undef is present in the Canonicalized CFG (is it along the lines of: "the second bb arg of newHeader is the updated value i that we want to return; when we enter newHeader for the first time from preheader, we don't know that value, but we won't ever return it, so setting it to undef is safe"), and why the undef can be removed after unrolling.
Also, to make things simpler, is it possible to avoid generating undefs in the first place, vs first generating it, and then try eliminating it? specifically, can we achieve that by moving loop rotation earlier?

we add some comments on the semantics of the bb args. e.g. what do i4 and i5 represent in newLatch(i4, i5)

I have updated the comment block. Let me know what you think.

Copy link
Contributor

@mhong mhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The code is more readable now. I left some more comments, but LG this patch otherwise.

This following higher level question is not yet addressed (please feel free to move the discussion out of this patch):

Also, to make things simpler, is it possible to avoid generating undefs in the first place, vs first generating it, and then try eliminating it? specifically, can we achieve that by moving loop rotation earlier?

// Note that the dataflow between iterations of the loop is captured by
// argument `i1` of the header. If we need to exit from the header of the
// loop, we will need to "freeze" the state at the current exit points and
// propagate it to the new header. The canonicalization pass does that by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"new header" is not defined so far? i believe you are referring to the next CFG, but that's not clear from the text here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the comment block a bit more. PTAL.

// do {
// i += 1
// if (...) break;
// i += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the example could be more readable if we use sth different from "i += 1" here, say "i += 2"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

// loop, we will need to "freeze" the state at the current exit points and
// propagate it to the new header. The canonicalization pass does that by
// adding an argument to the header and passing it from the exit points. See
// argument `i5` of the header. We also have a `stayInLoop` argument to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why citing "i5" here? We are passing "i4" to exit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it was a typo. Should have been i4.

// argument `i1` of the header. If we need to exit from the header of the
// loop, we will need to "freeze" the state at the current exit points and
// propagate it to the new header. The canonicalization pass does that by
// adding an argument to the header and passing it from the exit points. See
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "from the exit points"? i believe you are not talking about the "exit" block below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant exit points from the loop. Clarified the comment.

// dominates `break'`. (Note that it is enough to search for equivalent values
// among the immediate predecessors of the `newLatch'` block.)
//
// (1) Unroll the loop body once.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is confusing:

  1. the comment block above so far talks about an example. Here we start explaining what the code does. But it's not clear to reader if this is still talking about the above example.

  2. The comment coincides with the function name unrollLoopBodyOnce() -- so then why should there be subsequent steps (2), etc?

SmallVector<SILValue, 8> incomingValues;
destBBArg->getIncomingPhiValues(incomingValues);
for (auto value : incomingValues) {
if (value != arg && DI->properlyDominates(value, predTermInst)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we cannot find such a value? should we assert this won't ever happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is guaranteed to find a value. SIL verification will fail otherwise. No need to add another check that will essentially replicate SIL verification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging SIL verifier failure usually takes more work because the context is non-local. It'd usually be preferrable to have a local check so that we can fail fast. Also, the check serves as a documentation for this important invariant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added an assert to check that we patch such arguments.

}

// Get the clone for the original and new header.
SILBasicBlock *clonedHeader = cloner.remapBasicBlock(header);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given "newHeader", it'd be less confusing if we change this to "oldHeader", and same for clonedHeader.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to clonedOldHeader. header is a class field.

@bgogul
Copy link
Collaborator Author

bgogul commented Oct 11, 2018

Also, to make things simpler, is it possible to avoid generating undefs in the first place, vs first generating it, and then try eliminating it? specifically, can we achieve that by moving loop rotation earlier?

Sorry, I forgot to answer your query earlier. It is possible, but won't be much simpler. In fact, I started that way, but this turned out to be simpler. If you are curious, you can look at the very first commit "working proto" in this PR to see what the other solution would look like.

Doing it after SESE canonicalization puts the loop in a form where there is only one exit point in the unrolled loop body. This makes hooking up the unrolled loop body to the loop easier. Also, it enables me to patch arguments without having to recompute dominator and post dominator information.

@bgogul
Copy link
Collaborator Author

bgogul commented Oct 11, 2018

@swift-ci please test tensorflow

@bgogul
Copy link
Collaborator Author

bgogul commented Oct 11, 2018

@swift-ci please test tensorflow

@bgogul
Copy link
Collaborator Author

bgogul commented Oct 12, 2018

@swift-ci please test tensorflow

@bgogul bgogul merged commit 4e9fafa into apple:tensorflow Oct 12, 2018
@bgogul bgogul deleted the sese_unroll_loop branch October 12, 2018 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants