Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force deabstraction of functions that take or return Tensors if referred in code. #19052

Merged
merged 3 commits into from
Aug 30, 2018

Conversation

bgogul
Copy link
Collaborator

@bgogul bgogul commented Aug 29, 2018

This is a somewhat hacky patch to deal with SR-8589 related to the Jupyter tutoral. In the REPL mode (both in the in-build swift and lldb -r), the SIL module for a REPL expr is thrown away after lowering to LLVM IR. Consequently, when processing subsequent lines, TFDeabstraction does not see the body of some functions being invoked, which causes some functions not to be partitioned.

This PR performs a second round of deabstraction (and subsequent partitioning) of functions taking and returning TensorValues if they are still referred in the code after the first round.

BTW, this is not only relevant to Jupyter tutorial. We will encounter this if we allow linking to other swift-tensorflow binaries. We will need to think about how to deal with such cases in general.

@mhong, when I simply removed the condition that checks if a function operates on tensors, some tests started failing as attributes were no longer const-evaluable. That is why I am sending this instead of what we discussed.

…re referred to in the code after a first round of deabstraction.
@bgogul
Copy link
Collaborator Author

bgogul commented Aug 29, 2018

@swift-ci please test tensorflow linux

@bgogul
Copy link
Collaborator Author

bgogul commented Aug 29, 2018

@swift-ci please test tensorflow macos

@bgogul bgogul requested review from mhong, rxwei and lattner and removed request for rxwei August 29, 2018 18:32
Copy link
Collaborator

@lattner lattner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that this is not optimal, but I think it should work in the near term. Please file a bug to track improving this. A sketch:

  1. Make our existing pass over functions, deabstracting as we go.

  2. Do a second pass where we deabstract all functions, regardless of whether they have tensor args or results. Many would be dead, but the indirectly used ones would now work.

Copy link
Member

@rxwei rxwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix makes sense, unblocking lots of cross-module cases.

There is a more general problem though: whenever there’s a call to an opaque function which calls a protocol method (e.g. Equatable.==) passing tensor arguments whose implementation of this protocol is inlinable, execution will fail. This is because Tensor’s protocol implementation is never getting partitioned.

To resolve the bigger problem later down the road, partitioning generic functions should be supported and will require a major architectural change.

// Return true if it is referenced somewhere.
// TODO: There might be a better check other than fn->getRefCount() > 0.
// See the clean up code in TFDeabstraction::inlineCalls() function.
return (fn->getRefCount() > 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary parentheses.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rxwei rxwei added the tensorflow This is for "tensorflow" branch PRs. label Aug 30, 2018
Copy link
Contributor

@mhong mhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this as a short-term patch sounds good, but please add some unit test.

To avoid breaking unit tests, one option is to flag-control the value of forceTFFunctions, and only turn it on in REPL mode. This will buy us time to rewrite the relevant unit tests -- if we don't want a function to be partitioned (e.g. if the func takes a string/int/float param, and that param is used in a tfop attr), we can mark it " inline always". Once we finish this on all tests, we can then retire that flag.

@@ -277,7 +277,11 @@ class TensorFunctionClassifier {
/// example) for inlined functions that take and return tensors, since we
/// know that they are either unreachable or will be inlined into any
/// clients that use them.
bool shouldBePartitioned(SILFunction *fn);
///
/// If the flag forceTFFunctions is true, it forces partitioning of functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment does not quite reflect the impl, because there are conditions that cause the function to return false.
e.g. if (isAvailableExternally(fn->getLinkage())). Also when a function is marked inline always.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// If the flag forceTFFunctions is true, it forces partitioning of functions
/// that operate on Tensor values irrespective of whether they are inlinable,
/// private, etc.
bool shouldBePartitioned(SILFunction *fn, bool forceTFFunctions = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this func does not have many call-sites, it could be less error prone if we don't use a default value for forceTFFunctions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// are functions that are *still* referred to in the code and operate on
// tensor values, but have not been partitioned. It can happen in the
// following case, for instance, where `foo` is an external function that
// takes a for which we do not have the body:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "takes a for"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo. Fixed the comment.

// following case, for instance, where `foo` is an external function that
// takes a for which we do not have the body:
// main() {
// foo() { $0 -= 0.1 * $1 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we turn something like this into a unit test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a unit test in deabstraction_finished.swift

//
// (Note the body of a function may be missing when we are linking against a
// library or in the REPL context where the function was defined on a
// different line.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "different line" might not be the right description.

@marcrasi showed me earlier that you can copy/paste multi-line text into REPL, and that'll be executed as one "cell".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling it REPL line now. Not sure if that is the right terminology.

for (auto &fn : *module) {
// Skip if it is already partitioned, or if it was ignored only because it
// operated on tensor values.
if (partitionedFunctions.count(&fn) > 0 ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment block motivating this two-round / two-pass design?

IIUC, in your example above, if we just use tfc.shouldBePartitioned(&fn, /*forceTFFunctions=*/true) in the first round, and make it the only round, we'll be able to partition the private closure, so it achieves the same correctness.

the two-round design is an optimization to minimize the set of functions to partition (would be nice to have a unit test that shows its effects -- have a function with no caller left after the first round finishes, and confirm it's not partitioned in the second round).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few lines motivating the two round design.

@@ -631,7 +631,8 @@ tf::createTensorToInt1Inst(SILValue value, SILBuilder &builder,
/// example) for inlined functions that take and return tensors, since we
/// know that they are either unreachable or will be inlined into any
/// clients that use them.
Copy link
Contributor

@mhong mhong Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider removing this comment block as it's now out of date (the new param is not documented).

(i'd also be fine with keeping it in sync with the corresponding comment block in the header file.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the block comment altogether.

@mhong
Copy link
Contributor

mhong commented Aug 30, 2018

@rxwei To help me understand this larger problem, can you provide a concrete example?

(It might be useful to create a bug to track that.)

@rxwei
Copy link
Member

rxwei commented Aug 30, 2018

A common, concrete example is to try to use expectEqual() on two Tensors. It is guaranteed to fail, because expectEqual is opaque across module, i.e. non-inlinable, and Tensor.== is not getting partitioned because there's no inlining.

This is not a small bug, but a large hole in the architecture in that there's no way to even reject it from the compiler. These issues are not important in the initial GPE because the library ecosystem using Tensors and generics are not there yet.

I'll definitely file a bug soon. I was also planning to write a detailed document describing the issue, but it's not the best time until we hit the tutorial goal.

@bgogul
Copy link
Collaborator Author

bgogul commented Aug 30, 2018

@lattner , this PR already implements it in two passes as you outlined. Did I misunderstand your comment?

Agreed that this is not optimal, but I think it should work in the near term. Please file a bug to track improving this. A sketch:

Make our existing pass over functions, deabstracting as we go.

Do a second pass where we deabstract all functions, regardless of whether they have tensor args or results. Many would be dead, but the indirectly used ones would now work.

@mhong
Copy link
Contributor

mhong commented Aug 30, 2018

@bgogul, I believe the difference is: If we change the example from

main() {
  foo() { $0 -= 0.1 * $1 }
}

to

main() {
  foo() { #tfop("SomeOp", ...) }
}

This patch probably won't work. That is, if the private closure does not take or return tensors.

Another issue to take care (both in Chris' sketch and this PR) is that if the private closure takes a string/int/float param to be used in a tfop attr, as in:

main() {
  foo() { (s: String, i: Int) -> () in #tfop("SomeOp", intAttr: i, sttrAttr: s) }
}

The private closure cannot be partitioned due to const tfop attr requirement (as you observed). In that case it'd be useful to issue a good diagnostic message, asking user to convert the closure to a function, so that it can be annotated with something like "inline always", and as such compiler won't partition it.

@rxwei
Copy link
Member

rxwei commented Aug 30, 2018

In that case it'd be useful to issue a good diagnostic message, asking user to convert the closure to a function, so that it can be annotated with something like "inline always", and as such compiler won't partition it.

In cases like this, it's also possible to make the closure function serialized (inlinable) by simply adding a [serialized] modifier to the SIL function. It'd be much simpler than asking the user to change the code and make it serialized.

@bgogul
Copy link
Collaborator Author

bgogul commented Aug 30, 2018

Added unit tests and addressed comments. PTAL.

Copy link
Contributor

@mhong mhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tensorflow This is for "tensorflow" branch PRs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants