Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay] [Virtual Device] Store function result virtual device in virtual_device_ field #9848

Merged
merged 9 commits into from Feb 19, 2022

Conversation

electriclilies
Copy link
Contributor

@electriclilies electriclilies commented Jan 6, 2022

In this PR, I store the virtual device for a function result in the virtual_device_ field instead of in the function attributes. I am splitting this out from #9666 to isolate issues, and there will be a follow up PR moving the virtual devices of the function parameters into the virtual_device_ fields of the parameters themselves.

@electriclilies
Copy link
Contributor Author

@mbs-octoml @mikepapadim @comaniac This is ready for review!

@electriclilies
Copy link
Contributor Author

Note it's forked from #9690, which is forked from #9874, so until those go in those changes will be in the diff as well.

@electriclilies
Copy link
Contributor Author

electriclilies commented Jan 21, 2022

Sorry, mistyped a bit above-- because the virtual device field doesn't store the virtual device for all expression types right now, func->body->virtual_device() will most likely be null in this implementation and therefore not be equivalent to func->virtual_device().

What will be true is that GetVirtualDevice(fn->body) is always equivalent to GetVirtualDevice(fn). (i.e., the 'conceptual' virtual device is the same even though it's not stored in the virtual device field of the body of the function).

@tqchen
Copy link
Member

tqchen commented Jan 21, 2022

Thanks @electriclilies let me try to dissect what you said:

  • C0: Right now device propagation is not fully completed(through WithFields or other mechanisms), as a result funcion->body may not be property setup with the right device choice.
  • C1: There might be a need to properly annotate the return device type of function, which then can be used to inform, or enforce the planning of functions.

Note that C0 and C1 are slightly different. I believe you are talking about C0, but seems there is also a slightly indication of C1(which will indeed imply storing something on the function node(e.g. like the annotation of return type) ).

In a fully heterogenous setting, the input and return value's device might be different. In such cases, "execute on" can become ambiguous and differ from the "result of return value". The current change proposes to use fn->virtual_device() as the result of return value. It might be good to document this choice in FunctionNode(or virtual_device member function) and revisit once device propagation is fully compeled. As the data structure is the common exchange among different passes and what pass writer will refer to for semantics informaton.

These are high-level discussions for your information and hope that they will help clarifying the semantics of virtual device field. They are not asks about specific kind of implementation. So feel free to take the information and make a call, while documenting the rationales that might help future refactors.

@mbs-octoml
Copy link
Contributor

mbs-octoml commented Jan 21, 2022

Hi @tqchen, yes things do indeed get subtle on function types! I tried to capture that in the header mega-comment in device_planner.cc.

Inside the device planner we can talk about the 'device domain' for any expression, including those of function type which describes the device domains for the args and result separately. Ie the domain for planning is currently D := (D,...,D) -> D | VirtualDevice. And I declare (by fiat!) that the device holding the function or closure is the same as the device on which the function is executed which is the same as the device holding the functions result.

But when it comes time to the virtual_device field I'm not convinced it's worth exposing this higher-order domain machinery, so the field just has plain-old first order VirtualDevice.

So now we need a convention for what fn->virtual_device should be:
a. Leave it undefined -- you can't ask that question and must instead look at the virtual_device for the function param vars and the body. I think that's your suggestion also.
b. Set it to be the function's result device (and if the function returns a function it's the result device of that function, etc. This corresponds to the device planner's notion of 'result device'.) This was my suggestion to Lily, hence this PR. This approach also allows us to assign a device to any expression of function type (even though it may not be a function literal): expr.virtual_device = result_device(expr.virtual_device_domain).
c. Bite the bullet and store some new VirtualDeviceDomain on every expression instead of just VirtualDevice.

Wdyt?

@tqchen
Copy link
Member

tqchen commented Jan 21, 2022

Thanks @mbs-octoml . these are accurate summary of the state. I agree with you that it is not worth to go with c(high order machinary). Like I said I want to mainly clarifying the choices and and suggest documenting the semantics also in the IR data structure (assuming there will be alternative versions of device planner or consumers of the IR who will need to make use of such information, they will the read comments on the data structure for semantics).

From what I can see the current set of discussions are great and already comprehensive and I trust @electriclilies and you to make the call here.

@mbs-octoml
Copy link
Contributor

Great, thanks. Ok my suggestion then is to stick with b) but beef up the comment

* For expressions that have the function type, the virtual device describes where the result of

to capture some of the 'hidden semantics' buried in the device planner commentary:
).

@tqchen
Copy link
Member

tqchen commented Jan 25, 2022

Thanks @electriclilies @mbs-octoml !

@electriclilies electriclilies force-pushed the virtual-device-rep-functions branch 3 times, most recently from 080c8b3 to 9fa7f83 Compare January 26, 2022 23:52
Copy link
Contributor

@mbs-octoml mbs-octoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is indeed it, thanks. When we play the equivalent game for the arguments it may be that some FunctionOnDevice calls are unnecessary since the various WithFields will have already propagated the virtual_device_ on both params and function, but that will work itself out in the wash easily enough.

src/printer/relay_text_printer.cc Outdated Show resolved Hide resolved
src/relay/op/memory/on_device.cc Outdated Show resolved Hide resolved
@electriclilies
Copy link
Contributor Author

@mbs-octoml OK, so I ended up changing the parser to parse the virtual device in for the tests. I'm not married to the syntax-- right now I'm just putting the virtual device right after the return type of the function, separated by a comma. I'm thinking this will generalize well to parameter types.

So for now it''s like this:

fn main(%x: Tensor([1, 2, 3])) -> Tensor([1, 2, 3, 4]), virtual_device:meta[VirtualDevice][1] { body }

One thing I wasn't sure about was the approach for keywords in the parser-- I couldn't find where the keywords are checked, so I just am letting virtual_device be any identifier for now. @jroesch any insight on this?

@@ -1130,16 +1146,25 @@ class Parser {
ret_type = ParseType();
}

ObjectRef virtual_device;
if (WhenMatch(TokenType::kComma)) {
virtual_device = ParseVirtualDevice();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideal would be if we could parse the attributes as usual, then 'promote' the "virtual_device" attribute to the first class field. Since meta references are handled by ParseAttributeValue I think that would work out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, can do that

@electriclilies
Copy link
Contributor Author

I realized that the representation of the program after device planning needs some more thought, and this influences how the device planned program is represented in relay script. Pausing work on this PR till I have done a more detailed design review.

@electriclilies
Copy link
Contributor Author

electriclilies commented Feb 15, 2022

I ended up just lifting the result virtual device out of the attributes, will discuss dealing with the text representation in another PR.

@masahi masahi merged commit cae2680 into apache:main Feb 19, 2022
@electriclilies electriclilies deleted the virtual-device-rep-functions branch February 19, 2022 00:13
@electriclilies
Copy link
Contributor Author

Thanks @masahi!

pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
…ual_device_ field (apache#9848)

* VStore function result virtual devices in virtual_device_ field

* Address Mark's 'mega nit'

* Promote function result virtual device to first class

* Add kVirtualDevice

* move kVirtualDevice

* Fix annotation test

* Progress on parsing & printing

* Fix printing of virtual device attribute

* flake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants