Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wiring up the PrimFunc resource_handle #34

Closed
wants to merge 1 commit into from

Conversation

Mousius
Copy link
Member

@Mousius Mousius commented Sep 14, 2021

No description provided.

@Mousius
Copy link
Member Author

Mousius commented Sep 14, 2021

@tqchen / @areusch I've spun this out of the Device API RFC for independent consideration.

@tqchen tqchen added the status: need review RFC needs review label Sep 17, 2021
@manupak
Copy link
Contributor

manupak commented Sep 21, 2021

@junrushao1994 @tqchen,

A friendly ping!

[drawbacks]: #drawbacks

* This changes the intrinsic `call_packed` to assume the final argument is the `resource_handle`, making it incompatible with previous releases
* The lack of structure in the `call_packed` means it'll be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you finish the sentence?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the VM alloc_storage is represented as a 'call' using an opaque StorageHandle (or something like that) type. Can we fold the additional arg into the regular params using that trick to distinguish device handles from the normal TIR types? I'm wary of both hidden args and the conceptual overhead of more fields in PrimFunc.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

* Introduce another intrinsic and matching `call_cpacked` which have the suffix `_with_resource_handle` or similar - this means each code generator would have to implement additional intrinsics to support `resource_handle`s fully.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this kind of munges the definition of CallNode's args array--the idea is that this array directly forms the TVMValue passed to arg_values in the TVMBackendPackedCFunc. At the very least, is it possible we can wrap resource_handle with a sentinel PrimExpr or dtype? i'd like to avoid a secret convention that call_packed includes a special final arg.

cc @mbs-octoml @jroesch @junrushao1994 any suggestions? it's kind of awkward since call_packed doesn't have its own CallNode or object which can be extended.

@tqchen
Copy link
Member

tqchen commented Sep 22, 2021

Will comment more later. One thing I want to note is that resource handle is a special thing for PackedCFunc signature and do not generally exists in a normal packed func(that is backed by say a GPU driver closure).

As a result, we might want to think a bit more carefully on the calling convention here. My recommendation would still be try to hide this variables from the PrimFunc interface and leverage intrinsics(such as get resource handle) to obtain the handle from the environment.

We might be able to introduce call packed c func with handle intrinsic that passes the handle around packed c func

@areusch
Copy link
Contributor

areusch commented Sep 22, 2021

okay I've done a bit more research on this one. Here is one source of tension here:

  • at present, tir.call_packed codegens to TVMFuncCall and TVMFuncCall does not in fact accept resource_handle as an argument. it is not possible to supply resource_handle to tir.call_packed.
  • however, that is not actually the ultimate goal of this PR. the ultimate goal of this PR is to supply resource_handle to tir.call_cpacked, which doesn't use TVMFuncCall and instead invokes the TVMBackendPackedCFunc directly.
  • this basically works in a very narrow set of use cases which meet all of these conditions:
    a) when we don't need TVMFuncCall because we can get TVMBackendPackedCFunc* directly
    b) when we know that we have the resource_handle that PackedFunc expects.
    c) when the C interface API to AOT is used and therefore AOT Executor has a void* context to be supplied to device kernel launch PackedFunc

This really has quite a narrow use case: in the C runtime with the AOT executor and when the C interface API is used. Other use cases disqualify this usage pattern:

  • If the PackedFunc API is used we currently have no good way to pass the proposed TVM<Model>_Devices struct into AOTExecutor. We could pass it as OpaqueHandle if we had to.
  • If the C++ runtime is used, in most cases we cannot get the TVMBackendPackedCFunc* directly and must operate with the opaque TVMFunctionHandle and TVMFuncCall
  • It is possible that within a given compiled AOT module, we could employ tir.call_cpacked to refer to other DSO-exportable functions directly. However, this wouldn't necessarily extend to non-DSO-exportable functions, which in any case typically gain the context closure from sptr_to_self and/or a this capture. Also, doing so would hit the linker limitations discussed in https://discuss.tvm.apache.org/t/introduce-artifact-a-container-for-generated-code/11099/2.

Taken together, this means we need the following restrictions on this usage pattern:

  1. -runtime must be c due to the final bullet point above.
  2. -executor must be aot.
  3. -unpacked-api must be 0. When -unpacked-api is 1, we just append this to the extern call args, correct?
  4. -interface-api is likely c. As discussed above, this could work with packed, but the awkward OpaqueHandle dtype must be used.

Considering this altogether, this is essentially an attempt to simplify the multi-Module late-bound import system for use with resource-constrained embedded devices. The key piece of missing information needed without such a system is the context pointer. I see two routes forward here:

  1. We add resource_handle to tir.call_cpacked and resolve the incongruity between usage for a device kernel function and the current usage in the C Runtime as Module*.
  2. We just lower the extern schedule to a CallNode whose args includes an OpaqueHandle at a position which the driver is aware of. For instance, following the current suggestion, such OpaqueHandle would be the final arg. The question here is how to indicate to the executor that such argument should be filled from the TVMDevices struct. This could be done either as an attr of the CallNode or by introducing a new dtype (I sort of favor the former, as void* is exactly OpaqueHandle).

Since ultimately the driver must be written knowing where the "context" is going to be (either as this member or as resource_handle), the driver must also be designed around this choice. If portability between C and C++ runtimes is desired, it will need to choose an approach that works with both. While I appreciate that Module is generally overly bloated for many embedded applications, we do rely on it for the purposes of autotvm and providing the TVM RPC server. Moving away from it seems like it would add significant cognitive overhead to the design of microTVM. Given we may need resource_handle to be the Module* should TVMBackendGetFuncFromEnv need to be implemented on microTVM (admittedly, this means that the Module import mechanism is in use which is even more bloated), it kind of seems like we should avoid trying to repurpose resource_handle here and just adopt route 2 above.

Ultimately at tension here is TVMBackendGetFuncFromEnv and the need to provide a less bloated context for use with the TVM C runtime. I'm not sold on this opinion, just writing down my thoughts. Perhaps others could weigh in here. Supporting Module's import mechanism on the C runtime is indeed considerable extra complexity for questionable benefit. And, this RFC is part of a larger solution which supplants the need for Module import. Perhaps the outcome of the Article RFC (e.g. decision on what the long-term expectations of the compiler are on the module load process) also play in here as well.

@areusch
Copy link
Contributor

areusch commented Oct 12, 2021

@mbs-octoml @electriclillies and I discussed this at length and we followed up with @Mousius offline as well. While it is tempting to re-use resource_handle to pass a this context void* through to Device API impls, it seems like it ultimately amounts to carving out an exception case where, when targeting the C Runtime using the AOT Executor, TIR gets to control resource_handle; in any other case, it's either impossible to or requires significant runtime changes. A concern is that driver authors would ideally like a predictable programming model. Going the resource_handle route means that the programming model is more complex--two different ways to pass device_api_context depending on the targeted runtime and executor.

Given this, it seems like the best route is just to add it to the arguments list. @Mousius is going to try this and see if we run in to any other issues. According to @Mousius , past attempts to fully encapsulate the device_api_context parameter in tir::Call have maybe ran into issues with ArgBinder so we're going to see if this works and go from there.

@tqchen
Copy link
Member

tqchen commented Oct 12, 2021

Thanks @areusch for a great summary, indeed this is a situation where explicit could be better. Something in the middle ground might be allow packed_c function to have an intrinsic to get such context from the env.

@areusch
Copy link
Contributor

areusch commented Oct 14, 2021

@tqchen in this case we prefer not to pay the cost of retrieving such a pointer as tracking global state on an µC can be tricky. When the pointer is on the stack, it is easier to leverage argument-passing optimizations supplied by the architecture such as register-mapped arguments.

@tqchen
Copy link
Member

tqchen commented Oct 14, 2021

i agree, i mean that the intrinsic translate to the implicit ctx parameters as part of fn call

@areusch areusch added status: need update RFC needs update based on feedback and removed status: need review RFC needs review labels Nov 8, 2021
@areusch
Copy link
Contributor

areusch commented Mar 24, 2022

hey @Mousius what's the status on this PR? is it stale/shall we close it?

@Mousius
Copy link
Member Author

Mousius commented Apr 11, 2022

Eek! Forgot to close this, we merged a magic variable name version of this in https://github.com/apache/tvm/pull/9501/files#diff-a9ac007ffdd51c0bc3fb5780835e17ca92f76fdb775b09c391375247940548ec which didn't have the same impact on the IR.

@Mousius Mousius closed this Apr 11, 2022
@Mousius Mousius deleted the device-api-resource-handle branch April 11, 2022 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need update RFC needs update based on feedback
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants