Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module Based Model Runtime for AOT #46

Merged
merged 4 commits into from
Feb 15, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
348 changes: 348 additions & 0 deletions rfcs/0046-module-based-model-runtime-for-aot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,348 @@
# Module-based Model Runtime Interface for AOT

- Feature Name: module_based_model_runtime_for_aot
- Start Date: 2021-09-17
- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046)
- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)

# **Summary**

This RFC describes a [Module-based Model Runtime
interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for
the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby
enabling its use from the TVM C++ Runtime.

# **Motivation**

The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled
Relay models. At the time of writing, it's now possible to codegen a TIR function which executes
Relay models that have known shapes, don't have graph-level control flow, and execute only on the
CPU device. Right now, the C runtime is the only such runtime environment which can interact with
this generated code. However, significant interest exists in enabling the C++ runtime to use the
Ahead-of-Time executor.

# **Guide-level explanation**

Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow
(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target
[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module
object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`:

```bash
ir_mod = tvm.parser.fromtext("""\
#[version = "0.0.5"]
def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
%0 = %a + %b;
%0
}"""
)

with PassConfig(opt_level=3):
factory : AotExecutorFactory = tvm.relay.build(
ir_mod, "llvm -executor=aot", module_name="my_mod")

aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0))
```

`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a
user normally would `GraphExecutor`:

```bash
aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8")))
aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8")))
aot_exec.run()
output = aot_exec.get_output(0)
assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8")
```

[1] NOTE: The target string is not the final place this customization should be made. However, it's
been the place where we've been putting runtime-related stuff. A separate RFC will split the Target
string into Target options (which affect tuning) and runtime options.

# **Reference-level explanation**

Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function
which invokes the Relay operators (implemented in TIR) in a correct order. An example is given
below:

```bash
PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} {
// attr [(nullptr)] device_id = 0
// attr [(nullptr)] device_type = 1
tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output)
}
```

The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface:

1. Allocate input and output tensors as defined in the `run_model` function using the correct Device
manupak marked this conversation as resolved.
Show resolved Hide resolved
API.
2. Provide a mapping from relay parameter name to positional argument.
3. Invoke the generated TIR function and provide profiling.

### Compiler ↔ Runtime Metadata

In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to
be communicated from Compiler to Runtime:

- The mapping between Relay parameter name and TIR argument position
- The number of inputs and outputs
- The type of each parameter
- Information sufficient to choose a Device API to allocate memory for that data.

At present, Metadata is passed from Compiler to Runtime in several different ways:

1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via
`runtime::MetadataModule`
2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`,
`vulkan`) have adopted the convention of including a
[1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106)
(NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization:

```bash
/*! \brief function information needed by device */
struct FunctionInfo {
std::string name;
std::vector<DLDataType> arg_types;
std::vector<std::string> launch_param_tags;
}
```

3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the
graph-level
[`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I did move this to become ExecutorCodegenMetadata now as it did not feel like a runtime concept.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah updated the RFC text to point to that. thanks!


```bash
/*!
* \brief Structure that can be optionally used by the executor codegen
*/
class MetadataNode : public Object {
public:
/*! \brief input information for the main function */
Array<String> inputs;
/*! \brief number of outputs of the main function */
int num_outputs = 1;
/*! \brief the executor to be used to run the model */
String executor = kTvmExecutorGraph;

String mod_name = "";
}
```

4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which
communicates statistics about memory usage and I/O operation for each TIR operator and aggregate
statistics for the top-level AOT function:

```bash
struct FunctionInfoNode : public Object {
Map<Target, Integer> workspace_sizes;
Map<Target, Integer> io_sizes;
Map<Target, Integer> constant_sizes;
Map<Target, tir::PrimFunc> tir_primfuncs;
Map<Target, Function> relay_primfuncs;
}
```


Some duplication of information is already present. Likely this is due in part to the existing
middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime.Module is produced per backend. As long as that exists, somewhere in the compiler per-backend IRModule have to exist -- that does the translation from IRModule --> runtime.Module.

However, we have hierarchical runtime.Module trees being built -- so what we can eventually do maybe create hierarchical IRModule structure. In that structure, it might make sense to keep this metadata.

cc : @tqchen @Mousius @jroesch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple points:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

The requirement to mention this here seems like stemming from the fact it is aligned with other RFC/pre-RFCs that single (tree of) IRModule --> (list of) runtime.Modules -- is a proposed change -- therefore, this motivates the change proposed here.

Thus, would it be possible to add a reference to this proposal (if any)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i'm not quite sure i follow you here. I'm happy to add a reference to the Artifact proposal, but I'm not sure it's quite exactly what I'm stating here. Here, what I mean is that the TIR-to-Runtime interface is IRModule -> (tree of runtime.Module). The existing MetadataModule (which is proposed to rename to ConstLoaderModule here) seems to have arisen out of a desire to build common infrastructure to handle loading DLTensor from .text in the C++ runtime. Here what I'm trying to point out is that since the TIR-to-Runtime interface provides no facility for the TIR-to-runtime processes to return metadata outside of the runtime::Module, this leads to duplication of information should it be required by the compiler in any way. For example, constant_sizes could be deduced from the DLTensor passed to ConstLoaderModule, but ConstLoaderModule is not supported by all runtimes and not the de-facto way to load constant data or metadata at runtime because it doesn't support encoding structs and scalar values. You can also see some duplication in CudaModule.

I think this proposal is attempting to start down the path of unifying these different methods of providing data generated during lowering to the runtime as Metadata. I think that's mainly covered here, but happy to add a reference to the Artifact thing if it helps, it just seems a bit orthogonal to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I dont dispute the claim that IRModule --> (tree of runtime.Module).

The current lowering flow creates IRModule per backend that get translated to runtime.Module(s).

I see that as (correct me if I am wrong) : Unified IRModule --> [IRModule per backend] --> tree of runtime.Module s
which is basically a host runtime.Module includes a flat array device runtime.Modules.

Now the proposal here want to attach model-level Metadata from the Unified IRModule to (root of) tree of runtime.Module s.

So the gap in the text here is that the text assumes it is common knowledge that how [IRModule per backend] disappears. Hypothetically, lets say it is not there -- then text make sense because the model-level metadata could be attached to Unified IRModule and then passed onto tree of runtime.Module s.

Now my questions is, in the absense of a proposal how to remove the IRModule per backend (or if its already there -- please link the RFC/pre-RFC), this RFC needs to outline a way how this will be communicated from Unified IRModule to tree of runtime.Module s in the current lowering flow and/or the changes brought by this RFC.

Copy link
Contributor Author

@areusch areusch Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see--so this is more about the wording a couple paragraphs down, no?

Work is currently ongoing to unify the pre-codegen IRModule into a single instance. After this work is completed, it will be much easier to produce a centralized module-level Metadata.

cc @jroesch i am actually not sure if there is an RFC describing this.

i'm hoping to describe my ambitions to "link" metadata into TIR via tir.load_metadata node in a following RFC, and I definitely would need consolidated metadata for this at the IRModule level. I'm not sure if there is anything in code-generation that strictly requires this--it's just cleaner in my book. Let me know if I'm missing anything here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per our offline discussion, clarified how the metadata is carried through the current compiler design, removed references to un-RFC'd design efforts and replaced with text to motivate them. also clarified some wording--ptal

factor may be: since `runtime::Module` are responsible for their own serialization, and passing
`Node` across `PackedFunc` requires a cast, the lack of a centralized facility for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by Node here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tvm::Node; clarified

`runtime::Modules` to obtain module-level Metadata has led backend authors to roll their own. This
pattern means that it's very difficult to assess the full scope of metadata handed to the runtime,
particularly across all backends.

Work is currently ongoing to unify the pre-codegen `IRModule` into a single instance. After this
work is completed, it will be much easier to produce a centralized module-level Metadata. This RFC
argues for the expansion of `runtime::MetadataNode` in the following ways:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following arguments argues for re-structure (not just runtime::MetadataNode).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see how 1.) is relevant to the expansion of runtime::MetadataNode.

My comment was to adjust the text here to say something like "modify the lowering flow" rather than "the expansion of runtime::MetadataNode" -- because latter is a sub-part of the former and I believe former is what is proposed here and covered by the following points.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see, changed the wording. i guess i was trying to say that the name MetadataNode should cover a more complex thing, but in practice you're right that this proposes a restructuring.


1. Rename `runtime::MetadataModule` to `runtime::ConstLoaderModule` to disambiguate the two and make
manupak marked this conversation as resolved.
Show resolved Hide resolved
its purpose in life clearer.
2. Expand `input_args` in the existing `runtime::Metadata` to parity with `runtime::FunctionInfo`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you are referring to runtime::MetadataNode here.

This is now : https://github.com/apache/tvm/blob/e6af87491eb250a3266b1f09b36055c6ee79146b/src/relay/backend/utils.h#L60-L84

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah. i think perhaps this is done now, then, since ExecutorCodegenMetadata exports tir::Var. However, to then fit this proposal we need to reduce it to something exportable. updated the text to clarify, see what you think.

plus include `_sizes` from `tvm::relay::transform::FunctionInfoNode` and the required `shape` and
`dtype` information from the beginning of this section.
3. Introduce `ModelMetadataModule` to contain this information for use with the C++ runtime.

```bash
class ModelMetadataModule {
virtual GetFunction(const std::string& name, ObjectPtr<Object>& sptr_to_self) {
if (name == "get_model_metadata") {
return PackedFunc([](TVMArgs args, TVMRetValue* rv) {
*rv = ModelMetadata(metadata_);
});
} else {
return PackedFunc();
}
}

const struct ModelMetadata* metadata_;
};
```

4. Introduce an optional implementation for the C runtime.
5. Export runtime::Metadata to Model Library Format.

The new proposed definition of `runtime::Metadata` is as follows. NOTE that this is a C definition
because it will be made available both the C and C++ runtimes. A C++ wrapper will be written.

```bash
struct ParameterInfo {
const char* relay_name_hint;
const char* tir_name_hint;
int64_t* shape;
int64_t ndim;
DLDataType dtype;
TargetDevice target_device; // NOTE: future addition; not covered in this RFC.
};

struct FunctionInfo {
const char* function_name;
struct ParameterInfo* params;
int num_inputs;
int num_outputs;
int64_t workspace_size_bytes;
int64_t io_size_bytes;
int64_t constant_size_bytes;
};

typedef struct Metadata {
int version;
struct FunctionInfo* functions;
const char* module_name;
};
```

### Internal workings of AotExecutor (`--runtime=c++ --interface-api=packed`)

Given the above, we can now sketch out the way AotExecutor should behave (for C++ runtime).

Module initialization will:

1. Load the `ModelMetadata` using `get_model_metadata` PackedFunc.
2. Allocate space for the parameters to `tvmgen_<model_name>_run_model`.
3. Lookup and load any linked parameters using the `--link-params` mechanism.

- `set_input`, `get_input`, `get_output` all work as they do in `GraphExecutor`.
- `run` assembles `TVMArgs` containing inputs + outputs and invokes `tvmgen_<model_name>_run_model`.
- `time_evaluator` is implemented in the same way as it is in `GraphExecutor`. Timing `run_model` is
done using the CPU timer.

### Internal workings of AotExecutor (`--runtime=c --interface-api=packed`)

The C runtime version works in a very similar way with C accessor functions for the `ModelMetadata`.
manupak marked this conversation as resolved.
Show resolved Hide resolved

### No AotExecutor implementation planned (`--runtime=c --interface-api=c`)

When `-interface-api=c` is present in the Target string, the `run_model` function no longer accepts
the PackedFunc interface and instead accepts `arg_values` directly as positional args:

```bash
TVM_DLL int32_t tvmgen_default_run_model(void* arg0, void* arg1, void* arg2) {
void* input = arg0;
void* input1 = arg1;
void* output = arg2;
(void)tvmgen_default_fused_multiply(input, input1, output);
return 0;
}
```

Additional work is underway to wrap this in a firmware-friendly interface. A core design goal of
this interface is to offload all memory management tasks to the calling code to facilitate
integration with bare-metal embedded devices.

Therefore, it would go against the goals of the C interface to introduce a generic runtime wrapper
compatible with PackedFunc calling convention. It may be possible to do so in the future, but it
manupak marked this conversation as resolved.
Show resolved Hide resolved
would be great to motivate such an implementation with rationale more related to the embedded
runtime setting.

### Operator Calling Convention

TVM uses 3 internal calling conventions:

1. `call_packed` - the traditional calling convention used in the C++ runtime
2. `call_cpacked` - similar to `call_packed`, but TVM presumes a symbol is linked into the binary
containing that function name (e.g. `TVMBackendGetFuncFromEnv` is not used to lookup the
PackedFunc)
3. `unpacked` - used with microTVM to avoid overhead of PackedFunc calls in statically-linked
binaries. See [AOT optimisations for Embedded Targets
RFC](https://discuss.tvm.apache.org/t/rfc-utvm-aot-optimisations-for-embedded-targets/9849).

The AOT `run_func` can use a different calling convention externally (e.g. `--interface-api`) than
that used internally with Implemented Operators (`--unpacked-args`). However, there are some
circumstances under which not all choices can be used:

- When targeting the C++ runtime: `call_packed` must be used when non-DSO-exportable modules exist;
otherwise `call_cpacked` may be used. `unpacked` may not be used with AOT Executor as the
interface has not settled.
- When targeting the C runtime: any calling convention may be selected for either the interface API
or the operator calling convention. However, when using `--interface-api=c` (e.g. `unpacked`
`run_func` calling convention), you must also use the `unpacked` calling convention with
Implemented Operators.

# **Drawbacks**

Why should we  *not*  do this?

- This requires quite a bit of rework of the Metadata-passing mechanism, with potential for breakage.
- It also introduces yet another Executor to the runtime to maintain.
- It may introduce additional constraints on the `<C-runtime, C-interface>` implementation, which
may make it more difficult to make progress on microTVM.

# **Rationale and alternatives**

- Why is this design the best in the space of possible designs?
- What other designs have been considered and what is the rationale for not choosing them?
- What is the impact of not doing this?

This RFC doesn't address the question of "why add an AOT executor?" The RFC which added it in the
first place is a better location to look for rationale to motivate that. In general, not following
through with this RFC would relegate the AOT executor to a C-runtime-only component. There is
significant interest in AOT from C++ runtime users, and maintaining compatibility with both
increases the chances that AOT executor will support all TVM runtime features.

The controversial pieces of this RFC addressed are as follows:

### Should we maintain a unified approach to code-generating the AOT executor?

An alternative approach could introduce an additional e.g. `aot_cpp_executor_codegen.cc` and create
a third pathway (in the Graph/AOT build flow). Doing this allows us to implement runtime-specific
compiler primitives, which may simplify both pipelines. However, soon those pipelines will grow more
complicated as features are added to leverage AOT, such as Unified Static Memory Planning. The
burden of double-maintenance of those features outweighs the advantage of a simplified
implementation. It also makes it easier for newcomers to understand the compiler.

### Should we attempt to unify the Metadata?

Metadata could be left in the scattered form it is now. It may be that the implementation of this
RFC prioritizes expansion of `ModelMetadata` over propagating it to the various non-DSO-exportable
`runtime::Module`. Ultimately though, maintaining separate function-level metadata adds confusion
and code bloat. It also makes it harder to reason about the compiler as a whole. For these reasons,
this RFC advocates for centralizing the Metadata.

# **Prior art**

There is no known prior art of a C++-runtime-compatible AOT implementation.

# **Unresolved questions**

- Who will we break if we unify Model metadata?
- Will this play nicely with the VM compilation flow when it is unified?
- How will TargetDevice come in to play here?

# **Future possibilities**

Not covered in this RFC, but particularly useful with the C++ runtime, is heterogenous execution. In
the present PoC, AotExecutor will CHECK-fail if a non-cpu device is given. A future implementation
will annotate the parameters with one of:

- A `device_type` — in which case mapping from `device_type` to `tvm::Device` will be done in the
same way as the `GraphExecutor`
- A `target_device` — in which case a new mapping will be defined

Aside from that, the larger unresolved bit which makes it difficult to add heterogenous execution is:

- How should AOT codegen invoke the Device API?

Before this question can be answered, some progress needs to be made on the [C device
API](https://discuss.tvm.apache.org/t/pre-rfc-c-device-api/10874) and we need to define TIR
bindings.