<a href="https://colab.research.google.com/github/aquapapaya/BYOC/blob/main/How_BYOC_annotates_a_Relay_graph_(byoc_target).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BYOC Demo
**Author**: [Kuen-Wey Lin](https://github.com/aquapapaya)

We use a simple Relay graph to walkthrough the BYOC workflow.


In [None]:
%%shell
# Installs the latest dev build of TVM from pip
pip install apache-tvm --pre

Collecting apache-tvm
  Downloading apache_tvm-0.14.dev273-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (69.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.2/69.2 MB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: apache-tvm
Successfully installed apache-tvm-0.14.dev273




In [None]:
import tvm
from tvm import relay
import tvm.relay.testing

Since the entire Relay graph is pretty large, here we use a simple Relay pass to show the total number of operators it has and what they are.

In [None]:
def profile_graph(func):
    class OpProfiler(tvm.relay.ExprVisitor):
        def __init__(self):
            super().__init__()
            self.ops = {}

        def visit_call(self, call):
            op = call.op
            if op not in self.ops:
                self.ops[op] = 0
            self.ops[op] += 1
            super().visit_call(call)

        def get_byoc_graph_num(self):
            cnt = 0
            for op in self.ops:
                if str(op).find("byoc-target") != -1:
                    cnt += 1
            return cnt

    profiler = OpProfiler()
    profiler.visit(func)
    print("Total number of operators: %d" % sum(profiler.ops.values()))
    print("Detail breakdown")
    for op, count in profiler.ops.items():
        print("\t%s: %d" % (op, count))
    print("byoc-target subgraph #: %d" % profiler.get_byoc_graph_num())

Here we demonstrate how BYOC annotates a Relay graph.
Let's first define a simple Relay graph with supported and unsupported operators.



In [None]:
# Define the neural network
# Get the symbol definition and random weight of a network
mod, params = relay.testing.vgg.get_workload(batch_size=1, num_classes=1000,
    image_shape=(3, 224, 224), dtype='float32', num_layers=11
)
print(mod)
profile_graph(mod["main"])

def @main(%data: Tensor[(1, 3, 224, 224), float32] /* ty=Tensor[(1, 3, 224, 224), float32] */, %conv1_1_weight: Tensor[(64, 3, 3, 3), float32] /* ty=Tensor[(64, 3, 3, 3), float32] */, %conv1_1_bias: Tensor[(64), float32] /* ty=Tensor[(64), float32] */, %conv2_1_weight: Tensor[(128, 64, 3, 3), float32] /* ty=Tensor[(128, 64, 3, 3), float32] */, %conv2_1_bias: Tensor[(128), float32] /* ty=Tensor[(128), float32] */, %conv3_1_weight: Tensor[(256, 128, 3, 3), float32] /* ty=Tensor[(256, 128, 3, 3), float32] */, %conv3_1_bias: Tensor[(256), float32] /* ty=Tensor[(256), float32] */, %conv3_2_weight: Tensor[(256, 256, 3, 3), float32] /* ty=Tensor[(256, 256, 3, 3), float32] */, %conv3_2_bias: Tensor[(256), float32] /* ty=Tensor[(256), float32] */, %conv4_1_weight: Tensor[(512, 256, 3, 3), float32] /* ty=Tensor[(512, 256, 3, 3), float32] */, %conv4_1_bias: Tensor[(512), float32] /* ty=Tensor[(512), float32] */, %conv4_2_weight: Tensor[(512, 512, 3, 3), float32] /* ty=Tensor[(512, 512, 3, 3), flo

Then we define the annotation rules.
Developers can specify both operator-based and pattern-based annotation rules. Here, we define the single operators `dense` is supported. In addition, we also define two supported patterns `(Conv2D - (Bias) - ReLU)`.



In [None]:
# Operator-based annotation rules
@tvm.ir.register_op_attr("nn.dense", "target.byoc-target")
def dense(expr):
    return True

# Pattern-based annotation rules
def make_pattern(with_bias=True):
    from tvm.relay.dataflow_pattern import is_op, wildcard
    data = wildcard()
    weight = wildcard()
    bias = wildcard()
    conv = is_op("nn.conv2d")(data, weight)
    if with_bias:
        conv_out = is_op("nn.bias_add")(conv, bias)
    else:
        conv_out = conv
    return is_op("nn.relu")(conv_out)

conv2d_bias_relu_pat = ("byoc-target.conv2d_relu_with_bias", make_pattern(with_bias=True))
conv2d_relu_pat = ("byoc-target.conv2d_relu_wo_bias", make_pattern(with_bias=False))
patterns = [conv2d_bias_relu_pat, conv2d_relu_pat]

Now let's perform pattern-based annotation:

In [None]:
mod2 = relay.transform.MergeComposite(patterns)(mod)
print(mod2)
profile_graph(mod2["main"])

def @main(%data: Tensor[(1, 3, 224, 224), float32] /* ty=Tensor[(1, 3, 224, 224), float32] */, %conv1_1_weight: Tensor[(64, 3, 3, 3), float32] /* ty=Tensor[(64, 3, 3, 3), float32] */, %conv1_1_bias: Tensor[(64), float32] /* ty=Tensor[(64), float32] */, %conv2_1_weight: Tensor[(128, 64, 3, 3), float32] /* ty=Tensor[(128, 64, 3, 3), float32] */, %conv2_1_bias: Tensor[(128), float32] /* ty=Tensor[(128), float32] */, %conv3_1_weight: Tensor[(256, 128, 3, 3), float32] /* ty=Tensor[(256, 128, 3, 3), float32] */, %conv3_1_bias: Tensor[(256), float32] /* ty=Tensor[(256), float32] */, %conv3_2_weight: Tensor[(256, 256, 3, 3), float32] /* ty=Tensor[(256, 256, 3, 3), float32] */, %conv3_2_bias: Tensor[(256), float32] /* ty=Tensor[(256), float32] */, %conv4_1_weight: Tensor[(512, 256, 3, 3), float32] /* ty=Tensor[(512, 256, 3, 3), float32] */, %conv4_1_bias: Tensor[(512), float32] /* ty=Tensor[(512), float32] */, %conv4_2_weight: Tensor[(512, 512, 3, 3), float32] /* ty=Tensor[(512, 512, 3, 3), flo

A composite function has two specialized attributes -- `PartitionedFromPattern` and `Composite`:
*   PartitionedFromPattern: Indicate the operators in the function body.
*   Composite: Indicate the pattern name we defined.

Next, let's continue to apply the operator-based annotation rules:

In [None]:
mod3 = relay.transform.AnnotateTarget("byoc-target")(mod2)
print(mod3)
profile_graph(mod3["main"])

def @main(%data: Tensor[(1, 3, 224, 224), float32] /* ty=Tensor[(1, 3, 224, 224), float32] */, %conv1_1_weight: Tensor[(64, 3, 3, 3), float32] /* ty=Tensor[(64, 3, 3, 3), float32] */, %conv1_1_bias: Tensor[(64), float32] /* ty=Tensor[(64), float32] */, %conv2_1_weight: Tensor[(128, 64, 3, 3), float32] /* ty=Tensor[(128, 64, 3, 3), float32] */, %conv2_1_bias: Tensor[(128), float32] /* ty=Tensor[(128), float32] */, %conv3_1_weight: Tensor[(256, 128, 3, 3), float32] /* ty=Tensor[(256, 128, 3, 3), float32] */, %conv3_1_bias: Tensor[(256), float32] /* ty=Tensor[(256), float32] */, %conv3_2_weight: Tensor[(256, 256, 3, 3), float32] /* ty=Tensor[(256, 256, 3, 3), float32] */, %conv3_2_bias: Tensor[(256), float32] /* ty=Tensor[(256), float32] */, %conv4_1_weight: Tensor[(512, 256, 3, 3), float32] /* ty=Tensor[(512, 256, 3, 3), float32] */, %conv4_1_bias: Tensor[(512), float32] /* ty=Tensor[(512), float32] */, %conv4_2_weight: Tensor[(512, 512, 3, 3), float32] /* ty=Tensor[(512, 512, 3, 3), flo

In [None]:
mod4 = relay.transform.MergeCompilerRegions()(mod3)
print(mod4)
profile_graph(mod4["main"])

def @main(%data: Tensor[(1, 3, 224, 224), float32] /* ty=Tensor[(1, 3, 224, 224), float32] */, %conv1_1_weight: Tensor[(64, 3, 3, 3), float32] /* ty=Tensor[(64, 3, 3, 3), float32] */, %conv1_1_bias: Tensor[(64), float32] /* ty=Tensor[(64), float32] */, %conv2_1_weight: Tensor[(128, 64, 3, 3), float32] /* ty=Tensor[(128, 64, 3, 3), float32] */, %conv2_1_bias: Tensor[(128), float32] /* ty=Tensor[(128), float32] */, %conv3_1_weight: Tensor[(256, 128, 3, 3), float32] /* ty=Tensor[(256, 128, 3, 3), float32] */, %conv3_1_bias: Tensor[(256), float32] /* ty=Tensor[(256), float32] */, %conv3_2_weight: Tensor[(256, 256, 3, 3), float32] /* ty=Tensor[(256, 256, 3, 3), float32] */, %conv3_2_bias: Tensor[(256), float32] /* ty=Tensor[(256), float32] */, %conv4_1_weight: Tensor[(512, 256, 3, 3), float32] /* ty=Tensor[(512, 256, 3, 3), float32] */, %conv4_1_bias: Tensor[(512), float32] /* ty=Tensor[(512), float32] */, %conv4_2_weight: Tensor[(512, 512, 3, 3), float32] /* ty=Tensor[(512, 512, 3, 3), flo

Almost all nodes in the graph are annotated with `compiler_begin` and `compiler_end` nodes. `compiler_*` nodes has an attribute `compiler` to indicate which target should this node go. In this example, it can be `default` or `byoc-target`.

Composite function calls are also annotated with `compiler=byoc-target`, indicating that this entire function can be offloaded.

We use the pass, `MergeCompilerRegion`, to merge them so that we can minimize the number of subgraphs.

Finally, let's partition this graph:

In [None]:
mod5 = relay.transform.PartitionGraph()(mod4)
print(mod5)

def @main(%data: Tensor[(1, 3, 224, 224), float32] /* ty=Tensor[(1, 3, 224, 224), float32] */, %conv1_1_weight: Tensor[(64, 3, 3, 3), float32] /* ty=Tensor[(64, 3, 3, 3), float32] */, %conv1_1_bias: Tensor[(64), float32] /* ty=Tensor[(64), float32] */, %conv2_1_weight: Tensor[(128, 64, 3, 3), float32] /* ty=Tensor[(128, 64, 3, 3), float32] */, %conv2_1_bias: Tensor[(128), float32] /* ty=Tensor[(128), float32] */, %conv3_1_weight: Tensor[(256, 128, 3, 3), float32] /* ty=Tensor[(256, 128, 3, 3), float32] */, %conv3_1_bias: Tensor[(256), float32] /* ty=Tensor[(256), float32] */, %conv3_2_weight: Tensor[(256, 256, 3, 3), float32] /* ty=Tensor[(256, 256, 3, 3), float32] */, %conv3_2_bias: Tensor[(256), float32] /* ty=Tensor[(256), float32] */, %conv4_1_weight: Tensor[(512, 256, 3, 3), float32] /* ty=Tensor[(512, 256, 3, 3), float32] */, %conv4_1_bias: Tensor[(512), float32] /* ty=Tensor[(512), float32] */, %conv4_2_weight: Tensor[(512, 512, 3, 3), float32] /* ty=Tensor[(512, 512, 3, 3), flo

We can see that 8 subgraphs have been partitioned for `byoc-target`.



1.   @tvmgen_default_byoc_target_main_0
2.   @tvmgen_default_byoc_target_main_3
3.   @tvmgen_default_byoc_target_main_6
4.   @tvmgen_default_byoc_target_main_11
5.   @tvmgen_default_byoc_target_main_16
6.   @tvmgen_default_byoc_target_main_21
7.   @tvmgen_default_byoc_target_main_23
8.   @tvmgen_default_byoc_target_main_25

Each partitioned function will be sent to the `byoc-target` codegen for code generation.

As a result, you can imagine that the customized codegen only needs to consider the subgraphs without worrying about rest parts of the graph.

