<a target="_blank" href="https://colab.research.google.com/github/cyx-6/TVM-Demo/blob/main/tvmscript.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Introduction to Unity: TVMScript

## TVMScript Overview

TVMScript is a TVM frontend, used to **build** and **inspect** TVM IRModule’s in Python,

e.g. operators, tensor programs and tensor intrinsics, etc.

![overview](TVMScript.svg)

### TVMScript Basic Concepts

TVMScript is composed of two components:
 - **Parser** for building TVM IRModule's
 - **Printer** for inspecting TVM IRModules's

Let us start with a simple matrix multiplication, as example of initial `IRModule` building and inspecting.

![overview](TVMScript_1.svg)

#### TVMScript Parser

In [1]:
try:
  import tvm
except:
  !python3 -m  pip install mlc-ai-nightly -f https://mlc.ai/wheels

from tvm.script import ir as I, relax as R, tir as T


@I.ir_module
class Matmul:
    @R.function
    def main(
        a: R.Tensor((128, 128), "float32"), b: R.Tensor((128, 128), "float32")
    ) -> R.Tensor((128, 128), "float32"):
        out: R.Tensor((128, 128), "float32") = R.matmul(a, b)
        return out

Above we first import the TVMScript modules for different IR's,
 - `ir` for base IR
 - `relax` for Relax IR
 - `tir` for Tensor IR

Typically, we use `I`, `R` and `T` in short of these IR's.
 
Then we define a Module `Matmul`, with only one member function `main`.

And in the `main` function, we input 2 Relax `Tensor`, returning one Relax `Tensor` as the result of matrix multiplication.

Finally, we add the decorators `@I.ir_module` and `@R.function` to parse our Python module/function via TVMScript parser.

Let us check what the decorators returns.

In [2]:
print(type(Matmul))
print(type(Matmul["main"]))

<class 'tvm.ir.module.IRModule'>
<class 'tvm.relax.expr.Function'>


The result types prove that we have successfully parsed the module `Matmul` by TVMScript parser.
 - The `Matmul` module is now TVM IR module instaed of Python class.
 - The `main` function is now TVM Relax function instead of Python function.

Note: To access functions in `IRModule`, use `IRModule[Function]` instead of `IRModule.Function`.

### TVMScript Printer

To inspect the TVM `IRModule`, just `print(IRModule)` or `print(IRModule.script())`.

Both the `str(IRModule)` and `IRModule.script()` call the TVMScript printer to print as plain text.

Or we call `IRModule.show()` to print with code hightlighted and formatted as below.

In [3]:
Matmul.show()

### TVMScript in TVM Unity

TVMScript is not a stand-alone component in TVM. On the contrary, it works closely with other components in TVM Unity.

Next, we shall show how TVMScript works in a typical TVM scenario.

![overview](TVMScript_2.svg)

Let us look into the `Matmul` IRModule built above, especially the core part `R.matmul` operator.

Here, we apply our first transform over our initial `Matmul` IRModule - `LegalizeOps`.

Note: `LegalizeOps` enables us to expand higher-level Relax operators to a new operator in lower-level Tensor IR.

In [4]:
from tvm.relax.transform import LegalizeOps

legalized_matmul = LegalizeOps()(Matmul)
legalized_matmul.show()

As the printed result shows, we have replaced the `R.matmul` operator into a `R.call_tir`.

And the `R.call_tir` is a function call to call the newly added lower-level `matmul` tir function.

The `matmul` function shows how we compute the matrix multiplication in detail:
 - `T.grid` for the loops of computation.
 - `T.block` for the computation block for further transforms
 - `T.axis` for the attributes of different dimensions
 - `T.init` for the initial values of the accumalators

To further transform our tir function, we tend to the `tir.Schedule`, which is designed to manipulate our IRModule.

In [5]:
from tvm.tir import schedule as sch

sch = sch.Schedule(legalized_matmul)
block = sch.get_block("matmul", func_name="matmul")
sch.mod.show(black_format=False, obj_to_underline=[sch.get_sref(block).stmt.init])

The printed result shows that we have already loaded our IRModule into the `tir.Schedule`.

And the underlined part shows what we are about to transform next.

Note: TVMScript printer offers code decoration with underlines and annotations.

We may transform the initialization part - `T.init`, with `fill_fragment` in [Nvidia Warp Matrix Functions](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#id40) as an example transform.

To use the tensor intrinsic like `fill_fragment` in TVM, we need a `TensorIntrin`.

The `TensorIntrin` consists of two parts:
 - description of the IR with pattern to match
 - implementation of the IR using tensor intrinsic to replace

And both description and implementation are built by TVMScript parser.

Also, TVM has built some popular `TensorIntrin` internally, while you can bring your own `TensorIntrin` as well.

![overview](TVMScript_1.svg)

In [6]:
from tvm.tir import TensorIntrin
import tvm.tir.tensor_intrin

intrinsic = TensorIntrin.get("wmma_fill_16x16x16_f32")
print("TensorIntrin.desc:")
intrinsic.desc.show()
print("TensorIntrin.impl:")
intrinsic.impl.show()


TensorIntrin.desc:


TensorIntrin.impl:


For our `matmul` tir function, we apply the `TensorIntrin` named `"wmma_fill_16x16x16_f32"`.

And the description of the `TensorIntrin` shows the demanded pattern of `T.init`.
 - The tensor we write should be in shape of `16 * 16`
 - The tensor we write should be in scope of `"wmma.accumulator"`

So we shall transform our `T.init` to above pattern. 

For the sake of shape, we just split and reorder our for loops as below.

In [7]:
i, j, k = sch.get_loops(block)
i0, i1 = sch.split(i, [None, 16])
j0, j1 = sch.split(j, [None, 16])
sch.reorder(i0, j0, i1, j1, k)
sch.mod.show(
    black_format=False,
    obj_to_underline=[sch.get_sref(i1).stmt.extent, sch.get_sref(j1).stmt.extent],
)

For the sake of scope, we just cache our tensor to write to `"wmma.accumulator"` as below.

In [8]:
frag = sch.cache_write(block, 0, "wmma.accumulator")
sch.mod.show(
    black_format=False, obj_to_underline=[sch.mod["matmul"].body.block.alloc_buffers[0]]
)

Then we lift up our `T.init` and make it a `block` as below to match the demanded pattern of `TensorIntrin`.

In [9]:
block_init = sch.decompose_reduction(block, i1)
sch.mod.show(black_format=False, obj_to_underline=[sch.get_sref(block_init).stmt])


Now our original `T.init` perfectly matches the pattern of the `TensorIntrin` description.

Let us finally transform the pattern of `TensorIntrin` to its implementation as below.

In [10]:
sch.tensorize(sch.get_loops(block_init)[-2], "wmma_fill_16x16x16_f32")
sch.mod.show(
    black_format=False,
    obj_to_underline=[sch.get_sref(sch.get_block("matmul_init_o", "matmul")).stmt.body],
)

The underlined part shows that we have successfully transformed our `matmul` function `T.init` part into the tensor-core based IR.

![overview](TVMScript.svg)

Let us back the first overview of TVMScript in TVM unity. 

As the demonstrated examples, TVMScript plays an important roles in 
 - building and inspecting the IRModule's in TVM
 - helping us transform our IRModule's in an interactive way