# Apache TVM - an in-depth look

This notebook will demonstrate basics of TVM expressions and schedules.

Let's start with importing TVM:

In [None]:
import tvm
from tvm import te

import difflib
import sys


def compute_diff(s1: str, s2: str):
    """
    Demonstrates differences between two strings, line by line.
    
    Parameters
    ----------
    s1: str
        First sequence to compare
    s2: str
        Second sequence to compare
    """
    s1split = s1.split('\n')
    s2split = s2.split('\n')
    delta = difflib.ndiff(
        s1split,
        s2split
    )
    for line in delta:
        print(line)

## Defining schedules

**Schedules** are set of transformations applied to computations.

`tvm.te` provides Tensor Expressions used both by Relay to represent operations in the model functions, as in schedules/optimization strategies to organize operations.

### Creating computation

Let's perform element-wise matrix multiplication.

* `te.var` define single-value variables.
* `te.placeholder` are responsible for creating and managing space for tensors.
* `te.compute` constructs a new tensor by computing data over the shape domain with given function.

In [None]:
n = te.var('n')
m = te.var('m')

A = te.placeholder((m, n), name='A')
B = te.placeholder((m, n), name='B')
C = te.compute((m, n), lambda i, j: A[i, j] * B[i, j], name='C')

schedule = te.create_schedule([C.op])

### Lowering computations

`tvm.lower` transforms the computation definition into real callable function.

In [None]:
base_function = str(tvm.lower(schedule, [A, B, C], simple_mode=True))

print(base_function)

### Splitting and tiling computations

https://tvm.apache.org/docs/reference/api/python/te.html#tvm.te.Stage.split

`split` splits a given axis by `factor` into outer and inner axis (inner axis has `factor` length), where inner axis has a `factor` length

In [None]:
n = te.var('n')
m = te.var('m')

A = te.placeholder((m, n), name='A')
B = te.placeholder((m, n), name='B')
C = te.compute((m, n), lambda i, j: A[i, j] * B[i, j], name='C')

schedule = te.create_schedule([C.op])

xo, xi = schedule[C].split(C.op.axis[0], factor=32)

split_function = str(tvm.lower(schedule, [A, B, C], simple_mode=True))

print(split_function)

In [None]:
compute_diff(base_function, split_function)

https://tvm.apache.org/docs/reference/api/python/te.html#tvm.te.Stage.tile

Same as split, but in 2D - tiles the computations along given axes

In [None]:
n = te.var('n')
m = te.var('m')

A = te.placeholder((m, n), name='A')
B = te.placeholder((m, n), name='B')
C = te.compute((m, n), lambda i, j: A[i, j] * B[i, j], name='C')

schedule = te.create_schedule([C.op])

xo, xi, yo, yi = schedule[C].tile(C.op.axis[0], C.op.axis[1], x_factor=16, y_factor=8)

tile_function = str(tvm.lower(schedule, [A, B, C], simple_mode=True))

print(split_function)

In [None]:
compute_diff(base_function, tile_function)

### Fusing axes

https://tvm.apache.org/docs/reference/api/python/te.html#tvm.te.Stage.fuse

Fuses two consecutive axes into one

In [None]:
n = te.var('n')
m = te.var('m')

A = te.placeholder((m, n), name='A')
B = te.placeholder((m, n), name='B')
C = te.compute((m, n), lambda i, j: A[i, j] * B[i, j], name='C')

schedule = te.create_schedule([C.op])

fusedaxis = schedule[C].fuse(C.op.axis[0], C.op.axis[1])

fuse_function = str(tvm.lower(schedule, [A, B, C], simple_mode=True))

print(fuse_function)

In [None]:
compute_diff(base_function, fuse_function)

### Binding thread axis

Threading is a common concept in GEMM and linear algebra computations. It is possible to bind a specified axis to threads, e.g. CUDA thread blocks and threads.

In [None]:
n = te.var('n')
m = te.var('m')

A = te.placeholder((m, n), name='A')
B = te.placeholder((m, n), name='B')
C = te.compute((m, n), lambda i, j: A[i, j] * B[i, j], name='C')

schedule = te.create_schedule([C.op])

co, ci = schedule[C].split(C.op.axis[0], factor=64)

schedule[C].bind(co, te.thread_axis('blockIdx.x'))
schedule[C].bind(ci, te.thread_axis('threadIdx.x'))

bind_function = str(tvm.lower(schedule, [A, B, C], simple_mode=True))

print(bind_function)

In [None]:
compute_diff(base_function, bind_function)

### Reordering computation of axes

https://tvm.apache.org/docs/reference/api/python/te.html#tvm.te.Stage.reorder

Reorders computation of axes - let's test it on tiled example

In [None]:
n = te.var('n')
m = te.var('m')

A = te.placeholder((m, n), name='A')
B = te.placeholder((m, n), name='B')
C = te.compute((m, n), lambda i, j: A[i, j] * B[i, j], name='C')

schedule = te.create_schedule([C.op])

xo, xi, yo, yi = schedule[C].tile(C.op.axis[0], C.op.axis[1], x_factor=16, y_factor=8)

schedule[C].reorder(xo, yo, xi, yi)

reordered_function = str(tvm.lower(schedule, [A, B, C], simple_mode=True))

print(reordered_function)

In [None]:
compute_diff(tile_function, reordered_function)

### Shifting computations

Let's define a schedule with multiple operations in it

In [None]:
m = te.var('m')

A = te.placeholder((m,), name="A")
B = te.compute((m,), lambda i: A[i] + 1, name="B")
C = te.compute((m,), lambda i: B[i] * 2, name="C")

schedule = te.create_schedule(C.op)
base_op_chain = str(tvm.lower(schedule, [A, B, C], simple_mode=True))
print(base_op_chain)

Each computation is handled separately.
However, it is possible to move computations so they can share the same loop.
For this, we can use `compute_at`.

https://tvm.apache.org/docs/reference/api/python/te.html#tvm.te.Stage.compute_at

In [None]:
m = te.var('m')

A = te.placeholder((m,), name="A")
B = te.compute((m,), lambda i: A[i] + 1, name="B")
C = te.compute((m,), lambda i: B[i] * 2, name="C")

schedule = te.create_schedule(C.op)

schedule[B].compute_at(schedule[C], C.op.axis[0])

computeshift_op_chain = str(tvm.lower(schedule, [A, B, C], simple_mode=True))
print(computeshift_op_chain)

In [None]:
compute_diff(base_op_chain, computeshift_op_chain)

## Axis reduction

In neural network models, one of the most popular scenarios is reduction along given axis using such functions as +, -, *

In TVM, axis along which reduction occurs are created using `tvm.te.reduce_axis` constructors and stored in `tvm.te.Tensor.op.reduce_axis` (regular axes are stored in `tvm.te.Tensor.op.axis`.

In [None]:
n = te.var("n")
m = te.var("m")

A = te.placeholder((n, m), name="A")

k = te.reduce_axis((0, m), "k")

B = te.compute((n,), lambda i: te.sum(A[i, k], axis=k), name="B")

schedule = te.create_schedule(B.op)
reduced_axis = str(tvm.lower(schedule, [A, B], simple_mode=True))

print(reduced_axis)

It is also possible to perform `split` and `bind` on reduce axis.

## Lowering of operations in TVM

`tvm.te` module provides all kinds of typical functions occuring in linear algebra and neural networks.

See [`tvm.te` documentation](https://tvm.apache.org/docs/reference/api/python/te.html) for more details.

When building the model, those functions (so called **Unified intrinsic calls**) are replaced with target-specific functions and/or implementations.

### Sample implementation of operation

Let's create a schedule computing sigmoid function and check it's OpenCL implementation.

*Note: `blockIdx.x` and `threadIdx.x` are used in OpenCL to represent GPU workgroups and their individual threads - they are accessed via `get_group_id` and `get_local_id`*

In [None]:
n = te.var("n")
A = te.placeholder((n,), name="A")
B = te.compute(A.shape, lambda i: te.sigmoid(A[i]), name="B")
schedule = te.create_schedule(B.op)
num_thread = 64
bx, tx = schedule[B].split(B.op.axis[0], factor=num_thread)
schedule[B].bind(bx, te.thread_axis("blockIdx.x"))
schedule[B].bind(tx, te.thread_axis("threadIdx.x"))

print(tvm.lower(schedule, [A, B], simple_mode=True))

As it can be observed, sigmoid is represented here as tir function `tir.sigmoid` - let's see how it is implemented in OpenCL

In [None]:
fopencl = tvm.build(schedule, [A, B], "opencl", name="mysigm")
print(fopencl.imported_modules[0].get_source())

### Creating custom implementation of the operation

Adding new operation/function from the Python level is relatively easy, as long as necessary computation blocks are provided (lower-level implementations of kernels need to be handled in C++).

For a new operation/function, we need to create and register it and provide a lowering rule converting the operation to its implementation in supported targets.

*Note: demonstrated lowering of rules can be also used for existing operations to use our custom implementation of a certain function - its selection can be controlled with `level` parameter determining priority*

Let's add our custom `log` implementation:

In [None]:
def mylog(x):
    """customized log intrinsic function"""
    return tvm.tir.call_intrin(x.dtype, "tir.mylog", x)


def opencl_mylog_rule(op):
    """OpenCL lowering rule for log"""
    if op.dtype == "float32":
        return tvm.tir.call_pure_extern("float32", "log", op.args[0])
    else:
        return op


tvm.ir.register_op_attr("tir.mylog", "TCallEffectKind", tvm.tir.CallEffectKind.Pure)
tvm.ir.register_intrin_lowering("tir.mylog", target="opencl", f=opencl_mylog_rule, level=99)


In [None]:
n = te.var("n")
A = te.placeholder((n,), name="A")
B = te.compute(A.shape, lambda i: mylog(A[i]), name="B")
schedule = te.create_schedule(B.op)
num_thread = 64
bx, tx = schedule[B].split(B.op.axis[0], factor=num_thread)
schedule[B].bind(bx, te.thread_axis("blockIdx.x"))
schedule[B].bind(tx, te.thread_axis("threadIdx.x"))

In [None]:
fopencl = tvm.build(schedule, [A, B], "opencl", name="mykernel")
print(fopencl.imported_modules[0].get_source())

## Analyzing model's code

For building whole models from frontends, we use `relay.build`

In [None]:
import onnx
import tvm.relay as relay

onnxmodel = onnx.load('../models/test-delegate-one-input.onnx')
mod, params = relay.frontend.from_onnx(
    onnxmodel,
    freeze_params=True,
    dtype='float32'
)

with tvm.transform.PassContext(opt_level=3):
    graph, lib, params = relay.build(
        mod['main'],
        target='c'
    )
    
    print(lib.get_source())

## Model compilation, evaluation and fine-tuning

The above aspects of TVM are covered in homework tasks.

## References

* [Schedule primitives in TVM](https://tvm.apache.org/docs/how_to/work_with_schedules/schedule_primitives.html#sphx-glr-how-to-work-with-schedules-schedule-primitives-py)
* [Reduction in TVM](https://tvm.apache.org/docs/how_to/work_with_schedules/reduction.html#sphx-glr-how-to-work-with-schedules-reduction-py)
* [TVM intrinsics and math functions](https://tvm.apache.org/docs/how_to/work_with_schedules/intrin_math.html#sphx-glr-how-to-work-with-schedules-intrin-math-py)

## Useful additional resources

* [TVM User how-to guides](https://tvm.apache.org/docs/how_to/index.html)