## Vector Addition

In [2]:
import numpy as np

In [3]:
np.random.seed(0)

n = 100
a = np.random.normal(size=n).astype(np.float32)
b = np.random.normal(size=n).astype(np.float32)
c = a + b

In [6]:
def vector_add(a, b, c):
    for idx in range(n):
        c[idx] = a[idx] + b[idx]

d = np.empty(shape=n, dtype=np.float32)
vector_add(a, b, d)
np.testing.assert_array_equal(c, d)

In [7]:
import tvm
from tvm import te


def vector_add(n):
    """TVM expression for vector add"""
    A = te.placeholder((n,), name='a')
    B = te.placeholder((n,), name='b')
    C = te.compute(A.shape, lambda i: A[i] + B[i], name='c')

    return A, B, C

A, B, C = vector_add(n)

type(A), type(C)

(tvm.te.tensor.Tensor, tvm.te.tensor.Tensor)

In [8]:
type(A.op), type(C.op)

(tvm.te.tensor.PlaceholderOp, tvm.te.tensor.ComputeOp)

## Creating the Schedule

### Placeholder Creation

When we create placeholders A and B using te.placeholder((n,), name='a') and te.placeholder((n,), name='b'), we're not allocating actual memory or providing real data values. Instead, we're creating symbolic representations of the input tensors.

### Computation Definition 
The te.compute() function defines how the output C is computed symbolically, without actually performing the computation

This is essential for creating the graph, on which we can do the optimization

In [9]:
s = te.create_schedule(C.op)

### Delayed Data Provision
You don't need to have the actual input data available when defining the computation graph. This allows for a separation between the computation definition and its execution.
### Runtime Data Feeding
At runtime, when you're ready to execute the compiled function, you can provide the actual data for A and B.

### Flexibility
This approach allows the same compiled function to be used with different input data of the specified shape, without needing to redefine or recompile the computation.

### What does schedule mean here?

The concept of scheduling in TVM is about defining how the computation should be executed, optimized, and mapped to the target hardware. Let's break down what's happening and why we're passing C.op to create_schedule:

#### What We're Scheduling
When we create a schedule, we're essentially planning out:
- Execution Order: How the operations should be ordered and nested
- Memory Access Patterns: How data should be loaded and stored.
- Parallelization: How to distribute work across threads or vectorize operations.
- Hardware-Specific Optimizations: How to best utilize the target hardware's features.

#### Why We Pass C.op to create_schedule

We pass C.op to create_schedule for the following reasons:

##### Root of Computation Graph
C is our output tensor, and C.op represents the operation that produces C. By passing this to create_schedule, we're telling TVM to create a schedule for the entire computation graph that leads to C.

##### Backwards Traversal
TVM will start from the output operation (C.op) and work backwards through all the dependent operations (in this case, the operations on A and B) to create a complete schedule.

##### Optimization Scope
This approach allows TVM to consider the entire computation when making scheduling decisions, potentially leading to more globally optimal schedules.

By passing C.op to create_schedule, we're initiating the process of planning how the entire computation (from inputs A and B to output C) should be executed, while keeping this execution plan separate from the definition of what is being computed.

In [10]:
type(s), type(s[C])

(tvm.te.schedule.Schedule, tvm.te.schedule.Stage)

In [12]:
tvm.lower(s, [A, B, C], simple_mode=True)

#[version = "0.0.5"]
@main = primfn(a_1: handle, b_1: handle, c_1: handle) -> ()
  attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
  buffers = {a: Buffer(a_2: Pointer(float32), float32, [100], []),
             b: Buffer(b_2: Pointer(float32), float32, [100], []),
             c: Buffer(c_2: Pointer(float32), float32, [100], [])}
  buffer_map = {a_1: a, b_1: b, c_1: c}
  preflattened_buffer_map = {a_1: a_3: Buffer(a_2, float32, [100], []), b_1: b_3: Buffer(b_2, float32, [100], []), c_1: c_3: Buffer(c_2, float32, [100], [])} {
  for (i: int32, 0, 100) {
    c[i] = (a[i] + b[i])
  }
}

#[metadata]
{
  "root": 1, 
  "nodes": [
    {
      "type_key": ""
    }, 
    {
      "type_key": "Map", 
      "keys": [
        "IntImm"
      ], 
      "data": [2]
    }, 
    {
      "type_key": "Array", 
      "data": [3, 4]
    }, 
    {
      "type_key": "IntImm", 
      "attrs": {
        "dtype": "bool", 
        "span": "0", 
        "value": "1"
      }
 

### Look at

```
oat32, [100], [])} {
  for (i: int32, 0, 100) {
    c[i] = (a[i] + b[i])
  }
}
```

### Compile the above code to a module, and then execute

Once both computation and schedule are defined, we can compile them into an executable module with tvm.build. It accepts the same argument as tvm.lower. In fact, it first calls tvm.lower to generate the program and then compiles to machine codes.

In [14]:
mod = tvm.build(s, [A, B, C])
type(mod)

tvm.driver.build_module.OperatorModule

In [15]:
def get_abc(shape, constructor=None):
    """Return random a, b and empty c with the same shape.
    """
    np.random.seed(0)
    a = np.random.normal(size=shape).astype(np.float32)
    b = np.random.normal(size=shape).astype(np.float32)
    c = np.empty_like(a)

    if constructor:
        a, b, c = [constructor(x) for x in (a, b, c)]

    return a, b, c

In [16]:
a, b, c = get_abc(100, tvm.nd.array)

In [17]:
type(a)

tvm.runtime.ndarray.NDArray

In [18]:
mod(a, b, c)
np.testing.assert_array_equal(a.asnumpy() + b.asnumpy(), c.asnumpy())

##### Note

When we created the module,a and compiled it, we passed ```n``` as the fixed value 100.

```
n= 100

def vector_add(n):
    """TVM expression for vector add"""
    A = te.placeholder((n,), name='a')
    B = te.placeholder((n,), name='b')
    C = te.compute(A.shape, lambda i: A[i] + B[i], name='c')

    return A, B, C

A, B, C = vector_add(n)
```

As a result, the module can only accept the size we have created as the placeholder, the below gives an error.

In [19]:
a, b, c = get_abc(200, tvm.nd.array)
mod(a, b, c)

TVMError: Traceback (most recent call last):
  [bt] (3) 4   libtvm.dylib                        0x000000012f14f07e TVMFuncCall + 62
  [bt] (2) 3   libtvm.dylib                        0x000000012f16af12 tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0>>::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 418
  [bt] (1) 2   libtvm.dylib                        0x000000012d3ad3f9 tvm::runtime::detail::LogFatal::Entry::Finalize() + 89
  [bt] (0) 1   libtvm.dylib                        0x000000012f16c038 tvm::runtime::Backtrace() + 24
  File "/Users/runner/work/tlcpack/tlcpack/tvm/src/runtime/library_module.cc", line 80
TVMError: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------

  Check failed: ret == 0 (-1 vs. 0) : Assert fail: (100 == int32(arg.a.shape[0])), Argument arg.a.shape[0] has an unsatisfied constraint: (100 == int32(arg.a.shape[0]))