# SIGMA

This notebook reproduces the salient characteristics of the [SIGMA](https://ieeexplore.ieee.org/document/9065523) accelerator.

## Imports

Import the necessary modules.

In [None]:
# HiFiber boilerplate

from fibertree_bootstrap import *

fibertree_bootstrap(style="tree", animation='movie')

# Compilation boilerplate

import os
import sys
sys.path.insert(0, "..")

from src import utils

## Initialization

Initialize the input tensors. Tensor shapes and densities can be modified below.

**Warning:** Large tensors will overwhelm the video generation. Either:
1. Use small tensors; as a rule of thumb, fewer than 60 computes (e.g., multiplications) should be required.
2. Do not generate a video; remove the `spacetime` specification from the `mapping` before compiling.

In [None]:
K = 4
M = 5
N = 6

K0 = 2
KM0 = 4

density = [0.9, 0.5]
seed = 0

A_KM = Tensor.fromRandom(rank_ids=["K", "M"], shape=[K, M], seed=seed, density=density, name="A")
B_KN = Tensor.fromRandom(rank_ids=["K", "N"], shape=[K, N], seed=seed + 1, density=density, name="B")

## Compile and Run

Below is the TeAAL specification for SIGMA. To simulate the accelerator:
1. Compile it to HiFiber by running the cell, inserting a new cell
2. Run the new cell, which will
    - Execute the kernel; multiplying the above defined matrices
    - Generate visualizations of the actions of the kernel

#### Notes

- Small tensors are required for video generation. If you are using large tensors, remove the spacetime specification to generate a kernel that does not produce videos. Outputs can still be checked below.
- Partition shapes are decreased accordingly above for visualization purposes. The real SIGMA uses `K0 = 128` and `MK0 = 16384`.

In [None]:
yaml = """
einsum:
  declaration:
    A: [K, M]
    B: [K, N]
    S: [K, M]
    T: [K, M]
    Z: [M, N]
  expressions:
    - S[k, m] = take(A[k, m], B[k, n], 0)
    - T[k, m] = take(A[k, m], S[k, m], 0)
    - Z[m, n] = T[k, m] * B[k, n]
mapping:
  rank-order:
    A: [K, M]
    B: [K, N]
    S: [K, M]
    T: [K, M]
    Z: [M, N]
  partitioning:
    Z:
      K: [uniform_shape(K0)]
      (M, K0): [flatten()]
      MK0: [uniform_occupancy(T.KM0)]
  loop-order:
    S: [K, M, N]
    T: [K, M]
    Z: [K1, MK01, MK00, N]
  spacetime:
    S:
      space: []
      time: [K, M, N]
    T:
      space: []
      time: [K, M]
    Z:
      space: [MK00]
      time: [K1, MK01, N.coord]
"""

utils.compile(yaml)

In [None]:
# Autogenerated HiFiber

S_KM = Tensor(rank_ids=["K", "M"], name="S")
s_k = S_KM.getRoot()
a_k = A_KM.getRoot()
b_k = B_KN.getRoot()
canvas = createCanvas(A_KM, B_KN, S_KM)
for k_pos, (k, (s_m, (a_m, b_n))) in enumerate(s_k << (a_k & b_k)):
    for m_pos, (m, (s_ref, a_val)) in enumerate(s_m << a_m):
        for n_pos, (n, b_val) in enumerate(b_n):
            s_ref += a_val
            canvas.addActivity((k, m), (k, n), (k, m), spacetime=((), (k_pos, m_pos, n_pos)))
displayCanvas(canvas)
T_KM = Tensor(rank_ids=["K", "M"], name="T")
t_k = T_KM.getRoot()
a_k = A_KM.getRoot()
s_k = S_KM.getRoot()
canvas = createCanvas(A_KM, S_KM, T_KM)
for k_pos, (k, (t_m, (a_m, s_m))) in enumerate(t_k << (a_k & s_k)):
    for m_pos, (m, (t_ref, (a_val, s_val))) in enumerate(t_m << (a_m & s_m)):
        t_ref += a_val
        canvas.addActivity((k, m), (k, m), (k, m), spacetime=((), (k_pos, m_pos)))
displayCanvas(canvas)
Z_MN = Tensor(rank_ids=["M", "N"], name="Z")
tmp0 = T_KM
tmp1 = tmp0.splitUniform(K0, depth=0)
T_K1K0M = tmp1
T_K1K0M.setRankIds(rank_ids=["K1", "K0", "M"])
tmp2 = B_KN
tmp3 = tmp2.splitUniform(K0, depth=0)
B_K1K0N = tmp3
B_K1K0N.setRankIds(rank_ids=["K1", "K0", "N"])
z_m = Z_MN.getRoot()
T_K1MK0 = T_K1K0M.swizzleRanks(rank_ids=["K1", "M", "K0"])
tmp4 = T_K1MK0
tmp5 = tmp4.flattenRanks(depth=1, levels=1, coord_style="tuple")
T_K1MK0_flat = tmp5
T_K1MK0_flat.setRankIds(rank_ids=["K1", "MK0"])
b_k1 = B_K1K0N.getRoot()
t_k1 = T_K1MK0_flat.getRoot()
canvas = createCanvas(T_K1MK0_flat, B_K1K0N, Z_MN)
for k1_pos, (k1, (t_mk0, b_k0)) in enumerate(t_k1 & b_k1):
    T_MK0 = Tensor.fromFiber(rank_ids=["MK0"], fiber=t_mk0, name="T")
    tmp6 = T_MK0
    tmp7 = tmp6.splitEqual(KM0)
    T_MK01MK00 = tmp7
    T_MK01MK00.setRankIds(rank_ids=["MK01", "MK00"])
    t_mk01 = T_MK01MK00.getRoot()
    for mk01_pos, (mk01, t_mk00) in enumerate(t_mk01):
        for mk00_pos, ((m, k0), t_val) in enumerate(t_mk00):
            z_n = z_m.getPayloadRef(m)
            b_n = b_k0.getPayload(k0)
            for n, (z_ref, b_val) in z_n << b_n:
                z_ref += t_val * b_val
                canvas.addActivity((k1, (m, k0)), (k1, k0, n), (m, n), spacetime=((mk00_pos,), (k1_pos, mk01_pos, n)))
displayCanvas(canvas)

## Check Results

Check that generated code computes the correct result.

**Note**: Should be used after compiling and running the kernel (above cell).

In [None]:
utils.check_matmul(A_KM, B_KN, Z_MN)