# DSTC

This notebook reproduces the salient characteristics of the [DSTC](https://dl.acm.org/doi/10.1109/ISCA52012.2021.00088) accelerator.

## Imports

Import the necessary modules.

In [1]:
# HiFiber boilerplate

from fibertree_bootstrap import *

fibertree_bootstrap(style="tree", animation='movie')

# Compilation boilerplate

import os
import sys
sys.path.insert(0, "..")

from src import utils

Running bootstrap
The fibertree module is already installed and available to import


interactive(children=(Dropdown(description='style', options=('tree', 'uncompressed', 'tree+uncompressed'), val…

Button(description='Run all cells below', style=ButtonStyle())

## Initialization

Initialize the input tensors. Tensor shapes and densities can be modified below.

**Warning:** Large tensors will overwhelm the video generation. Either:
1. Use small tensors; as a rule of thumb, fewer than 60 computes (e.g., multiplications) should be required.
2. Do not generate a video; remove the `spacetime` specification from the `mapping` before compiling.

In [58]:
K = 4
M = 12
N = 8

density = [0.9, 0.5]
seed = 0

A_KM = Tensor.fromRandom(rank_ids=["K", "M"], shape=[K, M], seed=seed, density=density, name="A")
B_KN = Tensor.fromRandom(rank_ids=["K", "N"], shape=[K, N], seed=seed + 1, density=density, name="B")

## Compile and Run

Below is the TeAAL specification for Gamma. To simulate the accelerator:
1. Compile it to HiFiber by running the cell, inserting a new cell
2. Run the new cell, which will
    - Execute the kernel; multiplying the above defined matrices
    - Generate visualizations of the actions of the kernel

#### Notes
- Small tensors are required for video generation. Partition shapes are decreased accordingly for visualization purposes.
- If you are using large tensors, remove the spacetime specification to generate a kernel that does not produce videos. Outputs can still be checked below.

In [62]:
yaml = """
einsum:
  declaration:
    A: [K, M]
    B: [K, N]
    T: [K, M, N]
    Z: [M, N]
  expressions:
    # - T[k, m, n] = A[k, m] * B[k, n]
    - Z[m, n] = A[k, m] * B[k, n]
mapping:
  rank-order:
    A: [K, M]
    B: [K, N]
    # T: [M, K, N]
    Z: [M, N]
  partitioning:
    Z:
      M: [uniform_occupancy(A.4), uniform_occupancy(A.8), uniform_occupancy(A.4)] 
      N: [uniform_occupancy(A.4), uniform_occupancy(B.16), uniform_occupancy(B.4)]
    # Z:
    #   M: [uniform_occupancy(T.4)]
    #   N: [uniform_occupancy(T.4)]
  loop-order:
    Z: [K, M3, N3, M2, N2, M1, N1, M0, N0]
    # Z: [M1, N1, M0, N0, K]
  spacetime:
    Z:
      space: [M2, N2, M1, N1, N0]
      time: [K, M3, N3, M0]
    # Z:
    #   space: [M0, N0]
    #   time: [M1, N1, K]
"""

utils.compile(yaml)

In [63]:
# Autogenerated HiFiber

T_KM2N2M1N1M0N0 = Tensor(rank_ids=["K", "M2", "N2", "M1", "N1", "M0", "N0"], name="T")
t_k = T_KM2N2M1N1M0N0.getRoot()
a_k = A_KM.getRoot()
b_k = B_KN.getRoot()
canvas = createCanvas(A_KM, B_KN, T_KM2N2M1N1M0N0)
for k_pos, (k, (t_m2, (a_m, b_n))) in enumerate(t_k << (a_k & b_k)):
    A_M = Tensor.fromFiber(rank_ids=["M"], fiber=a_m, name="A")
    B_N = Tensor.fromFiber(rank_ids=["N"], fiber=b_n, name="B")
    tmp0 = A_M
    tmp1 = tmp0.splitEqual(4)
    A_M2M1I = tmp1
    A_M2M1I.setRankIds(rank_ids=["M2", "M1I"])
    tmp2 = B_N
    tmp3 = tmp2.splitEqual(4)
    B_N2N1I = tmp3
    B_N2N1I.setRankIds(rank_ids=["N2", "N1I"])
    a_m2 = A_M2M1I.getRoot()
    b_n2 = B_N2N1I.getRoot()
    for m2_pos, (m2, (t_n2, a_m1i)) in enumerate(t_m2 << a_m2):
        A_M1I = Tensor.fromFiber(rank_ids=["M1I"], fiber=a_m1i, name="A")
        tmp4 = A_M1I
        tmp5 = tmp4.splitEqual(2)
        A_M1M0 = tmp5
        A_M1M0.setRankIds(rank_ids=["M1", "M0"])
        a_m1 = A_M1M0.getRoot()
        for n2_pos, (n2, (t_m1, b_n1i)) in enumerate(t_n2 << b_n2):
            B_N1I = Tensor.fromFiber(rank_ids=["N1I"], fiber=b_n1i, name="B")
            tmp6 = B_N1I
            tmp7 = tmp6.splitEqual(2)
            B_N1N0 = tmp7
            B_N1N0.setRankIds(rank_ids=["N1", "N0"])
            b_n1 = B_N1N0.getRoot()
            for m1_pos, (m1, (t_n1, a_m0)) in enumerate(t_m1 << a_m1):
                for n1_pos, (n1, (t_m0, b_n0)) in enumerate(t_n1 << b_n1):
                    for m0_pos, (m0, (t_n0, a_val)) in enumerate(t_m0 << a_m0):
                        for n0_pos, (n0, (t_ref, b_val)) in enumerate(t_n0 << b_n0):
                            t_ref += a_val * b_val
                            canvas.addActivity((k, m0), (k, n0), (k, m2, n2, m1, n1, m0, n0), spacetime=((m1_pos, n1_pos, n0_pos), (k_pos, m2_pos, n2_pos, m0_pos)))
tmp8 = T_KM2N2M1N1M0N0
tmp9 = tmp8.swizzleRanks(rank_ids=["M2", "M1", "M0", "K", "N2", "N1", "N0"])
tmp10 = tmp9.mergeRanks(depth=0, levels=2, coord_style="absolute")
tmp11 = tmp10.mergeRanks(depth=2, levels=2, coord_style="absolute")
tmp11.setRankIds(rank_ids=["M", "K", "N"])
T_MKN = tmp11
displayCanvas(canvas)
Z_M1N1M0N0 = Tensor(rank_ids=["M1", "N1", "M0", "N0"], name="Z")
T_MNK = T_MKN.swizzleRanks(rank_ids=["M", "N", "K"])
z_m1 = Z_M1N1M0N0.getRoot()
t_m = T_MNK.getRoot()
canvas = createCanvas(T_MNK, Z_M1N1M0N0)
T_MNK = Tensor.fromFiber(rank_ids=["M", "N", "K"], fiber=t_m, name="T")
tmp12 = T_MNK
tmp13 = tmp12.splitEqual(2)
T_M1M0NK = tmp13
T_M1M0NK.setRankIds(rank_ids=["M1", "M0", "N", "K"])
T_M1NM0K = T_M1M0NK.swizzleRanks(rank_ids=["M1", "N", "M0", "K"])
t_m1 = T_M1NM0K.getRoot()
for m1_pos, (m1, (z_n1, t_n)) in enumerate(z_m1 << t_m1):
    T_NM0K = Tensor.fromFiber(rank_ids=["N", "M0", "K"], fiber=t_n, name="T")
    tmp14 = T_NM0K
    tmp15 = tmp14.splitEqual(2)
    T_N1N0M0K = tmp15
    T_N1N0M0K.setRankIds(rank_ids=["N1", "N0", "M0", "K"])
    T_N1M0N0K = T_N1N0M0K.swizzleRanks(rank_ids=["N1", "M0", "N0", "K"])
    t_n1 = T_N1M0N0K.getRoot()
    for n1_pos, (n1, (z_m0, t_m0)) in enumerate(z_n1 << t_n1):
        for m0_pos, (m0, (z_n0, t_n0)) in enumerate(z_m0 << t_m0):
            for n0_pos, (n0, (z_ref, t_k)) in enumerate(z_n0 << t_n0):
                for k_pos, (k, t_val) in enumerate(t_k):
                    z_ref += t_val
                    canvas.addActivity((m0, n0, k), (m1, n1, m0, n0), spacetime=((m0_pos, n0_pos), (m1_pos, n1_pos, k_pos)))
tmp16 = Z_M1N1M0N0
tmp17 = tmp16.swizzleRanks(rank_ids=["M1", "M0", "N1", "N0"])
tmp18 = tmp17.mergeRanks(depth=0, levels=1, coord_style="absolute")
tmp19 = tmp18.mergeRanks(depth=1, levels=1, coord_style="absolute")
tmp19.setRankIds(rank_ids=["M", "N"])
Z_MN = tmp19
displayCanvas(canvas)

  0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/14 [00:00<?, ?it/s]

## Check Results

Check that generated code computes the correct result.

**Note**: Should be used after compiling and running the kernel (above cell).

In [64]:
utils.check_matmul(A_KM, B_KN, Z_MN)

Result correct? True
