# Flexagon

This notebook reproduces the salient characteristics of the [Flexagon](https://dl.acm.org/doi/10.1145/3582016.3582069) accelerator. 

Flexagon supports six dataflows:
- Inner-Product(M)
- Outer-Product(M)
- Gustavson(M)
- Inner-Product(N)
- Outer-Product(N)
- Gustavson(N)

Without loss of generality, this notebook contains only the TeAAL specifications for the (M) variants.

## Imports

Import the necessary modules.

In [None]:
# HiFiber boilerplate

from fibertree_bootstrap import *

fibertree_bootstrap(style="tree", animation='movie')

# Compilation boilerplate

import os
import sys
sys.path.insert(0, "..")

from src import utils

## Initialization

Initialize the input tensors. Tensor shapes and densities can be modified below.

**Warning:** Large tensors will overwhelm the video generation. Either:
1. Use small tensors; as a rule of thumb, fewer than 60 computes (e.g., multiplications) should be required.
2. Do not generate a video; remove the `spacetime` specification from the `mapping` before compiling.

In [None]:
K = 10
M = 5
N = 6

num_multipliers = 4

density = [0.9, 0.5]
seed = 2

A_MK = Tensor.fromRandom(rank_ids=["M", "K"], shape=[M, K], seed=seed, density=density, name="A")
B_NK = Tensor.fromRandom(rank_ids=["N", "K"], shape=[N, K], seed=seed + 1, density=density, name="B")

A_KM = A_MK.swizzleRanks(["K", "M"])
B_KN = B_NK.swizzleRanks(["K", "N"])

## Compile from TeAAL Specification and Run

Below are the TeAAL specifications for Flexagon. To simulate the accelerator:
1. Compile it to HiFiber by running the cell, inserting a new cell
2. Run the new cell, which will
    - Execute the kernel; multiplying the above defined matrices
    - Generate visualizations of the actions of the kernel

Remember, if you are using large tensors, remove the spacetime specification to generate a kernel that does not produce videos. Outputs can still be checked below.

#### Notes

- Small tensors are required for video generation. If you are using large tensors, remove the spacetime specification to generate a kernel that does not produce videos. Outputs can still be checked below.
- Partition shapes are decreased accordingly above for visualization purposes. The real Flexagon uses `num_multipliers = 64`.

### Inner-Product(M)

In [None]:
yaml = """
einsum:
    declaration:
        A: [K, M]
        B: [K, N]
        Z: [M, N]
    expressions:
        - Z[m, n] = A[k, m] * B[k, n]
mapping:
    rank-order:
        A: [M, K]
        B: [N, K]
        Z: [N, M]
    partitioning:
        Z:
            (M, K): [flatten()]
            MK: [uniform_occupancy(A.num_multipliers)]
    loop-order:
        Z: [MK1, N, MK0]
    spacetime:
        Z:
            space: [MK0]
            time: [MK1, N]
"""

utils.compile(yaml)

#### Check Results

Check that generated code computes the correct result.

**Note**: Should be used after compiling and running the kernel (above cell).

In [None]:
utils.check_matmul(A_MK, B_NK, Z_NM)

### Outer-Product(M)

In [None]:
yaml = """
einsum:
    declaration:
        A: [K, M]
        B: [K, N]
        T: [K, M, N]
        Z: [M, N]
    expressions:
        - T[k, m, n] = A[k, m] * B[k, n]
        - Z[m, n] = T[k, m, n]
mapping:
    rank-order:
        A: [K, M]
        B: [K, N]
        T: [M, K, N]
        Z: [M, N]
    partitioning:
        T:
            (K, M): [ flatten() ]
            KM: [ uniform_occupancy(A.num_multipliers) ]
        Z:
            K: [ uniform_occupancy(T.num_multipliers) ]
    loop-order:
        T: [KM1, KM0, N]
        Z: [M, K1, K0, N]
    spacetime:
        T:
            space: [KM0]
            time: [KM1, N]
        Z:
            space: [K0]
            time: [M, K1, N]
"""

utils.compile(yaml)

#### Check Results

Check that generated code computes the correct result.

**Note**: Should be used after compiling and running the kernel (above cell).

In [None]:
utils.check_matmul(A_KM, B_KN, Z_MN)

### Gustavson(M)

In [None]:
yaml = """
einsum:
    declaration:
        A: [K, M]
        B: [K, N]
        Z: [M, N]
    expressions:
        - Z[m, n] = A[k, m] * B[k, n]
mapping:
    rank-order:
        A: [M, K]
        B: [K, N]
        Z: [M, N]
    partitioning:
        Z:
            (M, K): [flatten()]
            MK: [uniform_occupancy(A.num_multipliers)]
    loop-order:
        Z: [MK1, MK0, N]
    spacetime:
        Z:
            space: [MK0]
            time: [MK1, N]
"""

utils.compile(yaml)

#### Check Results

Check that generated code computes the correct result.

**Note**: Should be used after compiling and running the kernel (above cell).

In [None]:
utils.check_matmul(A_MK, B_KN, Z_MN)