# Position space tiling of A-stationary row-major spMspM

When using an m/k/n dataflow for matrix multiply there is an interesting opportunity for position-space tiling in the k rank of the A matrix. This notebook illustrates two examples of such tiling. 

First, include some libraries

In [None]:
# Begin - startup boilerplate code

import pkgutil

if 'fibertree_bootstrap' not in [pkg.name for pkg in pkgutil.iter_modules()]:
  !python3 -m pip  install git+https://github.com/Fibertree-project/fibertree-bootstrap --quiet

# End - startup boilerplate code


from fibertree_bootstrap import *
fibertree_bootstrap(style="tree", animation="movie")

## Read matrices


In [None]:
a = Tensor(os.path.join(data_dir, "sparse-matrix-a.yaml"))
b = Tensor(os.path.join(data_dir, "sparse-matrix-b.yaml"))

# Transpose the "a" matrix as desired by the outer product traveral order
at = Tensor.fromFiber(["K", "M"], a.getRoot().swapRanks())

print("Input A")
displayTensor(a.setColor("blue"))

#print("Input A - transposed")
#displayTensor(at.setColor("blue"))

print("Input B")
displayTensor(b.setColor("green"))
    


## Ordinary position-space tiling

Split the A matrix in position space in the K-rank and then swap ranks. This results in a position-space tiling of the A matrix so that the work can be divided temporarly (or spatially) between the top rank tiles. 


In [None]:
displayTensor(a)

a1 = a.splitEqual(3, depth=1).updateCoords(lambda n, c, p: n, depth=1)
displayTensor(a1)

a2 = a1.swapRanks()
displayTensor(a2)


## A-stationary/row-major spMspM

This dataflow traverses the position-space tiled **A** tensor

Observations:

- K-rank coordinates in **B** matrix fibers are reused when computing different **A** matrix tiles. This would not happen if the positioning was in coordinate-space.

- Fewer **Z** tensor values are touched while working on the larger coordinate (right most) K.1-rank tiles. This is part of the motivation for the reversed K.1-rank fiber traversal in [Sparch](./sparch.ipnb).

In [None]:
z = Tensor(rank_ids=["M", "N"])
z.setName("Z")

a_k1 = a2.getRoot()
b_k = b.getRoot()
z_m = z.getRoot()

canvas = createCanvas(a2, b, z)

for k1, a_m in a_k1:
    for m, (z_n, a_k) in z_m << a_m:
        for k, (a_val, b_n) in a_k & b_k:
            for n, (z_ref, b_val) in z_n << b_n:
                z_ref += a_val * b_val
                canvas.addFrame((k1, m, k), (k, n), (m, n))

displayTensor(z)
displayCanvas(canvas)

## Tile in position space and merge top ranks

Given that position-space tiling of the K-rank does not partition accesses of the **B** matrix, one can flatten the elements of the tiles into a single rank. 

In [None]:
displayTensor(a)

a3 = a.splitEqual(3, depth=1)
displayTensor(a3)

a4 = a3.flattenRanks()
displayTensor(a4)

## A-stationary/row-major spMspM

This dataflow traverses the flattened M/K1 ranks, which could be reordered (see [dynamic reordering notebook](./reordered-spMspM.ipynb)) to optimize the pattern of B tensor accesses, e.g., for better cache locality. However, that would result in more discordant access to the M-rank of **Z**. It also means that if all of **Z** cannot be buffered, then multiple shards of N-rank fibers in **Z** would be created and need to be reduced. 

In [None]:
z = Tensor(rank_ids=["M", "N"])
z.setName("Z")

a_m = a4.getRoot()
b_k = b.getRoot()
z_m = z.getRoot()

canvas = createCanvas(a4, b, z)

for (m, k1), a_k in a_m:
    z_n = z_m.getPayloadRef(m)
    for k, (a_val, b_n) in a_k & b_k:
        for n, (z_ref, b_val) in z_n << b_n:
            z_ref += a_val * b_val
            canvas.addFrame([((m, k1), k)], [(k, n)], [(m, n)])

displayTensor(z)
displayCanvas(canvas)

## Testing area

For running alternative algorithms