# Illustration of Dynamic Co-Tiling (DCoT)

## Introduction

This notebook contains an example of using **dynamic co-tiling** or **DCoT** to perform balanced parallel element-wise multiplication. This is distinguished from ordinary **dynamic tiling**, which does a data-dependent, runtime splitting of a _single_ tensor, into non-uniform, coordinate-space tiles. In contrast, DCoT simultaneously does data-dependent, runtime splitting of _two_ tensors into non-uniform, coordinate-space tiles. In addtion, DCoT tiling preseves the invariant that coresponding tiles of the two tensors occupy the same coordinate ranges. This allows straightforward co-traversal, e.g., for intersections, of the two tensors. In the examples below, the DCoT splitting strives to keep the sum of the occupancies of each cooresponding pair of tiles constant, with the objective of achiving similar execution times in parallel execution units working on different pairs of tiles.

To illustract DCoT tiling, we first include some libraries and provide some dropdown lists to select the display style and type of animation.

In [None]:
# Begin - startup boilerplate code

import pkgutil

if 'fibertree_bootstrap' not in [pkg.name for pkg in pkgutil.iter_modules()]:
  !python3 -m pip  install git+https://github.com/Fibertree-project/fibertree-bootstrap --quiet

# End - startup boilerplate code
  


from fibertree_bootstrap import *
fibertree_bootstrap(style="tree", animation="movie")

## Creating two rank-1 tensors

To start we will create two sparse input tensors ```a``` and ```b```, which we will use in the examples below. The following cells provide the parameterization, creation and display of these tensors as well as their intersection.

Note how the intersection fiber has *coordinates* only for those *coordinates* that exist in both the ```a``` and ```b``` input tensors. Also the *payloads* of the intersection fiber are tuples whose elements are the *payloads* from the matching *coordinates* in both the ```a``` and ```b``` tensors. These tuple *payloads* are displayed as a vertical red rectangle with two numbers in them.


In [None]:
#
# Set default shape, density and seed for the input tensors
#
shape0 = 50
density0 = 0.4
seed=10

def set_params(input_shape, input_density, input_seed):
    global shape0
    global density0
    global seed
    
    shape0 = input_shape
    density0 = input_density/100
    seed = input_seed

print("Run the next cell to create the input tensors after changing these sliders")

interactive(set_params,
            input_shape=widgets.IntSlider(min=10, max=100, step=1, value=shape0),
            input_density=widgets.IntSlider(min=1, max=100, step=1, value=(100*density0)),
            input_seed=widgets.IntSlider(min=0, max=100, step=1, value=seed))    

In [None]:
#
# Show parameters
#
print(f"Shape: {shape0}")
print(f"Density: {density0}")
print(f"Seed: {seed}")
print("")

#
# Create tensor a
#
a = Tensor.fromRandom(rank_ids=['I'],
                      shape=[shape0],
                      density=[density0],
                      interval=9,
                      seed=seed)

a.setColor("blue")
a_i = a.getRoot()

print("Tensor a")
displayTensor(a)

#
# Create tensor b
#
b = Tensor.fromRandom(rank_ids=['I'],
                      shape=[shape0],
                      density=[density0],
                      interval=9,
                      seed=seed+1)

b.setColor("green")
b_i = b.getRoot()

print("Tensor b")
displayTensor(b)

#
# Create intersection of tensors a and b
#
ab = a_i & b_i

print("Intersection of a and b")
displayTensor(ab)
print(f"The are {len(ab)} elements in the intersection of a and b")

## Simple Element-wise Multiplication

The following cell illustrates untiled element-wise multiplication of the ```a``` and ```b``` tensors created above. This cooresponds to the Einsum expression:

$$
Z_i = A_i \times B_i
$$

Note how the computation skips along the two input tensors while sequentially generating the output tensor


In [None]:
#
# Create output tensor z and get root
#
z = Tensor(rank_ids=["I"])
z_i = z.getRoot()

canvas = createCanvas(a, b, z)

#
# Traverse intersection of tensors a and b
#
for i, (z_ref, (a_val, b_val)) in z_i << (a_i & b_i):
    #
    # Compute output value
    #
    z_ref <<= a_val * b_val
    #
    # Animation bookeeping...
    #
    canvas.addFrame((i,), (i,), (i,))
        

#
# Print result
#
print("Tensor z - after")
displayTensor(z)

displayCanvas(canvas)

## Uniform Coordinate Space Tiling

In the following cells, we illustrate splitting both the input tensors **uniformly** in **coordinate space**.

But first we set the target tile size measured in coordinates, which is used in multiple cells below

In [None]:
#
# Set default parameter for uniform coordinate-space tiling
#
tile_size = 8

def set_tile_size(tile_size_input):
    global tile_size
    
    tile_size = tile_size_input

print("Run the cell below to create the split tensors after changing these sliders")

interactive(set_tile_size,
            tile_size_input=widgets.IntSlider(min=1, max=shape0, step=1, value=tile_size))


## Create uniform coordinate-space tiled tensors

The cell below creates the split ```a``` and ```b``` tensors. 

Note how the the upper rank (I1) coordinates of both the split tensors increment uniformly by the tile size (measured in coordinates) and how the total number of children under each I1 coordinate varies greatly.

In [None]:
#
# Split tensor "a" and get root
#
a_split = a.splitUniform(tile_size)
a_i1 = a_split.getRoot()

print("Tensor a_split")
displayTensor(a_split)

#
# Split tensor "b" and get root
#
b_split = b.splitUniform(tile_size)
b_i1 = b_split.getRoot()

print("Tensor b_split")
displayTensor(b_split)

## Uniform coordinate-space tiles element-wise multiplication

Below we show element-wise multiplication for each tile.

In the animations, the currently active tiles in the ```a``` and ```b``` tensors are highlighted for a set of cycles, and the scalar values that are currently being read or written are also highlighted. 

In [None]:
#
# Create output tensor z and get root
#
z = Tensor(rank_ids=["I1", "I0"])
z_i1 = z.getRoot()

canvas = createCanvas(a_split, b_split, z)

#
# Traverse the upper rank of the intersecton of the tiled a and b tensors
#
for i1, (z_i0, (a_i0, b_i0)) in z_i1 << (a_i1 & b_i1):
    #
    # Traverse the intersection of each lower rank of the tiled a and b tensors
    #
    for i0, (z_ref, (a_val, b_val)) in z_i0 << (a_i0 & b_i0):
        #
        # Compute the output product
        #
        z_ref <<= a_val * b_val

        #
        # Animation bookeeping...
        #
        canvas.addActivity(
                    [(i1,)], [(i1,)], [(i1,)],
                    worker="tile")
            
        canvas.addFrame(
                 [(i1, i0)], [(i1, i0)], [(i1,i0)])
        
#
# Display the results
#
print("Tensor z - after")
displayTensor(z)

displayCanvas(canvas)

## Parallel computation of uniform coordinate space tiling

A key issue with tiling is the load balance between the activies in different PEs, which are typically working on different tiles. In the following cells, there is an animation of a system with two PEs running in parallel on separate tiles from the uniform coordinate-space tiled  ```a``` and ```b``` tensors.

Since there are two PEs, when both PEs are active there will be two values highlighted in a cycle, but only one will be highlighted when there is load imbalance and only one PE is active. 

Note we assume that the activity on distinct tiles in different PEs are synchronized, so work on new tiles always starts in both PEs at the same cycle.

In [None]:
#
# Utility function to display the active tiles in two PEs
#
def addTile(canvas, tiles, max_skew):

    for c in range(max_skew):
        canvas.addActivity(
                    tiles, tiles, tiles,
                    worker=f"tile",
                    skew=c)


In [None]:
#
# Create an empty z tensor and get its root fiber
#
z = Tensor(rank_ids=["I1", "I0"])
z_i1 = z.getRoot()

print("Tensor z - before")
displayTensor(z)

canvas = createCanvas(a_split, b_split, z)

#
# Initialization
#
pe=0
#
# Animation bookeeping...
#
tiles = []
max_skew = 0
skew = 0

#
# Traverse the elements in the upper rank of the DCoT object
#
#
# Traverse the upper rank of the intersection of the tiled a and b tensors
#
for i1, (z_i0, (a_i0, b_i0)) in z_i1 << (a_i1 & b_i1):
    #
    # Animation bookeeping...
    #
    tiles.append((i1,))
    
    #
    # Traverse elements in the intersection of the lower rank fibers
    #
    for i0, (z_ref, (a_val, b_val)) in z_i0 << (a_i0 & b_i0):
        #
        # Compute the product
        #
        z_ref <<= a_val * b_val

        #
        # Animation bookeeping...
        #
        print(f"Skew = {skew}, i1 = {i1} and i0 = {i0}")
        canvas.addActivity(
                    [(i1, i0)], [(i1, i0)], [(i1,i0)],
                    worker=f"PE{pe}",
                    skew=skew)
            
        skew += 1

    #
    # Determine next PE
    #
    pe = (pe+1)%2
    
    #
    # Animation bookeeping...
    #
    max_skew = max(skew, max_skew)    
    skew = 0

    #
    # Animation bookeeping...
    # 
    if pe == 0:
        addTile(canvas, tiles, max_skew)
        for c in range(max_skew):
            canvas.addFrame()
        tiles = []
        max_skew = 0

#
# Animation finalization
#
if len(tiles) > 0:
    addTile(canvas, tiles, max_skew)
    canvas.addFrame()

#
# Show results
#

print("Tensor z - after")
displayTensor(z)

displayCanvas(canvas)


## DCoT Splitting Class

The following cell defines a class to create a **dyamaic co-tiled (DCoT)** split of two tensors.

Note, the code assumes the two tensors have only a single rank that share a **rank id**. Thus, for the case handled by this code the tiles created by the split must contain the same coordinates when the same coordinate exists in each orginal tensor. Other cases where the input tensors have more ranks or where all the **rank ids** are not the same in both tensors is beyond the scope of this notebook.

In [None]:
class DCoT():
    """
    DCoT
    """
    
    def __init__(self, a, b, size=2):
        """
        __init__
        
        Accept to fibers to be co-tiled and a target size
        of the occupancy of each cooresponding pair of tiles
    
        """

        self.a_i = a.getRoot()
        self.b_i = b.getRoot()
        self.size = size
        
        # TBD: Get rank names for a, b
        a_dcot = Tensor(rank_ids=["I1", "I0"]).setMutable(False)
        a_dcot.setColor(a.getColor())
        self.a_dcot = a_dcot
        
        b_dcot = Tensor(rank_ids=["I1", "I0"]).setMutable(False)
        b_dcot.setColor(b.getColor())
        self.b_dcot = b_dcot


    def getA(self):
        """
        getA
        
        Return the split tensor created from the "a" input fiber
        
        """
        return self.a_dcot


    def getB(self):
        """
        getB
        
        Return the split tensor created from the "b" input fiber
        
        """
        return self.b_dcot


    def __iter__(self):
        """
        __iter__
        
        An iterator that dynamically co-tiles the "a" and "b" input fibers 
        
        TBD: This iterator can only be called once...
        
        """
        
        #
        # Intialization
        #
        # 1) Create the first coordinate in the top ranks of the output tensors
        # 2) Create an empty fiber as the payload of those coordinates
        #
        a_i = self.a_i
        b_i = self.b_i
        
        i1_coord = 0
        
        a_dcot = self.a_dcot
        a_dcot_i1 = a_dcot.getRoot()
        a_dcot_i0 = Fiber()
        a_dcot_i1.append(i1_coord, a_dcot_i0)

        b_dcot = self.b_dcot
        b_dcot_i1 = b_dcot.getRoot()
        b_dcot_i0 = Fiber()
        b_dcot_i1.append(i1_coord, b_dcot_i0)
        
        cur_size = 0
        
        #
        # Co-iterate through the union of the input fibers
        #
        for i, (ab, a_val, b_val) in a_i | b_i:
            
            # TBD: Generalize so cur_size is subtensor size...
            
            #
            # Check if current pair of tiles are "full"
            #
            if cur_size == self.size or ("AB" in ab and cur_size == self.size-1):
                #
                # Return element of co-tiled tensor
                #
                yield i1_coord, (a_dcot_i0, b_dcot_i0)
                
                #
                # Create next coordinate in top rank of each output tensor
                #
                i1_coord = max(a_dcot_i0[-1].coord, b_dcot_i0[-1].coord)+1
                cur_size = 0
                
                a_dcot_i0 = Fiber()
                a_dcot_i1.append(i1_coord, a_dcot_i0)
                
                b_dcot_i0 = Fiber()
                b_dcot_i1.append(i1_coord, b_dcot_i0)

            #
            # If the was a non-empty element in the "a" tensor add it to the output
            #
            if "A" in ab:
                a_dcot_i0.append(i, a_val)
                cur_size += 1

            #
            # If the was a non-empty element in the "b" tensor add it to the output
            #
            if "B" in ab:
                b_dcot_i0.append(i, b_val)
                cur_size += 1


        #
        # Return the final element of the split tensors
        #
        yield i1_coord, (a_dcot_i0, b_dcot_i0)
        
    def getDefault(self):
        #
        # Since this is a rank-1 tensor, just return 0 as the default value
        #
        return 0
    

                

## Set the DCoT tile size

Set the target total tile size for the DCoT splitting, which is used in multiple cells below.

In [None]:
#
# Set default parameter for DCoT combined tile size
#
dcot_size = 6


def set_dcot_size(dcot_size_input):
    global dcot_size
    
    dcot_size = dcot_size_input

print("Run the cell below to create the split tensors after changing these sliders")

interactive(set_dcot_size,
            dcot_size_input=widgets.IntSlider(min=1, max=shape0, step=1, value=dcot_size))


## DCoT Splitting of two tensors

The cell below illustrates the result of DCoT splitting of the ```a``` and ```b``` tensors.

Note how the sum of the occupancies of the payloads of matching coordinates in the two split tensors are nearly constant.

In [None]:
#
# Get the a_i fiber
#
a_i = a.getRoot()

print("Tensor a")
displayTensor(a)

#
# Get the a_i fiber
#
b_i = b.getRoot()

print("Tensor b")
displayTensor(b)

#
# Create a DCoT object and fully populate the split tensors
#
# TBD: Allow manifestation of DCoT output as a fiber (like &) 
#
ab_dcot = DCoT(a, b, size=dcot_size)

cplist = [ e for e in ab_dcot]
ab_fiber = Fiber.fromCoordPayloadList(*cplist)

print("\nResult of DCoT as a fiber - text print because payloads are a tuple of fibers\n")
print(f"{ab_fiber:n*}")
print("\n")

print("Tensor a - split")
displayTensor(ab_dcot.getA())

print("Tensor b - split")
displayTensor(ab_dcot.getB())


## DCoT tiled element-wise multiplication

Iterate over the split DCoT to do element-wise multiplication

Note that the tile being worked on is highlighted.

In [None]:
#
# Get the a_i fiber
#
a_i = a.getRoot()

print("Tensor a")
displayTensor(a)

#
# Get the a_i fiber
#
b_i = b.getRoot()

print("Tensor b")
displayTensor(b)
   
#
# Create an empty z tensor and get its root fiber
#
z = Tensor(rank_ids=["I1", "I0"])
z_i1 = z.getRoot()

print("Tensor z - before")
displayTensor(z)

#
# Create a DCoT object to traverse
#
ab_dcot = DCoT(a, b, size=dcot_size)

a_dcot = ab_dcot.getA()
b_dcot = ab_dcot.getB()

canvas = createCanvas(a_dcot, b_dcot, z)

#
# Traverse the elements in the upper rank of the DCoT object
#
for i1, (z_i0, (a_i0, b_i0)) in z_i1 << ab_dcot:
    #
    # Traverse elements in the intersection of the lower rank fibers
    #
    for i0, (z_ref, (a_val, b_val)) in z_i0 << (a_i0 & b_i0):
        #
        # Compute the product
        #        
        z_ref <<= a_val * b_val
            

        #
        # Animation bookeeping...
        #
        canvas.addActivity(
                    [(i1,)], [(i1,)], [(i1,)],
                    worker="tile")
            
        canvas.addFrame(
                 [(i1, i0)], [(i1, i0)], [(i1,i0)])

#
# Show results
#
print("Tensor a_split")
displayTensor(a_dcot)

print("Tensor b_split")
displayTensor(b_dcot)

print("Tensor z - after")
displayTensor(z)

displayCanvas(canvas)


## Parallel DCoT

The code below runs two parallel PEs, each processing its own tile

In [None]:
#
# Utility function to display the active tiles in two PEs
#
def addTile(canvas, tiles, max_skew):

    for c in range(max_skew):
        canvas.addActivity(
                    tiles, tiles, tiles,
                    worker=f"tile",
                    skew=c)


In [None]:
#
# Get the a_i fiber
#
a_i = a.getRoot()

print("Tensor a")
displayTensor(a)

#
# Get the a_i fiber
#
b_i = b.getRoot()

print("Tensor b")
displayTensor(b)
   
#
# Create an empty z tensor and get its root fiber
#
z = Tensor(rank_ids=["I1", "I0"])
z_i1 = z.getRoot()

print("Tensor z - before")
displayTensor(z)

#
# Create a DCoT object to traverse
#
ab_dcot = DCoT(a, b, size=dcot_size)

a_dcot = ab_dcot.getA()
b_dcot = ab_dcot.getB()

canvas = createCanvas(a_dcot, b_dcot, z)

#
# Initialization
#
pe=0
#
# Animation bookeeping...
#
tiles = []
max_skew = 0
skew = 0

#
# Traverse the elements in the upper rank of the DCoT object
#
for i1, (z_i0, (a_i0, b_i0)) in z_i1 << ab_dcot:
    #
    # Animation bookeeping...
    #
    tiles.append((i1,))
    
    #
    # Traverse elements in the intersection of the lower rank fibers
    #
    for i0, (z_ref, (a_val, b_val)) in z_i0 << (a_i0 & b_i0):
        #
        # Compute the product
        #
        z_ref <<= a_val * b_val

        #
        # Animation bookeeping...
        #
        print(f"Skew = {skew}, i1 = {i1} and i0 = {i0}")
        canvas.addActivity(
                    [(i1, i0)], [(i1, i0)], [(i1,i0)],
                    worker=f"PE{pe}",
                    skew=skew)
            
        skew += 1

    #
    # Determine next PE
    #
    pe = (pe+1)%2
    
    #
    # Animation bookeeping...
    #
    max_skew = max(skew, max_skew)    
    skew = 0

    #
    # Animation bookeeping...
    # 
    if pe == 0:
        addTile(canvas, tiles, max_skew)
        for c in range(max_skew):
            canvas.addFrame()
        tiles = []
        max_skew = 0

#
# Animation finalization
#
if len(tiles) > 0:
    addTile(canvas, tiles, max_skew)
    canvas.addFrame()

#
# Show results
#
print("Tensor a_split")
displayTensor(a_dcot)

print("Tensor b_split")
displayTensor(b_dcot)

print("Tensor z - after")
displayTensor(z)

displayCanvas(canvas)


## Testing area

For running alternative algorithms