# EIE

Following are some represenations of my understanding of the EIE fully-connected layer acceleratator.

First, include some libraries

In [None]:
# Begin - startup boilerplate code

import pkgutil

if 'fibertree_bootstrap' not in [pkg.name for pkg in pkgutil.iter_modules()]:
  !python3 -m pip  install git+https://github.com/Fibertree-project/fibertree-bootstrap --quiet

# End - startup boilerplate code


from fibertree_bootstrap import *
fibertree_bootstrap()

## Read matrices

Since EIE works on fully-connected layers, the input (for a single input channel) is a flattened dimension CHW. The weights (CRS) are the same size as the inputs so they have shape [M, CHW]. 



In [None]:
i = Tensor.fromUncompressed(["CHW"], 
                            [ 1, 2, 0, 3, 4, 0, 0, 7, 8])
i.setColor("blue").setName("I")

f = Tensor.fromUncompressed(["M", "CHW"],
                            [[ 1, 2, 0, 3, 4, 0, 0, 7, 8],
                            [ 0, 1, 0, 0, 8, 0, 0, 7, 0],
                            [ 0, 0, 0, 0, 0, 0, 0, 0, 0],
                            [ 1, 0, 0, 3, 0, 0, 0, 2, 0],
                            [ 0, 0, 0, 2, 0, 0, 0, 4, 0],
                            [ 0, 0, 0, 0, 0, 0, 0, 0, 0],
                            [ 2, 4, 0, 6, 0, 0, 0, 14, 16],
                            [ 2, 0, 0, 2, 0, 0, 3, 0, 8]
                            ])
f.setColor("green").setName("F")

o_verify = Tensor.fromUncompressed(["M"],
                                   [143, 83, 0, 24, 34, 0, 254, 72])
o_verify.setName("O-Verify")


displayTensor(i)
displayTensor(f)
displayTensor(o_verify)
    

## Fully connected - Naive

An input stationary dataflow for a fully-connected computation, which is the base dataflow of the EIE design

In [None]:
swapped_f = f.swapRanks()

o = Tensor(rank_ids=["M"])

canvas = createCanvas(i, swapped_f, o)

i_chw = i.getRoot()
f_chw = swapped_f.getRoot()
o_m = o.getRoot()

for chw, (i_val, f_m) in i_chw & f_chw:    
    for m, (o_ref, f_val) in  o_m << f_m:
        o_ref += i_val * f_val
        canvas.addFrame((chw,), (chw, m), (m,))

displayTensor(o)
displayCanvas(canvas)


## Check result

In [None]:
o_verify == o

## EIE Dataflow

The EIE design splits the filter weights between multiple PEs, which result in a filter weight tensor with rank ids of M1, CHW and M0, where the M1 rank is the paritioning of output channel filter weights between PEs. Following is the transform of the original weights into the form used by a 2 PE instantiation of EIE. 

In [None]:
PEs = 2
split_size = (len(f.getRoot())+1)//PEs

eie_f = f.splitEqual(split_size).swapRanks(depth=1)
displayTensor(eie_f)

eie_o_verify = o_verify.splitEqual(split_size)
eie_o_verify
displayTensor(eie_o_verify)

## EIE Dataflow

Following is an EIE-like dataflow, but with the parallel units running serially. So with two PEs first one runs then the other.

In [None]:
o = Tensor(rank_ids=["M1", "M0"]).setName("O")

canvas = createCanvas(i, eie_f, o)

i_chw = i.getRoot()
f_m1 = eie_f.getRoot()
o_m1 = o.getRoot()

for chw, i_val in i_chw:
    for m1, (o_m0, f_chw) in o_m1 << f_m1:   # parallel
        f_m = f_chw.getPayload(chw)
        for m, (o_ref, f_val) in o_m0 << f_m:
            o_ref += i_val * f_val
            canvas.addFrame((chw,), (m1, chw, m), (m1, m))

displayTensor(o)
displayCanvas(canvas)


## Check result

In [None]:
displayTensor(eie_o_verify)
displayTensor(o)
eie_o_verify.getRoot() == o.getRoot()

## EIE parallel dataflow

Following is a representation of the EIE-like dataflow with parallel workers. The current cycle in each PE is tracked with the CycleManager class.

In [None]:
o = Tensor(rank_ids=["M1", "M0"]).setName("O")

canvas = createCanvas(i, eie_f, o)

i_chw = i.getRoot()
f_m1 = eie_f.getRoot()
o_m1 = o.getRoot()

cycle = CycleManager()

for chw, i_val in i_chw:

    cycle.startParallel()
    
    for pe, (m1, (o_m0, f_chw)) in enumerate(o_m1 << f_m1):   # parallel
        cycle.startWorker()
        
        f_m = f_chw.getPayload(chw)
        for m, (o_ref, f_val) in o_m0 << f_m:
            o_ref += i_val * f_val
            canvas.addActivity()
                        (chw,), (m1, chw, m), (m1, m),
                        worker=f"PE{pe}",
                        skew=cycle())

        cycle.finishWorker()

    cycle.finishParallel()
    
displayTensor(o)
displayCanvas(canvas)


## Check result

In [None]:
displayTensor(eie_o_verify)
displayTensor(o)
eie_o_verify.getRoot() == o.getRoot()

## Testing area

For running alternative algorithms