# Eyeriss

This notebook reproduces the salient characteristics of the [Eyeriss](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7738524) accelerator.

## Imports

Import the necessary modules.

In [28]:
# HiFiber boilerplate

from fibertree_bootstrap import *

fibertree_bootstrap(style="tree", animation='movie')

# Compilation boilerplate

import os
import sys
sys.path.insert(0, "..")

from src import utils

interactive(children=(Dropdown(description='style', options=('tree', 'uncompressed', 'tree+uncompressed'), val…

Button(description='Run all cells below', style=ButtonStyle())

## Initialization

Initialize the input tensors. Tensor shapes and densities can be modified below.

**Warning:** Large tensors will overwhelm the video generation. Either:
1. Use small tensors; as a rule of thumb, fewer than 60 computes (e.g., multiplications) should be required.
2. Do not generate a video; remove the `spacetime` specification from the `mapping` before compiling.

In [29]:
# Filter
M = 2
C = 2
R = 2
S = 2

# Input
N = 2
H = 3
W = 3

# Stride
Stride = 1

# Output
E = int((H-R+Stride)/Stride)
F = int((W-S+Stride)/Stride)

# Partition parameters
N1 = 2
N0 = 1
C2 = 2
C1 = 2
C0 = 2
M2 = 2
M1 = 1
M0 = 1
E2 = 2
E1 = 2
E0 = 2

# Random Input Tensors
I_NCHW = Tensor.fromRandom(rank_ids=["N", "C", "H", "W"], density=1.0, shape=[N, C, H, W])
F_MCRS = Tensor.fromRandom(rank_ids=["M", "C", "R", "S"], density=1.0, shape=[M, C, R, S])

## Compile and Run

Below is the TeAAL specification for Eyeriss. To simulate the accelerator:
1. Compile it to HiFiber by running the cell, inserting a new cell
2. Run the new cell, which will
    - Execute the kernel; multiplying the above defined matrices
    - Generate visualizations of the actions of the kernel

Here goes our resoning for the TeAAL specification for the Einsums and mappings of Eyeriss:
1. Partition
    - M rank is partitioned (uniform shape) twice (M2, M1, M0), and C rank is partitioned (uniform shape) twice (C2, C1, C0)

      M is the rank represents number of filters, and C is the rank represents number of channels.
      A PE array is a spatial array of 168 PEs organized as a 12 × 14 rectangle.
      The dimensions of a PE set are a function of the shape of a layer and are independent of the physical dimensions of the PE array.
      A PE set can handle multiple 2D convolutions.
      Efficiently mapping multiple (M x C) 2D Convolutions to a 2D PE array (the hardware) is a nontrivial task.
      
      In Eyeriss, they describe the partition in the following sentences:
      "The PE array can run multiple 2-D convolutions from up to (q × r) channels of (p × t) filters simultaneously."
      "The PE array fits r × t PE sets in parallel that run r different channels of t different filters simultaneously."
      "Each PE runs p × q primitives simultaneously from q different channels of p different filters."      
      
      The M and C ranks are first partitioned into (t x p) and (r x q) chunks to be mapped to the 2D PE array.
      Within the 2D PE array, a single PE handles q channels and p filters.

      As a result, M is broken as: M2 = M, M1 = t x p, M0 = p.
      C is broken as: C2 = C, C1 = q x r, C0 = q.
      
    - N rank is partitioned (uniform shape) once: N1, N0
  
      N is the rank represents the batch size. 
      According to figure 7 in Eyeriss, a single shceduling pass does not go over all the ifmaps.
      "A pass is assumed to process three channels (q × r) and four filters (p × t). Also, the number of ifmaps that a pass processes, denoted as n, is assumed to be 2."

      As a result, Eyeriss partitions N into (N1, N0), where in Figure 7, N1 = N, N0 = 2.
      
    - E rank is partitioned (uniform shape) twice: E2, E1, E0

      E is the rank represents the ofmap height.
      "the height and the width of the PE set are equal to the number of filter rows (R) and ofmap rows (E)."

      The hardware (PE array) has fixed dimension (12 x 14).
      When the PE set requires more than 12 x 14 PEs, Eyeriss "strip mining the 2-D convolution, i.e., the PE set only processes e rows of ofmap at a time, where e ≤ E. The dimensions of the strip-mined PE set then becomes R × e and can fit into the PE array."
      On top of this, the PE array can contain multiple (e1) PE sets: R x e x e1.

      As a result, Eyeriss partitions E into (E2, E1, E0), where E2 = E, E1 = e1 x e (e1 = 2 for AlexNet CONV layer 1), E0 = e (e = 7 for AlexNet CONV layer 1).
       
2. Space and Time
   - M and C ranks

     M0 and C0 are time because the computation is serial: A PE set can run p × q 2-D Convolutions in a time interleaved fashion (Details can be found in the subsection "Multiple 2-D Convolutions in a PE Set").

     M1 and C1 are space because "the PE array can fit more than one PE set if the set is small enough", which implies r and t chunks happen in parallel.

     M2 and C2 are time because the PE array has fixed capacity so this is the only option. 
     
   - N rank

     Throughout the paper, PE local spad only store data of width S (width of filter).
     When filter reuse is explored (Figure 6(a)), different ifmaps are processed in a sequential order, which implies N0 is time.

     N1 is also time as implied by Figure 7. 
     
   - E rank

     E1, E0 are space, because after strip-mining the 2D convlution, the PE array can contain e1 number of PE set of dimension R x e. The PE array handles E1 and E0 in paralle.

     E2 is time because the PE array has fixed capacity so this is the only option.  

   - R rank

     R is space because all rows of a filter is broadcasted onto the PE array.
     "Eyeriss currently does not support the mapping of a PE set that is taller than the height of the PE array. Therefore, the maximum natively supported filter height is 12."
     This sentence implies that there is no partition along the R rank.

   - S rank

     S is time because each PE can only compute one element at a given cycle (Figure 3), so a single row of filter takes S cycles to finish the entire row.

   - F rank

     F is time because PE local pad is of width S, so there is no capacity to store the entire row of ifmap.
     
3. Loop order
   
   According to Figure 7:
   - N1 and N0 are the outermost.
   - C2 follows
   - M2 follows C2
   
   Then, E2 should be the next loop order because this is the only other partition that is due to limited PE array capacity.

   Then, E1, E0, C1, M1 all happen simulteneously in space. There is no strict order between them.

   Then, C0 and M0 happen in a time interleaved fashion, and there is no strict order between them.

   Then, F because for a 1D primitive, ifmap row are streamed in in a sliding window fashion.
   
   Finally, R (filter height) and S (filter width) , and there is no strict order between them.
   
   

In [30]:
yaml = """
einsum:
  declaration:
    I: [N, C, H, W]
    F: [M, C, R, S]
    O: [N, M, E, F]
  expressions:
    - O[n, m, e, f] = I[n, c, e+r, f+s]*F[m, c, r, s]
mapping:
  rank-order:
    I: [N, C, H, W]
    F: [M, C, R, S]
    O: [N, M, E, F]
  partitioning:
    O:
      M: [uniform_shape(M1), uniform_shape(M0)]
      N: [uniform_shape(N0)]
      C: [uniform_shape(C1), uniform_shape(C0)]
      E: [uniform_shape(E1), uniform_shape(E0)]
  loop-order:
    O: [N1, N0, C2, M2, E2, C1, M1, E1, E0, C0, M0, F, R, S]
  spacetime:
    O:
      space: [C1, M1, E1, E0, R]
      time: [N1, N0, C2, M2, E2, C0, M0, F, S]
"""

utils.compile(yaml)

In [31]:
# Autogenerated HiFiber

O_N1N0M2E2M1E1E0M0F = Tensor(rank_ids=["N1", "N0", "M2", "E2", "M1", "E1", "E0", "M0", "F"], name="O", shape=[N, N, M, E, M, E, E, M, F])
tmp0 = I_NCHW
tmp1 = tmp0.splitUniform(C1, depth=1)
tmp2 = tmp1.splitUniform(C0, depth=2)
I_NC2C1C0HW = tmp2
I_NC2C1C0HW.setRankIds(rank_ids=["N", "C2", "C1", "C0", "H", "W"])
tmp3 = I_NC2C1C0HW
tmp4 = tmp3.splitUniform(N0, depth=0)
I_N1N0C2C1C0HW = tmp4
I_N1N0C2C1C0HW.setRankIds(rank_ids=["N1", "N0", "C2", "C1", "C0", "H", "W"])
tmp5 = F_MCRS
tmp6 = tmp5.splitUniform(C1, depth=1)
tmp7 = tmp6.splitUniform(C0, depth=2)
F_MC2C1C0RS = tmp7
F_MC2C1C0RS.setRankIds(rank_ids=["M", "C2", "C1", "C0", "R", "S"])
tmp8 = F_MC2C1C0RS
tmp9 = tmp8.splitUniform(M1, depth=0)
tmp10 = tmp9.splitUniform(M0, depth=1)
F_M2M1M0C2C1C0RS = tmp10
F_M2M1M0C2C1C0RS.setRankIds(rank_ids=["M2", "M1", "M0", "C2", "C1", "C0", "R", "S"])
o_n1 = O_N1N0M2E2M1E1E0M0F.getRoot()
F_C2M2C1M1C0M0RS = F_M2M1M0C2C1C0RS.swizzleRanks(rank_ids=["C2", "M2", "C1", "M1", "C0", "M0", "R", "S"])
i_n1 = I_N1N0C2C1C0HW.getRoot()
f_c2 = F_C2M2C1M1C0M0RS.getRoot()
canvas = createCanvas(I_N1N0C2C1C0HW, F_C2M2C1M1C0M0RS, O_N1N0M2E2M1E1E0M0F)
for n1_pos, (n1, (o_n0, i_n0)) in enumerate(o_n1 << i_n1):
    for n0_pos, (n0, (o_m2, i_c2)) in enumerate(o_n0 << i_n0):
        for c2_pos, (c2, (i_c1, f_m2)) in enumerate(i_c2 & f_c2):
            for m2_pos, (m2, (o_e2, f_c1)) in enumerate(o_m2 << f_m2):
                for e2_pos, (e2, o_m1) in enumerate(o_e2.iterRangeShapeRef(0, E, E1)):
                    for c1_pos, (c1, (i_c0, f_m1)) in enumerate(i_c1 & f_c1):
                        for m1_pos, (m1, (o_e1, f_c0)) in enumerate(o_m1 << f_m1):
                            for e1_pos, (e1, o_e0) in enumerate(o_e1.iterRangeShapeRef(int(e2 + 1) - 1, int(min(e2 + E1, E) + 1) - 1, E0)):
                                for e0_pos, (e0, o_m0) in enumerate(o_e0.iterRangeShapeRef(int(e1 + 1) - 1, int(min(e1 + E0, E) + 1) - 1, 1)):
                                    for c0_pos, (c0, (i_h, f_m0)) in enumerate(i_c0 & f_c0):
                                        for m0_pos, (m0, (o_f, f_r)) in enumerate(o_m0 << f_m0):
                                            for f_pos, (f, o_ref) in enumerate(o_f.iterRangeShapeRef(0, F, 1)):
                                                for r_pos, (r, (i_w, f_s)) in enumerate(i_h.project(trans_fn=lambda h: h + -1 * e0, interval=(0, R)) & f_r):
                                                    for s_pos, (s, (i_val, f_val)) in enumerate(i_w.project(trans_fn=lambda w: w + -1 * f, interval=(0, S)) & f_s):
                                                        o_ref += i_val * f_val
                                                        canvas.addActivity((n1, n0, c2, c1, c0, e0 + r, f + s), (c2, m2, c1, m1, c0, m0, r, s), (n1, n0, m2, e2, m1, e1, e0, m0, f), spacetime=((c1_pos, m1_pos, e1_pos, e0_pos, r_pos), (n1_pos, n0_pos, c2_pos, m2_pos, e2_pos, c0_pos, m0_pos, f_pos, s_pos)))
tmp11 = O_N1N0M2E2M1E1E0M0F
tmp12 = tmp11.swizzleRanks(rank_ids=["N1", "N0", "M2", "M1", "M0", "E2", "E1", "E0", "F"])
tmp13 = tmp12.mergeRanks(depth=2, levels=2, coord_style="absolute")
tmp14 = tmp13.mergeRanks(depth=0, levels=1, coord_style="absolute")
tmp15 = tmp14.mergeRanks(depth=2, levels=2, coord_style="absolute")
tmp15.setRankIds(rank_ids=["N", "M", "E", "F"])
O_NMEF = tmp15
#displayCanvas(canvas)

TypeError: 'float' object cannot be interpreted as an integer

In [27]:
utils.check_conv(I_NCHW, F_MCRS, O_NMEF)

AssertionError: 

## Extra specifications for Eyeriss with AlexNet Convolution layers as the target layers to model



![Eyerisis AlexNet detail](./images/alexnet-eyeriss.png)

### AlexNet Convolutional Layer 1


In [21]:
yaml = """
einsum:
  declaration:
    I: [N, C, H, W]
    F: [M, C, R, S]
    O: [N, M, E, F]
  expressions:
    - O[n, m, e, f] = I[n, c, 4*e+r, 4*f+s]*F[m, c, r, s]
mapping:
  rank-order:
    I: [N, C, H, W]
    F: [M, C, R, S]
    O: [N, M, E, F]
  partitioning:
    O:
      M: [uniform_shape(32), uniform_shape(16)]
      E: [uniform_shape(14), uniform_shape(7)]
  loop-order:
    O: [N, C, M2, E2, M1, E1, E0, M0, F, R, S]
  spacetime:
    O:
      space: [M1, E1, E0, R]
      time: [N, C, M2, E2, M0, S, F]
"""

utils.compile(yaml)

### AlexNet Convolutional Layer 2

In [None]:
yaml = """
einsum:
  declaration:
    I: [N, C, H, W]
    F: [M, C, R, S]
    O: [N, M, E, F]
  expressions:
    - O[n, m, e, f] = I[n, c, e+r, f+s]*F[m, c, r, s]
mapping:
  rank-order:
    I: [N, C, H, W]
    F: [M, C, R, S]
    O: [N, M, E, F]
  partitioning:
    O:
      M: [uniform_shape(16)]
      C: [uniform_shape(2)]
  loop-order:
    O: [N, C1, M1, E, C0, M0, F, R, S]
  spacetime:
    O:
      space: [E, R]
      time: [N, C1, M1, C0, M0, F, S]
"""

utils.compile(yaml)

### AlexNet Convolutional Layer 3


In [None]:
yaml = """
einsum:
  declaration:
    I: [N, C, H, W]
    F: [M, C, R, S]
    O: [N, M, E, F]
  expressions:
    - O[n, m, e, f] = I[n, c, e+r, f+s]*F[m, c, r, s]
mapping:
  rank-order:
    I: [N, C, H, W]
    F: [M, C, R, S]
    O: [N, M, E, F]
  partitioning:
    O:
      M: [uniform_shape(64), uniform_shape(16)]
      N: [uniform_shape(4)]
      C: [uniform_shape(4)]
  loop-order:
    O: [N1, N0, C1, M2, M1, E, C0, M0, F, R, S]
  spacetime:
    O:
      space: [M1, E, R]
      time: [N1, N0, C1, M2, C0, M0, F, S]
"""

utils.compile(yaml)

### AlexNet Convolutional Layer 4 & 5


In [22]:
yaml = """
einsum:
  declaration:
    I: [N, C, H, W]
    F: [M, C, R, S]
    O: [N, M, E, F]
  expressions:
    - O[n, m, e, f] = I[n, c, e+r, f+s]*F[m, c, r, s]
mapping:
  rank-order:
    I: [N, C, H, W]
    F: [M, C, R, S]
    O: [N, M, E, F]
  partitioning:
    O:
      M: [uniform_shape(32), uniform_shape(16)]
      N: [uniform_shape(4)]
      C: [uniform_shape(6), uniform_shape(3)]
  loop-order:
    O: [N1, N0, C2, M2, C1, M1, E, C0, M0, F, R, S]
  spacetime:
    O:
      space: [C1, M1, E, R]
      time: [N1, N0, C2, M2, C0, M0, F, S]
"""

utils.compile(yaml)

In [None]:
# Autogenerated HiFiber

O_N1N0M2M1EM0F = Tensor(rank_ids=["N1", "N0", "M2", "M1", "E", "M0", "F"], name="O", shape=[N, N, M, M, E, M, F])
tmp0 = I_NCHW
tmp1 = tmp0.splitUniform(4, depth=0)
I_N1N0CHW = tmp1
I_N1N0CHW.setRankIds(rank_ids=["N1", "N0", "C", "H", "W"])
tmp2 = I_N1N0CHW
tmp3 = tmp2.splitUniform(6, depth=2)
tmp4 = tmp3.splitUniform(3, depth=3)
I_N1N0C2C1C0HW = tmp4
I_N1N0C2C1C0HW.setRankIds(rank_ids=["N1", "N0", "C2", "C1", "C0", "H", "W"])
tmp5 = F_MCRS
tmp6 = tmp5.splitUniform(6, depth=1)
tmp7 = tmp6.splitUniform(3, depth=2)
F_MC2C1C0RS = tmp7
F_MC2C1C0RS.setRankIds(rank_ids=["M", "C2", "C1", "C0", "R", "S"])
tmp8 = F_MC2C1C0RS
tmp9 = tmp8.splitUniform(32, depth=0)
tmp10 = tmp9.splitUniform(16, depth=1)
F_M2M1M0C2C1C0RS = tmp10
F_M2M1M0C2C1C0RS.setRankIds(rank_ids=["M2", "M1", "M0", "C2", "C1", "C0", "R", "S"])
o_n1 = O_N1N0M2M1EM0F.getRoot()
F_C2M2C1M1C0M0RS = F_M2M1M0C2C1C0RS.swizzleRanks(rank_ids=["C2", "M2", "C1", "M1", "C0", "M0", "R", "S"])
i_n1 = I_N1N0C2C1C0HW.getRoot()
f_c2 = F_C2M2C1M1C0M0RS.getRoot()
canvas = createCanvas(I_N1N0C2C1C0HW, F_C2M2C1M1C0M0RS, O_N1N0M2M1EM0F)
for n1_pos, (n1, (o_n0, i_n0)) in enumerate(o_n1 << i_n1):
    for n0_pos, (n0, (o_m2, i_c2)) in enumerate(o_n0 << i_n0):
        for c2_pos, (c2, (i_c1, f_m2)) in enumerate(i_c2 & f_c2):
            for m2_pos, (m2, (o_m1, f_c1)) in enumerate(o_m2 << f_m2):
                for c1_pos, (c1, (i_c0, f_m1)) in enumerate(i_c1 & f_c1):
                    for m1_pos, (m1, (o_e, f_c0)) in enumerate(o_m1 << f_m1):
                        for e_pos, (e, o_m0) in enumerate(o_e.iterRangeShapeRef(0, E, 1)):
                            for c0_pos, (c0, (i_h, f_m0)) in enumerate(i_c0 & f_c0):
                                for m0_pos, (m0, (o_f, f_r)) in enumerate(o_m0 << f_m0):
                                    for f_pos, (f, o_ref) in enumerate(o_f.iterRangeShapeRef(0, F, 1)):
                                        for r_pos, (r, (i_w, f_s)) in enumerate(i_h.project(trans_fn=lambda h: h + -1 * e, interval=(0, R)) & f_r):
                                            for s_pos, (s, (i_val, f_val)) in enumerate(i_w.project(trans_fn=lambda w: w + -1 * f, interval=(0, S)) & f_s):
                                                o_ref += i_val * f_val
                                                canvas.addActivity((n1, n0, c2, c1, c0, e + r, f + s), (c2, m2, c1, m1, c0, m0, r, s), (n1, n0, m2, m1, e, m0, f), spacetime=((c1_pos, m1_pos, e_pos, r_pos), (n1_pos, n0_pos, c2_pos, m2_pos, c0_pos, m0_pos, f_pos, s_pos)))
tmp11 = O_N1N0M2M1EM0F
tmp12 = tmp11.swizzleRanks(rank_ids=["N1", "N0", "M2", "M1", "M0", "E", "F"])
tmp13 = tmp12.mergeRanks(depth=2, levels=2, coord_style="absolute")
tmp14 = tmp13.mergeRanks(depth=0, levels=1, coord_style="absolute")
tmp14.setRankIds(rank_ids=["N", "M", "E", "F"])
O_NMEF = tmp14
displayCanvas(canvas)