## COMP.CE.320 High-level Synthesis

## Hierarchical Design Exercise

Joonas Ikonen, 150244761, joonas.ikonen@tuni.fi

## Question 1

The area score is 1,77 times the size of the previous solutions area score that has throughput of 64. The only change between these is the implementation of the hierarchy and pipelining of two outer loops with initiation interval (II) of 8 instead of only main loop with II of 64. The previous solutions pipelining and unrolling options, made possible by memory interleaving, allow for less multiplication blocks. On top of this the interconnect memories require more area for the latest solution.

## Question 2

The three implemented blocks, two transposeMat blocks and multiply, are visible to the right of the schematic. A and B input arrays are connected to the transposeMat blocks. After transposing the matrices, the larger control blocks handle operating the smaller light brown interconnect memory blocks. From the interconnect memories, A and B are fed to multiply block resulting in C output.

From the schematic I noticed that the multiply block takes over 14300 of the area score of 15653 and the interconnect memories are counted with area score of 1 and their control takes about 60 for each block. So, the difference in loop pipelining and unrolling makes up most of the difference in the area scores of the first question.



RTL schematic of the top level function.