COMP.CE.320 High-level Synthesis

## Circular Buffer Exercise

Joonas Ikonen, 150244761

### Task 1

**Question 1:**

Exploration of pipelining and throughput options for FIR with shift register. The number of required memory-read and write operations required by the shift register in this function needs 8 clock cycles to complete, so the initiation interval cannot be lower than 8.

|  |  |  |  |
| --- | --- | --- | --- |
| Solution | Pipelining | Unrolling | Throughput |
| 1 | None | None | 42 |
| 2 | None | SHIFT and MAC | 9 |
| 3 | Main II=8 | SHIFT and MAC | 8 |

Solution 3 provides the best throughput where main loop is pipelined with initiation interval of 8 and the SHIFT and MAC loops are unrolled. The throughput values are from the scheduling phase of Catapult and the throughput of solution 3 was confirmed to be the same after RTL generation.

### Task 2

**Question 2:**

|  |  |  |  |
| --- | --- | --- | --- |
| Solution | Pipelining | Unrolling | Throughput |
| 4 | None | None | 19 |
| 5 | None | MAC | 10 |
| 6 | Main II=5 | MAC | 5 |

The best solution for the circular buffer-based FIR implementation was pipelining with initiation interval of 5 and MAC loop unrolled. This achieved a throughput of 5. Again, solution 6 was confirmed in RTL phase and rest were explored until scheduling step.

**Question 3:**

Inserting a value to the shift register required moving all the shift register values by one step forward. When the array for the shift register values is implemented as dual port RAM, this forms in to a bottleneck. Inserting a value to the shift register will take 8 memory operations with the array size of 8.

Circular buffer avoids this by removing all but one write-operation by changing pointer values instead of moving the data in memory.

This can be confirmed in the Gant-diagram schedule. Shift register needs 8 memory writes and circular buffer uses only one, this is the main difference between the two functions.

With the reduced amount of memory operations required by the circular buffer-based implementation we can change the pipelining of main function to II of 5. 8 read operations and 1 write operation on dual port memory can be performed in 5 clock cycles.