# AAHLS (Lab 2)

# 1. 簡介

本報告探討 AAHLS (Lab2),內容分為兩個部分。第一部分使用 AXI-M 傳輸數據來實現 FIR,第二部分則改為 AXI-Stream 傳輸數據來實現 FIR。此外,其他介面皆會透過 AXI-Lite 來實現。本次 LAB 使用 KV260(xck26-sfvc784-2LV-c)進行實作,詳細內容會在後面呈現。

### 2. 報告內容

#1:

#### (1) 內容

本次 Lab2 是實作 11taps 的 FIR,其中#1 使用 AXI-Master 作為資料讀寫的 interface,具體硬體架構是使用一個 shift register 將 input data 接到 shift register 中,透過 tap parameters 和 shift register 內部的 data 做 convolution 以實現 FIR。由於此 Lab 著重在 IO 的設定,硬體優化部分,包括 PIPELINE、UNROLL、ARRAY PARTITION 放在 QUESTION 進行回答。這次 HLS 主要有兩個 LOOP,分別是 SHIFT\_ACC\_LOOP 以及 XFER\_LOOP。SHIFT\_ACC\_LOOP 主要是進行輸出,每完成一次 LOOP 就會輸出一個對應的值,而 XFER\_LOOP 只要控制所有輸入 data,完成 XFER\_LOOP 代表對所有輸入 data 完成運算。此外,和 Lab1 不同的是 要另外定義一個 port=return,並透過 AXI-Lite 控制,此處設定是為了讓 host 能透過 AXI-Lite 控制 kernel。

#1 使用的是 AXI-Master 進行資料讀寫,因此需要啟用 Zynq MPSoC 的 Slave HP (High-Performance) Port,使 Kernel 能夠存取 DDR 進行讀寫。這需要在 Block Diagram 中手動啟用 Slave HP Port,完成相關設定後,即可產生 .hwh 和 .bit 檔案,供 Host 端操作。其中,.bit 是 bitstream 檔案,負責透過 fpga\_manager 配置 fpga,而 hwh 則包含 mmio、ip 等訊息,方便 python 進行調用 kernel。

在 python code 的部分,和 Lab1 不同的是,此次 Lab 會先將 data 透過 allocate 存入 main memory 中,接著再進行 kernel 的 configuration,在配置過程中也會計算 DC gain 以方便後續進行 normalized。再來會將 main memory 的 data 透過 AXI-Master 傳輸到 kernel,然後開始計算後再將結果一樣透過 AXI-Master 傳回 main memory 並書出結果。

# (2) 相關截圖

| ,                                                                                   |                                   |                    |                                                |                                    |                     |                          |           |         |          |         |           |     |        |       |        |
|-------------------------------------------------------------------------------------|-----------------------------------|--------------------|------------------------------------------------|------------------------------------|---------------------|--------------------------|-----------|---------|----------|---------|-----------|-----|--------|-------|--------|
| = Vitis HLS Rep                                                                     | ort for 'fir                      | nll maxi'          |                                                |                                    |                     | =                        |           |         |          |         |           |     |        |       |        |
| Date:                                                                               | Fri Feb 28                        | 02:04:01           | 2025                                           |                                    |                     | =                        |           |         |          |         |           |     |        |       |        |
| Version:<br>Project:<br>Solution:<br>Product family<br>Target device:               | /: zvnalin liis                   |                    | 2 on Mon<br>Flow Tar                           | Apr 18                             | 15:47:01 1          | DT 2022)                 |           |         |          |         |           |     |        |       |        |
| = Performance E                                                                     |                                   |                    |                                                |                                    |                     | •                        |           |         |          |         |           |     |        |       |        |
| · Timing:<br>* Summary:                                                             |                                   |                    |                                                |                                    |                     | =>                       |           |         |          |         |           |     |        |       |        |
| Clock I                                                                             | Target   Est                      | imated U           | ncertaint                                      | yl                                 |                     |                          |           |         |          |         |           |     |        |       |        |
| lap clk                                                                             | 10.00 ns 7.                       | 300 nsl            | 2.70 r                                         | ıs l                               |                     |                          |           |         |          |         |           |     |        |       |        |
| Latency:                                                                            |                                   |                    |                                                | +                                  |                     |                          |           |         |          |         |           |     |        |       |        |
| * Summary:                                                                          |                                   |                    |                                                |                                    |                     |                          |           |         |          |         |           |     |        |       |        |
| Latency (                                                                           | cycles)   La                      | tency (ab          | solute)  <br>max                               | Inter                              | val   Pipe          | eline <br>/pe            |           |         |          |         |           |     |        |       |        |
| 1 ?1                                                                                | ?†                                | ?1                 | ?1                                             | ?1                                 | ?1                  | nol                      |           |         |          |         |           |     |        |       |        |
| + Detail:                                                                           |                                   |                    |                                                |                                    |                     | +                        |           |         |          |         |           |     |        |       |        |
| * Instar                                                                            | ice:                              |                    |                                                |                                    |                     |                          | 4         |         |          |         |           |     |        |       |        |
| i                                                                                   |                                   |                    |                                                | 10                                 |                     |                          |           | Latency | (cycles) | Latency | (absolute | 1 T | ntervo | I Dir | neline |
| lgrp fir                                                                            | nll maxi Pip                      | eline XFE          | R LOOP fu                                      | 1 242                              | fir n11 m           | Module<br>axi Pipeline X | (FER LOOP | ?1      | ?        |         | -+        | 71  | 71     | ?     | no     |
| +                                                                                   |                                   |                    |                                                | +                                  |                     |                          |           |         |          | +       | -+        | +   | +      | -+    |        |
| * Loop:<br>N/A                                                                      |                                   |                    |                                                |                                    |                     |                          |           |         |          |         |           |     |        |       |        |
|                                                                                     |                                   |                    |                                                |                                    |                     |                          |           |         |          |         |           |     |        |       |        |
|                                                                                     |                                   |                    |                                                |                                    |                     | •                        |           |         |          |         |           |     |        |       |        |
| = Utilization H                                                                     | stimates                          |                    |                                                |                                    |                     | -                        |           |         |          |         |           |     |        |       |        |
| Summary:                                                                            | ++                                | +                  | +                                              |                                    | +                   |                          |           |         |          |         |           |     |        |       |        |
| Name                                                                                | BRAM 18Ki                         | DSP I              | FF i                                           | LUT                                | URAMI               |                          |           |         |          |         |           |     |        |       |        |
|                                                                                     |                                   |                    | -1                                             | _ 1                                |                     |                          |           |         |          |         |           |     |        |       |        |
| OSP<br>Expression                                                                   | -                                 | -                  | 01                                             | 40                                 |                     |                          |           |         |          |         |           |     |        |       |        |
| OSP<br>Expression<br>FIFO<br>Instance                                               | -1                                | - I<br>- I<br>33 I | 01<br>-1<br>14671                              | 40 I<br>- I<br>2466 I              | -                   |                          |           |         |          |         |           |     |        |       |        |
| DSP<br>Expression<br>FIFO<br>Instance<br>Memory                                     | 0                                 | -1<br>331<br>-1    | 01<br>- 1<br>14671<br>- 1                      | 401<br>24661<br>-1                 | 01                  |                          |           |         |          |         |           |     |        |       |        |
| DSP<br>Expression<br>FIFO<br>Instance<br>Memory<br>Multiplexer<br>Register<br>Total | - <br>  0 <br>  - <br>  -         | 33                 | 01<br>-1<br>14671<br>-1<br>-1<br>6501<br>      | 2466<br>175<br>-<br>2681           | - <br>- <br>- <br>0 |                          |           |         |          |         |           |     |        |       |        |
| DSP Expression FIFO Instance Memory Multiplexer Register Total Available            | - <br>  0 <br>  - <br>  - <br>  0 | 331                | 01<br>14671<br>-1<br>6501<br>21171<br>-2342401 | 2466<br>175<br>-<br>2681<br>117120 | 01                  |                          |           |         |          |         |           |     |        |       |        |

+ Performance & Resource Estimates: PS: '+' for module; 'o' for loop; '\*' for dataflow Modules Issue Latency Latency Iteration Trip & Loops Type | Slack (cycles)| (ns) | Latency Interval Count Pipelined BRAM | DSP FF LUT |+ fir nll maxi |+ fir nll maxi Pipeline XFER LOOP | o XFER LOOP -I 0.00I -I 0.00I -I 7.30I 2117 (~0%)| 2681 (2%) 463 (~0%)| 756 (~0%) 33 (2%)| 33 (2%)| = HW Interfaces Interface | Data Width | Address Width | Latency | Offset | Register | Max Widen | Max Read | Max Write | Num Read | Num Write | Num Write | Num Write | Num Read | Num Write | Num Read | Num Write | Num W | m axi gmem | 32 -> 32 | 64 \* S AXILITE Interfaces \* S AXILITE Registers | Interface | Register | Offset | Width | Access | Description Control signals
Global Interrupt Enable Register
IP Interrupt Enable Register
IP Interrupt Status Register
IP Interrupt Status Register
Data signal of pm32PPInput
Data signal of pm32PPInput
Data signal of pm32PPOutput
Data signal of pm32PPOutput
Data signal of pm32PPOutput
Data signal of regifterleng O-AP START 1-AP DONE 2-AP IDLE 3-AP READY 7-AUTO RESTART 9-INTERRUPT O-Enable O-CHANO INT EN 1-CHAN1 INT EN O-CHANO INT ST 1-CHAN1 INT ST \* TOP LEVEL CONTROL Interface | Type ap clk | clock
ap rst n | reset
interrupt | interrupt
ap ctrl | ap ctrl hs ap clk ap rst n interrupt H Q Q X X + H H =





```
In [6]: 1 # coding: utf-8
                       3 # In[]:
                      6 from __future__ import print_function
                    s import sys, os
simport numpy as np
for from time import time
final import matplotlib.pyplot as plt
sys.path.append('/home/xilinx')
sys.path.append('/home/xilinx')
form pynq import Overlay
for pynq import overlay
for pynq import overlay
               ol = Overlay("/home/root/jupyter_notebooks/FIRN11MAXI.bit")
ipFIRN11 = ol.fir_n11_maxi_0
                                     fisamples = open("samples_triangular_wave.txt", "r+")
numSamples = 0
line = fisSamples.readline()
while line:
    numSamples = numSamples + 1
line = fisamples.readline()
                                     inBuffer0 = allocate(shape=(numSamples,), dtype=np.int32)
outBuffer0 = allocate(shape=(numSamples,), dtype=np.int32)
fisamples.seek(0)
for i in range(numSamples):
    line = fisamples.readline()
    inBuffer0[j] = int(line)
fisamples.close()
                                    numTaps = 11
n32Taps = [0, -10, -9, 23, 56, 63, 56, 23, -9, -10, 0]
m322Taps = [1, 0, 0, 0, 0, 0, 0, 0, 0, 1]
n32DCGain = 0
timeKernelStart = time()
for i in range(numTaps):
n32DCGain = n32DCGain + n32Taps[i]
ipFrBNIL.write(@xd0 + i * 4, n32Taps[i])
if n32DCGain < 0:
                                     if n32DCGain < 0:
n32DCGain = 0 - n32DCGain
ipFIRM11.write(0x28, len(inBuffer0) * 4)
ipFIRM11.write(0x28, len(inBuffer0.device_address)
ipFIRM11.write(0x1c, outBuffer0.device_address)
ipFIRM11.write(0x00, 0x01)
while (ipFIRM11.read(0x00) & 0x4) == 0x0:
continue
timeKernelEnd = time()
print("Kernel execution time: " + str(timeKernelEnd - timeKernelStart) + " s")
                                      plt.title("FIR Response")
plt.xlabel("Sample Point")
plt.ylabel("Magnitude")
xSeq = range(len(inBuffer0))
if n32DCGain == 0:
plt.plot(xSeq, inBuffer0, 'b.', xSeq, outBuffer0, 'r.')
alae:
                                       plt.plot(xSeq, inBuffer0, 'b.', xSeq, outBuffer0 / n32DCGain, 'r.')
                                       plt.plot(XSeq, Innutries, o., Assay, plt.grid(True)
plt.show() # In Jupyter, press Tab + Shift keys to show plot then redo run
                                        Entry: /usr/local/share/pynq-venv/lib/python3.8/site-packages/ipykernel_launcher.py
                    CHITY: //DST/101-03/1-368-02/pynq-venv/110/python3-8/site-packages/ipykernel_launcher.py

System argument(5): 3

Start of "/usr/local/share/pynq-venv/lib/python3-8/site-packages/ipykernel_launcher.py"

Kernel execution time: 0-000274658203125 s
                                                                           FIR Response
                     Magnit
-20
                    Exit process
```

# #2:

#2 和#1 幾乎一樣,只是在讀寫 data 的介面從 AXI-Master 換成 AXI-Stream。接下來只會提到不同的部分。

在 vivado 進行 block diagram 的 connection 需要注意的是,要將 dma ip 呼叫出來,並且將其中一個設為 read 另一個設為 write 並提前手動連接 s\_ss2m 以及 s\_m2ss,將 s\_ss2m 連接到 axi\_dma\_1 而 s\_m2ss 連接到 axi\_dma\_0,接著再讓它進行自動連接,此時一樣要開啟 slave hp port。

在 python code 的部分,由於其採用 DMA 連結 PS 的 AXI-Master 和 kernel 的 AXI-Stream,因此要透過 ipDMAIn 以及 ipDMAOut 進行 data 讀寫,其他部分和#1 相同。

#### (2) 相關截圖

| = Synthesis St                                                                                               | ummary Report of                                                             | 'fir nll                           | strm'          |                              |                     |                 |                         |                                       |                                                                                                                                        |                     |                         |                         |    |     |       |  |  |
|--------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|------------------------------------|----------------|------------------------------|---------------------|-----------------|-------------------------|---------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|---------------------|-------------------------|-------------------------|----|-----|-------|--|--|
| * Product 1                                                                                                  | rmation: Fri Feb 2022.1 ( hls FIRN solution family: zynguplu evice: xck26-sf | IS                                 |                | Mon Apr 1                    | 18 15:47:0          | MDT 202         | 2)                      |                                       |                                                                                                                                        |                     |                         |                         |    |     |       |  |  |
| + Performance &                                                                                              | k Resource Estim                                                             | ates:                              |                |                              |                     |                 |                         |                                       |                                                                                                                                        |                     |                         |                         |    |     |       |  |  |
|                                                                                                              | r module; 'o' fo                                                             |                                    |                |                              |                     |                 |                         |                                       |                                                                                                                                        |                     |                         |                         |    |     |       |  |  |
| Modules<br>& Loops                                                                                           |                                                                              |                                    | Issu<br>  Type | Slack                        | Latency<br>(cycles) | Latency<br>(ns) | Iteration <br>  Latency | Interval                              | Trip  <br>  Count                                                                                                                      | Pipelined           | BRAM                    | DSP I                   | FF | LUT | URAMI |  |  |
| + fir nll<br>  + fir nll<br>  o XFER I                                                                       |                                                                              | 1.01<br>1.01<br>1.7.30             | -              | -                            |                     | 11              | -                       | nol<br>nol<br>ves                     | -                                                                                                                                      | 33 (2%) <br>33 (2%) | 952 (~0%) <br>762 (~0%) | 1082 (~0%)<br>825 (~0%) |    |     |       |  |  |
| Interface<br>  s axi control<br>  * S AXILITE Res                                                            | Data Width                                                                   | Address<br>7                       | Width   (      | Offset  <br>64  <br>1 Access | Register<br>0       | tion            |                         | Bit                                   | t Fields                                                                                                                               |                     |                         |                         |    |     | †     |  |  |
| s axi control<br>  s axi control |                                                                              |                                    |                |                              |                     |                 |                         | 0=/<br>  ster   0=E<br>  0=(<br>  0=( | O-AP STAKT 1-AP DONE 2-AP IDLE 3-AP READY 7-AUTO RESTAKT 9-INTERRUPT<br>O-ELRAD INT EN 1-CHANI INT EN<br>O-CHANO INT ST 1-CHANI INT ST |                     |                         |                         |    |     |       |  |  |
| * AXIS                                                                                                       |                                                                              |                                    |                |                              |                     |                 |                         |                                       |                                                                                                                                        |                     |                         |                         |    |     |       |  |  |
| Interface                                                                                                    | Register Mode                                                                | I TDATA I                          | TDEST          | TID   TH                     | CEEP   TLA          | ST   TREAL      | DY   TSTRB              | TUSER   T                             | IVALID I                                                                                                                               |                     |                         |                         |    |     |       |  |  |
| pstrmInput<br>  pstrmOutput                                                                                  | both<br>both                                                                 | 32                                 | 1 1            | 1   4                        | 1 1                 | 1 1             | 4 4                     | 1   1                                 | 1                                                                                                                                      |                     |                         |                         |    |     |       |  |  |
| * TOP LEVEL COM                                                                                              | VIROL<br>Type   Por                                                          | ts  <br>clk  <br>rst n  <br>errupt |                |                              |                     |                 |                         |                                       |                                                                                                                                        |                     |                         |                         |    |     |       |  |  |



```
for i in range(numTaps):
    n32DCGain = n32DCGain + n32Taps[i]
    ipFIRNI1.write(0x40 + i * 4, n32Taps[i])
    if n32DCGain < 0 = n32DCGain
    ipFIRNI1.write(0x10, len(inBuffer0) * 4)
    ipFIRNI1.write(0x10, len(inBuffer0) * 4)
    ipFIRNI1.write(0x10, len(inBuffer0)
    ipDMAIN.sendchannel.transfer(inBuffer0)
    ipDMAIN.tercvchannel.wait()
    ipDMAIN.tercvchannel.wait()
    timeKernelEnd = time()
    print("Kernel execution time: " + str(timeKernelEnd - timeKernelStart) + " s")

plt.title("FIR Response")
    plt.xlabel("sample Point")
    plt.ylabel("Sample Point")
    plt.ylabel("Sample Point")
    if n32DCGain = 0:
        plt.plot(xSeq, inBuffer0, 'b.', xSeq, outBuffer0, 'r.')
    else:
        plt.plot(xSeq, inBuffer0, 'b.', xSeq, outBuffer0 / n32DCGain, 'r.')
    plt.grid(True)
    plt.show() # In Jupyter, press Tab + Shift keys to show plot then redo run
    print("Exit process")
```

Entry: /usr/local/share/pynq-venv/lib/python3.8/site-packages/ipykernel\_launcher.py System argument(s): 3 Start of "usr/local/share/pynq-venv/lib/python3.8/site-packages/ipykernel\_launcher.py" Kernel execution time: 0.0008769035339355469 s



Exit process