# **Spring 2023**

**EE 382N-4: Advanced Micro-Controller Systems** 

Lab Assignment #2

**DUE FEB 24TH, 2023** 

#### Lab Goals:

This lab focuses on the instrumentation of AXI transfers and Interrupt Service Routines (ISR). The first part will measure the time it takes to DMA data to/from the OCM and from/to the BRAM. The second part measures the time it takes to respond to an interrupt from the PL.

## **Initial Setup:**

Generate a new AXI peripheral: **CAPTURE\_TIMER**. The block diagram is shown below. The timercapture counter that is gated by the CDMA Interrupt signal. The counter is started when the transfer starts and then stops when the CDMA unit interrupts the processor. The counter is read in the CDMA ISR and sent to the main routine for processing.



NOTE: The **interrupt\_out** pin needs to be configured as an interrupt pin during editing in the IP Packager so that the DTB assigns an interrupt number to the pin. The interrupt is asserted when **slv** reg1[0] is set to 0x1. The counter and the interrupt are disabled when the ISR exits.

The **CAPTURE\_TIMER** block will be implemented in Verilog. It will have 4 registers that will be used for this Lab.

Reading register port slv reg0[31:0] will return the following

```
slv reg0[0]
                     = capture gate;
                                                  // CDMA interrupt out signal
                                                  // to the GIC. Used to halt counter.
    slv_reg0[1] = 1'b0;
slv_reg0[2] = 1'b0;
slv_reg0[3] = 1'b0;
slv_reg0[4] = capture_complete; // Flag to indicate that the
                                                  // capture is complete
    slv_reg0[5]
                   = 1'b0;
                 = 1'b0;
= timer_enable;
    slv reg0[6]
    slv_reg0[7] = timer_enable;  // Timer enable signal
slv_reg0[31:8] = {16'hBEAD,8'h0};  // Debug
Writing to register port slv reg1[0] will be used to assert or negate the "interrupt_out" port
pin.
   assign interupt out = slv reg1[0];
                                                   // Active high asserts an
                                                   // interrupt to the GIC.
Writing to register port slv reg1[1] is used to enable counting.
   assign timer_enable = slv_reg1[1];
                                                   // Active high enables
                                                   // capture timer to count.
Reading register slv reg1[31:0] will return the following:
    slv_reg1[0] = interrupt_out;
    slv reg1[1] = timer enable;
    slv reg1[31:2] = {16'hFEED,14'h0};
                                                 // Debug
Reading register port slv reg2[31:0] reads the capture timer counter value.
     slv reg2[31:0] = Cap Timer Out[31:0];
Reading register port slv reg3[31:0] reads the following:
    slv reg3[2:0] = state[2:0];
                                                  // This is the current state of
                                                  // the timer counter state machine.
    slv reg3[3]
                      = 1'b0;
    slv reg3[31:4] = 28'h5555 CAB;
                                              // Debug
The state [2:0] data is for debug. A typical state assignment for a timer counter would be:
    parameter
                     RESET = 3'b000;
    parameter COUNT = 3'b010;
parameter WAIT = 3'b011;
parameter IDLE = 3'b100;
```

The state diagram for this assignment would look like this:



# Schematic design procedure:

Referring to the block diagram below:

- Use the Vivado schematic from Lab 1 and insert the **CAPTURE\_TIMER** as shown in the block diagram below.
- Modify the CONCAT module to add an additional interrupt input pin. Connect the interrupt\_out signal from CAPTURE\_TIMER module to the CONCAT module



The next step is to generate a bit file that can be dynamically downloaded into the FPGA. Refer to this guide on how to generate the FPGA bit file: BIT File Generation Flow

You may see the following warning messages when you generate the bit file:



The updated Address Map must look like this:

| Name                                                    | Interface  | Slave Segment | Master Base<br>Address | Range | Master High<br>Address |
|---------------------------------------------------------|------------|---------------|------------------------|-------|------------------------|
| axi_cdma_0                                              |            |               |                        |       |                        |
| /axi_cdma_0/Data (40 address bits : 1T)                 |            |               |                        |       |                        |
| /zynq_ultra_ps_e_0/SAXIGP6                              | S_AXI_LPD  | LPD_DDR_LOW   | 0x00_0000_0000         | 2G    | 0x00_7FFF_FFFF         |
| /axi_bram_ctrl_CDMA/S_AXI                               | S_AXI      | Mem0          | 0x00_B002_8000         | 8K    | 0x00_B002_9FFI         |
| /zynq_ultra_ps_e_0/SAXIGP6                              | S_AXI_LPD  | LPD_LPS_OCM   | 0x00_FF00_0000         | 16M   | 0x00_FFFF_FFF          |
| zynq_ultra_ps_e_0                                       |            |               |                        |       |                        |
| /ULTRA96_IO/Low_Speed_MEZZ/axi_uart16550_0/S_AXI        | S_AXI      | Reg           | 0x00_A000_0000         | 64K   | 0x00_A000_FFF          |
| /ULTRA96_IO/Low_Speed_MEZZ/axi_uart16550_0/S_AXI        | S_AXI      | Reg           | 0x00_A000_0000         | 64K   | 0x00_A000_FFF          |
| /ULTRA96_IO/Low_Speed_MEZZ/axi_uart16550_1/S_AXI        | S_AXI      | Reg           | 0x00_A001_0000         | 64K   | 0x00_A001_FFF          |
| /ULTRA96_IO/Low_Speed_MEZZ/axi_uart16550_1/S_AXI        | S_AXI      | Reg           | 0x00_A001_0000         | 64K   | 0x00_A001_FFF          |
| /ULTRA96_IO/BD_CTL_GPIO/axi_gpio_0/S_AXI                | S_AXI      | Reg           | 0x00_A002_0000         | 4K    | 0x00_A002_0FF          |
| /ULTRA96_IO/BD_CTL_GPIO/axi_gpio_0/S_AXI                | S_AXI      | Reg           | 0x00_A002_0000         | 4K    | 0x00_A002_0FF          |
| /ULTRA96_IO/BD_CTL_GPIO/axi_gpio_1/S_AXI                | S_AXI      | Reg           | 0x00_A002_1000         | 4K    | 0x00_A002_1FF          |
| /ULTRA96_IO/BD_CTL_GPIO/axi_gpio_1/S_AXI                | S_AXI      | Reg           | 0x00_A002_1000         | 4K    | 0x00_A002_1FF          |
| /ULTRA96_IO/Low_Speed_MEZZ/axi_gpio_2/S_AXI             | S_AXI      | Reg           | 0x00_A002_2000         | 4K    | 0x00_A002_2FF          |
| /ULTRA96 IO/Low Speed MEZZ/axi gpio 2/S AXI             | S AXI      | Reg           | 0x00 A002 2000         | 4K    | 0x00 A002 2FF          |
| /ULTRA96_IO/SYS_MGMT/axi_gpio_3/S_AXI                   | S_AXI      | Reg           | 0x00_A002_5000         | 4K    | 0x00_A002_5FF          |
| /ULTRA96 IO/SYS MGMT/axi gpio 3/S AXI                   | S AXI      | Reg           | 0x00 A002 5000         | 4K    | 0x00 A002 5FF          |
| /ULTRA96_IO/SYS_MGMT/system_management_wiz_0/S_AXI_LITE | S_AXI_LITE | Reg           | 0x00 A002 6000         | 8K    | 0x00 A002 7FF          |
| /ULTRA96_IO/SYS_MGMT/system_management_wiz_0/S_AXI_LITE | S_AXI_LITE | Reg           | 0x00_A002_6000         | 8K    | 0x00_A002_7FF          |
| /axi_bram_ctrl_PS/S_AXI                                 | S_AXI      | Mem0          | 0x00_A002_8000         | 8K    | 0x00_A002_9FF          |
| /axi_bram_ctrl_PS/S_AXI                                 | S_AXI      | Mem0          | 0x00_A002_8000         | 8K    | 0x00_A002_9FF          |
| /ULTRA96_IO/Low_Speed_MEZZ/PWM_w_Int_0/S00_AXI          | S00_AXI    | S00_AXI_reg   | 0x00_A003_0000         | 64K   | 0x00_A003_FFF          |
| /ULTRA96_IO/Low_Speed_MEZZ/PWM_w_Int_0/S00_AXI          | S00_AXI    | S00_AXI_reg   | 0x00_A003_0000         | 64K   | 0x00_A003_FFF          |
| /ULTRA96_IO/Low_Speed_MEZZ/PWM_w_Int_1/S00_AXI          | S00_AXI    | S00_AXI_reg   | 0x00_A004_0000         | 64K   | 0x00_A004_FFF          |
| /ULTRA96_IO/Low_Speed_MEZZ/PWM_w_Int_1/S00_AXI          | S00_AXI    | S00_AXI_reg   | 0x00_A004_0000         | 64K   | 0x00_A004_FFF          |
| /Capture_Timer_0/S00_AXI                                | S00_AXI    | S00_AXI_reg   | 0x00_A005_0000         | 4K    | 0x00_A005_0FF          |
| /Capture_Timer_0/S00_AXI                                | S00_AXI    | S00_AXI_reg   | 0x00_A005_0000         | 4K    | 0x00_A005_0FF          |
| /axi_cdma_0/S_AXI_LITE                                  | S_AXI_LITE | Reg           | 0x00_B000_0000         | 4K    | 0x00_B000_0FF          |
| /axi cdma 0/S AXI LITE                                  | S_AXI_LITE | Reg           | 0x00 B000 0000         | 4K    | 0x00 B000 0FF          |

Note that the Capture Timer can only be accessed by the PS. Make sure to exclude it from the CDMA unit.

The system.dtb for this lab is located here:

http://projects.ece.utexas.edu/courses/spring\_23/ee382n4-17685/arch/labs/SP23\_LAB\_2/

#### **CDMA Data Transfer Measurement Procedure:**

This program will use the code that was developed in LAB\_1 as a baseline. The code needs to be converted from polling to interrupt driven.

The **CAPTURE\_TIMER** must be configured in capture mode. The timer is enabled when the DMA transfer starts. The timer is halted when the **cdma\_introut** signal from the **axi\_cdma** module is asserted. The value in the **cap\_timer\_out[31:0]** register contains the number of **axi\_aclk** clock cycles that DMA transfer took. The **cdma\_introut** signal is connected to the GIC interrupt pin on the PS. The interrupt handler will acknowledge the interrupt and set the **det\_int** flag for the test program.

The test program will report the following information:

```
Test status --- # loops and # words transferred
Minimum Latency
Maximum Latency
Average Latency
Standard Deviation
```

Here is an example output of the CDMA transfer latency showing debug information

```
*******************
Memory test passed --- 500 loops and 1000 words
Minimum DMA Latency: 2689
Maximum DMA Latency: 9067
Average DMA Latency: 3166.000000
Standard Deviation: 372.000000
Number of samples: 500
Number of interrupts detected = 500
Interrupt #53: 9400000 GICv2 125 Edge cdma-controller
Memory test passed --- 500 loops and 1000 words
Minimum DMA Latency: 2653
Maximum DMA Latency: 6765
Average DMA Latency: 3224.000000
Standard Deviation: 433.000000
Number of samples: 500
Number of interrupts detected = 500
Interrupt 53: 9400500 GICv2 125 Edge cdma-controller
```

These values are for the kernel when it is not busy doing other tasks. You will need to open additional terminal windows and start other tasks to see what the impact is on the "Maximum Latency". An interesting task is to do a continuous recursive directory listing of the flash drive to stress the OS and the AMBA bus:

```
while (cd /) do ls -algR; done
```

The maximum latency and standard deviation will both increase:

\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

```
Memory test passed --- 500 loops and 1000 words
Minimum DMA Latency: 2663
Maximum DMA Latency: 170834
Average DMA Latency: 4496.000000
Standard Deviation: 8762.000000
Number of samples: 500
Number of interrupts detected = 500
53: 9424619 GICv2 125 Edge cdma-controller
Memory test passed --- 500 loops and 1000 words
Minimum DMA Latency: 2578
Maximum DMA Latency: 1312215
Average DMA Latency: 7919.000000
Standard Deviation: 61545.000000
Number of samples: 500
Number of interrupts detected = 500
53: 9424119 GICv2 125 Edge cdma-controller
```

Notice that the maximum latency has increased 194 times.

#### **Interrupt Latency Measurement Procedure.**

You can add this measurement to CDMA code or generate a new routine to measure the interrupt latency. The steps to follow are:

- 1) Assert the "interrupt\_out" pin (slv\_reg1[0]) on the Capture Timer module. This pin is connected to the GIC interrupt input on the PS.
- 2) At the same time assert the slv\_reg1[1] pin. This will start the timer counter.
- 3) Wait for interrupt to be detected and handled in the kernel. Read the timer counter value.
- 4) Negate the interrupt and disable the timer counter.
- 5) The value in the timer is read by the application program and the following data needs to be displayed:

```
Minimum Latency
Maximum Latency
Average Latency
Standard Deviation
Number of Samples
Number of Interrupts registered in the Kernel using /proc/interrupts
```

6) Here is an example output showing some debug information:

48: 433717

\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

Minimum Latency: 17
Maximum Latency: 1071
Average Latency: 27.470000
Standard Deviation: 11.150000
Number of samples: 10000

48: 443717

\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

Minimum Latency: 20 Maximum Latency: 335

Average Latency: 27.440000 Standard Deviation: 4.960000 Number of samples: 10000

48: 453717

\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

7) The number of samples per test needs to be programmable. The maximum number of samples is approximately 10,000. Here is what you should see if you ran around 30 million interrupts:



## Lab procedure:

In this Lab you will reuse the code from Lab #1. The CDMA code needs to be converted from polling to interrupt driven. You will leave the frequency dithering code in place. Use the 9 combinations shown in the table to the left:

| PS (CPU)<br>Clock Freq | PL (FPGA)<br>Clock Freq |  |
|------------------------|-------------------------|--|
| 1499 MHz               | 300 MHz                 |  |
| 999 MHz                | 187.5 MHz               |  |
| 416.6 MHz              | 100 MHz                 |  |

The capture timer block will be used to measure DMA transfer time and interrupt latency time using a counter clocked by the PL clock. Theoretically the number of clock cycle should be close to the same independent of the PL frequency. Minor differences may be caused by the cross-clock domain logic in the AMBA bus.

You will need to write two kernel modules for this lab. The first kernel module will handle the interrupts for the CDMA unit and second will handle the interrupts for the Capture-Timer unit.

TEST #1: Refer to CDMA Transfer Measurement Procedure outlined above for this test. The memory will be tested using the following sequence:

- 1. Load a memory page (i.e., 4K bytes) in the OCM with a 2048 random 32-bit data values using the Linux **srand(time(0))** and **rand()** routines. Use 0xFFFC\_0000 as the starting address of this page.
- 2. Transfer the random data in the OCM (0xFFFC\_0000) to the BRAM (0xB002\_8000) using the CDMA unit.
- 3. Measure the number of cycles it took to do CDMA transfer using the Capture-Timer unit as described above.
- 4. Transfer the random data in the BRAM (0xB002\_8000) back to the OCM at a different address (0xFFFC\_2000) using the CDMA unit.
- 5. Measure the number of cycles it took to do CDMA transfer using the Capture-Timer unit as described above.
- Once the steps above are complete, the OCM data at address 0xFFFC\_0000 is compared
  to the OCM data at address 0xFFFC\_2000. This confirms that DMA traffic to/from the
  OCM & BRAM works.
- 7. Change the PS and PL clock frequencies and repeat until all 9 combinations have been tested.

TEST #2: Refer to the Interrupt Latency Measurement Procedure outlined above to measure interrupt latency in the kernel.

- 1. Measure the interrupt latency for a statically large number of interrupts using all 9 frequency combinations.
- 2. Generate a plot similar to what is shown above.

Do NOT use the Vivado SDK development tools. They are for bare-metal implementations. We will not be doing any bare-metal implementations in this class.

#### **General Tutorials:**

Setting up Baseline Ultra96 Xilinx Environment BIT File Generation Flow

#### **Ultra-96 Documentation:**

Setting up Ultra-96 Board
Getting Started
Ultra-96 HW User Guide
Ultra-96 Base TRD
Ultra-96 Building the Base TRD
Ultra-96 Schematic
Ultra96 Assembly Drawings

# Xilinx Zynq UltraScale+ Tutorials and Documentation

ZYNQ UltraScale+ Register Map

ZYNQ UltraScale+ MPSoC Base Targeted Reference Design

Zynq UltraScale+ All Programmable SoC Technical Reference Manual
Repository of useful Vivado, Zyng & Petalinux Documentation

## **DMA Documentation & Tutorials**

Xilinx AXI CDMA Manual
Xilinx AXI Timer Manual
Xilinx Wiki page on DMA
Using the AXI DMA in Vivado