## ActiveCore

Laboratory work manual

# Using UDM bus transactor in FPGA designs

Author:

Alexander Antonov

antonov.alex.alex@gmail.com

### Contents

| 1. T | Carget skills 3                                                                                       |    |
|------|-------------------------------------------------------------------------------------------------------|----|
| 2. C | Overview 3                                                                                            |    |
| 3. P | Prerequisites 3                                                                                       |    |
| 4. T | Cask 3                                                                                                |    |
| 5. G | Guidance 3                                                                                            |    |
| 1.   | Examine UDM baseline project                                                                          | 3  |
| 2.   | (if FPGA board available) Implement UDM project in FPGA device and verify correctness of the baseline | 4  |
| 3.   | Design RTL module in synthesizable SystemVerilog HDL                                                  | 4  |
| 4.   | Integrate your design with UDM bus master module                                                      |    |
| 5.   | Write the testbench and simulate to verify correctness of your design                                 |    |
| 6.   | Implement your design, collect, and analyze metrics of the implementation                             |    |
| 7.   | (if FPGA board available) Write HW validation test matching the testbench                             |    |
| Q    | (if EPGA board available) Validate the design in EPGA                                                 | 16 |

#### 1. TARGET SKILLS

- Developing observable and controllable hardware designs using UDM bus transactor
- Verifying UDM-managed designs in simulation environment
- Using Xilinx FPGA and Vivado Design Suite for implementation of UDM-managed designs
- Validating UDM-managed designs in FPGA from PC programming environment

#### 2. OVERVIEW

This laboratory work is aimed at understanding the design flow of FPGA project managed by UDM bus transactor, its structure and role of its components. It is explored how to add custom RTL using SystemVerilog Hardware Description Language, write testbenches, verify correctness of design in simulation environment, validate the design in FPGA device, and collect metrics of the obtained implementation.

#### 3. PREREQUISITES

- 1. Xilinx Vivado 2019.1 HLx Edition (free for target board, available at https://www.xilinx.com/support/download.html).
- 2. ActiveCore baseline distribution (available at <a href="https://github.com/AntonovAlexander/activecore">https://github.com/AntonovAlexander/activecore</a>)
- 3. (for FPGA validation) Digilent Nexys A7 (Nexys 4 DDR) FPGA board (<a href="https://digilent.com/shop/nexys-a7-fpga-trainer-board-recommended-for-ece-curriculum/">https://digilent.com/shop/nexys-a7-fpga-trainer-board-recommended-for-ece-curriculum/</a>)
- 4. (for FPGA validation) working Python 3 installation with pyserial package

#### 4. TASK

- 1. Examine UDM baseline project
- 2. (if FPGA board available) Implement UDM project in FPGA device and validate correctness of the baseline
- 3. Design RTL module in synthesizable SystemVerilog HDL according to your variant
- 4. Integrate your design with UDM bus master module
- 5. Write the testbench and simulate to verify correctness of your design
- 6. Implement your design, collect, and analyze metrics of the implementation
- 7. (if FPGA board available) Write HW validation test matching the testbench
- 8. (if FPGA board available) Program the design in FPGA board and make sure the design operates correctly

#### 5. GUIDANCE

Detailed guidance will be provided using the example of a custom pipelined module that searches for the maximum value in 16-element array and returns this value and its index in the array.

#### 1. Examine UDM baseline project

UDM (UART-based **D**ebug **M**odule) is a bus master module that executes bus transactions controlled via serial port interface. This provides basic initialization, communication and debug capabilities for custom cores in FPGA fabric, allowing PC to "emulate" CPU host in System-on-Chip design. UDM block requires minimum setup, can be implemented in minutes, consumes minimum resources (< 1% of LUTs and FFs on target board) and requires no additional HW except for default serial port connectivity.

UDM block diagram is located at:

 $\underline{https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/udm/doc/udm\_baseline\_struct.png}$ 

**NOTE**: only 4-byte aligned accesses are allowed.

The project is located at: activecore/designs/rtl/udm/syn/NEXYS4\_DDR. Open NEXYS4\_DDR.xpr file using Xilinx Vivado.

NOTE: avoid spaces and non-English characters in project location path. Also, avoid very long project location path.

# 2. (if FPGA board available) Implement UDM project in FPGA device and validate correctness of the baseline

Press "Generate Bitstream" button in Vivado to generate bitstream. Upload the bitstream to FPGA device.

Find out the name of COM port associated with the board (COM<number> on Windows hosts or tty<number> on Linux hosts).

Open test Python script (located at activecore/designs/rtl/udm/sw/udm\_test.py) and fill the correct COM port name in line 7:

```
udm = udm("<correct COM port name>", 921600)
```

Run UDM test using udm\_test.py Python script. The script will connect to the board and check response. The console output should be:

```
Connecting COM port...

COM port connected

Connection established, response: 0x55

SW read: <value on switches>
---- memtest32 started, word size: 1024 ----
---- memtest32 PASSED ----
```

The script does the following:

- 1) Writing Oxaa55 value to CSR mapped on LEDs using udm.wr32 (addr, wdata) function
- 2) Reading CSR mapped on switches and printing this value
- 3) Testing testmem memory block using udm.memtest32 (addr, wsize) function

Type help (udm) in Python console for full API reference.

#### 3. Design RTL module in synthesizable SystemVerilog HDL

Example implementation is a 4-stage pipeline. The pipeline schedule is shown in Table 1:

| C-step number | Operation                                                               |
|---------------|-------------------------------------------------------------------------|
| 0             | compare elements in pairs: 0-1; 2-3; 4-5; 6-7; 8-9; 10-11; 12-13; 14-15 |
| 1             | compare pairing results from stage 0 in pairs: 0-1; 2-3; 4-5; 6-7       |
| 2             | compare pairing results from stage 1 in pairs: 0-1; 2-3                 |
| 3             | compare pairing results from stage 2                                    |

Table 1 Schedule for pipelined implementation

Microarchitectural diagram (in terms of combinational clouds, registers, and memories) is shown in Figure 1.



Figure 1 Microarchitectural diagram for fully pipelined implementation

Source code for the example module in shown in Listing 1:

```
module FindMaxVal pipelined (
    input clk i
    , input rst i
    , input [31:0] elem bi [15:0]
    , output logic [31:0] max elem bo
    , output logic [3:0] max index bo
);
//// stage 0 ////
// intermediate signals declaration
logic [31:0] max elem stage0 [7:0];
logic [31:0] max_index_stage0 [7:0];
logic [31:0] max_elem_stage0_next [7:0];
logic [31:0] max_index_stage0 next [7:0];
// combinational logic
always @*
    begin
    for(integer i=0; i<8; i++)</pre>
        begin
        max elem stage0 next[i] = 0;
        \max index stage0 next[i] = 0;
         if (\text{elem bi}[(i << \overline{1})] > \text{elem bi}[(i << 1)+1])
             begin
             max elem stage0 next[i] = elem bi[(i<<1)];</pre>
             max_index_stage0_next[i] = i<<1;</pre>
             end
        else
             begin
             max elem stage0 next[i] = elem bi[(i<<1)+1];</pre>
             \max index stage0 next[i] = (i << 1) +1;
```

```
end
       end
    end
// writing to registers
always @(posedge clk i)
    begin
    if (rst i)
        begin
         for (integer i=0; i<8; i++) max elem stage0[i] <= 0;</pre>
         for (integer i=0; i<8; i++) max index stage0[i] <= 0;
        end
    else
        begin
         for (integer i=0; i<8; i++) max elem stage0[i] <= max elem stage0 next[i];</pre>
         for (integer i=0; i<8; i++) max index stage0[i] <= max index stage0 next[i];</pre>
        end
    end
//// stage 1 ////
// intermediate signals declaration
logic [31:0] max elem stage1 [3:0];
logic [31:0] max index stage1 [3:0];
logic [31:0] max elem stage1 next [3:0];
logic [31:0] max index stage1 next [3:0];
// combinational logic
always @*
    begin
    for(integer i=0; i<4; i++)</pre>
        begin
        \max \text{ elem stage1 next[i]} = 0;
        max index stage1 next[i] = 0;
         if (\max \text{ elem stage0}[(i << 1)] > \max \text{ elem stage0}[(i << 1)+1])
             begin
             max elem stage1 next[i] = max elem stage0[(i<<1)];</pre>
             max_index_stage1_next[i] = max_index stage0[(i<<1)];</pre>
             end
         else
             max elem stage1 next[i] = max elem stage0[(i<<1)+1];</pre>
             max index stage1 next[i] = max index stage0[(i<<1)+1];</pre>
             end
        end
    end
// writing to registers
always @(posedge clk i)
    begin
    if (rst i)
         for (integer i=0; i<4; i++) max elem stage1[i] <= 0;</pre>
         for (integer i=0; i<4; i++) max_index_stage1[i] <= 0;</pre>
        end
    else
        begin
         for (integer i=0; i<4; i++) max elem stage1[i] <= max elem stage1 next[i];</pre>
         for (integer i=0; i<4; i++) max index stagel[i] <= max index stagel next[i];
        end
    end
```

```
//// stage 2 ////
// intermediate signals declaration
logic [31:0] max elem stage2 [1:0];
logic [31:0] max_index stage2 [1:0];
logic [31:0] max_elem_stage2_next [1:0];
logic [31:0] max index stage2 next [1:0];
// combinational logic
always @*
    begin
    for(integer i=0; i<2; i++)</pre>
        begin
        max elem stage2 next[i] = 0;
        max index stage2 next[i] = 0;
        if (\max elem stage1[(i << 1)] > \max elem stage1[(i << 1)+1])
            max elem stage2 next[i] = max elem stage1[(i<<1));</pre>
            max index stage2 next[i] = max index stage1[(i<<1)];</pre>
            end
        else
            begin
            max elem stage2 next[i] = max elem stage1[(i<<1)+1];</pre>
            max index stage2 next[i] = max index stage1[(i<<1)+1];</pre>
        end
    end
// writing to registers
always @(posedge clk i)
    begin
    if (rst i)
        begin
        for (integer i=0; i<2; i++) max elem stage2[i] <= 0;</pre>
        for (integer i=0; i<2; i++) max index stage2[i] <= 0;
        end
    else
        for (integer i=0; i<2; i++) max elem stage2[i] <= max elem stage2 next[i];</pre>
        for (integer i=0; i<2; i++) max index stage2[i] <= max index stage2 next[i];</pre>
        end
    end
//// stage 3 ////
// intermediate signals declaration
logic [31:0] max elem next;
logic [3:0] max_index next;
// combinational logic
always @*
    begin
    \max elem next = 0;
    max index next = 0;
    if (max elem stage2[0] > max elem stage2[1])
        begin
        max elem next = max elem stage2[0];
        max index next = max index stage2[0];
        end
    else
```

```
begin
        max elem next = max elem stage2[1];
        max index next = max index stage2[1];
    end
// writing to registers
always @(posedge clk i)
    begin
    if (rst i)
        begin
        max elem bo <= 0;</pre>
        max_index bo <= 0;
        end
    else
        max elem bo <= max elem next;
        max index bo <= max index next;</pre>
        end
    end
endmodule
```

Listing 1 Source code of the FindMaxVal pipelined module in SystemVerilog HDL

4. Integrate your design with UDM bus master module

Add your created design file to the project using Vivado GUI.

UDM exposes a system bus into FPGA fabric for custom logic integration and testing. UDM bus has a simplistic, RAM-like protocol, supports pipelined transactions and can easily be converted to various standard protocols (AMBA AHB, Avalon, Wishbone, etc.).

UDM write transaction waveform is located at:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/udm/doc/udm bus wr waveform.svg

UDM read transaction waveform is located at:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/udm/doc/udm bus rd waveform.svg

UDM has several predefined addresses where LED and switches control and status registers (CSRs) are mapped, as well as test memory. Address map of UDM baseline is located at:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/udm/doc/udm baseline addr map.md

Now we add custom CSRs to manage operation of the designed logic in top wrapper module. We need 16 CSRs for input data and 2 CSRs for output data. We should map these CSRs on free addresses, not overlapping with other CSRs and memories.

Here we map input CSRs on the following addresses:

Input data CSRs:

• csr elem in: 0x10000000-0x1000003C (16x elements with 4-byte stride)

Output data CSRs:

```
• csr_max_elem_out: 0x20000000
```

•  $csr_max_index_out: 0x20000004$ 

Instantiate the CSRs and the designed module in top wrapper module (NEXYS4\_DDR.sv) and connect it to custom CSRs. Resulting code is shown in Listing 2 (modified parts are highlighted in cyan).

```
module NEXYS4_DDR
#( parameter SIM = "NO" )
```

```
input CLK100MHZ
    , input
             CPU RESETN
             [15:0] SW
    , input
    , output logic [15:0] LED
              UART_TXD_IN
    , input
    , output UART RXD OUT
);
localparam UDM BUS TIMEOUT = (SIM == "YES") ? 100 : (1024*1024*100);
localparam UDM_RTX_EXTERNAL_OVERRIDE = (SIM == "YES") ? "YES" : "NO";
logic clk gen;
logic pll locked;
sys_clk sys_clk
    .clk_in1(CLK100MHZ)
    , .reset(!CPU RESETN)
    , .clk_out1(clk gen)
    , .locked(pll locked)
);
logic arst;
assign arst = !(CPU RESETN & pll locked);
logic srst;
reset cntrl reset cntrl
  .clk_i(clk_gen),
  .arst_i(arst),
  .srst_o(srst)
logic udm_reset;
MemSplit32 udm bus();
udm
# (
    .BUS TIMEOUT (UDM BUS TIMEOUT)
    , .RTX EXTERNAL OVERRIDE (UDM RTX EXTERNAL OVERRIDE)
) udm (
  .clk i(clk gen)
  , .rst i(srst)
  , .rx i(UART TXD IN)
  , .tx_o(UART_RXD_OUT)
  , .rst o(udm reset)
  , .bus req o(udm bus.req)
  , .bus_we_o(udm bus.we)
  , .bus_addr_bo(udm_bus.addr)
  , .bus_be_bo(udm_bus.be)
  , .bus_wdata_bo(udm bus.wdata)
  , .bus ack i (udm bus.ack)
  , .bus resp i(udm bus.resp)
    .bus rdata bi (udm bus.rdata)
```

```
);
localparam CSR LED ADDR
                               = 32'h00000000;
localparam CSR SW ADDR
                               = 32'h00000004;
                                = 32'h80000000;
localparam TESTMEM ADDR
localparam TESTMEM WSIZE POW
                                 = 10;
                                 = 2**TESTMEM WSIZE POW;
localparam TESTMEM WSIZE
logic testmem udm enb;
assign testmem udm enb = (!(udm addr < TESTMEM ADDR) && (udm addr < (TESTMEM ADDR +
(TESTMEM WSIZE*4)));
logic testmem udm we;
logic [TESTMEM WSIZE POW-1:0] testmem udm addr;
logic [31:0] testmem udm wdata;
logic [31:0] testmem udm rdata;
logic testmem p1 we;
logic [TESTMEM_WSIZE POW-1:0] testmem p1 addr;
logic [31:0] testmem_p1_wdata;
logic [31:0] testmem p1 rdata;
// testmem's port1 is inactive
assign testmem_p1 we = 1'b0;
assign testmem_p1_addr = 0;
assign testmem p1 wdata = 0;
ram dual #(
    .init_type("none")
    , .init data("nodata.hex")
    , .dat_{width(32)}
    , .adr_width(TESTMEM WSIZE POW)
    , .mem size(TESTMEM WSIZE)
) testmem (
    .clk(clk_gen)
    , .dat0 i(testmem udm wdata)
    , .adr0 i(testmem udm addr)
    , .we0 \overline{i} (testmem udm we)
    , .dat0 o(testmem udm rdata)
    , .dat1_i(testmem_p1_wdata)
    , .adr1_i(testmem_p1_addr)
    , .we1_i(testmem_p1_we)
    , .dat1 o(testmem p1 rdata)
);
assign udm bus.ack = udm bus.req; // bus always ready to accept request
logic csr resp, testmem resp, testmem resp dly;
logic [31:0] csr rdata;
  CSR instantiation
logic [31:0] csr elem in [15:0];
logic [31:0] csr max elem out;
logic [3:0] csr max index out;
// module instantiation
FindMaxVal pipelined FindMaxVal inst (
    .clk i(clk gen)
      .rst i(srst)
```

```
.elem bi(csr elem in)
    , .max_elem_bo(csr_max_elem_out)
    , .max index bo(csr max index out)
// bus request
always @(posedge clk gen)
    begin
    testmem udm we <= 1'b0;
    testmem_udm_addr <= 0;
    testmem udm wdata <= 0;
    csr resp <= 1'b0;
    testmem resp dly <= 1'b0;
    testmem resp <= testmem resp dly;</pre>
    if (srst) LED <= 16'hffff;</pre>
                        // asserting default values to input CSRs on reset
       (srst)
        begir
        for (int i=0; i<16; i++)
            begin
            csr elem in[i] <= 0;</pre>
            end
        end
    if (udm bus.req && udm bus.ack)
        begin
        if (udm bus.we)
                             // writing
            begin
            if (udm_bus.addr == CSR_LED_ADDR) LED <= udm_wdata;</pre>
                  (udm bus.addr[31:28]
                                              4'h1)
                                                      csr elem in[udm bus.addr[5:2]]
udm bus.wdata;
            if
               (testmem udm enb)
                begin
                testmem udm we <= 1'b1;
                testmem udm addr <= udm addr[31:2];
                                                         // 4-byte aligned access only
                testmem udm wdata <= udm wdata;
            end
        else
                         // reading
            begin
            if (udm bus.addr == CSR LED ADDR)
                begin
                csr resp <= 1'b1;
                csr rdata <= LED;
                end
            if (udm bus.addr == CSR SW ADDR)
                begin
                csr resp <= 1'b1;
                csr rdata <= SW;
                end
             if (udm bus.addr == 32'h20000000)
                begin
                csr_resp <= 1'b1;
                    rdata <= csr max elem out;
                (udm bus.addr == 32'h20000004)
```

```
csr resp <= 1'b1;
              csr rdata <= csr max index out;
          if (testmem udm enb)
             begin
              testmem udm we <= 1'b0;
              testmem_udm_wdata <= udm_wdata;</pre>
              testmem resp dly <= 1'b1;
          end
       end
   end
// bus response
always @*
   begin
   udm bus.resp = csr_resp | testmem_resp;
   udm bus.rdata = 0;
   if (csr resp) udm bus.rdata = csr rdata;
   if (testmem resp) udm bus.rdata = testmem udm rdata;
   end
endmodule
```

Listing 2 Source code of the updated NEXYS4\_DDR.sv module

5. Write the testbench and simulate to verify correctness of your design

The basic testbench functionality consists in the following operations:

- write the input data (stimulus) to the target synthesizable module (Design Under Test, DUT);
- start the computation (not needed here);
- read and verify the result.

Go to the testbench file (tb.sv) and find the main test procedure (initial block in the end of the file). Fill the input data with test values and retrieve the result. Resulting initial block for our example is shown in Listing 3 (modified parts are highlighted in cyan).

```
initial
  begin
  logic [31:0] wrdata [];
  integer ARRSIZE=10;

$display ("### SIMULATION STARTED ###");

SW = 8'h30;
  RESET_ALL();
  WAIT(100);

udm.cfg(`DIVIDER_115200, 2'b00);
  udm.check();
  udm.hreset();

// test data initialization
  udm.wr32(32'h10000000, 32'h112233cc);
  udm.wr32(32'h10000004, 32'h55aa55aa);
  udm.wr32(32'h10000004, 32'h55aa55aa);
  udm.wr32(32'h10000008, 32'h01010202);
  udm.wr32(32'h10000000, 32'h44556677);
```

```
udm.wr32(32'h100000
                        32'h00000003)
udm.wr32(32'h10000014,
                        32'h00000004);
udm.wr32(32'h10000018, 32'h00000005);
udm.wr32(32'h1000001C,
                       32'h00000006);
udm.wr32(32'h10000020, 32'h00000007);
udm.wr32(32'h10000024, 32'hdeadbeef);
udm.wr32(32'h10000028,
                       32'hfefe8800);
udm.wr32(32'h1000002C,
                       32'h23344556);
udm.wr32(32'h10000030,
                       32 'h05050505);
udm.wr32(32'h10000034,
                        32 'h07070707);
udm.wr32(32'h10000038, 32'h9999999);
udm.wr32(32'h1000003C, 32'hbadc0ffe);
udm.rd32(32'h20000000);
udm.rd32(32'h20000004);
WAIT (1000);
$display ("### TEST PROCEDURE FINISHED ###");
$stop;
end
```

Listing 3 Test procedure for the designed module

Note that maximum value is 0xfefe8800 at index 10 (0xa).

Now run the simulation. Add all interesting signals (including system bus interface and your module internals) to the waveform. The signals can be added using context menu on signals listed in Vivado GUI (see Figure 2).



Figure 2 Adding signals to waveform

If needed, change waveform style for selected signals (digital/analog) and radix (binary, hexadecimal, decimal, etc), see Figure 3.



Figure 3 Waveform configuration

**NOTE**: To speed up UDM simulation, serial connection is bypassed. Keep in mind that UART is a low-speed interface, and transactions will take more time to complete in hardware than shown in simulation.

Console output for simulation is shown in Listing 4.

```
UDM WR32: addr: 0x10000000, data:
                                    0x112233cc
UDM WR32: addr: 0x10000004, data:
                                    0x55aa55aa
UDM WR32: addr:
                 0x10000008, data:
                                    0x01010202
UDM WR32:
          addr:
                 0x1000000c, data:
                                    0x44556677
UDM WR32: addr: 0x10000010, data:
                                    0 \times 000000003
                 0x10000014, data:
UDM WR32: addr:
                                    0 \times 0 0 0 0 0 0 0 4
                 0x10000018, data:
UDM WR32: addr:
                                    0 \times 000000005
UDM WR32: addr:
                 0x1000001c, data:
                                    0x0000006
UDM WR32: addr: 0x10000020, data:
                                    0 \times 000000007
UDM WR32: addr: 0x10000024, data: 0xdeadbeef
UDM WR32: addr: 0x10000028, data: 0xfefe8800
UDM WR32: addr: 0x1000002c, data: 0x23344556
UDM WR32: addr: 0x10000030, data:
                                    0 \times 05050505
UDM WR32: addr: 0x10000034, data:
                                    0x07070707
UDM WR32: addr: 0x10000038, data:
                                    0x99999999
                 0x1000003c,
UDM WR32:
          addr:
                              data:
                                    0xbadc0ffe
    RD32:
          addr:
                 0x20000000,
                              data:
                                     0xfefe8800
UDM RD32:
          addr:
                 0x20000004,
                              data:
                                    0x0000000a
    TEST PROCEDURE FINISHED
###
                              ###
```

Listing 4 Console output of simulation

Note that max element and its index have been read correctly at addresses 0x20000000 and 0x20000004 respectively (highlighted in cyan). Waveform for the simulation is shown in Figure 4.



Figure 4 Waveform of simulation

The simulation is correct, DUT works as intended.

#### 6. Implement your design, collect, and analyze metrics of the implementation

Press "Generate Bitstream" to run implementation and obtain the image for FPGA device.

Metric values are the following:

• Timing:

```
WNS: 4.883 ns (fine)TNS: 0 ns (fine)
```

- Performance:
  - o Clock frequency: 100 MHz (period: 10 ns)
  - o Initiation Interval: 1 clock cycle; 10 ns
  - o Throughput: 1 op/cycle; 100 Mop/second
  - o Latency: 4 clock cycles; 40 ns
- HW resources (Implementation → Open Implemented Design → Report Utilization):
  - o LUTs: 498
  - o FFs (registers): 506

The timing closure is **successful**.

#### 7. (if FPGA board available) Write HW validation test matching the testbench

Open test Python script (located at activecore/designs/rtl/udm/sw/udm\_test.py) and write the test program matching SystemVerilog testbench. This program is needed for HW testing in FPGA board. Source code for the program is shown in Listing 5.

```
from __future__ import division
import udm
from udm import *

udm = udm('<your COM port name>', 921600)

# test data initialization
udm.wr32(0x10000000, 0x112233cc);
udm.wr32(0x10000004, 0x55aa55aa);
udm.wr32(0x10000008, 0x01010202);
udm.wr32(0x10000000c, 0x44556677);
```

```
udm.wr32(0x10000010, 0x00000003);
udm.wr32(0x10000014, 0x00000004);
udm.wr32(0x10000018, 0x00000005);
udm.wr32(0x1000001C, 0x00000006);
udm.wr32(0x10000020, 0x00000007);
udm.wr32(0x10000024, 0xdeadbeef);
udm.wr32(0x10000028, 0xfefe8800);
udm.wr32(0x1000002C, 0x23344556);
udm.wr32(0x10000030, 0x05050505);
udm.wr32(0x10000034, 0x07070707);
udm.wr32(0x10000038, 0x99999999);
udm.wr32(0x1000003C, 0xbadc0ffe);
# fetching results
                          ", hex(udm.rd32(0x2000000)))
print("csr max elem out:
print("csr max index out: ", hex(udm.rd32(0x20000004)))
```

Listing 5 HW test program in Python

#### 8. (if FPGA board available) Validate the design in FPGA

Program the design in FPGA board and make sure the design operates correctly. Output of Python program for our example is shown in Listing 6.

```
Connecting COM port...

COM port connected

Connection established, response: 0x55

csr_max_elem_out: 0xfefe8800
csr_max_index_out: 0xa
```

Listing 6 Output of HW test program in Python

Ensure that output of the HW validation test program matches simulation results. In our case, HW appears to work as intended.