# ActiveCore

Laboratory work manual

# Using Sigma MCU in FPGA designs

Author:

Alexander Antonov

antonov.alex.alex@gmail.com

# Contents

- 1. Target skills 3
- 2. Overview
- 3. Prerequisites 3
- 4. Task 3
- 5. Guidance 3

### 1. TARGET SKILLS

- Implementation of Sigma MCU in hardware projects
- Building and implementation of embedded software for Sigma MCU
- Choosing optimal CPU configuration of Sigma MCU
- Integration of custom logic with Sigma MCU using its expansion interface
- Using Xilinx FPGA and Vivado Design Suite for implementation of Sigma MCU

#### 2. OVERVIEW

This laboratory work covers software (firmware) based implementation of functionality using embedded programmable processor core. Using programable processors, through having lower efficiency compared to direct hardware implementation, offers multiple virtues: simplification of programming, faster compilation, software update capability, better availability of engineers, etc. In this Lab, basic open-source MCU with RISC-V central processor unit (CPU) core will be used. RISC-V is an open instruction set architecture being widely used both in academia and industry in recent years.

## 3. PREREQUISITES

- 1. Xilinx Vivado 2019.2 HLx Edition (free for target board, available at <a href="https://www.xilinx.com/support/download.html">https://www.xilinx.com/support/download.html</a>).
- 2. ActiveCore baseline distribution (available at <a href="https://github.com/AntonovAlexander/activecore">https://github.com/AntonovAlexander/activecore</a>)
- 3. Generated RISC-V CPU HDL sources
- 4. Working RISC-V GNU toolchain (available at https://github.com/riscv/riscv-gnu-toolchain)
  - **NOTE:** pre-built binaries for various hosts can be downloaded from <a href="https://www.sifive.com/software">https://www.sifive.com/software</a>. Do not forget to update PATH variable after downloading. Consider using Cygwin in Windows hosts.
- 5. (for FPGA prototyping) Digilent Nexys 4 DDR FPGA board (<a href="https://store.digilentinc.com/nexys-4-ddr-artix-7-fpga-trainer-board-recommended-for-ece-curriculum/">https://store.digilentinc.com/nexys-4-ddr-artix-7-fpga-trainer-board-recommended-for-ece-curriculum/</a>)
- 6. (for FPGA prototyping) working Python 3 installation with pyserial package

### 4. TASK

- 1. Examine Sigma MCU baseline project
- 2. (if FPGA board available) Implement Sigma MCU in FPGA device and verify correctness of the baseline
- 3. Write software implementation of functionality for eCPU according to your variant
- 4. Verify functional correctness in simulation
- 5. Implement the design and collect metrics of the implementation
- 6. (if FPGA board available) Upload your program to Sigma MCU and make sure it works correctly
- 7. Analyze performance of implementations
- 8. (optional) Integrate any UDM-compatible module in Sigma MCU

### 5. GUIDANCE

Detailed guidance will be provided using the example of a program that searches for the maximum value in 16-element array and returns this value and its index in the array.

### 1. Examine Sigma MCU baseline project

Sigma MCU is a basic microcontroller unit soft core consisting of sigma\_tile processing module, UDM and general-purpose input/output (GPIO) controller. GPIO controller is mapped on LEDs and switches on FPGA board.

Block diagram of Sigma MCU is located at:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/sigma/doc/sigma\_struct.png

Sigma\_tile module contains embedded CPU (eCPU) core with RISC-V ISA, tightly coupled on-chip RAM with single-cycle delay, interrupt controller, timer, Host InterFace (HIF), and eXpansion InterFace (XIF). Multiple sigma\_tile modules can fit in a single FPGA device. HIF and XIF have the same bus protocol as UDM block. Address maps are identical for UDM and eCPU. Working with UDM can be learned from the corresponding lab work:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/udm/doc/udm\_lab\_manual.pdf

Block diagram of sigma tile module is located at:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/sigma\_tile/doc/sigma\_tile\_struct.png

Address map of Sigma MCU is located at:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/sigma/doc/sigma\_addr\_map.md

Address map of sigma tile module is located at:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/sigma\_tile/doc/sigma\_tile\_addr\_map.md

**NOTE**: Only 4-byte aligned accesses are supported.

RISC-V eCPU supports basic bare metal programming (base RV32I ISA, without FPU, MMU, etc). ActiveCore distribution provides six Sigma MCU projects with different eCPU configurations (1-6 pipeline stages). Longer pipeline can operate on higher frequencies and have better performance, however, consuming more hardware resources and power.

The projects are located at: activecore/designs/rtl/sigma/syn/syn \*\*xstage/NEXYS4-DDR

Generate RISC-V eCPU HDL sources or unpack the provided coregen archive in the following directory:

```
activecore/designs/rtl/sigma tile/hw/riscv
```

E.g. riscv 5stage.sv file should be located at:

activecore/designs/rtl/sigma\_tile/hw/riscv/coregen/riscv\_5stage/sverilog

Open NEXYS4 DDR.xpr file using Xilinx Vivado.

# 2. (if FPGA board available) Implement Sigma MCU in FPGA device and verify correctness of the baseline

Go to activecore/designs/rtl/sigma/sw/benchmarks directory and build eCPU software using make command.

Implement the design, generate the bitstream and upload it to FPGA. LEDs should start blinking with variable speed, depending on value on switches.

Find out the name of COM port associated with the board (COM<number> on Windows hosts or tty<number> on Linux hosts). Open hw test.py test Python script and fill the correct COM port name in line 14:

```
udm = udm("<correct COM port name>", 921600)
```

Run eCPU tests using  $hw_test.py$  Python script. The script will upload five test programs for eCPU and verify correctness of their operation. The last line of console output should be:

```
Total tests PASSED: 5 , FAILED: 0
```

Type help(sigma) and help(sigma\_tile) in Python console for full API reference of Sigma MCU and sigma\_tile module respectively.

### 3. Write software implementation of functionality for eCPU according to your variant

Sigma MCU distribution provides several demo applications that can be used as reference (see Table 1).

| Demo application   | Description                                                                                     |
|--------------------|-------------------------------------------------------------------------------------------------|
| heartbeat_variable | A counter that is output to LED register. The period is continuously read from Switches         |
|                    | register. Period is implemented as CPU busy waiting.                                            |
| irq_counter        | A counter that is output to LED register. Increment is triggered by interrupt 3 that is         |
|                    | mapped on button on FPGA board.                                                                 |
| median             | Three-element median filter operating on 400-element array of integers.                         |
| mul_sw             | Software multiplication of two integers producing an integer.                                   |
| qsort              | Quick sort operating on 1024-element array of integers.                                         |
| rsort              | Bucket sort operating on 1024-element array of integers.                                        |
| timer_test         | A counter that is output to LED register. Utilizes the timer to count the period. The period is |
|                    | read from Switches register on reset.                                                           |

Table 1 Demo applications provided in Sigma MCU distribution

Write software implementation of your functionality and check its correctness. You can use your standard local gcc installation or an online tool (e.g. cplayground.com) for this task. Test result for our example is shown in Listing 1.

```
### stinctude -stoid.h>
### stinctude -stoid.hos -stoid.hos
```

Listing 1 Testing software implementation using cplayground.com

Go to activecore/designs/rtl/sigma/sw/benchmarks directory and add new directory for your software. In our example, the new directory is called findmaxval.

Create new C source file in the new directory. In our example, the file is called findmaxval.c. Write your program in this file. Source code for the example program in shown in Listing 2:

```
typedef struct
  unsigned int max elem;
  unsigned int max index;
} maxval data t;
maxval data t FindMaxVal(unsigned int x[ARR SIZE])
  maxval data t ret data;
  ret data.max elem = 0;
  ret data.max index = 0;
  for (int i=0; i<ARR SIZE; i++) {
    if (x[i] > ret data.max elem) {
      ret data.max elem = x[i];
      ret data.max index = i;
  }
  return ret data;
}
// Main
int main( int argc, char* argv[] )
  maxval data t maxval data;
  unsigned int
                   datain[16]
                                 = { 0x112233cc,
                                                       0x55aa55aa,
                                                                      0x01010202,
                                                                                     0x44556677,
0x00000003, 0x00000004, 0x00000005, 0x00000006, 0x00000007
0x23344556, 0x05050505, 0x07070707, 0x99999999, 0xbadcOffe };
                                                        0x00000007,
                                                                      0xdeadbeef,
                                                                                     0xfefe8800,
  IO LED = 0x55aa55aa;
  maxval_data = FindMaxVal(datain);
  IO_LED = maxval_data.max_index;
  IO LED = maxval data.max elem;
  while (1) {}
```

#### Listing 2 C source code in findmaxval.c

**NOTE:** we have output 0x55aa55aa value to LEDs to mark the end of startup sequence and start of the target function FindMaxVal. In the end of the program, we output max index and max val values and send eCPU to infinite loop.

**NOTE:** since Sigma MCU does not have standard output, we use LEDs to output resulting values.

Prepare executable image for eCPU. Open Makefile in activecore/designs/rtl/sigma/sw/benchmarks directory and add the reference to the new directory in bmarks variable (added line is highlighted in cyan). Source code for the updated bmarks assignment is shown in Listing 3:

```
bmarks = \
    <available applications>
    rsort \
    findmaxval \
    <commented lines>
```

### Listing 3 Source code of the updated bmarks assignment in Makefile

 $Call\ \texttt{make}\ command\ from\ \texttt{activecore/designs/rtl/sigma/sw/benchmarks}\ directory\ to\ build\ the\ program\ image.$ 

**NOTE:** since Sigma MCU does not support hardware multiplication, consider using software one if needed. The example program mul\_sw is included in ActiveCore distribution.

### 4. Verify functional correctness in simulation

Open the testbench file activecore/designs/rtl/sigma/tb/riscv\_tb.sv, set up the CPU configuration, and make mem data parameter of sigma instance reference to your program image. For our example, code updates are shown in Listing 4.

```
sigma
# (
   //.CPU("riscv 1stage"),
   //.CPU("riscv_2stage"),
   //.CPU("riscv_3stage"),
   //.CPU("riscv_4stage"),
   .CPU("riscv_5stage"),
   //.CPU("riscv 6stage"),
   .delay test flag(0),
   //.mem data("../../sw/benchmarks/heartbeat_variable.riscv.hex"),
   //.mem data("../../sw/benchmarks/median.riscv.hex"),
   //.mem_data("../../sw/benchmarks/qsort.riscv.hex"),
//.mem_data("../../sw/benchmarks/rsort.riscv.hex"),
   .mem data("<PATH TO ACTIVECORE>/activecore/designs/rtl/sigma/sw/benchmarks/findmaxval.
riscv.hex"),
   .mem size (8192)
  sigma
)
   .clk i(CLK 100MHZ)
   , .arst i(RST)
    .irq_btn_i(irq btn)
     .rx i(rx)
   //, .tx o()
   , .gpio_bi(SW)
    .gpio bo(LED)
```

Listing 4 Updated module instantiation in riscv tb.sv testbench

Simulation waveform for 5-stage eCPU configuration is shown in Figure 1.



Figure 1 Simulation waveform

The values on LEDs are correct, the program works as intended.

**NOTE:** if resulting values do not appear in simulation, check the program is placed in RAM. Compare first several values of /riscv\_tb/sigma/sigma\_tile/ram/ram\_dual/ram array to the program binary.

Measure the number of clock cycles needed to execute the program by various eCPU configurations. To switch eCPU configurations for simulation, uncomment corresponding CPU parameter of sigma instance in riscv\_tb.sv testbench from corresponding Vivado project. In the testbench, 100 MHz clock is generated, so 2440 ns equals 244 clock cycles. For our example, results are summarized in Table 2.

| eCPU configuration | Latency, clock cycles |
|--------------------|-----------------------|
| riscv_1stage       | 206                   |
| riscv_2stage       | 190                   |
| riscv_3stage       | 217                   |
| riscv_4stage       | 244                   |
| riscv_5stage       | 244                   |
| riscv_6stage       | 271                   |

Table 2 Performance (in clock cycles) of software implementations based on various eCPU configurations

## 5. Implement the design and collect metrics of the implementation

Characteristics of provided sigma tile configurations are shown in Table 3:

| eCPU configuration | Frequency, MHz | LUTs | FFs  |
|--------------------|----------------|------|------|
| riscv_1stage       | 75             | 2504 | 1706 |
| riscv_2stage       | 70             | 1966 | 1322 |
| riscv_3stage       | 100            | 1929 | 1474 |
| riscv_4stage       | 140            | 2330 | 1741 |
| riscv_5stage       | 160            | 2195 | 1782 |
| riscv_6stage       | 180            | 2253 | 1884 |

Table 3 Characteristics of provided sigma tile implementations

#### 6. (if FPGA board available) Upload your program to Sigma MCU and make sure it works correctly

To upload your program, add loadelf command to the end of hw\_test.py script. For our example, the line is the following: sigma.tile.loadelf('<PATH\_TO\_ACTIVECORE>/activecore/designs/rtl/sigma/sw/benchmarks/fin dmaxval.riscv')

In our example, the LEDs show 0x8800 (16 least significant bits of 0xfefe8800 value). The program works as intended.

### 7. Analyze performance of implementations

Now we can analyze the absolute performance values of target functionality implementations based on various eCPU configurations. To get these values for each implementation in ns, multiply latency in clock cycles by 10 (10 ns clock period in simulation) and divide by simulation/actual frequency ratio (i.e. multiply latency in clock cycles by 1000 and divide by actual frequency in MHz). For our example, these values are shown in Table 4.

| eCPU configuration | Latency, ns |  |
|--------------------|-------------|--|
| riscv_1stage       | 2747        |  |
| riscv_2stage       | 2714        |  |
| riscv_3stage       | 2170        |  |
| riscv_4stage       | 1743        |  |
| riscv_5stage       | 1525        |  |
| riscv_6stage       | 1506        |  |

Table 4 Absolute performance of target functionality implementations based on various eCPU configurations

### 8. (optional) Integrate any UDM-compatible module in Sigma MCU

Since  $sigma\_tile\ XIF\ protocol\ is\ identical\ to\ UDM\ system\ bus\ protocol,\ previously\ designed\ modules\ can\ be\ seamlessly\ integrated\ in\ Sigma\ MCU.$ 

Integrate one of the previously designed UDM-compatible modules in Sigma MCU (modify sigma.sv module) and feed this module with data from eCPU.