# ActiveCore

Laboratory work manual

# Using Sigma MCU in FPGA designs

Author:

Alexander Antonov

antonov.alex.alex@gmail.com

### Contents

| 1. | Targ                                               | et skills                                                        | 3                                                                                          |    |  |
|----|----------------------------------------------------|------------------------------------------------------------------|--------------------------------------------------------------------------------------------|----|--|
| 2. | Ove                                                | rview                                                            | 3                                                                                          |    |  |
| 3. | Prer                                               | equisites                                                        | 3                                                                                          |    |  |
| 4. | Task                                               | 3                                                                |                                                                                            |    |  |
| 5. | Guio                                               | lance                                                            | 3                                                                                          |    |  |
| 1  | •                                                  | Examine                                                          | Sigma MCU baseline project                                                                 | 4  |  |
| 2  | 2.                                                 | (if FPGA                                                         | board available) Implement Sigma MCU in FPGA device and verify correctness of the baseline | 4  |  |
| 3  | 3.                                                 | Write so                                                         | ftware application for eCPU                                                                | 5  |  |
| 4  | l.                                                 | Verify fu                                                        | inctional correctness in simulation                                                        | 7  |  |
| 5  | 5.                                                 | Implement the designs and collect metrics of the implementations |                                                                                            |    |  |
| 6  | <b>5</b> .                                         | (if FPGA                                                         | board available) Upload your program to Sigma MCU and make sure it works correctly         | 11 |  |
| 7  | <b>'</b> .                                         | Analyze performance of implementations                           |                                                                                            |    |  |
| 8  | . Integrate any UDM-compatible module in Sigma MCU |                                                                  |                                                                                            |    |  |

### 1. TARGET SKILLS

- Implementation of Sigma MCU in hardware projects
- Building and implementation of embedded software for Sigma MCU
- Choosing optimal CPU configuration of Sigma MCU
- Integration of custom logic with Sigma MCU using its expansion interface
- Using Xilinx FPGA and Vivado Design Suite for implementation of Sigma MCU

### 2. OVERVIEW

This laboratory work covers software (firmware) based implementation of functionality using embedded programmable processor core. Using programable processors, through having lower efficiency compared to direct hardware implementation, offers multiple virtues: simplification of programming, faster compilation, software update capability, better availability of engineers, etc. In this Lab, basic open-source MCU with RISC-V central processor unit (CPU) core will be used. RISC-V is an open instruction set architecture being widely used both in academia and industry in recent years.

### 3. PREREQUISITES

- 1. Xilinx Vivado 2019.1 HLx Edition (free for target board, available at <a href="https://www.xilinx.com/support/download.html">https://www.xilinx.com/support/download.html</a>).
- 2. ActiveCore baseline distribution (available at <a href="https://github.com/AntonovAlexander/activecore">https://github.com/AntonovAlexander/activecore</a>)
- 3. Generated RISC-V CPU HDL sources
- 4. Working RISC-V GNU toolchain (available at https://github.com/riscv/riscv-gnu-toolchain)
  - **NOTE:** pre-built binaries for various hosts can be downloaded from <a href="https://www.sifive.com/software">https://www.sifive.com/software</a>. Do not forget to update PATH variable after downloading. Consider using Cygwin (with make utility) or WSL for RISC-V software compilation in Windows hosts.
- 5. (for FPGA prototyping) Digilent Nexys A7 FPGA board (<a href="https://digilent.com/shop/nexys-a7-fpga-trainer-board-recommended-for-ece-curriculum/">https://digilent.com/shop/nexys-a7-fpga-trainer-board-recommended-for-ece-curriculum/</a>)
- 6. (for FPGA prototyping) working Python 3 installation with pyserial package

### 4. TASK

- 1. Examine Sigma MCU baseline project
- 2. (if FPGA board available) Implement Sigma MCU in FPGA device and verify correctness of the baseline
- 3. Write software implementation of functionality for eCPU according to your variant
- 4. Verify functional correctness in simulation
- 5. Implement the design and collect metrics of the implementation
- 6. (if FPGA board available) Upload your program to Sigma MCU and make sure it works correctly
- 7. Analyze performance of implementations
- 8. (optional) Integrate any UDM-compatible module in Sigma MCU

#### 5. GUIDANCE

Detailed guidance will be provided using the example of a program that searches for the maximum value in 16-element array and returns this value and its index in the array.

### 1. Examine Sigma MCU baseline project

Sigma MCU is a basic microcontroller unit soft core consisting of sigma\_tile processing module, UDM and general-purpose input/output (GPIO) controller. GPIO controller is mapped on LEDs and switches on FPGA board.

Block diagram of Sigma MCU is located at:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/sigma/doc/sigma\_struct.png

Sigma\_tile module contains embedded CPU (eCPU) core with RISC-V ISA, tightly coupled on-chip RAM with single-cycle delay, interrupt controller, timer, Host InterFace (HIF), and eXpansion InterFace (XIF). Multiple sigma\_tile modules can fit in a single FPGA device. HIF and XIF have the same bus protocol as UDM block. Address maps are identical for UDM and eCPU. Working with UDM can be learned from the corresponding lab work:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/udm/doc/udm\_lab\_manual.pdf

Block diagram of sigma tile module is located at:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/sigma\_tile/doc/sigma\_tile\_struct.png

Address map of Sigma MCU is located at:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/sigma/doc/sigma\_addr\_map.md

Address map of sigma tile module is located at:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/sigma\_tile/doc/sigma\_tile\_addr\_map.md

Pipeline structures of various RISC-V eCPU configurations can be found here:

https://github.com/AntonovAlexander/activecore/blob/master/designs/rtl/sigma tile/doc/aquaris pipeline structs

RISC-V eCPU supports basic bare metal programming (base RV32I ISA, without FPU, MMU, etc). ActiveCore distribution provides six Sigma MCU projects with different eCPU configurations (1-6 pipeline stages). Longer pipeline can operate on higher frequencies and have better performance, however, consuming more hardware resources and power.

The projects are located at: activecore/designs/rtl/sigma/syn/syn \*\*xstage/NEXYS4 DDR

Generate RISC-V eCPU HDL sources or unpack the provided coregen archive in the following directory:

```
activecore/designs/rtl/sigma tile/hw/riscv
```

E.g. riscv\_5stage.sv file should be located at:

```
activecore/designs/rtl/sigma tile/hw/riscv/coregen/riscv 5stage/sverilog
```

Open NEXYS4 DDR.xpr file using Xilinx Vivado.

**NOTE**: avoid having non-English characters in project location path. Also, avoid very long project location path.

## 2. (if FPGA board available) Implement Sigma MCU in FPGA device and verify correctness of the baseline

Go to the following directories and build eCPU software using make command:

- compliance tests: activecore/designs/rtl/sigma/sw/riscv-compliance
- demo applications: activecore/designs/rtl/sigma/sw/apps

Implement the design, generate the bitstream and upload it to FPGA device. LEDs should start blinking with variable speed, depending on value on switches.

Find out the name of COM port associated with the board (COM<number> on Windows hosts or tty<number> on Linux hosts). Go one directory up, open hw\_test\_bechmarks.py test Python script and fill the correct COM port name in line 14:

```
udm = udm("<correct COM port name>", 921600)
```

Run eCPU compliance tests using hw\_test\_compliance.py Python script. The script will upload 44 test programs for eCPU and verify correctness of their operation. The last line of console output should be:

```
Total tests PASSED: 44 , FAILED: 0
```

Run eCPU application tests using hw\_test\_apps.py Python script. The script will upload 9 test programs for eCPU and verify correctness of their operation. The last line of console output should be:

```
Total tests PASSED: 9 , FAILED: 0
```

You can type help(sigma) and help(sigma\_tile) in Python console for full API reference of Sigma MCU and sigma tile module respectively.

### 3. Write software application for eCPU

Sigma MCU distribution provides several demo applications that can be used as reference (see Table 1).

| Demo application    | Description                                                                                     |  |  |
|---------------------|-------------------------------------------------------------------------------------------------|--|--|
| heartbeat variable  | A counter that is output to LED register. The period is continuously read from Switches         |  |  |
| ilearebeae_variable | register. Period is implemented as CPU busy waiting.                                            |  |  |
| irq counter         | A counter that is output to LED register. Increment is triggered by interrupt 3 that is         |  |  |
|                     | mapped on button on FPGA board.                                                                 |  |  |
| dhrystone           | Dhrystone synthetic benchmark                                                                   |  |  |
| median              | Three-element median filter operating on 400-element array of integers.                         |  |  |
| mul_sw              | Software multiplication of two integers producing an integer.                                   |  |  |
| qsort               | Quick sort operating on 1024-element array of integers.                                         |  |  |
| rsort               | Radix sort operating on 1024-element array of integers.                                         |  |  |
| crc32               | CRC32 hash calculation                                                                          |  |  |
| md5                 | MD5 hash calculation                                                                            |  |  |
| timer test          | A counter that is output to LED register. Utilizes the timer to count the period. The period is |  |  |
| CTIMET_CESC         | read from Switches register on reset.                                                           |  |  |
| bootloader          | Bootloader of programs in binary (ELF) format from the memory buffer                            |  |  |

Table 1 Demo applications provided in Sigma MCU distribution

Write software application for eCPU and check its correctness. You can use either local gcc installation or an online service (e.g.  $\underline{\text{https://cplayground.com/}}$  or  $\underline{\text{https://ideone.com/}}$ ) for this task. Test result for our example is shown in Listing 1.

**NOTE**: PC and online programming environments don't provide the same peripherals as those included in Sigma MCU. Thus, consider testing only "algorithmic" part of your program in these environments.

```
### thickuse estito.h>
```

Listing 1 Testing software implementation using <a href="mailto:com">cplayground.com</a>

Go to activecore/designs/rtl/sigma/sw/apps directory and add new directory for your software. In our example, the new directory is called findmaxval.

Create new C source file in the new directory. In our example, the file is called findmaxval.c. Write your program in this file. Source code for the example program in shown in Listing 2:

```
#define IO LED
                         (*(volatile unsigned int *)(0x80000000))
#define IO SW
                         (*(volatile unsigned int *)(0x80000004))
#define ARR SIZE 16
typedef struct
  unsigned int max elem;
  unsigned int max index;
} maxval data t;
maxval data t FindMaxVal(unsigned int x[ARR SIZE])
{
  maxval data t ret data;
  ret data.max elem = 0;
  ret_data.max_index = 0;
  for (int i=0; i<ARR SIZE; i++) {</pre>
    if (x[i] > ret_data.max_elem) {
      ret data.max elem = x[i];
      ret data.max index = i;
  }
  return ret data;
// Main
int main( int argc, char* argv[] )
```

```
maxval data t maxval data;
                                                                               0x44556677,
  unsigned
           int
                  datain[16]
                              =
                                    0x112233cc,
                                                   0x55aa55aa,
                                                                 0x01010202,
0x00000003, 0x00000004, 0x00000005,
                                       0x00000006,
                                                    0x00000007,
                                                                 0xdeadbeef,
                                                                               0xfefe8800,
0x23344556, 0x05050505, 0x07070707, 0x99999999, 0xbadc0ffe };
  IO LED = 0x55aa55aa;
 maxval data = FindMaxVal(datain);
  IO_LED = maxval_data.max_index;
  IO LED = maxval data.max elem;
  while (1) {}
```

### Listing 2 C source code in findmaxval.c

**NOTE:** we have output 0x55aa55aa value to LEDs to mark the end of startup sequence and start of the target function FindMaxVal. In the end of the program, we output max index and max val values and send eCPU to infinite loop.

NOTE: since Sigma MCU does not have standard output, we use LEDs to output resulting values.

Prepare executable image for eCPU. Open Makefile in activecore/designs/rtl/sigma/sw/apps directory and add the reference to the new directory in bmarks variable (added line is highlighted in cyan). Source code for the updated bmarks assignment is shown in Listing 3:

```
bmarks = \
   <available applications>
   rsort \
     findmaxval \
     <commented lines>
```

### Listing 3 Source code of the updated bmarks assignment in Makefile

Call make command from activecore/designs/rtl/sigma/sw/benchmarks directory to build the program image.

**NOTE:** since Sigma MCU does not support hardware multiplication, consider using software one if needed. The example program mul sw is included in ActiveCore distribution.

### 4. Verify functional correctness in simulation

Open the testbench file activecore/designs/rtl/sigma/tb/riscv\_tb.sv, select desired clock frequency (needed in Section 7), choose the eCPU configuration, and make mem\_data parameter of sigma instance reference to your ELF program image. For our example, code updates are shown in Listing 4.

```
define CLK HALF PERIOD
                                                                          external 100 MHZ
   define CLK HALF PERIOD
                                             7143
                                                                       // external 70 MHZ
//`define CLK HALF PERIOD
                                             6250
                                                                       // external 80 MHZ
//`define CLK HALF PERIOD
                                             3571
                                                                       // external 140 MHZ
 define CLK HALF PERIOD
                                                                       // external 150 MHZ
                                             3333
//`define CLK HALF PERIOD
                                             3125
                                                                       // external 160 MHZ
. . .
sigma
# (
  //.CPU("riscv 1stage")
  //.CPU("riscv 2stage")
  //.CPU("riscv 3stage")
  //.CPU("riscv_4stage")
  .CPU("riscv_5stage")
  //.CPU("riscv 6stage")
```

```
, .UDM_RTX_EXTERNAL_OVERRIDE("YES")
, .delay_test_flag(0)

, .mem_init_type("elf")
, .mem_init_data("<PATH_TO_ACTIVECORE>/designs/rtl/sigma/sw/apps/findmaxval.riscv")
, .mem_size(8192)
) sigma
(
    .clk_i(CLK_100MHZ)
, .arst_i(RST)
, .irq_btn_i(irq_btn)
, .rx_i(rx)
//, .tx_o()
, .gpio_bi(SW)
, .gpio_bo(LED)
);
```

Listing 4 Updated module instantiation in riscv tb.sv testbench

Once simulation starts, Tcl console should show notification of successful program image upload (see Figure 1).



Figure 1 Notification of successful program image upload

Simulation waveform for 5-stage eCPU configuration is shown in Figure 2.



Figure 2 Simulation waveform of program working on eCPU

The values on LEDs are correct, the program works as intended.

**NOTE:** if resulting values do not appear in simulation, try the following:

- Check the program is placed in sigma\_tile RAM. Compare the content of RAM (RAM array is located at /riscv\_tb/sigma/sigma\_tile/ram/ram\_dual/ram) to the program binary. Consider specifying absolute path in case the image is not loaded.
- Write intermediate values to LED register.
- Trace program execution.

The program can be traced in simulation using 1-stage eCPU configuration. To switch eCPU configurations for simulation, open corresponding Vivado project and change CPU parameter of sigma instance in riscv\_tb.sv testbench. Display the following signals in eCPU (located in /riscv tb/sigma/sigma tile/genblk1.riscv, see Figure 3):

- genpstage EXEC TRX LOCAL.curinstr addr instruction address
- genpstage EXEC TRX LOCAL.instr code instruction code
- genpsticky glbl regfile general-purpose registers

NOTE: you can use the provided riscy to behav.wcfg waveform configuration file to display the eCPU state.



Figure 3 Tracing program execution using 1-stage eCPU configuration

Listing 5 Fragment of findmaxval.riscv.dump program dump file

Analyze dumped representation of program (findmaxval.riscv.dump in our case, see Listing 5) using RISC-V Assembly Programmer's Manual: <a href="mailto:github.com/riscv/riscv-asm-manual/blob/master/riscv-asm.md">github.com/riscv/riscv-asm-manual/blob/master/riscv-asm.md</a>. E.g., in our example, instruction at address 0x520 (li al,1412) writes immediate value 1412 (0x584) to register al. This operation is marked in Figure 3.

Identify and fix inconsistencies in program execution.

### 5. Implement the designs and collect metrics of the implementations

Characteristics of provided sigma tile configurations are shown in Table 2:

| - CDII ('1'        | 5              | 1117 |      |
|--------------------|----------------|------|------|
| eCPU configuration | Frequency, MHz | LUTs | FFs  |
| riscv_1stage       | 70             | 2144 | 1180 |
| riscv_2stage       | 70             | 2263 | 1279 |
| riscv_3stage       | 80             | 2293 | 1422 |
| riscv_4stage       | 140            | 2284 | 1686 |
| riscv_5stage       | 150            | 2385 | 1731 |
| riscv 6stage       | 160            | 2314 | 1830 |

Table 2 Characteristics of provided sigma\_tile implementations

6. (if FPGA board available) Upload your program to Sigma MCU and make sure it works correctly To upload your program, add loadelf command to the end of hw\_test.py script. For our example, the line is the following: sigma.tile.loadelf('<PATH\_TO\_ACTIVECORE>/designs/rtl/sigma/sw/apps/findmaxval.riscv') In our example, the LEDs show 0x8800 (16 least significant bits of 0xfefe8800 value). The program works as intended.

### 7. Analyze performance of implementations

Now we can analyze performance values of functionality implementations based on various eCPU configurations. Set the actual clock period for each eCPU configuration according to Section 4. For our example, these values are shown in Table 3.

| eCPU configuration | Latency, ns |  |
|--------------------|-------------|--|
| riscv_1stage       | 2943        |  |
| riscv_2stage       | 1586        |  |
| riscv_3stage       | 1938        |  |
| riscv_4stage       | 1179        |  |
| riscv_5stage       | 1100        |  |
| riscv_6stage       | 1200        |  |

Table 3 Performance of implementations based on various eCPU configurations

### 8. Integrate any UDM-compatible module in Sigma MCU

Since  $sigma\_tile\ XIF\ protocol\ is\ identical\ to\ UDM\ system\ bus\ protocol,\ UDM-compatible\ modules\ can\ be\ seamlessly\ integrated\ in\ Sigma\ MCU.$ 

NOTE: Beware that XIF address space starts from 0x80000000.

 $Integrate \ one \ of \ such \ modules \ in \ Sigma \ MCU \ (modify \ \texttt{sigma.sv} \ module) \ and \ feed \ this \ module \ with \ data \ from \ eCPU.$