

# The NEORV32 Processor

by Dipl.-Ing. Stephan Nolting

A customizable, lightweight and open-source 32-bit RISC-V soft-core microcontroller.





# **Proprietary and Legal Notice**

- "Vivado" and "Artix" are trademarks of Xilinx Inc.
- "AXI" and "AXI4-Lite" are trademarks of Arm Holdings plc.
- "ModelSim" is a trademark of Mentor Graphics A Siemens Business.
- "Quartus Prime" and "Cyclone" are trademarks of Intel Corporation.
- "iCE40", "UltraPlus" and "Radiant" are trademarks of Lattice Semiconductor Corporation.
- "Windows" is a trademark of Microsoft Corporation.
- "Tera Term" copyright by T. Teranishi.
- "Travis CI" is a trademark by Travis CI GMBH

Timing diagrams were made with WaveDrom Editor

#### Limitation of Liability for External Links

Our website contains links to the websites of third parties ("external links"). As the content of these websites is not under our control, we cannot assume any liability for such external content. In all cases, the provider of information of the linked websites is liable for the content and accuracy of the information provided. At the point in time when the links were placed, no infringements of the law were recognisable to us. As soon as an infringement of the law becomes known to us, we will immediately remove the link in question.

#### Disclaimer

This project is released under the BSD 3-Clause license. No copyright infringement intended. Other implied or used projects might have different licensing – see their documentation to get more information.

This project is not affiliated with or endorsed by the Open Source Initiative.

https://www.oshwa.org https://opensource.org

RISC-V – Instruction Sets Want To Be Free ♥

https://riscv.org/ https://github.com/riscv



The NEORV32 Processor Project

©2020 Dipl.-Ing. Stephan Nolting, Hannover, Germany

https://github.com/stnolting/neorv32

Contact: stnolting@gmail.com

#### **BSD 3-Clause License**

Copyright (c) 2020, Stephan Nolting. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

- 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# **Table of Content**

|    | Proprietary and Legal Notice                                |     |
|----|-------------------------------------------------------------|-----|
|    | SSD 3-Clause License                                        |     |
| 1. | . Overview                                                  | 6   |
|    | 1.1. Design Principles                                      | 7   |
|    | 1.2. Citing                                                 | 7   |
|    | 1.3. Processor Key Features                                 | 8   |
|    | 1.4. RISC-V Compliance                                      | 9   |
|    | 1.4.1 RISC-V Non-Compliance                                 | 11  |
|    | 1.5. Project Folder Structure                               | 12  |
|    | 1.6. VHDL File Hierarchy                                    | 13  |
|    | 1.7. Processor Top Entity – Signals                         | 14  |
|    | 1.8. Processor Top Entity – Configuration Generics.         | 15  |
|    | 1.8.1 General                                               | 15  |
|    | 1.8.2. RISC-V CPU Extensions                                |     |
|    | 1.8.3. Memory Configuration: Instruction Memory             |     |
|    | 1.8.4. Memory Configuration: Data Memory                    |     |
|    | 1.8.5. Memory Configuration: External Memory Interface      |     |
|    | 1.8.6. Processor Peripherals                                |     |
|    | 1.9. FPGA Implementation Results                            |     |
|    | 1.9.1. CPU                                                  |     |
|    | 1.9.2. Peripherals.                                         |     |
|    | 1.9.3. Exemplary FPGA Results                               |     |
|    | 1.10. CPU Performance                                       |     |
|    | 1.10.1. CoreMark Benchmark                                  |     |
|    | 1.10.2. Instruction Timing                                  |     |
| 2. | . Central Processing Unit                                   |     |
|    | 2.1. Instruction Set and CPU Extensions.                    |     |
|    | 2.2. Instruction Timing.                                    |     |
|    | 2.3. Control and Status Registers (CSRs)                    |     |
|    | 2.3.1. Machine Trap Setup                                   |     |
|    | 2.3.2. Machine Trap Handling                                |     |
|    | 2.3.3. Counters and Timers                                  |     |
|    | 2.3.4. Machine Information Registers.                       |     |
|    | 2.4. Exceptions and Interrupts.                             |     |
|    | 2.5. Address Space                                          |     |
|    | 2.5.1. Processor-Internal Instruction Memory (IMEM).        |     |
|    | 2.5.2. Processor-Internal Data Memory (DMEM)                |     |
|    | 2.5.3. Processor-Internal Bootloader ROM (BOOTROM)          |     |
|    | 2.5.4. Processor-External Memory Interface (WISHBONE)       |     |
|    | 2.5.5. Processor-Internal Peripheral/IO Devices             |     |
|    | 2.6. Bus Interface                                          |     |
|    | 2.6.1. Interface Signals                                    |     |
|    | 2.6.2. Protocol                                             |     |
| 3  | . Peripheral/IO Devices                                     |     |
| ٠. | 3.1. General Purpose Input and Output Port (GPIO).          | 45  |
|    | 3.2. Core-Local Interrupt Controller (CLIC)                 |     |
|    | 3.3. Watchdog Timer (WDT)                                   |     |
|    | 3.4. Machine System Timer (MTIME).                          |     |
|    | 3.5. Universal Asynchronous Receiver and Transmitter (UART) |     |
|    | 3.6. Serial Peripheral Interface Controller (SPI)           |     |
|    | Sign Serial I eliphetal interface Condulte (OI I)           | د د |

# The NEORV32 Processor

| 3.7. Two Wire Serial Interface Controller (TWI)                   | 55 |
|-------------------------------------------------------------------|----|
| 3.8. Pulse Width Modulation Controller (PWM)                      |    |
| 3.9. True Random Number Generator (TRNG)                          | 59 |
| 3.10. Dummy Device (DEVNULL)                                      | 61 |
| 3.11. System Configuration Information Memory (SYSINFO)           | 62 |
| 4. Software Architecture                                          |    |
| 4.1. Toolchain                                                    | 64 |
| 4.2. Core Software Libraries                                      | 65 |
| 4.3. Application Makefile                                         | 66 |
| 4.3.1. Makefile Targets                                           | 66 |
| 4.3.2. Makefile Configuration                                     |    |
| 4.4. Executable Image Format                                      | 68 |
| 4.5. Bootloader                                                   | 69 |
| 4.5.1. Auto Boot Sequence.                                        | 71 |
| 4.5.2. External SPI Flash for Booting                             | 72 |
| 4.5.3. Bootloader Error Codes                                     | 73 |
| 4.5.4. Final Notes                                                | 73 |
| 4.6. NEORV32 Runtime Environment                                  | 74 |
| 5. Let's Get It Started!                                          | 77 |
| 5.1. Toolchain Setup                                              | 77 |
| 5.1.1. Making the Toolchain from Scratch                          | 77 |
| 5.1.2. Downloading and Installing the Prebuilt Toolchain          | 78 |
| 5.1.3. Installation                                               | 78 |
| 5.1.4. Testing the Installation                                   | 78 |
| 5.2. General Hardware Setup                                       | 79 |
| 5.3. General Software Framework Configuration                     |    |
| 5.4. Building the Software Documentation                          | 83 |
| 5.5. Application Program Compilation                              |    |
| 5.6. Uploading and Starting of a Binary Executable Image via UART |    |
| 5.7. Setup of a New Application Program Project                   | 86 |
| 5.8. Enabling RISC-V CPU Extensions                               |    |
| 5.9. Building a Non-Volatile Application (Program Fixed in IMEM)  | 88 |
| 5.10. Re-Building the Internal Bootloader                         |    |
| 5.11. Programming the Bootloader SPI Flash                        | 90 |
| 5.12. Simulating the Processor                                    | 91 |
| 5.13. Continuous Integration                                      | 92 |
| 6. Troubleshooting                                                | 93 |
| 7. Change Log                                                     | 94 |

#### 1. Overview



Figure 1: NEORV32 processor block diagram

The **NEORV32**<sup>1</sup> is a customizable full-scale mikrocontroller-like processor system based on a <u>RISC-V-compliant</u> rv32i CPU with optional M, E, C and Zicsr and Zifencei extensions. The CPU was built from scratch and is compliant to the *Unprivileged ISA Specification Version 2.2* and a subset of the *Privileged Architecture Specification Version 1.12-draft*. A copy of these specifications can be found in the project's docs folder. The non-compliant issues can be found in chapter <u>1.4.1 RISC-V Non-Compliance</u>.

The NEORV32 is intended as auxiliary processor within a larger SoC designs or as stand-alone custom microcontroller. Its top entity can be directly synthesized for any FPGA without modifications and provides a full-scale RISC-V based microcontroller.

The processor provides common peripherals and interfaces like input and output ports, serial interfaces for UART, I<sup>2</sup>C and SPI, interrupt controller, timers and embedded memories. External memories peripherals and custom IP can be attached via a Wishbone-based external memory interface. All optional features beyond the base CPU can be enabled and configured via VHDL generics.

This project comes with a complete software ecosystem that features core libraries for high-level usage of the provided functions and peripherals, application makefiles, a runtime environment and several example programs. All software source files provide a doxygen-based documentary.

The project is intended to work "out of the box". Just synthesize the test setup from this project, upload it to your FPGA board of choice and start playing with the NEORV32. If you do not want to compile the GCC toolchain by yourself, you can also download <u>pre-compiled binaries</u> for Linux.



Quickstart: Jump directly to the 5. Let's Get It Started! chapter to get started.

1 Pronounced "neo-R-V-thirty-two" or "neo-risc-five-thirty-two" in its long form.

#### 1.1. Design Principles

I've worked on and with several soft-core architecture. And I have studied even more of them. There are so many good projects on GitHub: Great processor designs and projects with the best of intentions. Unfortunately, many of them lack a good documentation, that covers everything down from the rtl level to the software library and how all the parts get together.

→ From zero to main(): Completely open source and documented.

Everyone uses different FPGAs and evaluations boards. This variety is a good thing. Though, it can be quite frustrating if you have to dig into the deepest corners of a HDL project if you just want to do a quick test synthesis for your FPGA board.. This project comes in a technology-independent form. Nevertheless, it also provides *optional alternative* components, that are tailored to specific FPGAs.

→ Plain VHDL without technology-specific parts like attributes, macros or primitives.

I just talked about it. Sometimes you just want to check out a project. This project also tries to be useful for beginners, too.

**→** Easy to use – working out of the box.

Some of the open-source processors out there have nice CPI and benchmark parameters. But at least some of these architectures have quite unrealistic memory interfaces (read response within the same cycle) most memories cannot provide. Also, I do not like asynchronous interfaces (some of them even require latches) and clock gating to halt certain parts of a circuit.

→ Clean synchronous design, no wacky combinatorial interfaces.

Probably we all are fanboys/girls/you-name-it of a specific FPGA architectures, toolchains or manufacturers. All FPGA vendors out there have their individual benefits, but I really like the Lattice iCE40 FPGAs. They are so tiny and have a very clean and simple architecture (no fancy multiplexers and stuff like that). The large embedded memory blocks are a nice extra.

→ The processor has to fit in a Lattice iCE40 UltraPlus 5k FPGA running at 20+ MHz.

#### 1.2. Citing

If you are using the NEORV32 in some kind of publication, please cite it as follows:

S. Nolting, "The NEORV32 Processor", github.com/stnolting/neorv32

#### 1.3. Processor Key Features

- 32-bit rv32i RISC-V-compliant base CPU ( $\rightarrow$  p.24)
  - Optional C extension for compressed instructions
  - Optional E extensions for embedded CPU version
  - Optional M extension for multiplication and division instructions
  - Optional Zicsr extension for control and status register access and exception/interrupt system
  - Optional Zifencei extension for instruction stream synchronization
- Toolchain based on free RISC-V GCC port; prebuilt toolchains available ( $\rightarrow p.64$ )
- Application compilation based on GNU makefiles (→ p.66)
- Doxygen-based documentation of the software framework ( $\rightarrow$  p.83); the automatically deployed version is available at <a href="https://stnolting.github.io/neorv32/files.html">https://stnolting.github.io/neorv32/files.html</a>
- Completely described in behavioral, platform-independent VHDL no primitives, macros, etc. used
- Fully synchronous design, no latches, no gated clocks
- Small hardware footprint and high operating frequency ( $\rightarrow$  p. 19)
- Highly customizable processor configuration ( $\rightarrow$  p.15)
- Optional processor-internal data and instruction memories (DMEM/IMEM  $\rightarrow$  p.37)
- Optional internal bootloader with UART console and automatic SPI flash boot option ( $\rightarrow p.69$ )
- Optional machine system timer (MTIME  $\rightarrow$  p.50), RISC-V-compliant
- Optional universal asynchronous receiver and transmitter (UART  $\rightarrow$  p.51)
- Optional 8/16/24/32-bit serial peripheral interface controller (SPI  $\rightarrow$  p.53) with 8 dedicated CS lines
- Optional two wire serial interface controller (TWI  $\rightarrow$  p.55), compatible to the I<sup>2</sup>C standard
- Optional general purpose parallel IO port (GPIO  $\rightarrow$  p.45), 16xOut, 16xIn
- Optional 32-bit external bus interface, Wishbone b4 compliant (WISHBONE  $\rightarrow$  p.38)
- Optional watchdog timer (WDT  $\rightarrow$  p.48)
- Optional PWM controller with 4 channels and 8-bit duty cycle resolution (PWM  $\rightarrow p.57$ )
- Optional GARO-based true random number generator (TRNG  $\rightarrow$  p.59)
- Optional core-local interrupt controller with 8 channels (CLIC  $\rightarrow$  p.46)
- Optional dummy device; provides advanced simulation console output (DEVNULL → p.61)
- System configuration information memory to check HW config. by software (SYSINFO  $\rightarrow p.62$ )

# 1.4. RISC-V Compliance

The processor passes the <u>official RISC-V compliance test</u>. The port of this test suite for the NEORV32 can be found in the <u>neorv32 compliance test</u> GitHub repository.

#### RISC-V rv32i Tests

```
Check
                         I-ADD-01 ... OK
Check
                        \hbox{I-ADDI-01} \ \dots \ \hbox{OK}
Check
                         I-AND-01 ... OK
                        I-ANDI-01 ... OK
Check
Check
                       I-AUIPC-01 ... OK
Check
                         I-BEQ-01 ... OK
                         I-BGE-01 ... OK
Check
                       I-BGEU-01 ... OK
I-BLT-01 ... OK
Check
Check
                        I-BLTU-01 ... OK
Check
Check
                         I-BNE-01 ... OK
Check
                I-DELAY_SLOTS-01 ... OK
Check
                     I-EBREAK-01 ... OK
Check
                       I-ECALL-01 ... OK
                  I-ENDIANESS-01 ... OK
Check
Check
                          I-I0-01 ... OK
Check
                         I-JAL-01 ... OK
                        I-JALR-01 ... OK
Check
                         I-LB-01 ... OK
I-LBU-01 ... OK
Check
Check
                          I-LH-01 ... OK
Check
Check
                         I-LHU-01 ... OK
                         I-LUI-01 ... OK
Check
Check
                          I-LW-01 ... OK
Check
              I-MISALIGN_JMP-01 ... OK
Check
             I-MISALIGN_LDST-01 ... OK
Check
                         I-NOP-01 ... OK
Check
                          I-OR-01 ... OK
                         I-0RI-01 ... OK
Check
                   I-RF_size-01 ... OK
I-RF_width-01 ... OK
Check
Check
                      I-RF_x0-01 ... OK
Check
Check
                          I-SB-01 ... OK
                          I-SH-01 ... OK
Check
                        I-SLL-01 ... 0K
I-SLLI-01 ... 0K
I-SLT-01 ... 0K
Check
Check
Check
Check
                       I-SLTI-01 ... 0K
Check
                       I-SLTIU-01 ... OK
Check
                       I-SLTU-01 ... OK
                       I-SRA-01 ... OK
I-SRAI-01 ... OK
Check
Check
Check
                        I-SRL-01 ... OK
Check
                        I-SRLI-01 ... OK
                         I-SUB-01 ... OK
Check
                         I-SW-01 ... OK
I-XOR-01 ... OK
Check
Check
                        I-XORI-01 ... OK
Check
OK: 48/48 RISCV_TARGET=neorv32 RISCV_DEVICE=rv32i RISCV_ISA=rv32i
```

#### RISC-V rv32im Tests

```
Check
                           DIV ... OK
                          DIVU ... OK
Check
Check
                           MUL ... OK
                          MULH ... OK
Check
Check
                        MULHSU ... OK
Check
                         MULHU ... OK
                           REM ... OK
Check
                          REMU ... OK
Check
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=rv32im RISCV_ISA=rv32im
```

#### RISC-V rv32imc Tests

```
Check
                             C-ADD ... OK
Check
                           C-ADDI ... OK
Check
                       C-ADDI16SP ... OK
Check
                       C-ADDI4SPN ... OK
                           C-AND ... OK
C-ANDI ... OK
Check
Check
                           C-BEQZ ... OK
Check
Check
                           C-BNEZ ... OK
                            C-J ... OK
C-JAL ... OK
Check
Check
                           C-JALR ... OK
C-JR ... OK
Check
Check
                              C-LI ... OK
Check
Check
                             C-LUI ... OK
                              C-LW ... OK
Check
                           C-LWSP ... OK
C-MV ... OK
Check
Check
Check
                              C-OR ... OK
Check
                           C-SLLI ... OK
                           C-SRAI ... OK
Check
                           C-SRLI ... OK
C-SUB ... OK
Check
Check
Check
                              C-SW ... OK
                            C-SWSP ... OK
Check
Check
                             C-XOR ... OK
OK: 25/25 RISCV_TARGET=neorv32 RISCV_DEVICE=rv32imc RISCV_ISA=rv32imc
```

#### **RISC-V rv32Zicsr Tests**

#### RISC-V rv32Zifencei Tests

Check I-FENCE.I-01 ... OK
------OK: 1/1 RISCV\_TARGET=neorv32 RISCV\_DEVICE=rv32Zifencei RISCV\_ISA=rv32Zifencei

### 1.4.1 RISC-V Non-Compliance

This list shows the *currently known* issues regarding full RISC-V-compliance.



Not exception is triggered yet when using registers above x15 in embedded mode (CPU E extension enabled via the CPU\_EXTENSION\_RISCV\_E generic).



The misa CSR is read-only. It reflects the *synthesized* CPU extensions. Hence, all implemented CPU extensions are always active and cannot be enabled/disabled dynamically during runtime. Any write access to it is ignored and will not cause any exception or side-effects.



The performance counter CSRs [m]cycleh and [m]instreth are only 20-bit wide (in contrast to the original 32-bit).

i

The *machine software interrupt* msi is implemented, but there is currently no mechanism available to trigger this interrupt.

### 1.5. Project Folder Structure

```
neorv32
                      Project home folder.
 -.ci
                      Scripts for continuous integration.
 -docs
                      Project documentary: RISC-V specifications implemented in this
                      project, Wishbone bus specification, NEORV32 data sheet, doxygen
                      makefiles.
                      Software documentary HTML files (generated by doxygen).
   -doxygen build
  ldsymbol{ldsymbol{ldsymbol{ldsymbol{ldsymbol{\mathsf{L}}}}}fiqures
                      Images mainly for the GitHub front page.
 -rtl
                      Processor's VHDL source files.
                      This folder contains all the rtl (VHDL) core files of the NEORV32 processor. Make sure to add ALL of them to your FPGA EDA project.
    -core
   -top_templates
                      Here you can find alternative top entities of the NEORV32.
   -fpga_specific
                      This folder provides FPGA technology-specific optimized HW modules.
                      The sim folder contains the default VHDL testbench and additional
 -sim
                      simulation files.
    -ghdl
                      Simulation script for GHDL.
                      Default Xilinx Vivado simulation waveform configuration.
    -Vivado
                      The software folder contains the processor's core libraries,
  ·SW
                      makefiles, linker scripts, start-up codes and example programs.
   -boot loader
                      Source and compilation script of the NEORV32-internal bootloader.
   -common
                      Application & bootloader linker scripts and startup codes.
                      Here you can find several example programs. Each project folder
    example
                      includes the program's C sources and a makefile. Add your own
                      projects to this folder.
                      Helper program to generate executables for the NEORV32.
    image_gen
    ·lib
                      This folder contains the processor's core libraries.
                      NEORV32 hardware driver library C source files and the according
      include
                      header/include files.
      source
```



There are further files and folders starting with a dot which – for example – contain data/configurations only relevant for git or for the continuous integration framework (.ci). These files and folders are not relevant for the actual checked-out NEORV32 project.

# 1.6. VHDL File Hierarchy

All necessary VHDL hardware description files are located in the project's rtl/core folder. The top entity of the entire processor including all the required configuration generics is neorv32 top.vhd.



All processor core VHDL files have to be assigned to a new library called neorv32.

| neorv32_top.vhd               | Processor core top entity                                 |
|-------------------------------|-----------------------------------------------------------|
| neorv32_boot_rom.vhd          | Bootloader ROM                                            |
| _neorv32_bootloader_image.vhd | Boot ROM initialization image for the bootloader          |
| -neorv32_busswitch.vhd        | Bus switch to mux CPU's I & D interfaces to processor bus |
| —neorv32_clic.vhd             | Core-local interrupt controller                           |
| neorv32_cpu.vhd               | NEORV32 CPU top entity                                    |
| —neorv32_cpu_alu.vhd          | Arithmetic/logic unit                                     |
| -neorv32_cpu_bus.vhd          | Bus interface unit                                        |
| neorv32_cpu_control.vhd       | CPU control, exception/IRQ system and CSRs                |
| _neorv32_cpu_decompressor.vhd | Compressed instructions decoder                           |
| —neorv32_cpu_cp_muldiv.vhd    | Multiplication/division co-processor                      |
| _neorv32_reg_file.vhd         | Data register file                                        |
| -neorv32_devnull.vhd          | Dummy device                                              |
| -neorv32_dmem.vhd             | Processor-internal data memory                            |
| -neorv32_gpio.vhd             | General purpose input/output port unit                    |
| -neorv32_imem.vhd             | Processor-internal instruction memory                     |
| _neor32_application_image.vhd | IMEM application initialization image                     |
| -neorv32_mtime.vhd            | Machine system timer                                      |
| -neorv32_package.vhd          | Processor VHDL package file                               |
| -neorv32_pwm.vhd              | Pulse-width modulation controller                         |
| -neorv32_spi.vhd              | Serial peripheral interface controller                    |
| -neorv32_sysinfo.vhd          | System configuration information memory                   |
| -neorv32_trng.vhd             | True random number generator                              |
| -neorv32_twi.vhd              | Two wire serial interface controller                      |
| —neorv32_uart.vhd             | Universal asynchronous receiver/transmitter               |
| —neorv32_wdt.vhd              | Watchdog timer                                            |
| _neorv32_wb_interface.vhd     | External Wishbone bus gateway                             |

# 1.7. Processor Top Entity - Signals

The following table shows all interface ports of the processor top entity (neorv32\_top.vhd). The type of all signals is std\_ulogic or std\_ulogic\_vector, respectively – except for the TWI signals, which are of type std logic.

| Signal Name | ignal Name Width Direction Function |                |                                                              | HW Module |  |
|-------------|-------------------------------------|----------------|--------------------------------------------------------------|-----------|--|
|             |                                     |                | Global Control                                               |           |  |
| clk_i       | 1                                   | Input          | Global clock line, all registers triggering on rising edge   | global    |  |
| rstn_i      | 1                                   | Input          | Global reset, low-active                                     | gtobat    |  |
|             |                                     | External bus   | s interface (Wishbone-compatible)                            |           |  |
| wb_adr_o    | 32                                  | Output         | Destination address                                          |           |  |
| wb_dat_i    | 32                                  | Input          | Write data                                                   |           |  |
| wb_dat_o    | 32                                  | Output         | Read data                                                    |           |  |
| wb_we_o     | 1                                   | Output         | Write enable ('0' = read transfer)                           |           |  |
| wb_sel_o    | 4                                   | Output         | Byte enable                                                  | WISHBONE  |  |
| wb_stb_o    | 1                                   | Output         | Strobe                                                       |           |  |
| wb_cyc_o    | 1                                   | Output         | Valid cycle                                                  |           |  |
| wb_ack_i    | 1                                   | Input          | Transfer acknowledge                                         |           |  |
| wb_err_i    | 1                                   | Input          | Transfer error                                               |           |  |
|             |                                     | Advan          | ced memory control signals                                   |           |  |
| fence_o     | 1                                   | Output         | Indicates an executed fence instruction                      | CDII      |  |
| fencei_o    | 1                                   | Output         | Indicates an executed fencei instruction                     | CPU       |  |
|             | •                                   | General P      | urpose Inputs & Outputs (GPIO)                               |           |  |
| gpio_o      | 16                                  | Output         | General purpose parallel output <sup>2</sup>                 | GPI0      |  |
| gpio_i      | 16                                  | Input          | General purpose parallel input                               | GP10      |  |
|             | u                                   | Iniversal Asyn | chronous Receiver/Transmitter (UART)                         |           |  |
| uart_txd_o  | 1                                   | Output         | UART serial transmitter                                      | UART      |  |
| uart_rxd_i  | 1                                   | Input          | UART serial receiver                                         | UARI      |  |
|             |                                     | Serial Peri    | pheral Interface Controller (SPI)                            |           |  |
| spi_sck_o   | 1                                   | Output         | SPI controller clock line                                    |           |  |
| spi_sdo_o   | 1                                   | Output         | SPI serial data output                                       | CDT       |  |
| spi_sdi_i   | 1                                   | Input          | SPI serial data input                                        | SPI       |  |
| spi_csn_o   | 8                                   | Output         | SPI dedicated chip select lines 07 <sup>3</sup> (low-active) |           |  |
|             |                                     | Two-Wir        | e Interface Controller (TWI)                                 |           |  |
| twi_sda_io  | 1                                   | InOut          | TWI serial data line                                         | TWI       |  |
| twi_scl_io  | 1                                   | InOut          | TWI serial clock line                                        | IWI       |  |
|             |                                     | Pulse-Wi       | dth Modulation Channels (PWM)                                |           |  |
| pwm_o       | 4                                   | Output         | Pulse-width modulated channels                               | PWM       |  |
|             |                                     |                | External Interrupts                                          |           |  |
| ext_irq_i   | 2                                   | Input          | Interrupt request signals, high-active                       | 01.70     |  |
| ext_ack_o   | 2                                   | Output         | Interrupt request acknowledges, single-shot                  | CLIC      |  |

Table 1: neorv32\_top.vhd – processor's top entity interface ports

- 2 Bit #0 is used by the bootloader to drive a high-active status LED.
- 3 Chip select #0 is used by the bootloader to access the external boot SPI flash.

# 1.8. Processor Top Entity - Configuration Generics

This is a list of all configuration generics of the NEORV32 processor top entity rtl/neorv32\_top.vhd. The generic name is shown in orange, the type in **black** and the default value in light gray. Most of the configured settings can be determined by the software via the SYSINFO IO module (3.11. System Configuration Information Memory (SYSINFO)).

#### 1.8.1 General

#### **CLOCK\_FREQUENCY** natural 0

The clock frequency of the processor's clk\_i input port in Hertz (Hz).

#### **BOOTLOADER USE boolean true**

Implement the boot ROM, pre-initialized with the bootloader image when true. This will also change the processor's boot address from MEM\_ISPACE\_BASE to the base address of the boot ROM. See chapter 4.5. Bootloader for more information.

#### CSR COUNTERS USE USE boolean true

Implement the standard RISC-V (performance) counter CSRs [m]cyle[h], [m]instret[h] and time[h]. The counters can only be accessed when CPU\_EXTENSION\_RISCV\_Zicsr is true. If this generic is disabled, any access to one of those CSRs will cause an exception. The time[h] CSRs are only available if the MTIME unit is also available (IO\_MTIME\_USE is true).

# USER\_CODE std\_ulogic\_vector(31 downto 0) 0x"00000000"

Custom user code that can be read by software via the SYSINFO module.

### 1.8.2. RISC-V CPU Extensions

See chapter 2. Central Processing Unit for more information.

#### CPU\_EXTENSION\_RISCV\_C boolean false

Implement the CPU extension for compressed instructions when true.

### CPU\_EXTENSION\_RISCV\_E boolean false

Implement the embedded CPU extension (only implement the first 16 data registers) when true.

#### CPU\_EXTENSION\_RISCV\_M boolean false

Implement integer multiplication and division instruction when true.

#### CPU EXTENSION RISCV Zicsr boolean true

Implement the control and status register (CSR) access instructions when true. Note: When this option is disabled, the complete exception system will be excluded from synthesis. Hence, no interrupts and no exceptions can be detected.



The CPU\_EXTENSION\_RISCV\_Zicsr should be **always enabled**. The bootloader and also the default application start-up code (crt0.S) rely on system information provided by (custom) CSRs.

#### CPU EXTENSION RISCV Zifencei boolean true

Implement the instruction fetch synchronization instruction ifetch.i. For example, this option is required for self-modifying code.

#### 1.8.3. Memory Configuration: Instruction Memory

See chapter 2.5.1. Processor-Internal Instruction Memory (IMEM) for more information.

#### MEM\_ISPACE\_BASE std ulogic vector(31 downto 0) x"00000000"

Base address of the instruction memory space. This is also the default boot address, if the bootloader is not implemented.

#### MEM ISPACE SIZE natural 16\*1024

Size of the instruction memory space in bytes. Starts at MEM ISPACE BASE.

#### MEM\_INT\_IMEM\_USE boolean true

Implement processor internal instruction memory (IMEM) when true.

#### MEM INT IMEM SIZE natural 16\*1024

Size in bytes of the processor internal instruction memory (IMEM) when true. Has no effect when MEM\_INT\_IMEM\_USE is false. Must no be greater than MEM\_ISPACE\_SIZE.

#### MEM\_INT\_IMEM\_ROM boolean false

Implement processor-internal instruction memory as read-only memory, which will be initialized with the application image at synthesis time. Has no effect when MEM\_INT\_IMEM\_USE is false.

#### 1.8.4. Memory Configuration: Data Memory

See chapter 2.5.2. Processor-Internal Data Memory (DMEM) for more information.

#### MEM DSPACE BASE std ulogic vector(31 downto 0) x"80000000"

Base address of the data memory space.

#### MEM\_DSPACE\_SIZE natural 8\*1024

Size of the data memory space in bytes. Starts at MEM\_DSPACE\_BASE.

#### MEM\_INT\_DMEM\_USE boolean true

Implement processor internal data memory (DMEM) when true.

#### MEM\_INT\_DMEM\_SIZE natural 8\*1024

Size in bytes of the processor internal data memory (DMEM) when true. Has no effect when MEM\_INT\_DMEM\_USE is false. Must no be greater than MEM\_DSPACE\_SIZE.

#### 1.8.5. Memory Configuration: External Memory Interface

See chapter 2.5.4. Processor-External Memory Interface (WISHBONE) for more information.

#### MEM EXT USE boolean false

Implement external bus interface (WISHBONE) when true.

#### MEM EXT REG STAGES natural 2

Defines the number of register stages inside the external bus gateway. Allowed configurations: 0, 1 or 2. Adding register stages increases the bus access latency but will also improve timing.

#### MEM\_EXT\_TIMEOUT natural 15

Maximum length of bus access in main clock cycles. If a bus access is not acknowledged within the specified time, the access is aborted and a load/store/instruction access fault is triggered.

#### 1.8.6. Processor Peripherals

See chapter 2.5.5. Processor-Internal Peripheral/IO Devices for more information.

#### IO\_GPIO\_USE boolean true

Implement general purpose input/output port unit (GPIO) when true. When disabled, the gpio\_i signal is unconnected and the gpio\_o signal is always low. See chapter 3.1. General Purpose Input and Output Port (GPIO) for more information.

#### **IO MTIME USE boolean true**

Implement machine system timer (MTIME) when true. When disabled, the CPU's machine timer interrupt is not available. The CPU\_EXTENSION\_RISCV\_Zicsr has to be enabled if you want to use the machine system timer's interrupt. See chapter 3.4. Machine System Timer (MTIME) for more information.

#### IO\_UART\_USE boolean true

Implement universal asynchronous receiver/transmitter (UART) when true. When disabled, the uart\_rxd\_i signal is unconnected and the uart\_txd\_o signal is always low. The UART interrupt can only be used when CPU\_EXTENSION\_RISCV\_Zicsr and IO\_CLIC\_USE are enabled. See chapter 3.5. Universal Asynchronous Receiver and Transmitter (UART) for more information.

#### IO\_SPI\_USE boolean true

Implement serial peripheral interface controller (SPI) when true. When disabled, the <code>spi\_miso\_i</code> signal is unconnected, the <code>spi\_sclk\_o</code> and <code>spi\_mosi\_o</code> signals are always low and the <code>spi\_csn\_o</code> signal is always high. The SPI interrupt can only be used when <code>CPU\_EXTENSION\_RISCV\_Zicsr</code> and <code>IO\_CLIC\_USE</code> are enabled. See chapter 3.6. Serial Peripheral Interface Controller (SPI) for more information.

#### IO TWI USE boolean true

Implement two-wire interface controller (TWI) when true. When disabled, the twi\_sda\_io and twi\_scl\_io signals are unconnected. The TWI interrupt can only be used when CPU\_EXTENSION\_RISCV\_Zicsr and IO\_CLIC\_USE are enabled. See chapter 3.7. Two Wire Serial Interface Controller (TWI) for more information.

#### **IO PWM USE boolean true**

Implement pulse-width modulation controller (PWM) when true. When disabled, the pwm\_o signal is always low. See chapter 3.8. Pulse Width Modulation Controller (PWM) for more information.

#### IO WDT USE boolean true

Implement watchdog timer (WDT) when true. The WDT interrupt can only be used when CPU\_EXTENSION\_RISCV\_Zicsr and IO\_CLIC\_USE are enabled. See chapter 3.3. Watchdog Timer (WDT) for more information.

#### IO CLIC USE boolean true

Implement core-local interrupt controller (CLIC) when true. When disabled, the CPU's machine external interrupt is not available. The CPU\_EXTENSION\_RISCV\_Zicsr has to be enabled to use the CLIC interrupt. See chapter 3.2. Core-Local Interrupt Controller (CLIC) for more information.

#### IO\_TRNG\_USE boolean false

Implement true-random number generator (TRNG) when true. See chapter <u>3.9. True Random Number Generator (TRNG)</u> for more information.

#### IO\_DEVNULL\_USE boolean true

Implement dummy device (DEVNULL) when true. This device can also be used for fast simulation console out. See chapter 3.10. Dummy Device (DEVNULL) and 5.12. Simulating the Processor for more information.

#### 1.9. FPGA Implementation Results

This chapter shows exemplary implementation results of the NEORV32 processor for an **Intel Cyclone IV EP4CE22F17C6N** FPGA on a *Terasic* © *DE0-Nano* board. The design was synthesized using **Intel Quartus Prime Lite 19.1** ("balanced implementation"). The timing information is derived from the Timing Analyzer / Slow 1200mV 0C Model. If not other specified, the default configuration of the processor's generics is assumed. No constraints were used.

The first chapter shows the implementation results for different CPU configurations (via the CPU\_EXTENSION\_\* generics only) while the second chapter shows the implementation results for each of the available peripherals. The results were taken from the fitter report (Resource Section / Resource Utilization by Entity) and reflect the resource utilization by the CPU only.

Please note, that the provided results are just a relative measure as logic functions of different modules might be merged between entity boundaries, so the actual utilization results might vary a bit.

#### 1.9.1. CPU

Hardware Version: 1.2.0.0

| CPU                                    | CPU Configuration Gene                                                                                                   | erics                                               | LEs  | FFs  | MEM bits | DSPs | F <sub>max</sub> |
|----------------------------------------|--------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------|------|------|----------|------|------------------|
| rv32i                                  | CPU_EXTENSION_RISCV_C CPU_EXTENSION_RISCV_E CPU_EXTENSION_RISCV_M CPU_EXTENSION_RISCV_Zicsr CPU_EXTENSION_RISCV_Zifencei | = false<br>= false<br>= false<br>= false<br>= false | 1065 | 477  | 2048     | 0    | 112 MHz          |
| rv32i<br>+<br>Zicsr<br>+<br>Zifencei   | CPU_EXTENSION_RISCV_C CPU_EXTENSION_RISCV_E CPU_EXTENSION_RISCV_M CPU_EXTENSION_RISCV_Zicsr CPU_EXTENSION_RISCV_Zifencei | = false<br>= false<br>= false<br>= true<br>= true   | 1914 | 837  | 2048     | 0    | 100 MHz          |
| rv32im<br>+<br>Zicsr<br>+<br>Zifencei  | CPU_EXTENSION_RISCV_C CPU_EXTENSION_RISCV_E CPU_EXTENSION_RISCV_M CPU_EXTENSION_RISCV_Zicsr CPU_EXTENSION_RISCV_Zifencei | = false<br>= false<br>= true<br>= true<br>= true    | 2542 | 1085 | 2048     | 0    | 100 MHz          |
| rv32imc<br>+<br>Zicsr<br>+<br>Zifencei | CPU_EXTENSION_RISCV_C CPU_EXTENSION_RISCV_E CPU_EXTENSION_RISCV_M CPU_EXTENSION_RISCV_Zicsr CPU_EXTENSION_RISCV_Zifencei | = true<br>= false<br>= true<br>= true<br>= true     | 2806 | 1102 | 2048     | 0    | 100 MHz          |
| rv32emc<br>+<br>Zicsr<br>+<br>Zifencei | CPU_EXTENSION_RISCV_C CPU_EXTENSION_RISCV_E CPU_EXTENSION_RISCV_M CPU_EXTENSION_RISCV_Zicsr CPU_EXTENSION_RISCV_Zifencei | = true<br>= true<br>= true<br>= true<br>= true      | 2783 | 1102 | 1024     | 0    | 100 MHz          |

Table 2: Hardware utilization for different CPU configurations

# 1.9.2. Peripherals

Hardware Version: 1.2.0.0

| Module   | Description                                  | LEs | FFs | MEM bits | DSPs |
|----------|----------------------------------------------|-----|-----|----------|------|
| Boot ROM | Bootloader ROM (4kB)                         | 3   | 1   | 32 768   | 0    |
| DEVNULL  | Dummy device                                 | 3   | 1   | 0        | 0    |
| DMEM     | Processor-internal data memory (8kB)         | 12  | 2   | 65 536   | 0    |
| GPIO     | General purpose input/output ports           | 38  | 33  | 0        | 0    |
| IMEM     | Processor-internal instruction memory (16kB) | 7   | 2   | 131 072  | 0    |
| MTIME    | Machine system timer                         | 269 | 166 | 0        | 0    |
| PWM      | Pulse_width modulation controller            | 76  | 69  | 0        | 0    |
| SPI      | Serial peripheral interface                  | 206 | 125 | 0        | 0    |
| SYSINFO  | System configuration information memory      | 7   | 7   | 0        | 0    |
| TRNG     | True random number generator                 | 104 | 93  | 0        | 0    |
| TWI      | Two-wire interface                           | 78  | 44  | 0        | 0    |
| UART     | Universal asynchronous receiver/transmitter  | 151 | 108 | 0        | 0    |
| WDT      | Watchdog timer                               | 57  | 45  | 0        | 0    |

Table 3: Hardware utilization by the different peripheral modules

#### 1.9.3. Exemplary FPGA Results

The following table shows exemplary implementation results for different FPGA platforms. The processor setup uses **all provided peripherals**, all CPU extensions (but not the E extension), no external memory interface and only internal instruction and data memories. IMEM uses 16kB and DMEM uses 8kB memory space. The setup top entity connects most of the processor's top entity signals to FPGA pins – except for the Wishbone bus and the external interrupt signals.

Hardware Version: 1.2.0.6

CPU Configuration: rv32imc + Zicsr + Zifencei

| Vendor  | FPGA                               | Board               | Toolchain                     | Impl.<br>strategy | LUT /<br>LE   | FF /<br>REG   | DSP    | Embedded<br>memory               | f<br>[MHz] |
|---------|------------------------------------|---------------------|-------------------------------|-------------------|---------------|---------------|--------|----------------------------------|------------|
| Intel   | Cyclone IV<br>EP4CE22F17C6N        | Terasic<br>DE0-Nano | Quartus<br>Prime Lite<br>19.1 | balanced          | 4035<br>(18%) | 1860<br>(8%)  | 0 (0%) | Memory bits: 231424 (38%)        | 101        |
| Lattice | iCE40 UltraPlus<br>iCE40UP5K-SG48I | Upduino<br>v2.0     | Radiant 2.1<br>(LSE)          | timing            | 5001<br>(95%) | 1694<br>(37%) | 0 (0%) | EBR: 12 (40%)<br>SPRAM: 4 (100%) | 22.5*      |
| Xilinx  | Artix-7<br>XC7A35TICSG324<br>-1L   | Arty A7-<br>35T     | Vivado<br>2019.2              | default           | 2509<br>(12%) | 1914<br>(5%)  | 0 (0%) | BRAM: 8 (16%)                    | 100*       |

Table 4: Hardware utilization for different FPGA platforms

#### Notes

The Lattice iCE40 UltraPlus setup uses the FPGA's SPRAM memory primitives for the internal IMEM and DEMEM (each 64kb). The according FPGA-specific memory components for the IMEM and DMEM can be found in the rtl/fpga\_specific folder. Also, the Lattice setup does not implement the M extension.

The clock frequencies marked with an asterisk (\*) are constrained clocks. The remaining ones are "f\_max" results from the place and route timing reports.

The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32 bootloader to store and automatically boot an application program after reset (both tested successfully).

#### 1.10, CPU Performance

Hardware Version: 1.2.0.0

#### 1.10.1. CoreMark Benchmark

#### Configuration

Hardware: 32kB IMEM, 16kB DMEM, 100MHz clock

CoreMark: 2000 iteration, MEM\_METHOD is MEM\_STACK

Compiler: RISCV32-GCC 9.2.0

Peripherals: **UART** for printing the results

The performance of the NEORV32 was tested and evaluated using the <u>CoreMark CPU benchmark</u>. This benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole system. The according source code and the SW project can be found in the <u>sw/example/coremark</u> folder. All NEORV32-specific modifications were done in the port-me files - "outside" of the time-critical benchmark core.

The resulting CoreMark score is defined as CoreMark iterations per second:

$$CoreMark Score = \frac{CoreMark iterations}{Time in seconds}$$

The execution time is determined via the RISC-V-compliant [m]cycle[h] CSRs. The relative CoreMark score is defined as the CoreMark score divided by the clock frequency [MHz]:

Relative CoreMark Score = 
$$\frac{\text{CoreMark Score}}{\text{Clock frequency [MHz]}}$$

#### Results

| CPU                        | Optimization | CoreMark Score | CoreMarks/Mhz |
|----------------------------|--------------|----------------|---------------|
| rv32i + Zicsr + Zifencei   | -02          | 25.97          | 0.2597        |
| rv32imc + Zicsr + Zifencei | -02          | 55.55          | 0.5555        |
| rv32im + Zicsr + Zifencei  | -02          | 54.05          | 0.5405        |

Table 5: NEORV32 CoreMark results

#### 1.10.2. Instruction Timing

The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of several consecutive micro operations. Hence, each instruction requires several clock cycles to execute. The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on the available CPU extensions. The following table shows the performance results for successfully (!) running 2000 CoreMark iterations. The average CPI is computed by dividing the total number of required clock cycles (only the timed core to avoid distortion due to IO wait cycles) by the number of executed instructions ([m]instret[h] CSRs). The executables were generated using optimization -02.

| CPU                        | Required Clock Cycles | Executed<br>Instructions | Average CPI |
|----------------------------|-----------------------|--------------------------|-------------|
| rv32i + Zicsr + Zifencei   | 7 754 927 850         | 1 492 843 669            | 5.2         |
| rv32im + Zicsr + Zifencei  | 3 684 015 850         | 626 274 115              | 5.9         |
| rv32imc + Zicsr + Zifencei | 3 788 220 853         | 626 274 115              | 6.0         |



More information regarding the execution time of each implemented instruction can be found in chapter 2.2. Instruction Timing 2.2. Instruction Timing.

# 2. Central Processing Unit

This chapter takes a detailed look at the NEORV32 CPU. The CPU itself consists of the following VHDL files from the project's rtl/core folder:

neorv32\_cpu.vhd CPU top entity

neorv32\_cpu\_alu.vhd Arithmetic/logic unit neorv32\_cpu\_bus.vhd Bus interface unit

neorv32\_cpu\_cp\_muldiv.vhd MULDIV co-processor (for CPU M ext.)

neorv32\_cpu\_decompressor.vhd Compressed instructions decoder (for CPU C ext.)

neorv32\_cpu\_regfile.vhd Data register file

neorv32\_package.vhd Processor/CPU package files

#### **CPU Key Features**

- RISC-V 32-bit base integer ISA rv32[i/e][c][m]
- Compliant to the RISC-V user specifications passes the RISC-V compliance tests
- Mostly compliant to the RISC-V privileged specifications
- Optional privileged architecture (Zicsr) extension supporting RISC-V-compliant control and status registers (CSRs), access instructions, exceptions and interrupts
- Zifencei extension for instruction stream synchronization via the fence.i instruction
- Privilege levels: Machine mode (M-mode)
- Modified Von-Neumann architecture with unified instruction and data address space
- Little-endian byte order
- No hardware support of unaligned data/instructions accesses they will trigger an exception. When the <u>C</u> extension is enabled, instructions can also be 16-bit aligned and a misaligned instruction address exception is not possible anymore
- Two-stage pipelined multi-cycle in-order instruction execution
- NEORV32-specific custom CSRs are mapped to the official RISC-V custom address spaces

#### Architecture

The CPU of the NEORV32 processor was completely designed from scratch and based only on the official ISA and privileged architecture specifications.



Figure 2: Simplified architecture of the NEORV32 CPU

The NEORV32 CPU uses a 2 stages pipelined architecture. The first stage (IF) is responsible for fetching new instructions from memory via the *fetch engine*. Compressed instructions are also decompressed in this stage. The second stage (EX) is responsible for actually executing the fetched instructions via the *execute engine*. Both stages are connected via an instruction prefetch buffer.

These two pipeline stages are based on a multi-cycle processing engine. Hence, the optimal CPI (cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores, multi-cycle operations like shifts or multiplications or when the instruction fetch engine has to reload the prefetch buffers due to a taken branch.

Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes every single instruction in a series of consecutive micro-operations.

The combination of these two classical design paradigms allows an increased instruction execution (due to the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach). This seems to be a quite good trade-off – at least for me. Furthermore, the multi-cycle instruction execution of the second pipeline stage cannot generate any kind of pipeline hazards.

The CPU provides independent interfaces for instruction fetch and data access. These two busses are merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann Architecture.

#### 2.1. Instruction Set and CPU Extensions

#### 32-bit Base ISA (I extension)

The CPU supports the complete RV32I base integer instruction set:

- Immediates: LUI AUIPC
- Jumps: JAL JALR
- Branches: BEQ BNE BLT BGE BLTU BGEU
- Memory: LB LH LW LBU LHU SB SH SW
- ALU: ADDI SLTI SLTIU XORI ORI ANDI SLLI SRLI SRAI ADD SUB SLL SLT SLTU XOR
  - SRL SRA OR AND
- Environment: ECALL EBREAK FENCE



The FENCE does not perform any processor-internal operation at all and behaves like a NOP. However, the top's fence o signal is set high for one cycle to inform the memory system.

#### **Embedded CPU Architecture (E extension)**

This extensions does not feature additional instructions. However, the embedded CPU version only implements the lower 16 registers and uses a specific ABI (ilp32e). Also, no performance and timer CSRs are available ([m]cycle[h] and [m]instret[h]).

#### **Compressed Instructions (C extension)**

Hardware-accelerated multiplication and divisions instructions are available when the CPU\_EXTENSION\_M configuration generic is true. In this case the following instructions are available:

Multiplication: MUL MULH MULHSU MULHU
 Division: DIV DIVU REM REMU

#### **Integer Multiplication and Division (M extension)**

Compressed 16-bit instructions are available when the CPU\_EXTENSION\_C configuration generic is true. In this case the following instructions are available:

 C.ADDI4SPN C.LW C.SW C.NOP C.ADDI C.JAL C.LI C.ADDI16SP C.LUI C.SRLI C.SRAI C.ANDI C.SUB C.XOR C.OR C.AND C.J C.BEQZ C.BNEZ C.SLLI C.LWSP C.JR C.MV C.EBREAK C.JALR C.ADD C.SWSP

Multiplication and division operations are executed in a bit-serial approach. The execution time is fixed and not affected by the actual operation values.

#### Control and Status Register Access (Zicsr extension)

The CSR access instructions as well as the exception and interrupt system are implemented when the CPU\_EXTENSION\_RISCV\_Zicsr configuration generic is true. In this case the following instructions are available:

CSR access: CSRRW CSRRS CSRRC CSRRWI CSRRSI CSRRCI

• Environment: MRET WFI



#### Instruction Coherency Operations (Zifencei extension)

The Zifencei CPU extension is implemented if the CPU\_EXTENSION\_RISCV\_Zifencei configuration generic is true. It allows manual synchronization of the instruction stream.

FENCE.I



The FENCE. I instruction resets the CPU's instruction fetch engine and flushes all prefetch buffers. This allows a clean re-fetch of modified data from memory. Also, the top's fencei\_o signal is set high for one cycle to inform the memory system.

# 2.2. Instruction Timing

The following table shows the required clock cycles for executing a certain instruction. The execution cycles assume a bus access without additional wait states (like bus accesses to processor-internal memories or peripherals) and a filled pipelined.

| Class                      | Instructions                                                                                                                                                     | <b>Execution Cycles</b> |  |  |
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|--|--|
| ALU                        | ADDI SLTI SLTIU XORI ORI ANDI ADD SUB SLT SLTU XOR OR AND LUI AUIPC  ALU  C.ADDI4SPN C.NOP C.ADDI C.LI C.ADDI16SP C.LUI C.ANDI C.SUB C.XOR C.OR C.AND C.ADD C.MV |                         |  |  |
| ALU - Shifts               | SLLI SRLI SRAI SLL SRA ALU - Shifts  C.SRLI C.SRAI C.SLLI                                                                                                        |                         |  |  |
| Branches                   | Taken: 4 + 6<br>Not taken: 3                                                                                                                                     |                         |  |  |
| Jumps                      | JAL JALR  Jumps  C.JAL C.J C.JR C.JALR                                                                                                                           |                         |  |  |
| Memory                     | LB LH LW LBU LHU SB SH SW  Memory  C.LW C.SW C.LWSP C.SWSP                                                                                                       |                         |  |  |
| Multiplication             | MUL MULH MULHSU MULHU                                                                                                                                            | 2 + 32 + 4              |  |  |
| Division DIV DIVU REM REMU |                                                                                                                                                                  | 2 + 32 + 6              |  |  |
| CSR Access                 | 3                                                                                                                                                                |                         |  |  |
| System                     | ECALL EBREAK MRET WFI FENCE FENCE.I C.EBREAK                                                                                                                     | 3                       |  |  |

Table 6: Required clock cycles per instruction

#### **Iterative Operations**

Shift operations as well as multiplications and divisions (M extension) are processed in a bit-serial approach.

#### **Branches and Jumps**

Jumps and taken branches are quite painful due to the pipelined architecture and the prefetch buffers, which require reloading. However, this architecture highly accelerates the execution of all other instruction types. In summary, this acceleration overcomes the high branch penalty cycles.



The average CPI (cycles per instructions) for executing the CoreMark benchmark for different CPU configurations is presented in chapter 1.10.2. Instruction Timing.

4 Shift amount: 0..31

#### 2.3. Control and Status Registers (CSRs)



The CSRs, the CSR-related instructions as well as the complete exception and interrupt processing system are only available when the CPU\_EXTENSION\_RISCV\_Zicsr generic is true.

The following table shows a summary of all available CSRs. The address defines the CSR address for the CSR access instructions. The [ASM] name can be used for (inline) assembly code and is directly understood by the assembler/compiler. The [C] names are defined by the NEORV32 core library and can be used as immediates in plain C code. The "R/W" column shows whether the CSR can be read or written.

According to the RISC-V specs writing an implemented read-only CSR does not trigger an exception. The CSRs marked in **red** are not (fully) compliant to the official RISC-V specifications. For instance, the read/write access might be different (marked with a **red** and underlined "**r/w**") or not all of the specified bits are implemented.

When accessing a CSR that is not available or that is not implemented due to the actual processor configuration, the CSR access will trigger an invalid instruction exception.

The NEORV32-specific CSRs (if available at all) are mapped to the official "custom CSR" CSR address space.

| Address                                | Name [ASM] | Name [C]              | R/W        | Function                                          |  |
|----------------------------------------|------------|-----------------------|------------|---------------------------------------------------|--|
|                                        |            | Machine Tra           | p Setup    | (RISC-V compliant)                                |  |
| 0x300                                  | mstatus    | CSR_MSTATUS           | r/w        | Machine status register                           |  |
| 0x301                                  | misa       | CSR_MISA              | <u>r/-</u> | Machine CPU ISA and extensions                    |  |
| 0x304                                  | mie        | CSR_MIE               | r/w        | Machine interrupt enable register                 |  |
| 0x305                                  | mtvec      | CSR_MTVEC             | r/w        | Machine trap-handler base address (for ALL traps) |  |
|                                        |            | Machine Trap          | Handlii    | ng (RISC-V compliant)                             |  |
| 0x340                                  | mscratch   | SCR_MSCRATCH          | r/w        | Machine scratch register                          |  |
| 0x341                                  | mepc       | CSR_MEPC              | r/w        | Machine exception program counter                 |  |
| 0x342                                  | mcause     | CSR_MCAUSE            | r/w        | Machine trap cause                                |  |
| 0x343                                  | mtval      | CSR_MTVAL             | r/w        | Machine bad address or instruction                |  |
| 0x344                                  | mip        | CSR_MIP               | r/w        | Machine interrupt pending register                |  |
| Counters and Timers (RISC-V compliant) |            |                       |            |                                                   |  |
| 0xb00                                  | mcycle     | CSR_MCYCLE            | r/w        | Machine cycle counter low word                    |  |
| 0xb02                                  | minstret   | CSR_MINSTRET          | r/w        | Machine instructions-retired counter low word     |  |
| 0xb80                                  | mcycleh    | CSR_MCYCLEH           | r/w        | Machine cycle counter low word                    |  |
| 0xb82                                  | minstreth  | CSR_MINSTRETH         | r/w        | Machine instructions-retired counter high word    |  |
| 0xc00                                  | cycle      | CSR_CYCLE             | r/-        | Cycle counter low word                            |  |
| 0xc01                                  | time       | CSR_TIME              | r/-        | System time (from MTIME) low word                 |  |
| 0xc02                                  | instret    | CSR_INSTRET           | r/-        | Instructions-retired counter low word             |  |
| 0xc80                                  | cycleh     | CSR_CYCLEH            | r/-        | Cycle counter high word                           |  |
| 0xc81                                  | timeh      | CSR_TIMEH             | r/-        | System time (from MTIME) high word                |  |
| 0xc82                                  | instreth   | CSR_INSTRETH          | r/-        | Instructions-retired counter high word            |  |
|                                        | N          | Iachine Information R | egisters   | s, read-only (RISC-V compliant)                   |  |
| 0xf11                                  | mvendorid  | CSR_MVENDORID         | r/-        | Vendor ID                                         |  |
| 0xf12                                  | marchid    | CSR_MARCHID           | r/-        | Architecture ID                                   |  |
| 0xf13                                  | mimpid     | CSR_MIMPID            | r/-        | Machine implementation ID / version               |  |
| 0xf14                                  | mhartid    | CSR_MHARTID           | r/-        | Machine thread ID                                 |  |
|                                        |            | NEORV32               | 2-Specif   | fic (Custom CSRs)                                 |  |
| -                                      | -          | -                     | -          | -                                                 |  |

Table 7: NEORV32 Control and Status Registers (CSRs)

# 2.3.1. Machine Trap Setup

0x300: CSR\_MSTATUS

The MSTATUS register is compliant to the RISC-V mstatus CSR. The following bits are implemented (all remaining bits are always zero and are read-only):

| Bit#  | Name | R/W | Function                                                |
|-------|------|-----|---------------------------------------------------------|
| 12:11 | MPP  | r/- | CPU operation mode, always "11" (machine mode / M-mode) |
| 7     | MPIE | r/  | Previous machine interrupt enable flag                  |
| 3     | MIE  | r/w | Machine interrupt enable flag                           |

When entering an exception/interrupt, the MIE flag is copied to MPIE and cleared afterwards. When leaving the exception/interrupt (via the MRET instruction), MPIE is copied back to MIE.

0x301: CSR\_MISA



This CSR is not fully RISC-V-compliant yet as it is read-only. Hence, implemented CPU extensions cannot be switch on/off during runtime.

The MISA register is compliant to the RISC-V misa CSR. The first 26 bits show the implemented CPU extensions. The following bits are implemented (all remaining bits are always zero and are read-only):

| Bit#  | R/W | Function                                                                                         |  |
|-------|-----|--------------------------------------------------------------------------------------------------|--|
| 31:30 | r/- | 32-bit indicator ("01")                                                                          |  |
| 25    | r/- | Z CPU extension, set when CPU_EXTENSION_RISCV_Zicsr and CPU_EXTENSION_RISCV_Zifencei are enabled |  |
| 12    | r/- | M CPU extension, set when CPU_EXTENSION_RISCV_M enabled                                          |  |
| 8     | r/- | I CPU extension, always set, cleared when CPU_EXTENSION_RISCV_E enabled                          |  |
| 4     | r/- | E CPU extension, set when CPU_EXTENSION_RISCV_E enabled                                          |  |
| 2     | r/- | C CPU extension, set when CPU_EXTENSION_RISCV_C enabled                                          |  |

0x304: CSR\_MIE

The MIE register is compliant to the RISC-V mie CSR. The following bits are implemented (all remaining bits are always zero and are read-only):

| Bit# | Name     | R/W | Function                                      |
|------|----------|-----|-----------------------------------------------|
| 11   | MEIE r/w |     | Machine external interrupt enable (from CLIC) |
| 7    | MTIE r/w |     | Machine timer interrupt enable (from MTIME)   |
| 3    | MSIE     | r/w | Machine software interrupt enable             |

0x305: CSR\_MTVEC

The MTVEC register is compliant to the RISC-V mtvec CSR. This register stores the base address for the machine trap handler. The CPU jumps to this address, regardless of the trap source. The lowest two bits of this register are always zero and cannot be altered.

### 2.3.2. Machine Trap Handling

0x340: CSR\_MSCRATCH

The MSCRATCH register is compliant to the RISC-V mscratch CSR. It is a general purpose scratch register that can be used by the exception/interrupt handler.

0x341: CSR\_MEPC

The MEPC register is compliant to the RISC-V mepc CSR. For exceptions (like an illegal instruction), this register provides the address of the exception-causing instruction. On return (via the MRET instruction), the mepc CSR has to increased by 4 to get to the next instruction. For exceptions (like a machine timer interrupt), this register provides the address of the next not-yet-executed instruction. On return(via the MRET instruction), the mepc CSR must not be modified.

0x342: CSR\_MCAUSE

The MCAUSE register is compliant to the RISC-V mcause CSR. It shows the cause of the current exception (see chapter 2.4. Exceptions and Interrupts).

0x343: CSR\_MTVAL

The MTVAL register is compliant to the RISC-V mtval CSR. When a trap is triggered, the CSR shows either the faulting address (for misaligned/faulting load/stores/fetch) or the faulting instruction itself (for illegal instructions). For interrupts the CSR is set to zero.

0x344: CSR\_MIP

The MIP register is compliant to the RISC-V mip CSR. The following bits are implemented (all remaining bits are always zero and are read-only):

| Bit# | Name | R/W | Function                                       |
|------|------|-----|------------------------------------------------|
| 11   | MEIP | r/- | Machine external interrupt pending (from CLIC) |
| 7    | MTIP | r/- | Machine timer interrupt pending (from MTIME)   |
| 3    | MSIP | r/- | Machine software interrupt pending             |

#### 2.3.3. Counters and Timers

| 0xc01: CSR_TIME | 0xc81: CSR_TIMEH |
|-----------------|------------------|
|-----------------|------------------|

The TIME[H] registers show the current system time. The system time is generated by the MTIME system timer unit. If the MTIME unit is not implemented (IO\_MTIME\_USE is false) these CSRs do not exist. The time CSRs are read-only. Change the system time via the MTIME unit.

| 0xb00: CSR_MCYCLE | 0xb80: CSR_MCYCLEH |
|-------------------|--------------------|
| 0xc00: CSR_CYCLE  | 0xc80: CSR_CYCLEH  |



The [M]CYCLEH and [M]INSTRETH CSRs only implement the lowest 20-bit (in contrast to the original RISC-V-specified 32-bit). The remaining bits are always zero.

The MCYCLE[H] and CYCLE[H] CSRs count the executed clock cycles. These registers are cleared during reset and increment with the primary clock. The CYCLE[H] CSR is read-only and is a shadowed copy of the MCYCLE[H] CSR, which can also be written by the user. These registers are not available for embedded CPUs (CPU\_EXTENSION\_E generic is true) and are only implemented if CSR\_COUNTERS\_USE is true. The [m]cycle[h] counters stop counting then the CPU is in sleep mode.

| 0xb02: CSR_MINSTRET | 0xb82: CSR_MINSTRETH |
|---------------------|----------------------|
| 0xc02: CSR_INSTRET  | 0xc82: CSR_INSTRETH  |

The MINSTRET[H] and INSTRET[H] CSRs count the executed instructions. These registers are cleared during reset and increment when the CPU control state machine is in EXECUTE state. The INSTRET[H] CSR is read-only and is a shadowed copy of the MINSTRET[H] CSR, which can also be written by the user. These registers are not available for embedded CPUs (CPU\_EXTENSION\_E generic is true) and are only implemented if CSR\_COUNTERS\_USE is true.

# 2.3.4. Machine Information Registers

| 0xf11: | CSR_MVENDORID |
|--------|---------------|
| 0xf12: | CSR_MARCHID   |

The MVENDORID and MARCHID register are compliant to the RISC-V mvendorid and marchid CSRs. They are always read as zero and are implemented for compatibility only.

0xf13: CSR\_MIMPID

The MIMPID register is compliant to the RISC-V mimpid CSR. It shows the version of the NEORV32.

0xf14: CSR\_MHARTID

The MHARTID register is compliant to the RISC-V mhartid CSR. It is always read as zero.

# 2.4. Exceptions and Interrupts

The NEORV32 supports the following exceptions and instructions. The identifier codes and the priority are compliant to the RISC-V specifications. Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in the mtvec CSR. The cause of the according interrupt or exception can be determined via the content of the mcause CSR

| Priority       | mcause     | ID [C]             | Function                                         |  |
|----------------|------------|--------------------|--------------------------------------------------|--|
| 1<br>(highest) | 0×8000000B | EXCID_MEI          | Machine external interrupt (via CLIC)            |  |
| 2              | 0×80000007 | EXCID_MTI          | Machine timer interrupt (via MTIME)              |  |
| 3              | 0×80000003 | EXCID_MSI          | Machine software interrupt                       |  |
| 4              | 0x0000001  | EXCID_I_ACCESS     | Instruction access fault                         |  |
| 5              | 0x00000002 | EXCID_I_ILLEGAL    | Illegal instruction                              |  |
| 6              | 0×00000000 | EXCID_I_MISALIGNED | Instruction address misaligned                   |  |
| 7              | 0x0000000B | EXCID_MENV_CALL    | Environment call from M-mode (ECALL instruction) |  |
| 8              | 0x0000003  | EXCID_BREAKPOINT   | Breakpoint (EBREAK instruction)                  |  |
| 9              | 0x00000006 | EXCID_S_MISALIGNED | Store address misaligned                         |  |
| 10             | 0x00000004 | EXCID_L_MISALIGNED | Load address misaligned                          |  |
| 11             | 0×00000007 | EXCID_S_ACCESS     | Store access fault                               |  |
| 12<br>(lowest) | 0×00000005 | EXCID_L_ACCESS     | Load access fault                                |  |



The [C] names are defined by the NEORV32 core library and can be used as immediates in plain C code.

# 2.5. Address Space

The CPU is a 32-bit architecture with separated instruction and data interfaces. Hence, each of this interfaces can access an address space of up to  $2^{32}$  bytes (4GB). For the NEORV32 processors, these two interfaces are multiplexed to a single processor-internal bus making it a modified von-Neumann architecture.

The resulting (shared) address space is divided into 4 main region: The instruction memory space for instructions, the data memory space for application runtime data, the bootloader address space for the processor-internal bootloader and the IO address space for the processor-internal peripheral/IO devices.

The beginning of the memory space for instructions is defined via the MEM\_MISPACEBASE generic. This generic must be 4-byte aligned. The complete size of the instruction memory space is defined via the MEM\_MISPACESIZE generic (in bytes). Analogous, the beginning of the memory space for data is defined via the MEM\_MDSPACEBASE generic. This generic must be 4-byte aligned. The complete size of the data memory space is defined via the MEM\_MDSPACESIZE generic (in bytes). The instruction and data memory spaces may overlap.

The base address of the bootloader and the IO region for the peripheral devices are fixed. These address regions cannot be used for other applications – even if the bootloader or all IO devices are not implemented.



Figure 3: General NEORV32 address space layout

The processor can implement internal memories for instructions (IMEM) and data (DMEM), which will be mapped to FPGA block RAMs. The implementation of these memories is controlled via the boolean MEM\_INT\_IMEM\_USE and MEM\_INT\_DMEM\_USE generics. The size of these memories are configured via the MEM\_INT\_IMEM\_SIZE and MEM\_INT\_DMEM\_SIZE generics (in bytes). The processor-internal instruction memory (IMEM) can be implemented as ROM (MEM\_INT\_IMEM\_ROM), which is initialized with the application code during synthesis.

If the processor-internal IMEM is implemented, it is located at the base address of the instruction address space. Also, the processor-internal data memory is located at the beginning of the data address space if implemented. If the configured instruction/data memory space is greater than the size of the IMEM/DMEM, the accesses to the according addresses are forwarded to the external bus interface to interface processor-external memories and peripheral devices (when MEM\_EXT\_USE generic is true).

### 2.5.1. Processor-Internal Instruction Memory (IMEM)

A processor-internal instruction memory (rtl/core/neorv32\_imem.vhd) can be enabled via the processor's MEM\_INT\_IMEM\_USE generic. The size in bytes is defined via the MEM\_INT\_IMEM\_SIZE generic. If the IMEM is implemented, the memory is mapped into the instruction memory space (defined via the MEM\_MISPACESIZE generic) and located at the beginning of the instruction memory space (defined via the MEM\_MISPACEBASE generic).

By default, the IMEM is implemented as RAM, so the content can be modified during run time. This is required when using a bootloader that can update the content of the IMEM at any time. If you do not need the bootloader anymore — since your application development is done and you want the program to permanently reside in the internal instruction memory — the IMEM can also be implemented as true read-only memory. In this case set the MEM\_INT\_IMEM\_ROM generic of the processor's top entity to true.

When the IMEM is implemented as ROM, it will be initialized during synthesis with the actual application program image. Based on your application the toolchain will automatically generate a VHDL initialization file rtl/core/neorv32\_application\_image.vhd, which is automatically inserted into the IMEM. If the IMEM is implemented as RAM, the memory will not be initialized at all.

# 2.5.2. Processor-Internal Data Memory (DMEM)

A processor-internal data memory (rtl/core/neorv32\_dmem.vhd) can be enabled via the processor's MEM\_INT\_DMEM\_USE generic. The size in bytes is defined via the MEM\_INT\_DMEM\_SIZE generic. If the DMEM is implemented, the memory is mapped into the data memory space (defined via the MEM\_MDSPACESIZE generic) and located at the beginning of the data memory space (defined via the MEM\_MDSPACEBASE generic). The DMEM is always implemented as RAM.

### 2.5.3. Processor-Internal Bootloader ROM (BOOTROM)

As the name already suggests, the boot ROM (rtl/core/neorv32\_boot\_rom.vhd) contains the readonly bootloader image. When the bootloader is enabled via the BOOTLOADER\_USE generic it is directly executed after system reset.

The bootloader ROM is located at address 0xFFFF0000. This location is fixed and the bootloader ROM size must not exceed 32kB. The bootloader read-only memory is automatically initialized during synthesis via the rtl/core/neorv32\_bootloader\_image.vhd file, which is generated when compiling and installing the bootloader sources.

The bootloader ROM address space cannot be used for other applications even when the bootloader is not implemented.

### **Boot Configuration**

When the bootloader is implemented, the CPU starts execution after reset right at the beginning of the boot ROM. If the bootloader is not implemented, the CPU starts execution at the beginning of the instruction memory space defined via MEM\_MISPACEBASE generic. In this case, the instruction memory has to contain a valid executable – either by using the internal IMEM with an initialization during synthesis or by a user-defined initialization process.

### 2.5.4. Processor-External Memory Interface (WISHBONE)

#### Overview

| TT 1 (1)                 | noonus sa de la companya de la                                                  |                                                                                                                                            |
|--------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
| Hardware source file(s): | neorv32_wishbone.vhd                                                            |                                                                                                                                            |
| Software driver file(s): | none                                                                            | Implicitly used                                                                                                                            |
| Top entity ports:        | wb_adr_o wb_dat_i wb_dat_o wb_we_o wb_sel_o wb_stb_o wb_cyc_o wb_ack_i wb_err_i | Address output (32-bit) Data input (32-bit) Data output (32-bit) Write enable Byte enable (4-bit) Strobe Valid cycle Acknowledge Bus error |
|                          | fence_o<br>fencei_o                                                             | Indicates an executed fence instruction Indicates an executed fencei instruction                                                           |
| Configuration generics:  | MEM_EXT_USE<br>MEM_EXT_REG_STAGES<br>MEM_EXT_TIMEOUT                            | Enable external memory interface when true<br>Number of interface register stages<br>Maximum length of bus accesses                        |

The external memory interface uses the Wishbone interface protocol. The external interface port is available when the MEM\_EXT\_USE generic is true. This interface can be used to attach external memories, custom hardware accelerators additional IO devices or all other kinds of IP blocks.

All memory accesses from the CPU, that do not target the internal bootloader ROM, the internal IO region or the internal data/instruction memories (if implemented at all) are forwarded to the Wishbone gateway and thus to the external memory interface.

#### Latency

The Wishbone gateway can be configured to provide additional register stages to ease timing closure. The MEM\_EXT\_REG\_STAGES generic defines the number of register stages:

- 0: No register stages; no additional latency
- 1: Processor-outgoing signals are buffered; 1 cycle additional latency
- 2: Processor-outgoing and -incoming signals are buffered; 2 cycles additional latency

### **Bus Access Timeout**

Whenever the CPU starts a memory access, an internal timer is started. If the accessed address (the memory or peripheral device) does not acknowledge the transfer within a certain time, the bus access is canceled and a load/store/instruction fetch bus access fault exception is raised – depending on the bus access type.

The processor-internal memories and peripherals will always acknowledge the transfers within two cycles. Of course, a bus timeout will occur if accessing unused address locations. For example, a bus timeout and thus, a load/store bus access fault, will occur when trying to access an IO device, that has not been implemented.

The maximum bus cycle time, after which an exception will be raised, is defined via the MEM\_EXT\_TIMEOUT generic of the NEORV32 processor.

Bus accesses via the external memory interface are acknowledged via the Wishbone-compliant wb\_ack\_i signal. The external bus accesses can be terminated/aborted at any time by an accessed device/memory via the Wishbone-compliant wb\_err\_i signal.



The bus timeout value is defined for the external memory interface but also applies when accessing processor-internal modules like memories or IO device. Hence, this parameter must not be less than one cycle.

#### Wishbone Bus Protocol

The external memory interface uses <u>Classic Pipelined Wishbone Transactions</u>. There is always a delay of at least one clock cycle between issuing a bus access and read-back / acknowledge. The transactions are always in order and cannot overlap.

A detailed description of the implemented Wishbone bus protocol and the according interface signals can be found in the data sheet "Wishbone B4 – WISHBONE System-on-Chip (SoC) Interconnection Architecture for Portable IP Cores". A copy of this document can be found in the docs folder of this project.

# 2.5.5. Processor-Internal Peripheral/IO Devices

The processor-internal peripheral/IO devices are located at the end of the 32-bit address space at base address 0xFFFFF80. A region of 128 bytes is reserved for this devices. Hence, all peripheral are accessed using a memory-mapped scheme. A special linker script as well as the NEORV32 core software library abstract the specific memory layout for the user.

The peripheral/IO address space should not be used for other applications even when all of the devices are not implemented.



When accessing an IO device, that hast not been implemented (e.g., via the IO\_xxx\_USE generics), a load/store access fault is triggered.

#### **Internal Reset Generator**

Most processor-internal modules – except for the CPU and the watchdog timer – do not require a dedicated reset signal. However, all devices can be reset by software by clearing the corresponding unit's control register. The automatically included application start-up code will perform such a software-reset of all modules to ensure a clean system reset state.

The hardware reset signal of the processor can either be triggered via the external reset pin (rstn\_i, low-active) or by the internal watchdog timer (if implemented). Before the external reset signal is applied to the system, it is filtered (so no spike can generate a reset, a minimum active reset period of one clock cycle is required) and extended to have a minimal duration of four clock cycles.

### **Internal Clock Divider**

An internal clock divider generates 8 clock signals derived from the processor's main clock input clk\_i. These derived clock signals are not actual *clock signals*. Instead, they are derived from a simple counter and are used as "clock enable" signal by the different processor modules. Thus, the whole design operates using only the main clock signal (single clock domain). Some of the processor peripherals like the Watchdog or the UART can select one of the derived clock enabled signals for their internal operation. If none of the connected modules require a clock signal from the divider, it is automatically deactivated to reduce dynamic power.

The peripheral devices, which provide a time-based configuration, provide a 3 bit prescaler select in their according control register to select 1 out of the 8 available clocks. The mapping of the prescaler select bits to the actually obtained clock are shown in the table below. Here, f represents the processor main clock from the top entity <code>clk\_i</code> signal.

| Prescaler bits:  | 000 | 001 | 010 | 011  | 100   | 101    | 110    | 111    |
|------------------|-----|-----|-----|------|-------|--------|--------|--------|
| Resulting clock: | f/2 | f/4 | f/8 | f/64 | f/128 | f/1024 | f/2048 | f/4096 |

### 2.6. Bus Interface

The CPU provides two independent bus interfaces: One for fetching instructions (i\_bus\_\*) and one for accessing data (d\_bus\_\*) via load and store operations. Both interfaces use the same interface protocol. The two CPU busses are multiplexed by a bus switch (rtl/core/neorv32\_busswitch.vhd) in the processor so both can access the processor-internal bus. Again, the processor-internal bus provides the same interface protocol as the original CPU interfaces.



The bus switch allows data accesses to have higher priority than instruction fetch accesses. The CPU's fetch provides prefetch buffers so there is no big deal it has to wait for bus access.



All processor-internal peripherals and memories are connected to the processor-bus driven by the bus switch. Also, the Wishbone-based external memory interface is connected to this bus. Hence, the processor uses a von-Neumann approach since data and instructions are accessed via the same bus and address space.

### 2.6.1. Interface Signals

The following table shows the signals of the interfaces seen from the CPU (a \*\_0 signal is driven by the CPU and a \*\_i signal is read by the CPU).

| Signal       | Size | Function                                                                                  |
|--------------|------|-------------------------------------------------------------------------------------------|
| bus_addr_o   | 32   | The access address                                                                        |
| bus_rdata_i  | 32   | Data input for read operations                                                            |
| bus_wdata_o  | 32   | Data output for write operations                                                          |
| bus_ben_o    | 4    | Byte enable signal for write operations                                                   |
| bus_we_o     | 1    | Bus write access                                                                          |
| bus_re_o     | 1    | Bus read access                                                                           |
| bus_cancel_o | 1    | Indicates that the current bus access is terminated by the controller (the CPU)           |
| bus_ack_i    | 1    | Accessed peripheral indicates a successful completion of the bus transaction              |
| bus_err_i    | 1    | Accessed peripheral indicates an error during the bus transaction                         |
| bus_fence_o  | 1    | This signal is set for one cycle when the CPU executes a data/instruction fence operation |



Currently, there a no pipelined operation implemented. So only a single transfer request can be "on the fly". This also means that there can only be an exclusive active read transaction or an active write transactions – read and write transactions in parallel are not yet implemented.

#### 2.6.2. Protocol

A bus request is triggered either by the bus\_re\_o signal (for reading data) or by the bus\_we\_o signal (for writing data). These signals are active for one cycle and initiate a new bus transaction. The transaction is completed when the accessed peripheral either sets the bus\_ack\_i signal ( $\rightarrow$  successful completion) or the bus\_err\_i signal to indicate an error during the transaction. All these control signals are only active (= high) for one single cycle.

An error during a transfer will trigger the according *instruction bus access fault* or *load/store bus access fault* exception. The CPU can also terminate a transfer (when an error during transfer is encountered) via the bus\_cancel\_o signal.

The transfer can be completed directly in the next cycle after it was initiated (via the bus\_re\_o or bus\_we\_o signal) if the peripheral sets bus\_ack\_i or bus\_err\_i high for one cycle.



There is no problem if the accessed peripheral takes longer to process the request. However, the bus transaction **has to be completed** within the number of cycles specified via the top entity **MEM\_EXT\_TIMEOUT** generic (for example within 15 cycles). If not, the according *instruction bus access fault* or *load/store bus access fault* exception is triggered and the CPU cancels the transaction via the bus\_cancel\_o signal.

#### Read Access

For a read access, the accessed address (bus\_addr\_o) is set when bus\_re\_o goes high. The address is kept stable until the transaction is completed. In the example below the accessed peripheral cannot answer directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as the bus transaction is completed (here, the transaction is successful and the peripheral sets the bus\_ack\_i signal).



Figure 4: CPU interface read access

### **Write Access**

For a write access, the accessed address (bus\_addr\_o), the data to be written (bus\_wdata\_o) and the byte enable signals (bus\_ben\_o) are set when bus\_we\_o goes high. These three signals are kept stable until the transaction is completed. In the example below the accessed peripheral cannot answer directly in the next cycle after issuing. Here, the transaction is successful and the peripheral sets the bus\_ack\_i signal several cycles after issuing.



Figure 5: CPU interface write access

### **Memory Barriers**

Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle (d\_bus\_fence\_o for a fence instruction; i\_bus\_fence\_o for a fencei instruction). It is the task of the memory system to perform the necessary operations (like a cache flush and refill).

# 3. Peripheral/IO Devices

This chapter gives detailed information about the available processor-internal peripheral / IO devices. You do not need to worry about peripheral device registers and register bits when writing an application for the NEORV32. The core software library completely abstracts the underlying hardware via high-level C functions.



You should use the provided core software library to interact with the peripheral devices. This prevents incompatibilities with future versions, since the hardware driver functions handle all the register and register bit accesses.



Most of the IO devices do not have a hardware reset. Instead, the devices are reset via software by writing zero to the unit's control register. A general software-based reset of all devices is done by the application start-up code crt0.asm.

#### Nomenclature

Each peripheral device chapter features a register map showing accessible control and data registers of the according device including the implemented control and status bits. You can directly interact with these registers/bits via the provided <u>C-code defines</u>. These defines are defined in the main main processor core library file <u>sw/lib/include/neorv32.h</u>. The registers and/or register bits, which can be directly accessed using plain C-code, are marked with a **[C]**.

Not all registers or register bits can be arbitrarily read/written. The following read/write access types are available:

- r/w Registers / register bits can be read and written.
- r/- Registers / register bits are read-only. Any write access to them has no effect.
- 0/w These registers / register bits are write-only. They auto-clear in the next cycle and are always read as zero.
- Bits / registers that are not listed in the register map tables are not (yet) implemented. These registers / register bits are always read as zero. A write access to them has no effect, but user programs should only write zero to them to keep compatible with future extension.
- When writing to read-only registers, the access is nevertheless acknowledged, but no actual data is written. When reading data from a write-only register the result is undefined.

# 3.1. General Purpose Input and Output Port (GPIO)

### Overview

Hardware source file(s): neorv32\_gpio.vhd

Software driver file(s): neorv32\_gpio.c

neorv32\_gpio.h

Top entity ports: gpio\_o 16-bit parallel output port

gpio\_i 16-bit parallel input port

Configuration generics: IO\_GPIO\_USE Implement GPIO port unit when true

CPU interrupts: none

CLIC interrupts: CLIC channel 2 Pin-change interrupt

### **Theory of Operation**

The general purpose parallel IO port unit provides a simple 16-bit parallel input port and a 16-bit parallel output port. These ports can be used chip-externally (for example to drive status LEDs, connect buttons, etc.) or system-internally to provide control signals for other IP modules. When the modules is disabled for implementation, the GPIO output port is tied to zero.

The parallel input port features a single pin-change interrupt. Whenever an input pin has a low-to-high or high-to-low transition, the interrupt is triggered. When the modules is disabled for implementation, the pin-change interrupt is permanently disabled.

| Address    | Name [C]    | Bit(s) (Name) [C] | R/W | Function             |
|------------|-------------|-------------------|-----|----------------------|
| 0xFFFFFF80 | GPIO_INPUT  | 015               | r/- | Parallel input port  |
| 0xFFFFFF84 | GPIO_OUTPUT | 015               | r/w | Parallel output port |

Table 8: GPIO port unit register map

# 3.2. Core-Local Interrupt Controller (CLIC)

#### Overview

Hardware source file(s): neorv32\_clic.vhd

Software driver file(s): neorv32\_clic.c

neorv32\_clic.h

Top entity ports: ext\_irq\_i 2-bit external interrupt request

ext\_ack\_o 2-bit external acknowledge

Configuration generics: IO\_CLIC\_USE Implement CLIC when true

CPU interrupts: MEI Machine external interrupt

CLIC interrupts: none

### **Theory of Operation**

The core-local interrupt controller implements a simple interrupt controller for the processor-internal peripherals. It features eight independent IRQ channels. Whenever a channel is triggered, the interrupt request is stored and forwarded to the CPU's machine external interrupt.

The CLIC features a single control register The unit is globally enabled when setting the CLIC\_CT\_EN bit. If this bit is cleared during operation, all buffered interrupt requests are deleted. Each interrupt request channel features a unique enable signal (CLIC\_CT\_IRQx\_EN), which activates the according channel when set. Channel 0 has the highest priority, while channel 7 has the lowest. If several interrupt requests arise at the same time, the one with highest priority will be processed while the remaining ones are internally buffered. Note, that all interrupt request channels **trigger on high-level**.

The following table shows the CLIC interrupt channels and according sources.

| Channel # | Channel ID [C] | Priority | Source   | Function                  |
|-----------|----------------|----------|----------|---------------------------|
| 0         | CLIC_CH_WDT    | highest  | WDT      | Watchdog timeout          |
| 1         |                |          | _        | reserved                  |
| 2         | CLIC_CH_GPIO   |          | GPIO     | GPIO input pin-change     |
| 3         | CLIC_CH_UART   |          | UART     | RX or TX done             |
| 4         | CLIC_CH_SPI    |          | SPI      | Transmission done         |
| 5         | CLIC_CH_TWI    |          | TWI      | Transmission done         |
| 6         | CLIC_CH_EXT0   |          | external | External via ext_irq_i(0) |
| 7         | CLIC_CH_EXT1   | lowest   | external | External via ext_irq_i(1) |

Table 9: CLIC interrupt request channels

When an interrupt is signaled to the CPU, the application program can determine which channel caused the request by reading the CLIC\_CT\_SRCx bits. A "000" indicates channel 0, a "001" channel 1 and so on. The current interrupt is acknowledged by writing a 1 to the CLIC\_CT\_ACK control register bit.

Each of the eight interrupt channels can also be triggered by software. For that, the SW IRQ enable bit CLIC\_CT\_SW\_IRQ\_EN has to be set while the CLIC\_CT\_SW\_IRQ\_SRCx bits define the channel to be triggered (e.g., "000" for channel 0). A software triggering is only possible when the according IRQ channel is enabled and the CLIC is activated at all.

The CLIC also provides two processor-external interrupt request lines with according acknowledge via the top entity's ext\_irq\_i and ext\_ack\_o ports.

When the CLIC is disabled from implementation, the MEI CPU interrupt is permanently disabled.

| Address    | Name [C] |    | Bit(s) (Name) [C]   | R/W | Function                    |
|------------|----------|----|---------------------|-----|-----------------------------|
| 0xFFFFFF88 | CLIC_CT  | 0  | CLIC_CT_SRC0        | r/- | IRQ source bit 0            |
|            |          | 1  | CLIC_CT_SRC1        | r/- | IRQ source bit 1            |
|            |          | 2  | CLIC_CT_SRC2        | r/- | IRQ source bit 2            |
|            |          | 3  | CLIC_CT_ACK         | 0/w | ACK current IRQ when set    |
|            |          | 4  | CLIC_CT_EN          | r/w | CLIC enable                 |
|            |          | 8  | CLIC_CT_IRQ0_EN     | r/w | Enable CLIC channel 0       |
|            |          | 9  | CLIC_CT_IRQ1_EN     | r/w | Enable CLIC channel 1       |
|            |          | 10 | CLIC_CT_IRQ2_EN     | r/w | Enable CLIC channel 2       |
|            |          | 11 | CLIC_CT_IRQ3_EN     | r/w | Enable CLIC channel 3       |
|            |          | 12 | CLIC_CT_IRQ4_EN     | r/w | Enable CLIC channel 4       |
|            |          | 13 | CLIC_CT_IRQ5_EN     | r/w | Enable CLIC channel 5       |
|            |          | 14 | CLIC_CT_IRQ6_EN     | r/w | Enable CLIC channel 6       |
|            |          | 15 | CLIC_CT_IRQ7_EN     | r/w | Enable CLIC channel 7       |
|            |          | 16 | CLIC_CT_SW_IRQ_SRC0 | 0/w | SW IRQ trigger select bit 0 |
|            |          | 17 | CLIC_CT_SW_IRQ_SRC1 | 0/w | SW IRQ trigger select bit 1 |
|            |          | 18 | CLIC_CT_SW_IRQ_SRC2 | 0/w | SW IRQ trigger select bit 2 |
|            |          | 19 | CLIC_CT_SW_IRQ_EN   | 0/w | SW IRQ trigger enable       |

Table 10: CLIC register map

# 3.3. Watchdog Timer (WDT)

#### Overview

Hardware source file(s): neorv32\_wdt.vhd

Software driver file(s): neorv32\_wdt.c

neorv32\_wdt.h

Top entity ports: none

Configuration generics: IO\_WDT\_USE Implement Watchdog timer when true

CPU interrupts: none

CLIC interrupts: CLIC channel 0 Watchdog timer overflow

### **Theory of Operation**

The watchdog (WDT) provides a last resort for safety-critical applications. The WDT has a free running 20-bit counter, that needs to be reset every now and then by the user program. If the counter overflows, either a system reset or an interrupt is generated.

The watchdog is enabled by setting the WDT\_CT\_EN bit. The clock used to increment the internal counter is selected via the 3-bit WDT\_CT\_CLK\_SWLx prescaler:

| WDT_CT_CLK_SWLx                 | 000       | 001       | 010       | 011        | 100         | 101           | 110           | 111           |
|---------------------------------|-----------|-----------|-----------|------------|-------------|---------------|---------------|---------------|
| Main clock prescaler:           | 2         | 4         | 8         | 64         | 128         | 1024          | 2048          | 4096          |
| Timeout period in clock cycles: | 2 097 152 | 4 194 304 | 8 388 608 | 67 108 864 | 134 217 728 | 1 073 741 824 | 2 147 483 648 | 4 294 967 296 |

Whenever the internal timer overflow, the watchdog executes one of two possible actions: Either a hard processor reset or an interrupt request to the CLIC. The WDT\_CT\_MODE bit defines the action to take on overflow: When cleared, the Watchdog will trigger an IRQ, when set the WDT will cause a system reset.

The cause of the last action of the Watchdog can be determined via the WDT\_CT\_CAUSE flag. If this flag I zero, the processor has been reset via the external reset pin. If this flag is set, the last action (reset or interrupt) was caused by a Watchdog timer overflow. The WDT\_CT\_PWFAIL flag is set, when the last Watchdog action was triggered by an illegal access to the Watchdog control register.

The Watchdog control register can only be accessed when the access password is present in bits 15:8 of the written data. The default Watchdog password is: **0x47** 

The watchdog is reset whenever a valid write access to the unit's control register is performed.

| Address    | Name [C] |      | Bit(s) (Name) [C] | R/W | Function                            |
|------------|----------|------|-------------------|-----|-------------------------------------|
| 0xFFFFFF8C | WDT_CT   | 0    | WDT_CT_CLK_SEL0   | r/w | Clock prescaler select bit 0        |
|            |          | 1    | WDT_CT_CLK_SEL1   | r/w | Clock prescaler select bit 1        |
|            |          | 2    | 2 WDT_CT_CLK_SEL2 |     | Clock prescaler select bit 2        |
|            |          | 3    | WDT_CT_EN         | r/w | Watchdog enable                     |
|            |          | 4    | WDT_CT_MODE       | r/w | Overflow action: 1: reset, 0: IRQ   |
|            |          | 5    | WDT_CT_CAUSE      | r/- | Cause of last WDT action            |
|            |          | 6    | WDT_CT_PWFAIL     | r/- | Last WDT action caused by wrong pwd |
|            |          | 15:8 | WDT_CT_PASSWORD   | 0/w | Watchdog access password            |

Table 11: WDT register map

# 3.4. Machine System Timer (MTIME)

#### Overview

Hardware source file(s): neorv32\_mtime.vhd

Software driver file(s): neorv32\_mtime.c

neorv32\_mtime.h

Top entity ports: none

Configuration generics: IO\_MTIME\_USE Implement MTIME when true

CPU interrupts: MTI Machine timer interrupt

CLIC interrupts: none

### **Theory of Operation**

The MTIME machine system timer implements the memory-mapped mtime timer from the official RISC-V specifications. This unit features a 64-bit system timer incremented with the primary processor clock.

The 64-bit system time can be accessed via the MTIME\_LO and MTIME\_HI registers. A 64-bit time compare register – accessible via MTIMECMP\_LO and MTIMECMP\_HI – can be used to trigger an interrupt to the CPU whenever MTIME >= MTIMECMP. This interrupt is directly forwarded to the CPU's MTI interrupt. The time and compare registers can also be accessed as single 64-bit registers via the MTIME and MTIMECMP defines.



There is no need to acknowledge the MTIME interrupt. The interrupt request is a single-shot signal, so the CPU is triggered <u>once</u> if the system time is greater than or equal to the compare time. Hence, another MTIME IRQ is only possible when increasing the compare time.

The 64-bit counter and the 64-bit comparator are implemented as  $2\times32$ -bit counters and comparators with a registered carry to prevent a 64-bit carry chain ad thus, to simplify timing closure.

### Register Map

| Address    | Name [C]    | R/W | Function                       |
|------------|-------------|-----|--------------------------------|
| 0xffffff90 | MTIME_LO    | r/w | Machine system time, low word  |
| 0xFFFFFF94 | MTIME_HI    | r/w | Machine system time, high word |
| 0xFFFFFF98 | MTIMECMP_LO | r/w | Time compare, low word         |
| 0xFFFFFF9C | MTIMECMP_HI | r/w | Time compare, high word        |

Table 12: MTIME register map



All registers of the MTIME system timer can only be written in full word mode (using sw instruction). All other write accesses will have no effect.

# 3.5. Universal Asynchronous Receiver and Transmitter (UART)

#### Overview

Hardware source file(s): neorv32\_uart.vhd

Software driver file(s): neorv32\_uart.c

neorv32\_uart.h

Top entity ports: uart\_txd\_o Serial transmitter output

uart\_rxd\_o Serial receiver input

Configuration generics: IO\_UART\_USE Implement UART when true

CPU interrupts: none

CLIC interrupts: CLIC channel 3 TX done or RX done

### **Theory of Operation**

In most cases, the UART is a standard interface used to establish a communication channel between the computer/user and an application running on the processor platform. The NEORV32 UART features a standard configuration frame configuration: 8 data bits, 1 stop bit and no parity bit. These values are fixed. The actual Baudrate is configurable by software.

The UART is enabled when the UART\_CT\_EN bit in the UART control register is set. The actual transmission Baudrate (like "19200") is configured via the 20-bit UART\_CT\_BAUDxx value and the 3-bit UART\_CT\_PRSCx clock prescaler.

| UART_CT_PRSCx        | 000 | 001 | 010 | 011 | 100 | 101  | 110  | 111  |
|----------------------|-----|-----|-----|-----|-----|------|------|------|
| Resulting prescaler: | 2   | 4   | 8   | 64  | 128 | 1024 | 2048 | 4096 |

$$Baudrate = \frac{f_{main}[Hz]}{Prescaler \cdot UART\_CT\_BAUD}$$

A new transmission is started by writing the data byte to the lowest byte of the UART\_DATA register. The transfer is completed when the UART\_CT\_TX\_BUSY control register flag returns to zero. A new received byte is available when the UART\_DATA\_AVAIL flag of the UART\_DATA register is set. If a new byte is received before the previous one has been read by the CPU, the receiver overrun flag UART\_CT\_RXOR is set.

The UART has a single interrupt, which can be trigger by two sources: The interrupt is triggered when a transmission has finished and the UART\_CT\_TX\_IRQ flag is set. Additionally, the interrupt can also be triggered when a data byte has been received and the UART\_CT\_RX\_IRQ flag is set.

If the UART is not implemented, the UART's serial output port is tied to zero and the UART's interrupt is unavailable.

| Address    | Name [C]  |      | Bit(s) (Name) [C]  | R/W | Function                              |
|------------|-----------|------|--------------------|-----|---------------------------------------|
| 0xffffffA0 | UART_CT   | 11:0 | UART_CT_BAUDxx     | r/w | 20-bit BAUD configuration value       |
|            |           | 24   | 24 UART_CT_PRSC0   |     | Baudrate clock prescaler select bit 0 |
|            |           | 25   | 25 UART_CT_PRSC1 : |     | Baudrate clock prescaler select bit 1 |
|            |           | 26   | UART_CT_PRSC2      | r/w | Baudrate clock prescaler select bit 2 |
|            |           | 27   | UART_CT_RXOR       | r/- | UART receiver overrun                 |
|            |           | 28   | UART_CT_EN         | r/w | UART enable                           |
|            |           | 29   | UART_CT_RX_IRQ     | r/w | RX complete IRQ enable                |
|            |           | 30   | UART_CT_TX_IRQ     | r/w | TX done IRQ enable                    |
|            |           | 31   | UART_CT_TX_BUSY    | r/- | Transceiver busy flag                 |
| 0xFFFFFFA4 | UART_DATA | 7:0  | UART_DATA_LSB/MSB  | r/w | Receive/transmit data (8-bit)         |
|            |           | 31   | UART_DATA_AVAIL    | r/- | RX data available when set            |

Table 13: UART register map

# 3.6. Serial Peripheral Interface Controller (SPI)

### Overview

Hardware source file(s): neorv32\_spi.vhd

Software driver file(s): neorv32\_spi.c neorv32\_spi.h

Top entity ports: Spi\_sck\_0 1-bit serial controller clock output

spi\_sdo\_o 1-bit serial controller data output
spi\_dsi\_i 1-bit serial controller data input
spi\_csn\_o 8-bit chip select port (low-active)

8-bit chip select port (low-active

Configuration generics: IO\_SPI\_USE Implement SPI when true

CPU interrupts: none

CLIC interrupts: CLIC channel 4 Transmission done interrupt

### **Theory of Operation**

SPI is a synchronous serial transmission protocol. The NEORV32 SPI transceiver allows 8-, 16-, 24- and 32-bit wide transmissions. The unit provides 8 dedicated chip select signals via the top entity's spi\_csn\_o signal.

The SPI unit is enabled via the SPI\_CT\_EN bit. The idle clock polarity is configured via the SPI\_CT\_CPHA bit and can be low (0) or high (1) during idle. Data is shifted in/out with MSB first when the SPI\_CT\_DIR bit is cleared; data is sifted in/out LSB-first when the flag is set. The data quantity to be transferred within a single transmission is defined via the SPI\_CT\_SIZEx bits. The unit supports 8-bit ("00"), 16-bit ("01"), 24-bit ("10") and 32-bit ("11") transfers. Whenever a transfer is completed, an interrupt is triggered when the SPI\_CT\_IRQ\_EN bit is set. A transmission is still in progress as long as the SPI\_CT\_BUSY flag is set. T he SPI controller features 8 dedicated chip-select lines. These lines are controlled via the control register's SPI\_CT\_CSx bits. When the CSx bit is set, the according chip select line spi\_csn\_o(x) goes low (low-active chip select lines)

The SPI clock frequency is defined via the 3 SPI\_CT\_PRSCx clock prescaler bits. The following prescalers are available:

| SPI_CT_PRSCx         | 000 | 001 | 010 | 011 | 100 | 101  | 110  | 111  |
|----------------------|-----|-----|-----|-----|-----|------|------|------|
| Resulting prescaler: | 2   | 4   | 8   | 64  | 128 | 1024 | 2048 | 4096 |

Based on the SPI\_CT\_PRSCx configuration, the actual SPI clock frequency  $f_{SPI}$  is determined by:

$$f_{SPI} = \frac{f_{main}[Hz]}{2 \cdot Prescaler}$$

A transmission is started when writing data to the SPI\_DATA register. The data must be LSB-aligned. So if the SPI transceiver is configured for less than 32-bit transfers data quantity, the transmit data must be placed into the lowest 8/16/24 bit of SPI\_DATA. Vice versa, the received data is also always LSB-aligned.

| Address    | Name [C] |    | Bit(s) (Name) [C] | R/W | Function                                          |
|------------|----------|----|-------------------|-----|---------------------------------------------------|
| 0xffffffA8 | SPI_CT   | 0  | SPI_CT_CS0        | r/w | Direct chip select 0, csn(0) is low when set      |
|            |          | 1  | SPI_CT_CS1        | r/w | Direct chip select 1, csn(1) is low when set      |
|            |          | 2  | SPI_CT_CS2        | r/w | Direct chip select 2, csn(2) is low when set      |
|            |          | 3  | SPI_CT_CS3        | r/w | Direct chip select 3, csn(3) is low when set      |
|            |          | 4  | SPI_CT_CS4        | r/w | Direct chip select 4, csn(4) is low when set      |
|            |          | 5  | SPI_CT_CS5        | r/w | Direct chip select 5, csn(5) is low when set      |
|            |          | 6  | SPI_CT_CS6        | r/w | Direct chip select 6, csn(6) is low when set      |
|            |          | 7  | SPI_CT_CS7        | r/w | Direct chip select 7, csn(7) is low when set      |
|            |          | 8  | SPI_CT_EN         | r/w | SPI enable                                        |
|            |          | 9  | SPI_CT_CPHA       | r/w | Idle clock polarity                               |
|            |          | 10 | SPI_CT_PRSC0      | r/w | Clock prescaler select bit 0                      |
|            |          | 11 | SPI_CT_PRSC1      | r/w | Clock prescaler select bit 1                      |
|            |          | 12 | SPI_CT_PRSC2      | r/w | Clock prescaler select bit 2                      |
|            |          | 13 | SPI_CT_DIR        | r/w | Shift direction (0: MSB first, 1: LSB first)      |
|            |          | 14 | SPI_CT_SIZE0      | r/w | Transfer size (00: 8.bit, 01: 16-bit, 10: 24-bit, |
|            |          | 15 | SPI_CT_SIZE1      | r/w | 11: 32-bit)                                       |
|            |          | 16 | SPI_CT_IRQ_EN     | r/w | Transfer done interrupt enable                    |
|            |          | 31 | SPI_CT_BUSY       | r/- | Ongoing transfer when set                         |
| 0xFFFFFFAC | SPI_DATA |    | 31:0              | r/w | Receive/transmit data, LSS-aligned                |

Table 14: SPI transceiver register map

# 3.7. Two Wire Serial Interface Controller (TWI)

#### Overview

Hardware source file(s): neorv32\_twi.vhd

Software driver file(s): neorv32\_twi.c

neorv32\_twi.h

Top entity ports: twi\_sda\_io Bi-directional serial data line

twi\_scl\_io Bi-directional serial clock line

Configuration generics: IO\_TWI\_USE Implement TWI when true

CPU interrupts: none

CLIC interrupts: CLIC channel 5 Transmission done interrupt

### **Theory of Operation**

The two wire interface – actually called I<sup>2</sup>C – is a quite famous interface for connecting several on-board components. Since this interface only needs two signals (the serial data line SDA and the serial clock line SCL) – despite of the number of connected devices – it allows easy interconnections of several peripheral nodes. The NEORV32 TWI implements a TWI controller. It features "clock stretching", so a slow peripheral can halt the transmission by pulling the SCL line low. Currently no multi-controller support is available. Also, the TWI unit cannot operate in peripheral mode.

The TWI is enabled via the control register TWI\_CT\_EN bit. The user program can start / terminate a transmission by issuing a START or STOP condition. These conditions are generated by setting the according bit (TWI\_CT\_START or TWI\_CT\_STOP) in the control register.

Data is send by writing a byte to the TWI\_DATA register. Received data can also be obtained from this register. The TWI controller is busy (transmitting or performing a START or STOP condition) as long as the TWI\_CT\_BUSY bit in the control register is set.

An accessed peripheral has to acknowledge each transferred byte. When the TWI\_CT\_ACK bit is set after a completed transmission, the accessed peripheral has send an acknowledge. If it is cleared after a transmission, the peripheral has send a not-acknowledge (NACK). The NEORV32 TWI controller can also send an ACK (→ controller acknowledge "MACK") after a transmission by pulling SDA low during the ACK time slot. Set the TWI\_CT\_MACK bit to activate this feature. If this bit is cleared, the ACK/NACK of the peripheral is sampled in this time slot (normal mode).

In summary, the following independent TWI operations can be triggered by the application program:

- send START condition (also as REPEATED START condition)
- send STOP condition
- send (at least) one byte while also sampling one byte from the bus



The serial clock (SCL) and the serial data (SDA) lines can only be actively driven low by the controller. Hence, external pull-up resistors are required for the SDA and SCL lines.

The TWI clock frequency is defined via the 3 TWI\_CT\_PRSCx clock prescaler bits. The following prescalers are available:

| TWI_CT_PRSCx         | 000 | 001 | 010 | 011 | 100 | 101  | 110  | 111  |
|----------------------|-----|-----|-----|-----|-----|------|------|------|
| Resulting prescaler: | 2   | 4   | 8   | 64  | 128 | 1024 | 2048 | 4096 |

Based on the TWI\_CT\_PRSCx configuration, the actual TWI clock frequency  $f_{SCL}$  is determined by:

$$f_{SCL} = \frac{f_{main}[Hz]}{4 \cdot Prescaler}$$

| Address    | Name [C] | J   | Bit(s) (Name) [C] | R/W | Function                                      |
|------------|----------|-----|-------------------|-----|-----------------------------------------------|
| 0xffffffB0 | TWI_CT   | 0   | TWI_CT_EN         | r/w | TWI enable                                    |
|            |          | 1   | TWI_CT_STAT       | 0/w | Generate START condition                      |
|            |          | 2   | TWI_CT_STOP       | 0/w | Generate STOP condition                       |
|            |          | 3   | TWI_CT_IRQ_EN     | r/w | Transmission-done interrupt enable            |
|            |          | 4   | TWI_CT_PRSC0      | r/w | Clock prescaler select bit 0                  |
|            |          | 5   | TWI_CT_PRSC1      | r/w | Clock prescaler select bit 1                  |
|            |          | 6   | TWI_CT_PRSC2      | r/w | Clock prescaler select bit 2                  |
|            |          | 7   | TWI_CT_MACK       | r/w | Generate controller ACK for each transmission |
|            |          | 30  | TWI_CT_ACK        | r/- | ACK received when set                         |
|            |          | 31  | TWI_CT_BUSY       | r/- | Transfer in progress when set                 |
| 0xFFFFFB4  | TWI_DATA | 7:0 | TWI_DATA          | r/- | Receive/transmit data                         |

Table 15: TWI register map

# 3.8. Pulse Width Modulation Controller (PWM)

#### Overview

Hardware source file(s): neorv32\_pwm.vhd

Software driver file(s): neorv32\_pwm.c

neorv32\_pwm.h

Top entity ports: pwm\_o 4-channel PWM output

Configuration generics: IO\_PWM\_USE Implement PWM controller when true

CPU interrupts: none CLIC interrupts: none

### **Theory of Operation**

The PWM controller implements a pulse-width modulation controller with four independent channels and 8-bit resolution per channel. It is based on an 8-bit counter with four programmable threshold comparators that control the actual duty cycle of each channel. The controller can be used to drive a fancy RGB-LED with 24-bit true color, to dim LCD backlights or even for motor control. An external integrator (RC low-pass filter) can be used to smooth the generated "analog" signals.

The PWM controller is activated by setting the PWM\_CT\_EN bit in the module's control register. When this flag is cleared, the unit is reset and all PWM output channels are set to zero. The base clock for the PWM generation is defined via the 3 PWM\_CT\_PRSCx bits. The 8-bit duty cycle for each channel, which represents the channel's "intensity", is defined via the according 8-bit PWM\_DUTY\_CHx byte in the PWM\_DUTY register.

Based on the duty cycle PWM\_DUTY\_CHx the according analog output voltage (relative to the IO supply voltage) of each channel can be computed by the following formula:

Intensity 
$$_{xx} = \frac{PWM\_DUTY\_CHx}{2^8} \%$$

The frequency of the generated PWM signals is defined by the PWM operating clock. This clock is derived from the main processor clock and divided by a prescaler via the 3 PWM\_CT\_PRSCx bits in the unit's control register. The following prescalers are available:

| PWM_CT_PRSCx         | 000 | 001 | 010 | 011 | 100 | 101  | 110  | 111  |
|----------------------|-----|-----|-----|-----|-----|------|------|------|
| Resulting prescaler: | 2   | 4   | 8   | 64  | 128 | 1024 | 2048 | 4096 |

The resulting PWM frequency is defined by:

$$f_{PWM} = \frac{f_{main}}{2^8 \cdot Prescaler}$$

| Address    | Name [C] | В     | it(s) (Name) [C] | R/W | Function                       |
|------------|----------|-------|------------------|-----|--------------------------------|
| 0xffffff88 | PWM_CT   | 0     | PWM_CT_EN        | r/w | PWM controller enable          |
|            |          | 1     | PWM_CT_PRSC0     | r/w | Clock prescaler select bit 0   |
|            |          | 2     | PWM_CT_PRSC1     | r/w | Clock prescaler select bit 1   |
|            |          | 3     | PWM_CT_PRSC2     | r/w | Clock prescaler select bit 2   |
| 0xffffffBC | PWM_DUTY | 7:0   | PWM_DUTY_CH0     | r/w | 8-bit duty cycle for channel 0 |
|            |          | 15:8  | PWM_DUTY_CH1     | r/w | 8-bit duty cycle for channel 1 |
|            |          | 23:16 | PWM_DUTY_CH2     | r/w | 8-bit duty cycle for channel 2 |
|            |          | 31:24 | PWM_DUTY_CH3     | r/w | 8-bit duty cycle for channel 3 |

Table 16: PWM controller register map

# 3.9. True Random Number Generator (TRNG)

#### Overview

Hardware source file(s): neorv32\_trng.vhd

Software driver file(s): neorv32\_trng.c

neorv32\_trng.h

Top entity ports: none

Configuration generics: IO\_TRNG\_USE Implement TRNG when true

CPU interrupts: none
CLIC interrupts: none



This device is still highly experimental!

### **Theory of Operation**

The NEORV32 true random number generator provides true random numbers for your application. Instead of using a pseudo RNG like a LFSR, the TRNG of the processor uses a simple, straight-forward ring oscillator as physical entropy source. Hence, voltage and thermal fluctuations are used to provide true physical random data. It features a platform independent architecture based on two papers which are cited at the bottom of the following pages.

When the TRNG\_CT\_EN bit is set, the TRNG starts operation. Make sure to configure the GARO taps using the TRNG\_CT\_TAPx bits in advance. As soon as the TRNG\_DATA\_VALID bit in the TRNG\_DATA register is set, the current sampled 16-bit random data can be obtained from the lowest 16 bits of the TRNG\_DATA register. Note, that the TRNG needs at least 16 clock cycles to generate a new random byte. During this sampling time the current output random data is kept in the output register until a valid sampling of the new byte has completed.

#### Architecture

The NEORV32 TRNG is based on the *GARO Galois Ring Oscillator TRNG*<sup>5</sup>. Basically, this architecture is an asynchronous LFSR constructed from a chain of inverters. Before the output signal of one oscillator is passed to the input of the next one, the signal can be XORed with the final output signal of the inverter chain (see image below) using a switching mask (f).

The default setup of the TRNG uses a total of 16 inverters and a software configurable GARO tap maks. To prevent the synthesis tool from doing logic optimization and thus, removing all but one inverter, the TRNG uses simple latches to decouple an inverter and its actual output. The latches are reset when the TRNG is disabled and are enabled one by one by a simple shift register when the TRNG is activated. By this, the TRNG provides a platform independent architecture<sup>6</sup> since no specific VHDL attributes are required.

The single-bit output signal of the GARO array is fed through flip flops to eliminate any metastability beyond this point. Afterwards, a Von-Neuman de-biasing is applied to get rid of any any bias introduced by

<sup>5 &</sup>quot;Enhancing the Randomness of a Combined True Random Number Generator Based on the Ring Oscillator Sampling Method" by Mieczyslaw Jessa and Lukasz Matuszewski

<sup>6 &</sup>quot;Extended Abstract: The Butterfly PUF Protecting IP on every FPGA" by Sandeep S. Kumar, Jorge Guajardo, Roel

the GARO array. If the de-biasing fails, an additional cycle is required to obtain a now random sample. This process might replicate depending on the quality of the GARo oscillation.

The single-bit output signal of the GARO array is fed through flip flops to eliminate any metastability beyond this point. Afterwards, a Von-Neuman de-biasing is applied to get rid of any any bias introduced by the GARO array. If the de-biasing fails, an additional cycle is required to obtain a now random sample. This process might replicate depending on the quality of the GARO oscillation.

This de-biased signal is used as input for a simple chaos machine post-processing to provide a 'better' uniform distribution. This chaos machine is implemented as a 16-bit LFSR. As soon as 16 <u>valid</u> bits (so no errors during the de-biasing) have bin sampled, the resulting data is moved to the output register and is available for fetching by the CPU bus.

| Address    | Name [C]  | Bit(s) (Name) [C] |                  | R/W | Function                           |
|------------|-----------|-------------------|------------------|-----|------------------------------------|
| 0xFFFFFFC0 | TRNG_CT   | 15:0              | 15:0 TRNG_CT_TAP |     | 16-bit GARo tap mask configuration |
|            |           | 31                | TRNG_CT_EN       | r/w | TRNG enable                        |
| 0xFFFFFFC4 | TRNG_DATA | 15:0              | TRNG_DATA        | r/- | Random data output                 |
|            |           | 31                | TRNG_DATA_VALID  | r/- | Random data valid when set         |

Table 17: TRNG register map

# 3.10. Dummy Device (DEVNULL)

#### Overview

Hardware source file(s): neorv32\_devnull.vhd

Software driver file(s): none
Top entity ports: none

Configuration generics: IO\_DEVNULL\_USE Implement DEVNULL when true

CPU interrupts: none CLIC interrupts: none

# **Theory of Operation**

Just like the device file /dev/null, the DEVNULL unit discards any data written to it as always returns zero when reading from it. The bus access to this unit always succeeds (when the unit is implemented). It is primarily meant for testing or if you need to make a data read access (e.g., from a peripheral device), where the read access itself triggers some action and you do not actually need the return data.

```
// not so good:
volatile uint32_t dst = SOME_IO_REG;
// better:
DEVNULL_DATA = SOME_IO_REG;
```

### Register Map

| Address    | Name [C]     | Bit(s) | R/W | Function                                                                  |
|------------|--------------|--------|-----|---------------------------------------------------------------------------|
| 0xFFFFFFC8 | DEVNULL_DATA | 31:0   | r/w | Write has no effect, read always return 0; the bus access always succeeds |
|            | _            | 7:0    | -/w | Write ASCII data to simulator console and logging filde                   |

Table 18: DEVNULL register map

### **Simulation Usage**

When writing data to DEVNULL\_DATA in a simulation, the lowest 8 bit of the written data will be printed as ASCII char to the simulator's console. Additionally, this ASCII data will be written to a file called neorv32.devnull.out in the simulation home folder. All written data is also dumped as 32-bit hexadecimal value into a file called neorv32.devnull.data.out also in the simulation home folder.



More information regarding the simulation-only usage of the DEVNULL device can be found in chapter <u>5.12</u>. <u>Simulating the Processor</u>.

# 3.11. System Configuration Information Memory (SYSINFO)

### Overview

Hardware source file(s): neorv32\_sysinfo.vhd

Software driver file(s): none
Top entity ports: none

Configuration generics: \* Shows the settings of most config. generics

CPU interrupts: none CLIC interrupts: none

### **Theory of Operation**

The SYSINFO allows the application software to determine the settings of most of the processor's top entity generics. All registers of this unit are read-only.

This devices is always implemented – regardless of the actual hardware configuration. The bootloader as well as the NEORV32 software runtime environment require information (like memory layout) for correct operation.

| Address    | Name [C]            | R/W | Function                                                              |
|------------|---------------------|-----|-----------------------------------------------------------------------|
| 0xFFFFFE0  | SYSINFO_CLK         | r/- | Clock speed in Hz (via CLOCK_FREQUENCY generic)                       |
| 0xFFFFFE4  | SYSINFO_USER_CODE   | r/- | Custom user code, assigned via the USER_CODE generic                  |
| 0xFFFFFE8  | SYSINFO_FEATURES    | r/- | Implemented hardware (see next table)                                 |
| 0xFFFFFEC  | -                   | r/- | reserved                                                              |
| 0xfffffff0 | SYSINFO_ISPACE_BASE | r/- | Instruction address space base (via MEM_ISPACE_BASE generic)          |
| 0xFFFFFFF4 | SYSINFO_ISPACE_SIZE | r/- | Instruction address space size in bytes (via MEM_ISPACE_SIZE generic) |
| 0xFFFFFFF8 | SYSINFO_DSPACE_BASE | r/- | Data address space base (via MEM_DSPACE_BASE generic)                 |
| 0×FFFFFFC  | SYSINFO_DSPACE_SIZE | r/- | Data address space size in bytes (via MEM_ISPACE_SIZE generic)        |

Table 19: SYSINFO register map

# SYSINFO\_FEATURES

| Bit# | Name [C]                          | Function                                                                                   |
|------|-----------------------------------|--------------------------------------------------------------------------------------------|
| 25   | SYSINFO_FEATURES_IO_DEVNULL       | Set when DEVNULL is implemented (via the IO_DEVNULL_USE generic)                           |
| 24   | SYSINFO_FEATURES_IO_TRNG          | Set when the TRNG is implemented (via the IO_TRNG_USE generic)                             |
| 23   | SYSINFO_FEATURES_IO_CLIC          | Set when the CLIC is implemented (via the IO_CLIC_USE generic)                             |
| 22   | SYSINFO_FEATURES_IO_WDT           | Set when the WDT is implemented (via the IO_WDT_USE generic)                               |
| 21   | SYSINFO_FEATURES_IO_PWM           | Set when the PWM is implemented (via the IO_PWM_USE generic)                               |
| 20   | SYSINFO_FEATURES_IO_TWI           | Set when the TWI is implemented (via the IO_TWI_USE generic)                               |
| 19   | SYSINFO_FEATURES_IO_SPI           | Set when the SPI is implemented (via the IO_SPI_USE generic)                               |
| 18   | SYSINFO_FEATURES_IO_UART          | Set when the UART is implemented (via the IO_UART_USE generic)                             |
| 17   | SYSINFO_FEATURES_IO_MTIME         | Set when the MTIME is implemented (via the IO_MTIME_USE generic)                           |
| 16   | SYSINFO_FEATURES_IO_GPIO          | Set when the GPIO is implemented (via the IO_GPIO_USE generic)                             |
| 4    | SYSINFO_FEATURES_MEM_INT_DMEM     | Set when the processor-internal IMEM is implemented (via the MEM_INT_IMEM_USE generic)     |
| 3    | SYSINFO_FEATURES_MEM_INT_IMEM_ROM | Set when the processor-internal IMEM is read-only (via the MEM_INT_IMEM_ROM generic)       |
| 2    | SYSINFO_FEATURES_MEM_INT_IMEM     | Set when the processor-internal DMEM implemented (via the MEM_INT_DMEM_USE generic)        |
| 1    | SYSINFO_FEATURES_MEM_EXT          | Set when the external Wishbone bus interface is implemented (via the MEM_EXT_USE generic)  |
| 0    | SYSINFO_FEATURES_BOOTLOADER       | Set when the processor-internal bootloader is implemented (via the BOOTLOADER_USE generic) |

### 4. Software Architecture

To make actual use of the processor, the NEORV32 project comes with a complete software ecosystem. This ecosystem consists of the following elementary parts.

sw/common/bootloader crt0.S Application and bootloader start-up codes

sw/common/crt0.S

sw/common/bootloader neorv32.ld Application and bootloader linker scripts

sw/common/neorv32.ld

sw/lib/include/ Core hardware driver libraries

sw/lib/source/

Makefiles E.g. sw/example/blink\_led/makefile

sw/image\_gen/ Auxiliary tool for generating NEORV32 executables

sw/bootloader/bootloader.c Default booloader

The complete software ecosystem is based on the RISC-V port of the GCC GNU Compiler Collection.

Last but not least, the NEORV32 ecosystem provides some example programs for testing the hardware, for illustrating the usage of peripherals and for general getting in touch with the project.

### 4.1. Toolchain

The toolchain for this project is based on the free RISC-V GCC-port. You can find the compiler sources and build instructions on the official RISC-V GNU toolchain GitHub page: https://github.com/riscv/riscv-gnutoolchain. The NEORV32 uses a 32-bit base integer architecture (rv32i) and a 32-bit integer and soft-float ABI (ilp32), so make sure you build an according toolchain.

Alternatively, you can download a prebuilt rv32i/e toolchain for 64-bit x86 Linux from: github.com/stnolting/riscv gcc prebuilt



More information regarding the toolchain (building from scratch or downloading the prebuilt ones) can be found in chapter 5.1. Toolchain Setup.

### 4.2. Core Software Libraries

The NEORV32 project provides a set of C libraries that allow an easy usage of all of the core's peripheral and CPU features. All you need to do is to include the main NEORV32 library file in your application's source file(s):

#include <neorv32.h>

Together with the makefile, this will automatically include all the processor's header files located in sw/lib/include into your application. The actual source files of the core libraries are located in sw/lib/source and are automatically included into the source list of your software project. The following files are currently part of the NEORV32 core library:

| C source file   | C header file   | Function                                      |
|-----------------|-----------------|-----------------------------------------------|
| -               | neorv32.h       | Main NEORV32 definitions and library file.    |
| neorv32_clic.c  | neorv32_clic.h  | HW driver functions for the CLIC.             |
| neorv32_cpu.c   | neorv32_cpu.h   | HW driver functions for the NEORV32 CPU.      |
| neorv32_gpio.c  | neorv32_gpio.h  | HW driver functions for the GPIO.             |
| neorv32_mtime.c | neorv32_mtime.h | HW driver functions for the MTIME.            |
| neorv32_pwm.c   | neorv32_pwm.h   | HW driver functions for the PWM.              |
| neorv32_rte.c   | neorv32_rte.h   | NEORV32 runtime environment helper functions. |
| neorv32_spi.c   | neorv32_spi.h   | HW driver functions for the SPI.              |
| neorv32_trng.c  | neorv32_trng.h  | HW driver functions for the TRNG.             |
| neorv32_twi.c   | neorv32_twi.h   | HW driver functions for the TWI.              |
| neorv32_uart.c  | neorv32_uart.h  | HW driver functions for the UART.             |
| neorv32_wdt.c   | neorv32_wdt.h   | HW driver functions for the WDT.              |

### **Documentation**

All core library functions are highly documented using <u>doxygen</u>. To generate the HTML-based documentation, navigate to the project's docs folder and execute doxygen using the provided doxygen makefile:

neorv32/docs\$ doxygen doxygen\_makefile\_sw

This will generate (or update) the docs/doxygen\_build folder. To view the documentation, open the docs/doxygen\_build/html/index.html file with your browser of choice. Click on the "files" tab to see a list of all documented files.



The SW documentation is automatically built and deployed to GitHub pages by Travis CI. The online documentation is available at: <a href="https://stnolting.github.io/neorv32/files.html">https://stnolting.github.io/neorv32/files.html</a>

# 4.3. Application Makefile

Application compilation is based on a single GNU makefile. Each project in the sw/example folder features a makefile. All these makefiles are identical. When creating a new project, copy an existing project folder or at least the makefile to your new project folder. I suggest to create new projects also in sw/example to keep the file dependencies. Of course, these dependencies can be manually configured via makefiles varibales when your project is located somewhere else.

Before you can use the makefiles, you need to install the RISC-V GCC toolchain. Also, you have to add the installation folder of the compiler to your system's PATH variable. More information can be found in chapter 5. Let's Get It Started!.

The makefile is invoked by simply executing make in your console:

neorv32/sw/example/blink\_led\$ make

### 4.3.1. Makefile Targets

Just executing make will show the help menu showing all available targets. The following targets are available:

help Show a short help text explaining all available targets.

check Check the toolchain. You should run this target at least once after installation.

info Show the makefile configuration (see next chapter).

compile Compile all sources and generate compile executable for upload via bootloader

install Compile all sources, generate executable (via compile target) for upload via

bootloader and generate and install IMEM VHDL initialization image file rtl/core/

neorv32\_application\_image.vhd.

all Execute compile and install.

clean Remove all generated files in the current folder.

clean\_all Remove all generated files in the current folder and also removes the compiled core

libraries and the compiled image generator tool.

boot loader Compile all sources, generate executable and generate and install BOOTROM VHDL

initialization image file rtl/core/neorv32\_bootloader\_image.vhd. This target uses the bootloader-specific start-up code (sw/common/bootloader\_crt0.S) and

linker script (sw/common/bootloader\_neorv32.ld) instead.

### 4.3.2. Makefile Configuration

The compilation flow is configured via variables right at the beginning of the makefile:

### **Description of Makefile Configuration Variables**

| <b>EFFORT</b> Optimization level, optimize for size ( | -0s) is default; legal values: -00 -01 -02 -03 |
|-------------------------------------------------------|------------------------------------------------|
|-------------------------------------------------------|------------------------------------------------|

-Ôs

APP\_SRC The \*.c source files of the application. \*.c-files in the current folder are automatically

added via wildcard. Additional files can be added; separated by white spaces

APP\_INC Include file folders; separated by white spaces; must be defined with -I prefix

RISCV\_TOOLCHAIN The toolchain to be used; follows the naming convention architecture-vendor-output

MARCH The architecture of the RISC-V CPU. Only RV32 is supported by the NEORV32.

Enable compiler support of optional CPU extension by adding the according

extension letter (e.g. rv32im for M CPU extension).

MABI The default 32-bit integer ABI. Do not change.

LIBC\_PATH Location of the standard C library.

LIBGCC\_PATH Locations of the standard GCC C-library.

NEORV32\_HOME Relative or absolute path to the NEORV32 project home folder. Adapt this if the

makefile/project is not in the project's sw/example folder.



The makefile configuration variable can be re-defined directly when invoking the makefile: \$ make MARCH=-march=rv32ic clean\_all compile

### 4.4. Executable Image Format

When all the application sources have been compiled and linked, a final executable file has to be generated. For this purpose, the makefile uses the NEORV32-specific linker script sw/common/neorv32.ld to map all the sections into only four final sections: .text, .rodata, .data and .bss. These four section contain everything required for the application to run:

| .text   | Executable instructions generated from the start-up code and all application sources           |
|---------|------------------------------------------------------------------------------------------------|
| .rodata | Constants (like strings) from the application; also the initial data for initialized variables |
| .data   | This section is required for the address generation of fixed (= global) variables only         |
| .bss    | This section is required for the address generation of dynamic memory constructs only          |

The .text and .rodata sections are mapped to processor's instruction memory space and the .data and .bss sections are mapped to the processor's data memory space.

Finally, the .text, .rodata and .data sections are extracted and concatenated into a single file main.bin. This file is parsed by the NEORV32 image generator (sw/image\_gen) to generate the final executable. The image generator can generate three types of executables, selected by a flag when calling the generator:

| -app_bin | Generates an executable binary file neorv32_exe.bin (for UART uploading via the bootloader)                                                                         |
|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -app_img | Generates an executable VHDL memory initialization image for the processor-internal IMEM. This option generates the rtl/core/neorv32_application_image.vhd file.    |
| -bld_img | Generates an executable VHDL memory initialization image for the processor-internal BOOT ROM. This option generates the rtl/core/neorv32_bootloader_image.vhd file. |

All these options are managed by the makefile – so you don't actually have to think about them. The normal application compilation flow will generate the neorv32\_exe.bin file in the current software project folder ready for upload via the UART to NEORV32 bootloader.

This executable version has a very small header consisting of three 32-bit words located right at the beginning of the file. This header is generated by the image generator (sw/image\_gen). The image generator is automatically compiled when invoking the makefile.

The first word of the executable is the signature word and is always 0x4788CAFE. Based on this word, the bootloader can identify a valid image file. The next word represents the size in bytes of the <u>actual program image</u>. A simple "complement" checksum of the actual program image is given by the third word. This provides a simple protection against data transmission or storage errors.

#### 4.5. Bootloader

The default bootloader (sw/bootloader/bootloader.c) of the NEORV32 processor allows you to upload new program executable at every time. If you have an external SPI flash connected to the processor (for example the FPGA configuration memory), you can store the program executable to it and the system can directly boot it after reset without any user interaction.



The bootloader is only implemented when the BOOTLOADER\_USE generic is true and requires the CSR access CPU extension (CPU\_EXTENSION\_RISCV\_Zicsr generic is true).



The bootloader requires the UART for manual executable upload or SPI flash programming (IO\_UART\_USE generic is true).



For the automatic boot from an SPI flash, the SPI controller has to be implemented (IO\_SPI\_USE generic is true) and the machine system timer MTIME has to be implemented (IO\_MTIME\_USE generic is true), too.

To interact with the bootloader, attach the UART signals (uart\_txd\_o and uart\_rxd\_o) of the processor's top entity via a COM port (-adapter) to a computer, configure your terminal program using the following settings and perform a reset of the processor.

Terminal console settings (19200-8-N-1):

- 19200 Baud
- 8 data bits
- No parity bit
- 1 stop bit
- Newline on \r\n (carriage return, newline)
- · No transfer protocol for sending data, just the raw byte stuff

The bootloader uses the LSB of the top entity's gpio\_o output port as high-active status LED (all other output pin are set to low level by the bootloader). After reset, this LED will start blinking at ~2Hz and the following intro screen should show up in your terminal:

```
<< NEORV32 Bootloader >>
BLDV: Jul 2 2020
HWV: 0.1.0.1
CLK: 0x05F5E100 Hz
MHID: 0x00000000
MISA: 0x42801104
CONF: 0x01FF0015
IMEM: 0x00008000 bytes @ 0x00000000
DMEM: 0x00002000 bytes @ 0x80000000
Autoboot in 8s. Press key to abort.
```



The uploaded executables are always stored to the instruction space starting at the base address of the instruction space.

This start-up screen also gives some brief information about the bootloader version and several system parameters:

BLDV Bootloader version (built time).

HWV Processor hardware version (from the mimpid CSR).

USER Custom user code (from the USER\_CODE generic).

CLK Processor clock speed in Hz (via the mclock CSR from the CLOCK\_FREQUENCY generic).

MISA CPU extensions (from the misa CSR).

CONF Processor configuration (via the mfeatures CSR from the IO and MEM config. generics).

IMEM Instructions memory base address and size in byte (via the mispacebase & mispacesize).

CSRs from the MEM\_ISPACE\_BASE & MEM\_ISPACE\_SIZE generics).

DMEM

Data memory base address and size in byte (via the mdspacebase & mdspacesize CSRs from the MEM\_DSPACE\_BASE & MEM\_DSPACE\_SIZE generics).

Now you have 8 seconds to press any key. Otherwise, the bootloader starts the auto boot sequence. When you press any key within the 8 seconds, the actual bootloader user console starts:

```
<< NEORV32 Bootloader >>
BLDV: Jul 2 2020
HWV: 0.1.0.1
CLK: 0x05F5E100 Hz
USER: 0x00000000
MISA: 0x42801104
CONF: 0x01FF0015
IMEM: 0x00008000 bytes @ 0x00000000
DMEM: 0x00002000 bytes @ 0x80000000
Autoboot in 8s. Press key to abort.
Aborted.
Available commands:
 h: Help
 r: Restart
 u: Upload
 s: Store to flash
 l: Load from flash
 e: Execute
CMD:>
```

The auto-boot countdown is stopped and now you can enter a command from the list to perform the corresponding operation:

- h: Show the help text (again)
- **r**: Restart the bootloader and the auto-boot sequence
- u: Upload new program executable (neorv32\_exe.bin) via UART into the instruction memory
- s: Store executable to SPI flash at spi\_csn\_o(0)
- 1: Load executable from SPI flash at spi\_csn\_o(0)
- e: Start the application, which is currently stored in the instruction memory

A new executable can be uploaded via UART by executing the u command. The executable can be directly executed via the e command. To store the recently uploaded executable to an attached SPI flash press s. To directly load an executable from the SPI flash press 1. The bootloader and the auto-boot sequence can be manually restarted via the r command.

### 4.5.1. Auto Boot Sequence

When you reset the NEORV32processor, the bootloader waits 8 seconds for a user console input before it starts the automatic boot sequence. This sequence tries to fetch a valid boot image from the external SPI flash, connected to SPI chip select <code>spi\_csn\_o(0)</code>. If a valid boot image is found and can be successfully transferred into the instruction memory, it is automatically started. If no SPI flash was detected or if there was no valid boot image found, the bootloader stalls and the status LED is permanently activated.

### 4.5.2. External SPI Flash for Booting

If you want the NEORV32 bootloader to automatically fetch and execute an application at system start, you can store it to an external SPI flash. The advantage of the external memory is to have a non-volatile program storage, which can be re-programmed at any time just by executing some bootloader commands. Thus, no FPGA bitstream recompilation is required at all.

### **SPI Flash Requirements**

The bootloader can access an SPI compatible flash via the processor top entity's SPI port and connected to chip select <code>spi\_csn\_o(0)</code>. The flash must be capable of operating at least at 1/8 of the processor's main clock. Only single read and write byte operations are used. The address has to be 24 bit long. Furthermore, the SPI flash has to support at least the following commands:

```
    READ (0x03)
    READ STATUS (0x05)
    WRITE ENABLE (0x06)
    PAGE PROGRAM (0x02)
    SECTOR ERASE (0x08)
    READ ID (0x9E)
```

Compatible (FGPA configuration) SPI flash memories are for example the Winbond W25Q64FV or the Micron N25Q032A.

### **SPI Flash Configuration**

The base address SPI\_FLASH\_BOOT\_ADR for the executable image inside the SPI flash is defined in the "user configuration" section of the bootloader source code (sw/bootloader/bootloader.c). Most FPGAs, that use an external configuration flash, store the golden configuration bitstream at base address 0. Make sure there is no address collision between the FPGA bitstream and the application image. You need to change the default sector size if your Flash has a sector size greater or less than 64kB:



For any change you made inside the bootloader, you have to recompile the bootloader (5.10. Re-Building the Internal Bootloader) and do a new synthesis of the processor.

### 4.5.3. Bootloader Error Codes

If something goes wrong during the bootloader operation, an error code is shown. In this case, the processor stalls, a bell command and one of the following error codes is send to the terminal, the status LED is permanently activated and the system must be manually reset.

| ERR_0 | If you try to transfer an invalid executable (via UART or from the external SPI flash), this error message shows up. Also, if no SPI flash was found during a boot attempt, this message will be displayed.                                                                                                    |
|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ERR_1 | Your program is way too big for the internal processor's instructions memory. Increase the memory size or reduce (optimize!) your application code.                                                                                                                                                            |
| ERR_2 | This indicates a checksum error. Something went wrong during the transfer of the program image (upload via UART or loading from the external SPI flash). If the error was caused by a UART upload, just try it again. When the error was generated during a flash access, the stored image might be corrupted. |
| ERR_3 | This error occurs if the attached SPI flash cannot be accessed. Make sure you have the right type of flash and that it is properly connected to the NEORV32 SPI port using chip select #0.                                                                                                                     |
| ERR_4 | The instruction memory is marked as read-only. Set the MEM_INT_IMEM_ROM generic to false to allow write accesses.                                                                                                                                                                                              |
| ERR_5 | This error pops up when an exception was triggered. Such an error with exception code " $0x00000002$ " will be generated when you try to boot from the instruction memory, but no valid executable has been loaded yet.                                                                                        |

### 4.5.4. Final Notes



The bootloader is intended to work independently of the actual hardware (-configuration). Hence, it should be compiled with the minimal base ISA only. The current version of the bootloader uses the rv32i ISA – so it will not work on rv32e architectures. To make the bootloader work on embedded CPU, recompile it using the rv32e ISA (see chapter 5.10. Re-Building the Internal Bootloader).

### 4.6. NEORV32 Runtime Environment

The software architecture of the NEORV32 comes with a minimal runtime environment that takes care of clean application start and also of all interrupts and exceptions during execution.

The runtime environment is implemented in the sw/common/crt0.asm application start-up code. This piece of code is automatically linked with every application program and represents the starting point for every application. Hence, it is directly executed after reset. The start-up code performs the following operations:

- Initialize all data registers x1 x31 (x1 x15 only for embedded CPU mode) with zero.
- Setup stack-pointer (x2): The stack always starts at the very end of the data address space (DSPACEBASE + DSPACESIZE 4).
- Initialize the global pointer (x3) according to the .data segment layout provided by the linker script.
- Initialize the NEORV32 runtime environment for catching exceptions and interrupts.
- Clear IO area: Write zero to all memory-mapped registers in the IO region. If certain devices have not been implemented, a bus access fault exception will occur. This exception is also processed by the start-up code.
- Clear the .bss section defined by the linker script.
- Copy read-only data from the .text section to the .data section to initialize initialized variables.
- Call the application's main function (with no arguments).
- If the main function return, the processor goes to an endless sleep mode (using a simple loop or the WFI instruction if implemented).

The most interesting point in terms of this chapter is the NEORV32 runtime environment neorv32\_rte. This minimal runtime environment is executed right after booting the application. It basically initializes the RISC-V-compliant mtvec CSR, which provides the base address for <u>all</u> instruction and exception handlers. The address stored to this register reflects the *first-level exception handler* implemented in the sw/common/crt0.asm file. Whenever an exception or interrupt is triggered, this *first-level handler* is called.

The *first-level handler* performs a complete context save, analyzes the source of the exception/interrupt and calls the according *second-level exception* handler, which actually takes care of the exception/interrupt. For this, the first-level exception handler uses an interrupt/exception vector table located right at the beginning of the data memory space. This vector table is not available for "normal" application data storage, which is enforced by the NEORV32 linker script (sw/common/neorv32.ld):

The vector tables has 32 x 4-byte entries. The first 16 entries are reserved for exception handlers, the second 16 entries are reserved for the interrupt handlers. While executing the start-up code each entry is initialized with a default handler, that actually does nothing but return (via ret). By this, every exception and every interrupt is safely catched. Of course, this default handler cannot handle system-critical events. Hence, these default handlers are only relevant for an early system state. The actual application program should replace these dummy handler entries in the vector table with actual "real" handlers.

### **Using the NEORV32 Runtime Environment (RTE)**

The actual application should not directly mess with the exception vector table. Instead, the NEORV32 runtime environment (sw/lib/include/neorv32\_rte.h) provides functions to install the real *second-level handlers* for each of the implemented exceptions and interrupts:

```
int neorv32_rte_exception_install(uint8_t exc_id, void (*handler)(void));
```

The following exc\_id exception IDs are available:

| ID name [C]        | Description / exception or interrupt causing event     |
|--------------------|--------------------------------------------------------|
| EXCID_I_MISALIGNED | Instruction address misaligned                         |
| EXCID_I_ACCESS     | Instruction (bus) access fault                         |
| EXCID_I_ILLEGAL    | Illegal instruction                                    |
| EXCID_BREAKPOINT   | Breakpoint (EBREAK instruction)                        |
| EXCID_L_MISALIGNED | Load address misaligned                                |
| EXCID_L_ACCESS     | Load (bus) access fault                                |
| EXCID_S_MISALIGNED | Store address misaligned                               |
| EXCID_S_ACCESS     | Store (bus) access fault                               |
| EXCID_MENV_CALL    | Environment call from machine mode (ECALL instruction) |
| EXCID_MTI          | Machine timer interrupt (via MTIME)                    |
| EXCID_MEI          | Machine external interrupt (via CLIC)                  |
| EXCID_MSI          | Machine software interrupt                             |

When installing a custom handler function for any of these exception/interrupts, make sure the function uses no attributes, has no arguments and no return value like in the following example:

```
void handler_xyz(void) {
}
```



Do <u>NOT</u> use the ((interrupt)) attribute for the application exception handler functions! This will place an mret instruction to the end of it making it impossible to return to the first-level exception handler, which will cause stack corruption.

Example: Installation of the MTIME interrupt handler:

```
neorv32_rte_exception_install(EXC_MTI, handler_xyz);
```

To remove a previously installed exception handler, call the according uninstall function from the NEORV32 runtime environment. This will replace the previously installed handler by the initial default dummy handler, so even uninstalled exceptions and interrupts are further captured.

```
int neorv32_rte_exception_uninstall(uint8_t exc_id);
```

**Example:** Removing the MTIME interrupt handler:

```
neorv32_rte_exception_uninstall(EXC_MTI);
```

### **Debugging**

For debugging purpose, the NEORV32 runtime environment features a "debug handler" that can be installed for all available exceptions and interrupts. This debug handler will give detailed information about the triggered exception/interrupt via UART. After that, it will try to resume normal application execution. This attempt might fail – but at least the debug handler gives detailed information for digging into the faulty code. The installation of the debug handler is triggered via the following function:

```
void neorv32_rte_enable_debug_mode(void);
```



Installing the debug handler will override all previous interrupt/exception handler installations. Hence, enable the NEORV32 RTE debug mode right at program start and install your custom exception/interrupt handler after that.



More information regarding the NEORV32 runtime environment can be found in the doxygen software documentation (also available online).

### 5. Let's Get It Started!

To make your NEORV32 project run, follow the guides from the upcoming sections. Follow these guides step by step and in the presented order.

### 5.1. Toolchain Setup

At first, we need to get the RISC-V GCC toolchain. There are two possibilities to do this:

- Download and compile the official RISC-V GNU toolchain
- Download and install a prebuilt version of the toolchain

Compilation of the toolchain is done using the guide from the official <a href="https://github.com/riscv/riscv-gnu-toolchain">https://github.com/riscv/riscv-gnu-toolchain</a> GitHub page. I have done that on my computer and you can download my prebuilt version from <a href="https://github.com/stnolting/riscv\_gcc\_prebuilt">https://github.com/stnolting/riscv\_gcc\_prebuilt</a>.

The default toolchain for this project is riscv32-unknown-elf.

Of course you can use any other RISC-V toolchain. Just change the RISCV\_TOOLCHAIN variable in the application makefile(s) according to your needs.

Besides of the RISC-V GCC, you will need a native GCC to compile the NEORV32 image generator.

### 5.1.1. Making the Toolchain from Scratch

The official RISC-V repository uses submodules. You need the --recursive option to fetch the submodules automatically:

```
$ git clone --recursive https://github.com/riscv/riscv-gnu-toolchain
```

Download and install the prerequisite standard packages:

\$ sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev

To build the Linux cross-compiler, pick an install path. If you choose, say, /opt/riscv, then add /opt/riscv/bin to your PATH now.

```
$ export PATH:$PATH:/opt/riscv/bin
```

Then, simply run the following commands in the RISC-V GNU toolchain source folder (for rv32i):

```
riscv-gnu-toolchain$ ./configure --prefix=/opt/riscv --with-arch=rv32i -with-abi=ilp32 riscv-gnu-toolchain$ make
```

After a while (hours!) you will get riscv32-unknown-elf-gcc and all of its friends in your /opt/riscv/bin folder.

### 5.1.2. Downloading and Installing the Prebuilt Toolchain

Alternatively, you can download a prebuilt version of the toolchain. I have compiled the toolchain on a 64-bit x86 Ubuntu (Ubuntu on Windows, actually):

\$ git clone https://github.com/stnolting/riscv\_gcc\_prebuilt.git

Alternatively, you can directly download the according toolchain archive from: https://github.com/stnolting/riscy\_gcc\_prebuilt

Unpack the archive and copy the content to a location in your file system (e.g. /opt/riscv).



Of course, you can also use any other prebuilt version of the toolchain. Make sure it supports the rv32i/e architecture and uses the ilp32 or ilp32e ABI.

#### 5.1.3. Installation

Now you have the binaries. The last step is to add them to your PATH environment variable (if you have not already done so). Make sure to add the <u>binaries folder</u> (bin) of your toolchain.

\$ export PATH:\$PATH:/opt/riscv/bin

You should add this command to your .bashrc (if you are using bash) to automatically add the RISC-V toolchain at every console start.

### **5.1.4.** Testing the Installation

To make sure everything works fine, navigate to an example project in the NEORV32 example folder and execute the following command:

neorv32/sw/example/blink\_led\$ make check

This will test all the tools required for the NEORV32. Everything is working fine if Toolchain check OK appears at the end.

# 5.2. General Hardware Setup

The following steps are required to generate a bitstream for your FPGA board. If you want to run the NEORV32 in simulation only, the following steps might also apply.

In this tutorial we will use a test implementation of the processor – using most of the processor's optional modules but just propagating the minimal signals to the outer world. Hence, this guide is intended as evaluation or "hello world" project to check out the NEORV32. A little note: The order of the following steps might be a little different for your specific EDA tool.

- 1. Create a new project with your FPGA EDA tool of choice.
- 2. Add all VHDL files from the project's rtl/core folder to your project. Make sure to *reference* the files only do not copy them.
- 3. Make sure to add all the rtl files to a new **library** called neorv32. If your FPGA tools does not provide a field to enter the library name, check out the "properties" menu of the rtl files.
- 4. The <a href="rtl/core/neorv32\_top.vhd">rtl/core/neorv32\_top.vhd</a> VHDL file is the top entity of the NEORV32 processor. If you already have a design, instantiate this unit into your design and proceed. If you do not have a design yet and just want to check out the NEORV32 no problem! In this guide we will use a simplified top entity: Add the <a href="rtl/core/top\_templates/neorv32\_test\_setup.vhd">rtl/core/top\_templates/neorv32\_test\_setup.vhd</a> VHDL file to your project too, and select it as top entity. This test setup provides a minimal test hardware setup:



Figure 6: Hardware configuration of the NEORV32 test setup

5. This test setup only implements some very basic processor and CPU features. Also, only the minimum number of signals is propagated to the outer world.

6. The configuration of the NEORV32 processor is done using the generics of the instantiated processor top entity. Let's keep things simple at first and use the default configuration (see below). But there is one generic, that has to be set according to your FPGA / board: The clock frequency of the top's clock input signal (clk\_i). Use the CLOCK\_FREQUENCY generic to specify your clock source's frequency in Hertz (Hz). The default value, that you need to adapt, is marked in orange.

```
neorv32_top_inst: neorv32_top
generic map (
  -- General --
                                 => 100000000, -- in Hz
 CLOCK FREQUENCY
 BOOTLOADER USE
                                 => true,
                                => true,
 CSR_COUNTERS_USE
                                => x"00000000",
  USER CODE
  -- RISC-V CPU Extensions --
 CPU_EXTENSION_RISCV_C => false,
CPU_EXTENSION_RISCV_E => false,
CPU_EXTENSION_RISCV_M => false,
CPU_EXTENSION_RISCV_Zicsr => true,
  CPU_EXTENSION_RISCV_Zifencei => true,
  -- Memory configuration: Instruction memory --
 MEM_ISPACE_BASE
                                => x"00000000"
                                 => 16*1024, -- in BYTES
 MEM_ISPACE_SIZE
                                => true,
 MEM INT IMEM USE
 MEM_INT_IMEM_SIZE
                                => 16*1024, -- in BYTES
                                 => false,
 MEM_INT_IMEM_ROM
  -- Memory configuration: Data memory --
                   => x"80000000",
  MEM_DSPACE_BASE
                                 => 8*1024, -- in BYTES
 MEM DSPACE SIZE
 MEM_INT_DMEM_USE => true,
MEM_INT_DMEM_SIZE => 0*4000
                                => 8*1024, -- in BYTES
  -- Memory configuration: External memory interface --
 MEM_EXT_USE
                                => false,
 MEM_EXT_REG_STAGES
                                 => 2,
                                 => 15,
 MEM_EXT_TIMEOUT
  -- Processor peripherals --
 IO_GPIO_USE
                                 => true,
 IO_MTIME_USE
                                 => true,
  IO_UART_USE
                                 => true,
                                 => false,
 IO_SPI_USE
 IO TWI USE
                                => false,
  IO_PWM_USE
                                => false,
 IO_WDT_USE
                                 => true,
  IO CLIC USE
                                 => true,
                                 => false,
  IO_TRNG_USE
  IO DEVNULL USE
                                 => true
)
```

7. If you feel like it – or if your FPGA does not provide enough resources – you can modify the memory sizes (MEM\_INT\_IMEM\_SIZE and MEM\_INT\_DMEM\_SIZE, marked in **red** and **blue**) or exclude certain peripheral modules from implementation. But as mentioned above, let's keep things simple and use the standard configuration for now. We will use the processor-internal data and instruction memories for the test setup. So make sure, the instruction and data space sizes are always equal to the sizes of the internal memories (i.e. MEM\_INT\_IMEM\_SIZE == MEM\_ISPACESIZE and MEM\_INT\_DMEM\_SIZE == MEM\_DSPACESIZE).



Keep the internal instruction and data memory sizes in mind as these values will be required for setting up the software framework in the next chapter.

8. Depending on your FPGA tool of choice, it is time to assign the signals of the test setup top entity to the according pins of your FPGA board. All the signals can be found in the entity declaration:

```
entity neorv32_test_setup is
  port (
    -- Global control --
    clk_i : in std_ulogic := '0'; -- global clock, rising edge
    rstn_i : in std_ulogic := '0'; -- global reset, low-active, async
    -- GPIO --
    gpio_o : out std_ulogic_vector(7 downto 0); -- parallel output
    -- UART --
    uart_txd_o : out std_ulogic; -- UART send data
    uart_rxd_i : in std_ulogic := '0' -- UART receive data
    );
end neorv32_test_setup;
```

- 9. Attach the clock input clk\_i to your clock source and connect the reset line rstn\_i to a button of your FPGA board. Check whether it is low-active or high-active the reset signal of the processor must be low-active, so maybe you need to invert the input signal. If possible, connected at least bit #0 of the GPIO output port gpio\_o to a high-active LED (invert the signal when your LEDs are low-active). Finally, connect the UART signals uart\_txd\_o and uart\_rxd\_i to your serial host interface (dedicated pins, USB-to-serial converter, etc.).
- 10. Perform the project HDL compilation (synthesis, mapping, bitstream generation).
- 11. Download the generated bitstream into your FPGA ("program" it) and press the reset button (just to make sure everything is sync).
- 12. Done! If you have assigned the bootloader status LED (bit #0 of the GPIO output port), it should be flashing now and you should receive the bootloader start prompt via the UART.

## 5.3. General Software Framework Configuration

While your synthesis tool is crunching the NEORV32 HDL files, it is time to configure the project's software framework for your processor hardware.

- 1. You need to tell the linker the size of the processor's instruction and data memories. We have just configured the test setup so you should remember the memory configuration.
- 2. Open the application linker script sw/common/neorv32.ld with a text editor. Right at the beginning of the linker script you will find the memory configuration:

3. There are four parameters that are relevant here: The origin and the length of the instruction memory (named rom) and the origin and the length of the data memory (named ram). These four parameters have to be always sync to your hardware memory configuration:



The rom ORIGIN parameter has to be equal to the configuration of the MEM\_ISPACE\_BASE generic. The rom LENGTH parameter has to be equal to the configuration of the MEM\_ISPACE\_SIZE generic.



The ram ORIGIN parameter has to be equal to the configuration of the MEM\_DSPACE\_BASE generic. The ram LENGTH parameter has to be equal to the configuration of the MEM\_DSPACE\_SIZE generic.



Make sure you **do not** delete the + 2\*16\*4 right after the origin of the RAM! This offset is required to reserve space for the exception vector table managed by the NEORV32 runtime environment.

# 5.4. Building the Software Documentation

If you wish, you can generate the documentation of the NEORV32 software framework. This <u>doxygen</u>-based documentation illustrates the core libraries as well as all the example programs. A deployed version of the documentation can be found online at <u>GitHub pages</u>.

1. Make sure doxygen is installed. Navigate to the docs folder and generate the documentation files using the provided doxygen makefile:

```
neorv32/docs$ doxygen doxygen_makefile_sw
```

2. Doxygen will generate a HTML-based documentary. The output files are placed in (a new folder) docs/doxygen\_build/html. Move to this folder and open index.html with your browser. Click on the "files" tab to see an overview of all documented files.

# 5.5. Application Program Compilation

- 1. Open a terminal console and navigate to one of the project's example programs. For example the simple sw/example\_blink\_led program. This program uses the NEORV32 GPIO unit to display an 8-bit counter on the lowest eight bit of the gpio\_o port.
- 2. To compile the project, execute:

```
neorv32/sw/example/blink_led$ make compile
```

3. This will compile and link the application sources together with all the included libraries. At the end, your application is put into an ELF file (main.elf). The image generator process takes this file and creates a final executable. The makefile will show the resulting memory utilization and the executable size in the console:

```
neorv32/sw/example/blink_led$ make compile

Memory utilization:

text data bss dec hex filename

852 0 0 852 354 main.elf

Executable (neorv32_exe.bin) size in bytes:

864
```

4. That's it. The compile target has created the actual executable (neorv32\_exe.bin) in the current folder, which is ready to be uploaded to the processor via the bootloader and a UART interface.

### 5.6. Uploading and Starting of a Binary Executable Image via UART

We have just created the executable. Now it is time to upload it to the processor. In this tutorial, we will use **TeraTerm** as an exemplary serial terminal program for **Windows**, but the general procedure is the same for other terminal programs, build environments or operating systems.



Make sure your terminal program can transfer the executable in raw byte mode without any protocol stuff around it.

- 1. Connect the UART interface of your FPGA (board) to a COM port of your computer or use an USB-to-serial adapter.
- 2. Start a terminal program. In this tutorial, I am using TeraTerm for Windows. You can download it from <a href="https://ttssh2.osdn.jp/index.html.en">https://ttssh2.osdn.jp/index.html.en</a>
- 3. Open a connection to the corresponding COM port. Configure the terminal according to the following parameters:
  - 19200 Baud
  - 8 data bits
  - 1 stop bit
  - No parity bits
  - No transmission/flow control protocol (just raw byte mode)
  - Newline on \r\n = carriage return & newline (if configurable at all)



Figure 7: Serial configuration of TeraTerm

- 4. Also make sure, that single chars are transmitted without any consecutive "new line" or "carriage return" commands (this is highly dependent on your terminal application of choice, TeraTerm only sends the raw chars by default).
- 5. Press the NEORV32 reset button to restart the bootloader. The status LED starts blinking and the bootloader intro screen appears in your console. Hurry up and press any key (hit space!) to abort the automatic boot sequence and to start the actual bootloader user interface console.

This project is licensed under the BSD 3-Clause License (BSD). Copyright (c) 2020, Stephan Nolting. All rights reserved.

```
<< NEORV32 Bootloader >>
BLDV: Jul 2 2020
HWV: 0.1.0.1
CLK: 0x05F5E100 Hz
USER: 0x00000000
MISA: 0x42801104
CONF: 0x01FF0015
IMEM: 0x00008000 bytes @ 0x00000000
DMEM: 0x00002000 bytes @ 0x80000000
Autoboot in 8s. Press key to abort.
Aborted.
Available commands:
 h: Help
 r: Restart
 u: Upload
 s: Store to flash
 l: Load from flash
 e: Execute
CMD:>
```

6. Execute the "Upload" command by typing u. Now the bootloader is waiting for a binary executable to be send.

```
CMD:> u
Awaiting neorv32_exe.bin...
```

- 7. Use the "send file" option of your terminal program to transmit the previously generated binary executable neorv32\_exe.bin.
- 8. Again, make sure to transmit the executable in **raw binary mode** (no transfer protocol, no additional header stuff). When using TeraTerm, select the "binary" option in the send file dialog:



Figure 8: Transfer executable in binary mode (German version of TeraTerm)

9. If everything went fine, ox will appear in your terminal:

```
CMD:> u
Awaiting neorv32_exe.bin... OK
```

10. The executable now resides in the instruction memory of the processor. To execute the program right now execute the "Execute" command by typing e.

```
CMD:> u
Awaiting neorv32_exe.bin... OK
CMD:> e
Booting...
Blinking LED demo program
```

11. Now you should see the LEDs counting.

# 5.7. Setup of a New Application Program Project

Done with all the introduction tutorials and those example programs? Then it is time to start your own application project!

- 1. The easiest way of creating a new project is to make a copy of an existing project (like the blink\_led project) inside the example folder. By this, all file dependencies are kept and you can start coding and compiling.
- 2. If you want to have he project folder somewhere else, you need to adapt the project's makefile. In the makefile you will find a variable that keeps the relative or absolute path to the NEORV32 home folder. Just modify this variable according to your project's location:

```
# Relative or absolute path to the NEORV32 home folder (use default if not set by user) NEORV32_HOME ?= ../../..
```

3. If your project contains additional source files outside of the project folder, you can add them to the APP SRC variable:

```
# User's application sources (add additional files here)
APP_SRC = $(wildcard *.c) ../somewhere/some_file.c
```

4. You also need to add the folder containing the include files of your new project to the APP\_INC variable (do not forget the -I prefix):

```
# User's application include folders (don't forget the '-I' before each entry)

APP_INC = -I . -I ../somewhere/include_stuff_folder
```

5. If you feel like it, you can change the default optimization level:

```
# Compiler effort
EFFORT = -0s
```

## 5.8. Enabling RISC-V CPU Extensions

Whenever you enable a RISC-V compliant CPU extensions via the CPU\_EXTENSION\_RISCV\_\* generics, you need to adapt the toolchain configuration so the compiler actually can make use of the extension.

To do so, open the makefile of your project (e.g., sw/example/blink\_led/makefile) and scroll to the "USER CONFIGURATION" section right at the beginning of the file. You need to modify the MARCH and MABI variables according to your CPU hardware configuration.

```
# CPU architecture and ABI
MARCH = -march=rv32i
MABI = -mabi=ilp32
```

The following table shows the different combinations of CPU extensions and the according configuration for the MARCH and MABI variables. Of course you can also just use a subset of the available extensions (e.g. march=rv32im for a rv32imc CPU).

| Enabled CPU Extension(s)                                          | Toolchain MARCH      |      | Toolchain MABI |
|-------------------------------------------------------------------|----------------------|------|----------------|
| none                                                              | MARCH=-march=rv32i   |      |                |
| CPU_EXTENSION_RISCV_C                                             | MARCH=-march=rv32ic  |      |                |
| CPU_EXTENSION_RISCV_M                                             | MARCH=-march=rv32im  | MABI | = -mabi=ilp32  |
| CPU_EXTENSION_RISCV_C<br>CPU_EXTENSION_RISCV_M                    | MARCH=-march=rv32imc |      |                |
| CPU_EXTENSION_RISCV_E                                             | MARCH=-march=rv32e   |      |                |
| CPU_EXTENSION_RISCV_E<br>CPU_EXTENSION_RISCV_C                    | MARCH=-march=rv32ec  |      |                |
| CPU_EXTENSION_RISCV_E<br>CPU_EXTENSION_RISCV_M                    | MARCH=-march=rv32em  | MABI | = -mabi=ilp32e |
| CPU_EXTENSION_RISCV_E CPU_EXTENSION_RISCV_C CPU_EXTENSION_RISCV_M | MARCH=-march=rv32emc |      |                |



The toolchain always supports the privileged instructions (like ECALL or CSR access instructions) regardless of the MARCH or MABI configuration. However, the compiler will not generate the according instructions by itself. Privileged instructions are only used when explicitly coded as inline assembly or when using according libraries. When the CPU\_EXTENSION\_RISCV\_Zicsr extension is not synthesized, all these instructions will behave like NOPs. Instructions returning data will always return zero.



When using the embedded CPU mode (CPU\_EXTENSION\_RISCV\_E generic set true) a C-library, which was compiled for the ilp32e ABI, is required.



The CSR access and the instruction fence extensions (CPU\_EXTENSION\_RISCV\_Zicsr and CPU\_EXTENSION\_RISCV\_Zifencei) is supported by all toolchain configurations and all ABIs and need no further makefile configuration.

### 5.9. Building a Non-Volatile Application (Program Fixed in IMEM)

The purpose of the bootloader is to allow an easy and fast update of the application being currently executed. But maybe at some time your project has become mature and you want to actually embed your processor including the application. Of course you can store the executable to the SPI flash and let the bootloader fetch and execute it a system start. But if you don't have an SPI flash available or you want a really fast start of your applications, you can directly implement your executable within the processor internal instruction memory. When using this approach, the bootloader is no longer required. To have your application to permanently reside in the internal instruction memory, follow the upcoming steps.



This works only for the internal instruction memory. Also make sure that the memory components the IMEM is mapped to support an initialization via the bitstream.

1. At first, compile your application code by running the make install command (the memory utilization is not shown again when your code has already been compiled):

```
neorv32/sw/example/blink_led$ make compile

Memory utilization:
   text data bss dec hex filename
   852 0 0 852 354 main.elf

Executable (neorv32_exe.bin) size in bytes:
864

Installing application image to ../../../rtl/core/neorv32_application_image.vhd
```

- 2. The install target has created an executable, too, but this time in the form of a VHDL memory initialization file. At synthesis, this initialization will become part of the final FPGA bitstream, which in terms initializes the IMEM's blockram.
- 3. You need the processor to directly execute the code in the IMEM. Deactivate the implementation of the bootloader via the top entity's generic:

```
BOOTLOADER_USE => false, -- implement processor-internal bootloader?
```

- 4. When the bootloader is deactivated, the according ROM is removed and the CPU will start booting at the base address of the instruction memory space. Thus, the CPU directly executed your application code after reset.
- 5. The IMEM could be still modified, since it is implemented as RAM. This might corrupt your executable. To prevent this and to implement the IMEM as true ROM (and eventually saving some more hardware resources), active the IMEM as ROM feature using the processor's top entity generic:

```
MEM_INT_IMEM_ROM => true, -- implement processor-internal instruction memory as ROM
```

6. Perform a synthesis and upload your new bitstream. Your application code resides now unchangeable in the processor's IMEM and is directly executed after reset.

# 5.10. Re-Building the Internal Bootloader

If you have modified any of the configuration parameters of the default bootloader (in sw/bootloader.c), if you have added additional features or if you have implemented your own bootloader, you need to re-compile and re-install the bootloader.

- 1. The NEORV32 default bootloader uses 4kB of boot ROM space. This is also the default boot ROM size. If your new/modified bootloader exceeds this size, you need to modify the boot ROM configurations.
- 2. Open the processor's main package file rtl/core/neorv32\_package.vhd and edit the boot\_size\_c constant according to your requirements. The boot ROM size **must not exceed 32kB** and should be a power of two (for optimal hardware mapping).

```
-- Bootloader ROM -- constant boot_size_c : natural := 4*1024; -- bytes
```

3. Now open the bootloader linker script sw/common/bootloader\_neorv32.ld and adapt the LENGTH parameter of the bootrom according to your new memory size. boot\_size\_c and LENGTH have to be always identical.

```
MEMORY
{
  bootrom (rx) : ORIGIN = 0xffff0000, LENGTH = 4*1024
}
```

4. Compile and install the bootloader using the explicit bootloader makefile target. This target uses the bootloader-specific start-up code and linker script instead of the regular application files.

```
neorv32/sw/bootloader$ make bootloader
```

5. Now perform a new synthesis / HDL compilation to update the bitstream with the new bootloader image.



The bootloader is intended to work regardless of the actual NEORV32 hardware configuration – especially when it comes to CPU extensions. Hence, the bootloader should be build using the minimal rv32i ISA only (rv32e would be even better).



See chapter <u>4.5. Bootloader</u> for more information regarding the bootloader.

# 5.11. Programming the Bootloader SPI Flash

- 1. At first, reset the NEORV32 processor and wait until the bootloader start screen appears in your terminal program.
- 2. Abort the auto boot sequence and start the user console by pressing any key.
- 3. Press u to upload the program image, that you want to store to the external flash:

```
CMD:> u
Awaiting neorv32_exe.bin...
```

4. Send the binary in raw binary via your terminal program. When the uploaded is completed and OK appears, press p to trigger the programming of the flash (do not execute the image via the e command as this might corrupt the image):

```
CMD:> u
Awaiting neorv32_exe.bin... OK
CMD:> p
Write 0x000013FC bytes to SPI flash @ 0x00800000? (y/n)
```

5. The bootloader shows the size of the executable and the base address inside the SPI flash where the executable is going to be stored. A prompt appears: Type y to start the programming or type n to abort.

```
CMD:> u
Awaiting neorv32_exe.bin... OK
CMD:> p
Write 0x000013FC bytes to SPI flash @ 0x00800000? (y/n) y
Flashing... OK
CMD:>
```

6. If OK appears in the terminal line, the programming process was successful. Now you can use the auto boot sequence to automatically boot your application from the flash at system start-up without any user interaction.



See chapter 4.5. Bootloader for more information regarding the bootloader.

### 5.12. Simulating the Processor

The NEORV32 project features a simple testbench (sim/neorv32\_tb.vhd) that can be used to simulate and test the processor and the CPU itself. This testbench features a 100MHz clock and enables all optional peripheral devices and all optional CPU extensions (but not the embedded CPU mode).



Please note that the true-random number generator (TRNG) <u>CANNOT</u> be simulated due to its combinatorial oscillator architecture.

The simulated NEORV32 does not use the bootloader and directly boots the current application image (from the rtl/core/neorv32\_application\_image.vhd image file). Make sure to use the all target of the makefile to **install** your application as VHDL image after compilation:

sw/example/blink\_led\$ make clean\_all all

### **Simulation Console Output**

Data written to the NEORV32 UART transmitter is send to a virtual UART receiver implemented within the testbench. This receiver uses the default (bootloader) UART configuration. Received chars are send to the simulator console and are also stored to a file (neorv32.testbench\_uart.out) in the simulator home folder

### **Faster Simulation Console Output**

When printing data via the UART the communication will always be based on the configured BAUD rate. For a simulation this will take a very long time. To have a faster output you can send data to the DEVNULL device (see chapter 3.10. Dummy Device (DEVNULL)). ASCII data written to this device will be immediately printed to the simulator console. Additionally, the ASCII data is logged in a file (neorv32.devnull.out) in the simulator home folder. All written data is also dumped as 32-bit hexadecimal value into a file called neorv32.devnull.data.out in the simulation home folder.

You can also redirect all data written to the UART transmitter for sending directly to the DEVNULL simulation output. Hence, the UART transmitter is not used at all. To enable this redirection just compile and install your application and <u>add DEVNULL\_UART\_OVERRIDE</u> to the compiler USER\_FLAGS variable (do not forget the -D suffix flag):

sw/example/blink\_led\$ make USER\_FLAGS+=-DDEVNULL\_UART\_OVERRIDE clean\_all all

#### Xilinx Vivado

The project features a Vivado simulation waveform configuration in sim/vivado.

### **GHDL**

To simulate the processor using GHDL navigate to the sim folder and run the provided shell script. The simulation time can be configured in the script via the --stop-time=4ms argument.

neorv32/sim\$ sh ghdl\_sim.sh

# 5.13. Continuous Integration

This project uses continuous integration provided by <u>Travis CI</u>. The project includes a .travis.yml file for configuring Travis CI. This configuration file uses the continuous integration scripts located in .ci.

What the continuous integration does so far:

- Builds the doxygen-based software documentation and deploys it to GitHub pages
- Downloads, unpacks and installs the <u>pre-built GCC toolchain</u>
- Test the toolchain
- Compile all example projects from the sw/example folder
- Compile the bootloader from the sw/bootloader folder
- Compile and install the CPU test code from the <a href="mailto:sw/bootloader/cpu\_test">sw/bootloader/cpu\_test</a> folder, the generated executable uses the DEVNULL simulation output
- Simulate the processor using its default testbench (sim/neorv32\_tb.vhd) using GHDL
- The DEVNULL output is searched for a reference string; if the string is found the test was successful

# 6. Troubleshooting

If your setup does not work as expected, scroll through the following point. Maybe you have missed something during setup. If you are still encountering problems, open a new issue on GitHub or contact me.

- Check the correct installation of the toolchain with a \$ make check.
- Check the synthesis tool for errors and warnings. Double-check the timing report.
- Does the processor simulate correctly?
- Synthesis tools can be a little bit obscure sometimes. Use the default synthesis options and do not start with a target frequency of 800MHz for your first setup.
- If your application does not run, make a clean rebuild: \$ make clean\_all compile
- If it still does not run, enable the debug feature of the NEORV32 runtime environment.
- Make sure your hardware configuration (like memory sizes, CPU extensions,...) is always sync with the toolchain (linker script configuration, targeted architecture (march), ABI (mabi),...).
- Make sure to have the latest version of the project (do a git pull) or a stable release.
- More to come...

# 7. Change Log

| Date (DD.MM.YYYY) | HW version | Modifications                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|-------------------|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 23.06.2020        | 0.0.2.3    | Publication                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| 25.06.2020        | 0.0.2.5    | Added DEVNULL device; added chapter regarding processor simulation; fixed/added links; fixed typos; added FPGA implementation results for iCE40 UP                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| 05.07.2020        | 1.0.0.0    | New CPU architecture: Fetch and execute engines; increased CPI; timer and counter CSRs are now all 64-bit wide; fixed CSR access errors; fixed C.LW decompression logic; misa flags C and M are now r/w – compressed mode and multiplier/divider support can be switched on/off during runtime; PC(0) is now always zero; fixed bug in multiplier/divider co-processor; renamed SPI signals; added RISC-V compliance check information – processor now passes the official RISC-V compliance tests                                                                                                           |
| 06.07.2020        | 1.0.0.1    | Removed instret CSR since it is not yet ratified by the RISC-V spec.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| 06.07.2020        | 1.0.1.0    | Added missing fence instruction; added new generic to enable optional Zifencei CPU extension for instruction stream synchronization                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| 09.07.2020        | 1.0.5.0    | X flag of misa CSR is zero now; the default SPI flash boot address of the bootloader is now 0x0080000; new exemplary FPGA utilization results for Intel, Lattice and Xilinx; misa CSR is read-only again, switching compressed extension on/off is pretty bad for the fetch engine; mtval and mcause CSRs now allow write accesses and are finally RISC-V-compliant; time low and high registers of MTIME peripheral can now also be written by user; MTIME registers only allow full-word write accesses                                                                                                    |
| 10.07.2020        | 1.0.6.0    | Non-taken branches are now 1 cycle faster; the time[h] CSR now correctly reflects the system time from the MTIME unit; fixed WFI instruction permanently stalling the CPU; [m]cycle[h] counters now stop counting when CPU is in sleep mode; minstret[h] and mcycle[h] now also allow writeaccess                                                                                                                                                                                                                                                                                                            |
| 14.07.2020        | 1.1.0.0    | Added fence_o and fencei_o signals to top entity to show if a fence or fencei instruction is executed; added mvendorid and marchid CSRs (both are always zero); ALU shift unit is faster now; two lowest bits of mtvec are always zero; fixed wrong instruction exception priority; removed HART_ID generic – mhartid CSR is always read as zero; performance counters ([m]cycle[h], [m]instret[h] and time[h]) are also available in embedded mode – but can be explicitly disabled via the CSR_COUNTERS_USE generic; mcause CSR only allows write access to bit 31 and bits 3:0; updated synthesis reports |
| 19.07.2020        | 1.2.0.0    | CPU bus unit now has independent busses for instruction fetch and data access — merged into single processor bus via new bus switch unit; doubled speed of ALU shifter unit again; all bits of mcause CSR can now be modified by application program (full RISC-V-compliant); performance counters CSRs [m]cycleh and [m]instreth are only 20-bit wide; removed NEORV32-specific custom CSRs — all processor-related information can be obtained from the new SYSINFO IO module (CPU is now more independent from processor configuration); changed IO address of DEVNULL; fixed bug in bootloader's         |

# The NEORV32 Processor

| Date (DD.MM.YYYY) | HW version | Modifications                                                                                                                  |
|-------------------|------------|--------------------------------------------------------------------------------------------------------------------------------|
|                   |            | trap handler; added USER_CODE generic to assign a custom user code that can be read by software (from SYSINFO)                 |
| 20.07.2020        | 1.2.0.5    | Less penalty for taken branches and jumps (2 cycles faster)                                                                    |
| 21.07.2020        | 1.2.0.6    | Added section regarding the CPU's data and instruction interfaces; optimized CPU fetch engine; updated iCE40 synthesis results |