# The NEO430 Processor

by Dipl.-Ing. Stephan Nolting

A small, powerful and highly customizable open-source 16-bit soft-core microcontroller, compatible to TI's MSP430© ISA

Processor Hardware Version: 0x0184





## **Proprietary Notice**

- "MSP430" is a trademark of Texas Instruments Corporation.
- "ISE", "Vivado", "ISIM", "Artix" and "Virtex" are trademarks of Xilinx Inc.
- "AXI" and "AXI-Lite" are trademarks of Arm Holdings plc.
- "ModelSim" is a trademark of Mentor Graphic.
- "Quartus", "Cyclone" and "Avalon Bus" are trademarks of Intel Corporation.
- "Lattice Diamond" and "MachXO2" are trademarks of Lattice Corporation.
- "Windows" is a trademark of Microsoft Corporation.
- "Cygwin" © by Red Hat Inc.
- "Tera Term" © by T. Teranishi.

This documents was created using LibreOffice.

#### License / Disclaimer

This file is part of the NEO430 Project by Stephan Nolting.

This source file may be used and distributed without restriction provided that this copyright statement is not removed from the file and that any derivative work contains the original copyright notice and the associated disclaimer.

This source file is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This source is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this source; if not, download it from <a href="https://www.gnu.org/licenses/lgpl-3.0.en.html">https://www.gnu.org/licenses/lgpl-3.0.en.html</a>.

## The **NEO430 Project**

© Dipl.-Ing. Stephan Nolting, Hannover, Germany

For any kind of feedback, feel free to drop me a line: stnolting@gmail.com

The most recent version of the NEO430 project and the according documentary can be found at <a href="https://github.com/stnolting/neo430">https://github.com/stnolting/neo430</a>

This project is published under the GNU Lesser General Public License (LGPL)

June 22, 2018

## **Table of Content**

| 1. Introduction                                                   | 5          |
|-------------------------------------------------------------------|------------|
| 1.1. Processor Features                                           |            |
| 1.2. Main Differences to TI's Original MSP430 Architecture        |            |
| 1.3. Project Folder Structure                                     | 8          |
| 1.4. Processor VHDL File Hierarchy                                | 9          |
| 1.5. Processor Top Entity – Signals                               | 10         |
| 1.6. Processor Top Entity – Configuration Generics                | 11         |
| 1.7. Alternative Top Entities                                     | 11         |
| 1.8. FPGA Implementation Results                                  | 12         |
| 1.8.1. Full Implementation (Default)                              | 12         |
| 1.8.2. Minimal Configuration                                      |            |
| 1.8.3. Resource Utilization by Entity                             |            |
| 2. Hardware Architecture                                          |            |
| 2.1. NEO430 CPU                                                   |            |
| 2.1.1. Status Register                                            |            |
| 2.1.2. Interrupts                                                 |            |
| 2.1.3. Instruction Set                                            |            |
| 2.1.4. Instruction Timing                                         |            |
| 2.1.5. System Bus                                                 | 22         |
| 2.2. Internal Instruction Memory (IMEM)                           | 23         |
| 2.3. Internal Data Memory (DMEM)                                  |            |
| 2.4. Boot ROM                                                     |            |
| 2.5. Wishbone Bus Interface Adapter (WB32)                        |            |
| 2.6. General Purpose IO (GPIO)                                    |            |
| 2.7. USART / USI                                                  |            |
| 2.7.1 Universal Asynchronous Receiver/Transmitter (UART)          |            |
| 2.7.2 Serial Peripheral Interface (SPI)                           |            |
| 2.8. High-Precision Timer (TIMER)                                 |            |
| 2.9. Watchdog Timer (WDT)                                         |            |
| 2.10. System Configuration Module (SYSCONFIG)                     |            |
| 2.11. Multiplier and Divider Unit (MULDIV)                        |            |
| 2.12. Cyclic Redundancy Checksum Computation Unit (CRC)           |            |
| 2.13. Custom Functions Unit (CFU)                                 |            |
| 2.14. Pulse Width Modulation Controller (PWM)                     |            |
| 2.15. True Random Number Generator (TRNG)                         |            |
|                                                                   | <b>4</b> 4 |
| 3.1. Executable Program Image                                     |            |
| 3.1.1. Image Sections                                             | 45         |
| 3.1.2. Dynamic Memory                                             |            |
| 3.1.3. Application Start-Up Code                                  |            |
| 3.1.4. Executable Image Formats                                   |            |
| 3.2. Internal Bootloader                                          |            |
| 3.2.1. Auto Boot Sequence.                                        |            |
| 3.2.2. Error Codes                                                | 49         |
| 4. Let's Get It Started!                                          |            |
| 4.1. General Hardware Setup.                                      |            |
| 4.2. General Software Setup                                       |            |
| 4.3. Application Program Compilation using Windows CMD Batch File |            |
| 4.4. Application Program Compilation using Cygwin/Linux Makefile  | 55         |

## **NEO430** Processor

## by Stephan Nolting

## This project is hosted on GitHub stnolting@gmail.com

| 4.5. Uploading and Starting of a Binary Executable Image via UART    | 57 |
|----------------------------------------------------------------------|----|
| 4.6. Programming an External SPI Boot EEPROM                         | 60 |
| 4.7. Setup of a New Application Program Project                      |    |
| 4.8. Simulating the Processor                                        |    |
| 4.9. Changing the Compiler's Optimization Goal                       |    |
| 4.10. Re-Building the Internal Bootloader                            |    |
| 4.11. Building a Non-Volatile Application (Program Fixed in IMEM)    |    |
| 4.12. Alternative Top Entities / Avalon Bus / AXI4 Lite Connectivity |    |
| 4.13. Troubleshooting.                                               |    |
| 5. Change Log                                                        |    |

This project is published under the **GNU Lesser General Public License** (LGPL) 4 June 22, 2018

## 1. Introduction



Figure 1: NEO430 processor block diagram, optional modules are marked using dashed lines

#### Welcome to the **NEO430 Processor** project!

You need a small but still powerful, customizable and microcontroller-like processor system for your next FPGA design? Then the NEO430 is the perfect choice for you!

This processor is based on the Texas Instruments MSP430 ISA and provides 100% compatibility with the original instruction set. The NEO430 is not an MSP430 clone – it is more a complete new implementation from the bottom up. The processor features a very small outline, already implementing standard features like a timer, a watchdog, UART and SPI serial interfaces, general purpose IO ports, an internal bootloader and of course internal memory for program code and data. All of the peripheral modules are optional – so if you do not need them, you can exclude them from implementation to reduce the size of the system. Any additional modules, which make a more customized system, can be connected via a Wishbone-compatible bus interface or you add them as custom functions unit to the processor core. By this, you can build a system, that perfectly fits your needs.

The high-level software development is based on the free TI <u>msp430-gcc</u> compiler tool chain. You can either use Windows or Linux/Cygwin as build environment for your applications – the project comes with build scripts for both worlds. The example folder of this project features several demo programs, from which you can start creating your own NEO430 applications.

This project is intended to work "out of the box". Just synthesize the test setup from this project, upload it to your FPGA board of choice and start exploring the capabilities of the NEO430 processor. Application program generation (and even installation) works by executing a single "make" command. Jump to the "Let's Get It Started", which provides a lot of guides and tutorials to make your first NEO430 setup run.

#### 1.1. Processor Features

- ✓ 16-bit open source soft-core microcontroller-like processor system
- ✓ Code-efficient RISC-like instruction amount with powerful CISC-like addressing capabilities
- ✓ Full support of the original MSP430 instruction set architecture
- ✓ Tool chain based on free TI msp430-gcc compiler
- ✔ Application compilation scripts for Windows and Linux/Cygwin
- ✓ Completely described in behavioral, platform-independent VHDL
- ✓ Fully synchronous design, no latches, no gated clocks
- ✓ Very small outline and high operating frequency (~200 MHz¹) compared to other implementations;)
- ✓ Internal DMEN (RAM, for data) and IMEM (RAM or ROM, for code), configurable sizes
- ✓ One external interrupt line with acknowledge signal
- Customizable processor hardware configuration
- Optional multiplier and divider unit (MULDIV)
- ✓ Optional high-precision timer (TIMER)
- ✓ Optional USART interface; UART and SPI (USART)
- ✓ Optional general purpose parallel IO port (GPIO), 16 inputs, 16 outputs, with pin-change interrupt
- ✓ Optional 32-bit Wishbone bus interface adapter (WB32) including bridges to Avalon and AXI-Lite
- ✓ Optional watchdog timer (WDT)
- ✓ Optional custom functions unit (CFU)
- ✓ Optional cyclic redundancy checksum computation unit (CRC16/32)
- Optional 3 channel PWM controller with 8 bit resolution (PWM)
- ✓ Optional true random number generator (TRNG)
- ✓ Optional internal bootloader (2kB ROM):
  - Upload new application image via UART, program external SPI EEPROM, boot from external SPI EEPROM
  - Memory hex viewer
  - · Automatic boot sequence, no user input required
- 1 Xilinx Virtex-6, speed-optimized implementation, minimal processor configuration

This project is published under the GNU Lesser General Public License (LGPL)

## 1.2. Main Differences to TI's Original MSP430 Architecture

Since the NEO430 is not intended as a MSP430 processor clone, there are several differences to TI's original product lines. The *main* differences are:

- x Completely different processor modules with different functionality
- X No "compiler support" of the hardware multiplier/divider (the multiplier/divider unit can be used by using specific C functions from the NEO430 library)
- x Maximum of 32kB instruction memory and 28kB of data memory
- x Specific memory map NEO430 tool chain (makefiles, boot-code and linker script) required
- x Custom binary executable format
- x Just 4 CPU interrupt channels (instead of 16+)
- x Single clock domain for complete processor
- **x** Different numbers of instruction execution cycles
- x Only one power-down (sleep) mode
- x Internal bootloader with user interface (via UART serial port)

## 1.3. Project Folder Structure





Do not change the project's folder structure unless you really know what you are doing. Changing the structure might corrupt the file dependencies.

## 1.4. Processor VHDL File Hierarchy

All necessary VHDL hardware description files are located in the project's rtl/core folder. The top entity of the entire processor including all the required configuration generics is neo430\_top.vhd. Make sure to add all files to your project and assign them to a library called "neo430".

| neo430_top.vhd              | Processor core top entity                                                                                                                                                                        |
|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -neo430_boot_rom.vhd        | Bootloader ROM                                                                                                                                                                                   |
| neo430_bootloader_image.vhd | Boot ROM initialization image for the bootloader. During synthesis, the image is copied to the voot ROM. This file is automatically generated and copied when compiling the bootloader's sources |
| neo430_cfu.vhd              | Custom functions units for user-defined extension                                                                                                                                                |
| -neo430_crc.vhd             | Checksum computation unit (CRC16/32)                                                                                                                                                             |
| -neo430_dmem.vhd            | DMEM: Internal RAM for storing data                                                                                                                                                              |
| -neo430_gpio.vhd            | General purpose parallel IO port                                                                                                                                                                 |
| -neo430_imem.vhd            | IMEM: Internal RAM/ROM for the application code                                                                                                                                                  |
| neo430application_image.vhd | IMEM initialization image. During synthesis, the initialization image is copied to the IMEM. This file is automatically generated and copied when compiling an application                       |
| -neo430_muldiv.vhd          | Multiplier and divider unit                                                                                                                                                                      |
| neo430_package.vhd          | Processor VHDL package file                                                                                                                                                                      |
| -neo430_pwm.vhd             | PWM controller                                                                                                                                                                                   |
| neo430_sysconfig.vhd        | IRQ vector configuration and system information                                                                                                                                                  |
| -neo430_timer.vhd           | High-precision timer                                                                                                                                                                             |
| neo430_trng.vhd             | True random number generator                                                                                                                                                                     |
| neo430_usart.vhd            | USART serial transceiver (SPI and UART)                                                                                                                                                          |
| neo430_wb_interface.vhd     | Wishbone bus adapter                                                                                                                                                                             |
| -neo430_wdt.vhd             | Watchdog timer                                                                                                                                                                                   |
| neo430_cpu.vhd              | CPU's top entity                                                                                                                                                                                 |
| neo430_addr_gen.vhd         | Address generator unit                                                                                                                                                                           |
| -neo430_alu.vhd             | Arithmetic/logic unit                                                                                                                                                                            |
| neo430_control.vhd          | CPU control finite state machine                                                                                                                                                                 |
| neo430_reg_file.vhd         | Register file                                                                                                                                                                                    |

## 1.5. Processor Top Entity – Signals

The following table shows all interface ports of the processor top entity (neo430\_top.vhd). The type of all signals is std\_ulogic or std\_ulogic\_vector, respectively.

| Signal name    | Width        | Direction | Function                                                   | Module     |
|----------------|--------------|-----------|------------------------------------------------------------|------------|
| Global Control |              |           |                                                            |            |
| clk_i          | 1            | Input     | Global clock line, all registers triggering on rising edge | all        |
| rst_i          | 1            | Input     | Global reset, asynchronous, low-active                     | all        |
|                |              | General   | Purpose Inputs & Outputs                                   |            |
| gpio_o         | 16           | Output    | General purpose parallel output <sup>2</sup>               | GPIO       |
| gpio_i         | 16           | Input     | General purpose parallel input                             | GPIO       |
|                |              |           | PWM Channels                                               |            |
| pwm_o          | 3            | Output    | Pulse Width Modulation channels                            | PWM        |
|                |              | S         | Serial Communication                                       |            |
| uart_txd_o     | 1            | Output    | UART serial transmitter                                    | USART.UART |
| uart_rxd_i     | 1            | Input     | UART serial receiver                                       | USART.UART |
| spi_sclk_o     | 1            | Output    | SPI master clock line                                      | USART.SPI  |
| spi_mosi_o     | 1            | Output    | SPI serial data output                                     | USART.SPI  |
| spi_miso_i     | 1            | Input     | SPI serial data input                                      | USART.SPI  |
| spi_cs_o       | 6            | Output    | SPI intrinsic chip select lines 05 <sup>3</sup>            | USART.SPI  |
|                | Wishbone Bus |           |                                                            |            |
| wb_adr_o       | 32           | Output    | Slave address                                              | WISHBONE   |
| wb_dat_i       | 32           | Input     | Write data                                                 | WISHBONE   |
| wb_dat_o       | 32           | Output    | Read data                                                  | WISHBONE   |
| wb_we_o        | 1            | Output    | Write enable ('0' = read transfer)                         | WISHBONE   |
| wb_sel_o       | 4            | Output    | Byte enable                                                | WISHBONE   |
| wb_stb_o       | 1            | Output    | Strobe                                                     | WISHBONE   |
| wb_cyc_o       | 1            | Output    | Valid cycle                                                | WISHBONE   |
| wb_ack_i       | 1            | Input     | Transfer acknowledge                                       | WISHBONE   |
|                |              | Exte      | ernal Interrupt Request                                    |            |
| irq_i          | 1            | Input     | Interrupt request signal, high-active, single-shot         | CPU        |
| irq_ack_o      | 1            | Output    | Interrupt request acknowledge, single-shot                 | CPU        |

Table 1: neo430 top.vhd – processor's top entity interface ports



Of course, you can instantiate the processor top entity (neo430\_top.vhd) witin another entity – your actual system's top entity – and only connect the specific signals that you really need to the outer world. Set all unused input signals to logical zero and leave all unused outputs 'open'.

- 2 Bit #0 is used by the bootloader to drive a high-active status LED.
- 3 Chip select #0 is used by the bootloader to access the boot SPI EEPROM.

## 1.6. Processor Top Entity – Configuration Generics

The following table shows the configuration generics of the processor top entity (neo430 top.vhd).

| Generic name          | Туре              | Default    | Function                                                  |  |
|-----------------------|-------------------|------------|-----------------------------------------------------------|--|
| General Configuration |                   |            |                                                           |  |
| CLOCK_SPEED           | natural           | 100000000  | Clock speed of signal clk_i in Hz (Hertz)                 |  |
| IMEM_SIZE             | natural           | 4*1024     | Size of internal instruction memory (max 32kB)            |  |
| DMEM_SIZE             | natural           | 2*1024     | Size of internal data memory (max 28kB)                   |  |
|                       |                   | Additional | Configuration                                             |  |
| USER_CODE             | std_ulogic_vector | x"0000"    | Custom user code, can be checked by application software  |  |
|                       |                   | Module Co  | onfiguration                                              |  |
| DADD_USE              | boolean           | true       | Implement decimal addition (DADD) CPU instruction         |  |
| MULDIV_USE            | boolean           | true       | Implement multiplier/divider unit                         |  |
| WB32_USE              | boolean           | true       | Implement Wishbone interface adapter                      |  |
| WDT_USE               | boolean           | true       | Implement watchdog timer                                  |  |
| GPIO_USE              | boolean           | true       | Implement parallel GPIO port                              |  |
| TIMER_USE             | boolean           | true       | Implement high-precision timer                            |  |
| USART_USE             | boolean           | true       | Implement UART/SPI serial communication unit              |  |
| CRC_USE               | boolean           | true       | Implement checksum computation unit                       |  |
| CFU_USE               | boolean           | false      | Implement custom functions unit                           |  |
| PWM_USE               | boolean           | true       | Implement pulse width controller                          |  |
| TRNG_USE              | boolean           | false      | Implement true random number generator                    |  |
| Boot Configuration    |                   |            |                                                           |  |
| BOOTLD_USE            | boolean           | true       | Implement and use internal bootloader                     |  |
| IMEM_AS_ROM           | boolean           | false      | Implement internal instruction memory as read-only memory |  |

Table 2: neo430\_top.vhd – processor's top entity configuration generics

## 1.7. Alternative Top Entities

Besides the actual top entity of the processor (rtl/core/neo430\_top.vhd), there are several other entities that can be used instead. These alternative top entities are located in: rtl/top templates

|                          | · · · · · · · · · · · · · · · · · · ·                                                                                                                               |
|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Top entity file          | Description                                                                                                                                                         |
| neo430_test.vhd          | This setup is meant as "hello world" test project for your first contact with the neo430 processor. This setup is used in the Let's Get It Started tutorial.        |
| neo430_top_avm.vhd       | This top entity features an Avalon master interface, which is generated by converting the processor's Wishbone bus. All signals are <b>std_logic</b> .              |
| neo430_top_std_logic.vhd | Same as the original top entity, but using only <b>std_logic</b> signal types.                                                                                      |
| neo430_top_axi_lite.vhd  | This top entity features an AXI-Lite-compatible master itnerface, which is generated by converting the processor's Wishbone bus. All signals are <b>std_logic</b> . |

Table 3: Alternative top entitites

## 1.8. FPGA Implementation Results

This chapter shows some exemplary implementation results of the NEO430 processor for different FPGA platforms, EDA tool chains and configurations.

## 1.8.1. Full Implementation (Default)

#### General information:

• Hardware version: 0x0180

• Top entity file: neo430\_top.vhd

## Configuration generics:

| • IMEM SIZE  | 4*1024          |
|--------------|-----------------|
| • DMEM SIZE  | 2*1024          |
| • DADD USE   | true            |
| • MULDIV USE | true            |
| • WB32 USE   | true            |
| • WDT USE    | true            |
| • GPIO USE   | true            |
| • TIMER USE  | true            |
| • USART_USE  | true            |
| • CRC USE    | true            |
| • CFU_USE    | false           |
| • PWM USE    | true            |
| • TRNG USE   | false           |
| • BOOTLD_USE | true            |
| • IMEM AS RO | OM <b>false</b> |

#### FPGA tools and settings:

- Intel Quartus Prime Lite 17.0 ("balanced implementation")
- Xilinx ISE 14.7 ("speed-optimized"), synthesis results

| Utilization for the default full configuration |                                 |                                      |  |  |
|------------------------------------------------|---------------------------------|--------------------------------------|--|--|
| Resource Altera Cyclone IV EP4CE22F17C6N       |                                 | Xilinx Virtex-6<br>C6VLX240T-2FF1156 |  |  |
| LUTs/LEs:                                      | 1588 / 22320 = 7%               | 1074 / 150720 = >1%                  |  |  |
| FFs/Registers:                                 | 868 / 22320 = 3%                | 825 / 301440 = >1%                   |  |  |
| Total memory bits / Block RAMs:                | 65792 / 608256 = 11%            | 5 / 832 = >1%                        |  |  |
| DSP-Blocks:                                    | 0                               | 0                                    |  |  |
| Maximum Frequency:                             | 115 MHz (slow 1200mV 0°C model) | 156 MHz (synthesis result)           |  |  |

Table 4: Hardware utilization

## 1.8.2. Minimal Configuration

This is the minimal configuration of the NEO430 processor, that is still able to do "useful" stuff.

## General information:

• Hardware version: 0x0180

• Top entity file: neo430\_top.vhd

## Configuration generics:

|   | TOTAL STREET |        |
|---|--------------|--------|
| • | IMEM_SIZE    | 4*1024 |
| • | DMEM_SIZE    | 2*1024 |
| • | DADD_USE     | false  |
| • | MULDIV_USE   | false  |
| • | WB32_USE     | false  |
| • | WDT_USE      | false  |
| • | GPIO_USE     | true   |
| • | TIMER_USE    | false  |
| • | USART_USE    | false  |
| • | CRC_USE      | false  |
| • | CFU_USE      | false  |
| • | PWM_USE      | false  |
| • | TRNG_USE     | false  |
| • | BOOTLD_USE   | false  |
| • | IMEM AS ROM  | true   |

## FPGA tools and settings:

- Intel Quartus Prime Lite 17.0 ("balanced implementation")
- Xilinx ISE 14.7 ("speed-optimized"), synthesis results

| Utilization for the minimal configuration |                                 |                                      |  |  |
|-------------------------------------------|---------------------------------|--------------------------------------|--|--|
| Resource Altera Cyclone IV EP4CE22F17C6N  |                                 | Xilinx Virtex-6<br>C6VLX240T-2FF1156 |  |  |
| LUTs/LEs:                                 | 614 / 22320 = 3%                | 402 / 150720 = >1%                   |  |  |
| FFs/Registers:                            | 232 / 22320 = 1%                | 241 / 301440 = >1%                   |  |  |
| Total memory bits / Block RAMs:           | 49408 / 608256 = 8%             | 4 / 832 = >1%                        |  |  |
| DSP-Blocks:                               | 0                               | 0                                    |  |  |
| Maximum Frequency:                        | 120 MHz (slow 1200mV 0°C model) | 204 MHz (synthesis result)           |  |  |

Table 5: Hardware utilization

## 1.8.3. Resource Utilization by Entity

This table shows the required resources for each entity of the processor system. Logic functions of different modules might be merged between entity boundaries, so the total number might vary a bit.

#### General information:

• Hardware version: **0x0180** 

• Top entity file: neo430\_top.vhd

## Configuration generics:

| • | IMEM SIZE   | 4*1024 |
|---|-------------|--------|
| • | DMEM SIZE   | 2*1024 |
| • | DADD_USE    | true   |
| • | MULDIV_USE  | true   |
| • | WB32_USE    | true   |
| • | WDT_USE     | true   |
| • | GPIO_USE    | true   |
| • | TIMER_USE   | true   |
| • | USART_USE   | true   |
| • | CRC_USE     | true   |
| • | CFU_USE     | false  |
| • | PWM_USE     | true   |
| • | TRNG_USE    | true   |
| • | BOOTLD_USE  | true   |
| • | IMEM AS ROM | false  |

## FPGA tools and settings:

• Intel/Altera Quartus Prime Lite 17.0 ("balanced implementation")

| Altera Cyclone IV EP4CE22F17C6N |                                   |     |     |          |      |
|---------------------------------|-----------------------------------|-----|-----|----------|------|
| Entity/Module                   | Function                          | LEs | FFs | MEM bits | DSPs |
| CPU                             | Central processing unit           | 614 | 188 | 256      | 0    |
| IMEM (4kB)                      | Instruction memory (RAM)          | 6   | 1   | 32768    | 0    |
| DMEM (2kB)                      | Data memory (RAM)                 | 6   | 1   | 16384    | 0    |
| Boot ROM (2kB)                  | Bootloader ROM                    | 3   | 1   | 16384    | 0    |
| SYSCONFIG                       | System information                | 14  | 12  | 0        | 0    |
| GPIO                            | GPIO parallel in/out ports        | 51  | 45  | 0        | 0    |
| MULDIV                          | Multiplier/divider unit           | 186 | 131 | 0        | 0    |
| WDT                             | Watchdog timer                    | 46  | 34  | 0        | 0    |
| TIMER                           | High-precision timer              | 86  | 55  | 0        | 0    |
| USART                           | Serial interfaces (UART + SPI)    | 185 | 121 | 0        | 0    |
| WB32                            | Wishbone bus interface            | 147 | 117 | 0        | 0    |
| CRC                             | Cyclic redundancy checksum unit   | 113 | 94  | 0        | 0    |
| CFU                             | Custom Functions Unit             | -   | _   | -        | -    |
| PWM                             | Pulse Width Modulation Controller | 43  | 38  | 0        | 0    |
| TRNG                            | True Random Number Generator      | 44  | 31  | 0        | 0    |

Table 6: Hardware utilization by entity

#### 2. Hardware Architecture

The NEO430 processor system is constructed from several different modules. This chapter takes a closer look these modules and their specific functionality.

#### **Address Space**

Although the NEO430 is fully compatible to the original TI MSP430 instruction set architecture, the implemented modules are completely different from the original design. Hence, the provided modules and the resulting address space layout are completely new. The figure below shows the general layout of the 16-bit address space.



Figure 2: General NEO430 address space layout

In general, the address space in separated into four groups: At the beginning of the address space, instruction memory (IMEM) is located. This memory (can be implemented as RAM or ROM) stores the instruction of the actual application. The data memory (DMEM) starts in the middle of the address space. This memory stores global variables, the stack and the heap. Additionally, the interrupt vectors are located at the beginning of the DMEM. The custom NEO430 linker script ensures that these locations are not used by the application program. At the end of the address space the bootloader ROM can be found. This optional memory contains the image of the interactive bootloader. Finally, all the IO devices (timer, USART, GPIO, ...) are located at the very end of the address space.

## Peripheral/IO Devices

In contrast to the original MSP430, the NEO430 does not have any special function registers at the beginning of the memory space. Instead, all 'special functions' – like peripheral/IO devices and interrupt enable configurations – are located inside the according hardware units. These units (devices) are located at the end of the memory space in the so-called *IO region*. This region is 128 bytes large. A special linker script as well as a dedicated NEO430 include and definition files abstract the specific memory layout for the user.

## Separated Instruction (IMEM) and Data (DMEM) Memories

Just like the original MSP430, the NEO430 uses separated memories for storing data and instructions. The DMEM is implemented as normal RAM, the IMEM is also implemented as RAM, but one can only write to it when a special bit in the CPU's status register is set. Normally, this bit is only set by the bootloader for transferring an image from an external flash/EEPROM or via the UART into the IMEM. Alternatively, the IMEM can be implemented as true ROM (via the IMEM\_AS\_ROM generic). In this case, the actual executable application is included during the synthesis process and persists as non-volatile image in the IMEM. Thus, a bootloader ROM is no longer necessary (only for development purpose, maybe).

## **Word and Byte Accesses**

All internal memories (IMEM, DMEM, bootloader ROM) can be accesses in byte and word mode. Most of the peripheral devices in the IO region can only be accessed using full-word mode.

#### **Internal Reset Generator**

All processor-internal modules – except for the CPU – do not require a dedicated reset signal. However, to ensure correct operation, all devices are reset by software by clearing the corresponding control register before using the unit. The automatically included application start-up code will perform this minimal-required system initialization. The hardware reset signal of the processor can either be triggered via the external reset pin (asynchronous & LOW-active) or by the internal watchdog timer (if implemented).

#### **Internal Clock Generator**

An internal clock divider generates 8 clock signals derived from the main clock input. These derived clock signals are not actual *clock signals*. Instead, they are derived from a simple counter and are used as "clock enable" by the different modules. Thus, the whole design operates using only the main clock. Some of the processor modules (like the timer or the USART) can select one of the derived clock enabled signals for their internal operation. If none of the connected modules require an active clock, the clock divider is automatically deactivated to save dynamic power.

#### 2.1. NEO430 CPU

The CPU is the heart of the NEO430 processor. It implements all the instructions, emulated instructions and addressing modes of the original TI MSP430 instruction set architecture (ISA). There are small differences to the original architecture when it comes to instruction execution cycles, status register bits, power down modes and interrupts.

#### **Data and Control Path**

Instruction execution is conducted by performing several tiny steps – so-called *micro operations*. Thus, the NEO430 is a multi-cycle architecture: The CPU needs several consecutive cycles to complete a single instruction. An accurate listing of the required processing cycles for each instruction is given in the following chapter. The execution of the micro operations is controlled by the central control arbiter, which implements a complex finite state machine (FSM). This FSM generates the control signals for the data path, that processes the data. This data path is constructed from the register file, the primary data ALU and the address generator unit. The image below shows the simplified architecture of this data path.



Figure 3: Data path of the NEO430 CPU

## 2.1.1. Status Register

The status register (SR = R2) represents the ALU execution status flags and CPU control flags. The carry C, zero Z, negative N and the overflow V flags correspond to the result of the last ALU operation.

Via the I flag, interrupts can be globally activated or deactivated. If this flag is cleared, all further interrupt requests to the CPU are queued and finally executed when the I flag is set again. When setting the Q flag, all pending interrupts in the CPU's interrupt request queue are deleted. This flag is write-only (read as zero) and automatically clears after being set.

The S flag is used to bring the CPU into power-down (sleep) mode. When this flag is set, the CPU is completely deactivated while all processor modules – like the timer – keep operating. An interrupt request from any IRQ channel will reactivate the CPU and clear the S flag again.

The **R** flag is used to control write access to the internal instruction memory (IMEM). When set, the IMEM behaves as a RAM, otherwise the IMEM behaves like a true read-only memory. If the IMEM is implemented as true ROM (IMEM\_AS\_ROM generic), the state of this flag is irrelevant and no write accesses to the IMEM are possible at all.

All other bits of the status register do not have a specific function yet. Hence, they are reserved for future use and should not be used. However, they are always read as zero.



Figure 4: Processor status register

| Bit# | Name   | R/W                                          | Function                         |
|------|--------|----------------------------------------------|----------------------------------|
| 0    | C_FLAG | R/W                                          | Carry flag                       |
| 1    | Z_FLAG | R/W                                          | Zero flag                        |
| 2    | N_FLAG | R/W                                          | Negative flag                    |
| 3    | I_FLAG | R/W Global Interrupt enable                  |                                  |
| 4    | S_FLAG | R/W Sleep mode (CPU off)                     |                                  |
| 57   | -      | R/- Reserved, read as 0                      |                                  |
| 8    | V_FLAG | R/W                                          | OVerflow flag                    |
| 913  | -      | R/- Reserved, read as 0                      |                                  |
| 14   | Q_FLAG | -/₩ Clears pending interrupt buffer when set |                                  |
| 15   | R_FLAG | R/W                                          | Allow write access to IMEM "ROM" |

Table 7: Bits of the status register

## 2.1.2. Interrupts

The NEO430 features 4 independent interrupts via 4 interrupt request signals. When triggered, each of these requests start a unique interrupt handler. The interrupt-causing sources are the TIMER overflow interrupt, the USART SPI transmission done and/or the UART RX/TX complete interrupt, the GPIO pin-change interrupt and the external interrupt request signal. The base addresses of these handlers have to be stored in advance to the interrupt vector configuration table, which is located at the beginning of the DMEM. The application linker script ensures, that these locations are not used by the actual program.

## **Operation**

All interrupts can be globally disabled by clearing the I flag in the processor's status register. An interrupt can only trigger, when the I flag is set and the corresponding enable flag of the interrupt source is activated inside the according source's control register.

If an interrupt was triggered, the according handler is executed and the interrupt request is deleted from the queue as soon as the handler starts executing. If the same interrupt request triggers again during the execution of the handler, the request is stored and is executed after the handler has finished. Any other pending interrupt requests with lower priority will be further queued. Whenever an interrupt is triggered and the corresponding handler is entered, the I flag of the status register is cleared to avoid an interruption of the executed handler. If more than one interrupt channel is triggered at the same time, the one with the highest priority is executed while the other requests are queued. When the handler of the interrupt with the highest priority exits, the handler of the interrupt with the next lower priority is started afterwards. Of course, you can reactivate the global interrupt enable flag inside an interrupt handler to implement a nested interrupt behavior. When setting the Q flag, all pending interrupt request in the buffer are deleted.

When an interrupt handler finishes execution, at least <u>one instruction</u> from the interrupted program is executed before another interrupt handler can start execution.

| Address | IRQ Vector Name | Priority    | Source                                                                    |
|---------|-----------------|-------------|---------------------------------------------------------------------------|
| 0x8000  | IRQVEC_TIMER    | 1 (highest) | The timer generates a threshold match.                                    |
| 0x8002  | IRQVEC_USART    | 2           | UART Rx available <b>OR</b> UART TX done <b>OR</b> SPI transmission done. |
| 0x8004  | IRQVEC_GPIO     | 3           | GPIO input pin change.                                                    |
| 0x8006  | IRQVEC_EXT      | 4 (lowest)  | External interrupt request via the irq_i top entity signal.               |

Table 8: Interrupt sources, priorities and handler base addresses with configuration register names

## **Interrupt Vector Configuration**

The interrupt vectors can be initialized from a program by using the provided register aliases:

```
// interrupt vector table setup
IRQVEC_TIMER = (uint16_t)(&timer_irq_handler); // Timer handler address
IRQVEC_USART = (uint16_t)(&usart_irq_handler); // USART handler address
IRQVEC_GPIO = (uint16_t)(&dummy_handler); // not used, use dummy handler
IRQVEC_EXT = (uint16_t)(&dummy_handler); // not used, use dummy handler
```

#### **CPU Behavior**

When a valid interrupt is received by the CPU, the hardware stores first the return address and afterwards the current state of the status register (including the set interrupt enable flag) to the stack. After that, the global interrupt flag and the sleep flag (if set) is cleared in the status register. Thus, the status register keeps its value, which was set in the interrupted program, except for the cleared sleep and interrupt enable flags. Finally, the according interrupt handler address is moved to the program counter.

When the handler finishes via the RETI instruction, the CPU reloads the old state of the status register (with the set interrupt enable flag) and the return address from the stack to continue normal program execution.

## **External Interrupt Request Signal**

The NEO430 processor features a single non-maskable external interrupt request signal (irq\_i). This signal can be used to attach an external (e.g., Wishbone-coupled) interrupt controller to add additional interrupt sources to the processors.

The external interrupt request signal of the NEO430 is level sensitive and synchronous to the processor's clock. Thus, a high-level triggers the external interrupt. The attached interrupt source (e.g., the interrupt controller) has to make sure that the IRQ signal to the processor is only active for one clock cycle. Otherwise the interrupt might be triggered several times. As soon as the CPU has received the interrupt request and the according interrupt handler is started, an acknowledge is issued via the external interrupt acknowledge signal (irq\_ack\_o). This acknowledge is also active for only one clock cycle. After that acknowledge the external interrupt request can be triggered again.

## **Maximum Interrupt Latency**

The maximum interrupt latency – starting from a valid interrupt request to the CPU until the actual interrupt service routine is started – is defined by the latency of the interrupt management system of the CPU, the latency of the currently executed instruction and the latency required for fetching and starting the according interrupt vector.

The interrupt management system requires 2 cycles to process a valid interrupt request. The worst-case latency for an instruction (see next sub chapter) is presented by a certain CALL instruction, which requires 11 cycles for completion. If an interrupt is triggered, 8 cycles are required to fetch the corresponding interrupt vector and to perform a jump to that address. Thus, the **maximum interrupt latency is 21 cycles**. In contrast, the **minimum interrupt latency is 10 cycles**.

#### 2.1.3. Instruction Set

The instruction set of the NEO430 CPU is fully compatible to the original TI MSP430 instruction set architecture. The full data sheet and the according user guide including also the pseudo-instruction can be found at <a href="https://www.ti.com/lit/ug/slau049f/slau049f.pdf">www.ti.com/lit/ug/slau049f/slau049f.pdf</a>.

## 2.1.4. Instruction Timing

A fully registered data path, which is subdivided into several micro operation cycles, is implemented by the NEO430 processor. This allows the system to operate at very high clock rates, but of course this also requires a splitting of the instruction execution into several sub cycles. The tables below show the required execution cycles for the different operand classes and addressing modes.

|     |                   | SRC               |               |              |                         |  |  |
|-----|-------------------|-------------------|---------------|--------------|-------------------------|--|--|
|     |                   | Register direct R | Indexed [R+n] | Indirect [R] | Indirect auto inc [R++] |  |  |
| DST | Register direct R | 6                 | 9             | 7            | 7                       |  |  |
|     | Indexed [R+n]     | 9                 | 10            | 10           | 10                      |  |  |

Table 9: Double-operand (format I) instruction execution cycles

|           |        | SRC = DST         |               |              |                         |  |  |  |
|-----------|--------|-------------------|---------------|--------------|-------------------------|--|--|--|
|           |        | Register direct R | Indexed [R+n] | Indirect [R] | Indirect auto inc [R++] |  |  |  |
|           | CALL   | 8                 | 11            | 9            | 9                       |  |  |  |
| Operation | PUSH   | 7                 | 10            | 8            | 8                       |  |  |  |
|           | Others | 6                 | 9             | 7            | 7                       |  |  |  |

Table 10: Single-operand (format II) instruction execution cycles (except RETI)

|                         | Branches  | 3 |
|-------------------------|-----------|---|
| Instruction / Operation | RETI      | 8 |
|                         | Interrupt | 6 |

Table 11: Special instructions / operations execution cycles

#### **Average Instruction Execution Time**

If all instruction types and formats are executed in an equally distributed manner, the average CPI (cycles per instruction) evaluates to **8.14 cycles per instruction**.



The double-operand "decimal addition" instruction (DADD) requires an additional execution cycle (for all addressing modes) to complete. Also, the implementation of this instruction requires a lot of hardware! If you do not use explicit elementary decimal additions in your application, you can remove this instruction from implementation by assigning "false" to the DADD\_USE generic of the processor's top entity.

## 2.1.5. System Bus

All components of the NEO430 processor are connected to the CPU via the main system bus. Since the connected devices are accessed using a memory-mapped scheme, simple load and store operations are used to transfer data to or from the devices.

| Name | Width | Dir | Function                                           |  |
|------|-------|-----|----------------------------------------------------|--|
| WREN | 2     | out | Write enable for each of the two transferred bytes |  |
| RDEN | 1     | out | Read enable (always full-word)                     |  |
| ADDR | 16    | out | Address signal                                     |  |
| DO   | 16    | out | Write data                                         |  |
| DI   | 16    | in  | Read data (one cycle latency)                      |  |

Table 12: System bus signals (direction seen from CPU)

In the figure below you can see the signal timings when performing a write or read transaction. When conducting a write operation to a specific module, the actual 16-bit address and the data, that shall be written, are applied together with the write enable signals. For single byte transmission, only the corresponding bit of the WREN signal is set. A complete write transaction only requires a single cycle to complete. Read operations require two clock cycles to complete. Here, the read enable signals is applied together with the source address. In the next cycle, the accessed data word is read. Even when performing an explicit read operation of a single byte, the full 16-bit word is transferred.

The data output signals of all devices are OR-ed together before the resulting signal is fed to the CPU. Hence, only the actually accessed device must generate an output different than 0x0000. Therefore, read transactions are subdivided into two consecutive cycles: In the first cycle, the address and the read enabled signal are applied. Now, each device can check whether it is accessed or not. If there is an address match, the according device fetches data from the accessed location and applies it to its data output port in the *next cycle*. In any other situation, the data output of that module must be set to 0x0000.



Figure 5: Write and read bus cycles (full word transfers); Write:  $M \rightarrow [A]$ ; Read: [A] = N



You can add custom modules to the processor-internal bus, but that requires are very good understanding of the address space layout and the general NEO430 architecture. Instead, I encourage you to use the custom functions unit (CFU) or the Wishbone bus interface to implement or attach custom logic.

## 2.2. Internal Instruction Memory (IMEM)

The internal instruction memory (VHDL component neo430\_imem.vhd) stores the code of the currently executed program. It is located at base address 0x0000 of the address space. The actual IMEM size can be configured via the IMEM\_SIZE generic (see bewow) of the processor top entity (see cut-out below). Make sure the IMEM size does not exceed 32kB. During run time, the size can be obtained by a program by reading a specific CPUID register from the SYSCONFIG module.

```
IMEM_SIZE => 4*1024, -- internal IMEM size in bytes, max 32kB (default=4kB)
```

By default, the IMEM is implemented as RAM, so the content can be modified during run time. This is required when using a bootloader that can update the content of the IMEM at any time. With the default implementation as RAM, the **r\_flag** in the CPU's status register has to be set in order to allow write accesses to the instruction memory. If you do not need the bootloader, because your application development is done and you want the program to permanently reside in the IMEM, the IMEM can also be implemented as true read-only ROM. In this case, set the IMEM AS ROM generic of the processor's top entity to "true".

```
IMEM_AS_ROM : boolean := false -- implement IMEM as read-only memory? (default=false)
```

When the IMEM is implemented as ROM, it will be initialized during synthesis with the actual application program. The toolchain will generate a VHDL initialization file (neo430\_application\_image.vhd) from your application, which is automatically inserted into the IMEM. If the IMEM is implemented as RAM, the memory will not be initialized at all.

## 2.3. Internal Data Memory (DMEM)

The internal data memory (VHDL component neo430\_dmem.vhd) serves as general data memory / RAM for the currently executed program. It is located at base address 0x8000 of the address space. This address is fixed and must not be altered. The actual RAM size can be configured via the DMEM\_SIZE generic of the processor top entity (see cut-out below). Make sure the RAM size does not exceed 28kB. During run time, the size can be obtained by a program by reading a specific CPUID register from the SYSCONFIG module. The first 8 byte of the DMEM are used by the hardware for storing the interrupt vectors.

```
DMEM_SIZE => 2*1024, -- internal DMEM size in bytes, max 28kB (default=2kB)
```

#### 2.4. Boot ROM

As the name already suggests, the boot ROM (VHDL component neo430\_boot\_rom.vhd) contains the read-only bootloader image, which is executed right after system reset. It is located at address 0xF000 of the address space. This address is fixed and must not be altered, since it represents the hardware-defined boot address. The ROM size can be configured in the neo430\_package.vhd file (see cut-out below) if you want to write your own custom bootloader, but the size must not exceed 2kB. During synthesis, the VHDL boot ROM is initialized using the neo430\_bootloader\_image.vhd file (generated by the image generator).

```
constant boot_size_c : natural := 2*1024; -- bytes, max 2kB
```

If you are using the IMEM as true ROM – initialized with your application code during synthesis – the bootloader is - in most cases - no longer necessary. In this case you can disable the implementation of the bootloader. Use the BOOTLD\_USE generic of the processor top entity (see cut-out below) to exclude it. If the bootloader implementation is deactivated, the CPU starts booting your application from address 0x0000 instead from the base address of the boot ROM at 0xF000.

```
BOOTLD_USE => true, -- implement and use bootloader? (default=true)
```

#### **Boot Configuration**

The default configuration of the NEO430 processor includes all optional modules (except for the custom functions unit) and also provides a build-in serial bootloader. This bootloader is very suitable for the evaluation process, since the application program can be re-uploaded at every time using a standard UART connection. Furthermore, the bootloader provides an automatic boot configuration, which automatically boots from an external SPI EEPROM after a specific console timeout. This feature allows to implement a non-volatile program storage, which can still can be altered after implementation.

For a mature design, the bootloader feature might not be required anymore. For this scenario, the bootloader can be excluded from the design via the according generic configuration switch (BOOTLD\_USE). If the bootloader is disabled, your application code will be directly executed after reset. Therefore, the application program image remains permanently in the internal instruction memory (IMEM). Note that modifications of the IMEM are still possible when the IMEM AS ROM switch is not disabled.

## 2.5. Wishbone Bus Interface Adapter (WB32)

The default NEO430 processor setup includes a Wishbone bus interface adapter (VHDL component neo430\_wb\_interface.vhd). Several IP blocks (e.g. from opencores.org) provide a Wishbone interface. Hence, a custom system-on-chip can be build using this bus standard. The Wishbone adapter features 32-bit wide address and data buses. If required, only a subsection of the address and/or data buses can be connected to create a Wishbone bus with smaller data and/or address buses. The neo430\_wishbone.h library file in the sw/lib/neo430 folder already implements the most common Wishbone transfer operations. These "driver functions" also take care of setting the according byte enable signals manipulating the final address when performing 16-bit or even byte-aligned accesses. Note that these functions are blocking! So a non-responding slave will permanently stall the system!

#### Wishbone Bus Protocol

A detailed description of the implemented Wishbone bus protocol and the according interface signals can be found in the opencores.org documentation data sheet "Wishbone B4 – WISHBONE System-on-Chip (SoC) Interconnection Architecture for Portable IP Cores". A copy of this document can be found in the doc folder of this project.

#### **Implementation Control**

Use the wb32\_use generic of the processor top entity (see cut-out below) to control implementation. When disabled, the dedicated Wishbone interface signals (see table) are not functional. In this case, set all input signals to low level and leave all output signals unconnected.

```
WB32_USE => true, -- implement WB32 unit? (default=true)
```

#### **Wishbone Transactions**

To perform a Wishbone transaction, several tiny steps are required. At first, the according byte enable signals (WB32\_CT\_WBSELx) and the transfer cycle type (see later) must be configured in the control register. The byte enable signals directly drive the wb\_sel\_o bus. Setting any of these four bits will also activate the Wishbone adapter. In contrast, the adapter can be deactivated (for example if an addressed slave is not responding) by clearing the control register.

In case of a write transfer, the data, which shall be written, must be loaded into the WB32\_LD (low 16-bit part) and WB32\_HD (high 16-bit part) registers. Together, these registers directly drive the wb\_dat\_o data output bus. To start the actual transfer, the address is written to the WB32\_LWA and WB32\_HWA register. The actual store access to the high address word register initiates the actual transfer. For a read transfer, the low part of the address is stored to the WB32\_LRA and WB32\_HRA register. Just like before, a write access to the high word register triggers the actual read transaction.

As soon as the transaction is started, the WB32\_CT\_PENDING bit in the unit's control register is set to indicate a pending transfer. The transfer was successfully completed when this bit returns to zero. A transfer is completed if the accesses slaves acknowledges the cycle by setting the ACK signal. If you wish to abort a pending transfer, you must disable the device by clearing the WB32\_CT control register (set all byte enable signals to zero).



The Wishbone bus interface of the NEO430 processor **cannot** be used for instruction fetch or direct data access via the standard memory-access instructions. Instead, the bus interface is a <u>communication device</u>, that can be used by a program, which is executed from the internal memory, to access processor-external peripheral devices like mass-storage memories, hardware accelerators or additional communication interfaces.

The Wishbone interface adapter supports the standard or "classic" Wishbone transfer mode when connecting slaves, which apply their acknowledge signal in an asynchronous way. When the accessed slave cannot apply the acknowledge immediately – for example by registering the output signals in order to shorten the critical path – the implemented Wishbone protocol differs from the standard. The NEO430 Wishbone adapter **applies its STB signal only for one single cycle** of an active transfer. The cycle is terminated when the selected slave sets the acknowledge signal. This allows to implement slaves with registered outputs (higher frequency!) while also allowing correct accesses to IP cores, which trigger their operation on the STB signal (for instance a FIFO).

Long story short: When the slave applies its acknowledge asynchronously signal in the **same cycle**, the Wishbone transfer fulfills the **classic/standard mode protocol specification**. When the slave applies its acknowledge signal one **ore more cycles later**, the Wishbone transfer fulfills the specification of the **pipelined mode** protocol.



#### Wishbone Bus Interface Signals

| Signal name | Width (#bits) | Direction | Function              |
|-------------|---------------|-----------|-----------------------|
| wb_adr_o    | 32            | Output    | Access address        |
| wb_dat_i    | 32            | Input     | Read data input       |
| wb_dat_o    | 32            | Output    | Write data output     |
| wb_we_o     | 1             | Output    | Read/write access     |
| wb_sel_o    | 4             | Output    | Byte enable           |
| wb_stb_o    | 1             | Output    | Strobe signal         |
| wb_cyc_o    | 1             | Output    | Valid cycle indicator |
| wb_ack_i    | 1             | Input     | Cycle acknowledge     |

Table 13: Wishbone bus interface adapter signals

i

The Wishbone bus uses the processor's main clock for data bus operations.

#### Register Map

| Address | Name     |                  | Bit(s) (Name)      | R/W | Function                                                   |
|---------|----------|------------------|--------------------|-----|------------------------------------------------------------|
| 0xFF90  | WB32_CT  | 0 WB32_CT_WBSEL0 |                    | -/W | Byte 0 transfer enable $\rightarrow$ wb_sel_0(0)           |
|         |          | 1                | WB32_CT_WBSEL1     | -/W | Byte 1 transfer enable → wb_sel_o(1)                       |
|         |          | 2                | WB32_CT_WBSEL2     | -/W | Byte 2 transfer enable → wb_sel_o(2)                       |
|         |          | 3                | WB32_CT_WBSEL3     | -/W | Byte 3 transfer enable → wb_sel_o(3)                       |
|         |          |                  | 414                | -/- | reserved                                                   |
|         |          | 15               | 15 WB32_CT_PENDING |     | Pending Wishbone transfer flag                             |
| 0xFF92  | WB32_LRA | 015              |                    | -/W | Low address word for read transfer                         |
| 0xFF94  | WB32_HRA | 015              |                    | -/W | High address word for read transfer (+trigger)             |
| 0xFF96  | WB32_LWA | 015              |                    | -/W | Low address word for write transfer                        |
| 0xFF98  | WB32_HWA | 015              |                    | -/W | High address word for write transfer (+trigger)            |
| 0xFF9A  | WB32_LD  | 015              |                    | R/W | Low word of read/write data, can be accessed in byte mode  |
| 0xFF9C  | WB32_HD  | 015              |                    | R/R | High word of read/write data, can be accessed in byte mode |
| 0xFF9E  | -        |                  | 015                | -/- | Reserved                                                   |

Table 14: Wishbone32 interface adapter address map



This unit needs to be reset by software before you can use it. The reset is performed by writing zero to the unit's control register.



Only the data and address registers can be accessed in byte mode. The control register can only be accessed in word mode.



Do not read from registers, which do not provide a read access feature (e.g. when R/W is -/W), since such accesses return undefined data!

## **Regarding Wishbone Bus Sizes**

Although the wishbone adapter implements 32-bit wide data and address buses, you do not need to actually use this sizes. If you have only some accessible addresses in you Wishbone network, then 16-bit address width might by sufficient. Also, if you only use slaves with e.g., 16-bit data width, you do not actually need the full32-bit provided by the adapter. By constraining the bus sizes to your actual needs the hardware can be simplified by the synthesis tool to save hardware resources. Furthermore, if you use the according Wishbone access functions for smaller data width (e.g., 16-bit transfers only), the Wishbone access can be conducted in less CPU cycles increasing the perfromance.

## 2.6. General Purpose IO (GPIO)

The general purpose parallel IO controller (VHDL component neo430\_gpio.vhd) provides a simple 16-bit parallel input port and a 16-bit parallel output port. These ports can be used chip-externally (e.g. to drive LEDs, connect buttons, etc.) or system-internally to provide control signals for other IP modules.

#### **Implementation Control**

I know, hardware resources are precious. The GPIO module does not require a lot of logic and you won't have this fancy blinking bootloader status signal, when you disable the module. However, if you do not need the included GPIO ports, you can exclude the module from synthesis. Use the GPIO\_USE generic of the processor top entity (see cut-out below) to control implementation. If the unit is excluded from synthesis, all parallel output signals are set to low level and the pin change interrupt is permanently disabled.

```
GPIO_USE => true, -- implement GPIO unit? (default=true)
```

#### **Pin-Change Interrupt**

The parallel input port features a single pin-change interrupt. To select which input pins can cause an interrupt, the GPIO\_IRQMAS register allows to select only the desired input pins. The pin change interrupt is deactivated if all bits of the mask register are cleared. When enabled, the interrupt will trigger if there is any transition (rising edge or falling edge on one of the masked input pins. Therefore, it is not possible to directly determine which input pin caused the interrupt. This mus be done by reading the input data and examining the new state. Use the neo430\_gpio.h library file in the sw/lib/neo430 folder to get access to some of the most common IO operations like bit set, clear or toggle.

| Address | IRQ Vector Name | Priority | Source                 |
|---------|-----------------|----------|------------------------|
| 0x8004  | IRQVEC_GPIO     | 3        | GPIO input pin change. |

Table 15: GPIO pin-change interrupt vector

#### **Register Map**

| Address | Name         | Bit(s) (Name) | R/W | Function                                           |
|---------|--------------|---------------|-----|----------------------------------------------------|
| 0xFFA8  | _            | 015           | -/- | Reserved                                           |
| 0xFFAA  | GPIO_IRQMASK | 015           | -/W | Enable according input pin(s) as interrupt trigger |
| 0xFFAC  | GPIO_IN      | 015           | R/- | Parallel input port                                |
| 0xFFAE  | GPIO_OUT     | 015           | R/W | Parallel output port                               |

Table 16: GPIO module address map



Do not read from registers, which do not provide the read access feature (e.g. when R/W = -/W), since these return undefined data!

## 2.7. USART / USI

The universal synchronous/asynchronous receiver and transmitter (VHDL component neo430\_usart.vhd) features standard serial interfaces for chip-external communications. Two independent sub modules are implemented: A UART sub module and an SPI sub module. Both modules can operate in parallel. In terms of this documentary, the USART is also labeled as USI (universal serial interface).

## Register Map

| Address | Name        |     | Bit(s) (Name)               | R/W | Function                           |
|---------|-------------|-----|-----------------------------|-----|------------------------------------|
| 0xFFA0  | USI_CT      | 0   | USI_CT_EN                   | R/W | USI enable (SPI & UART)            |
|         |             | 1   | USI_CT_UARTRXIRQ            | R/W | UART Rx available IRQ enable       |
|         |             | 2   | USI_CT_UARTTXIRQ            | R/W | UART Tx done IRQ enable            |
|         |             | 3   | USI_CT_UARTTXBSY            | R/- | UART Tx busy flag                  |
|         |             | 4   | USI_CT_SPICPHA              | R/W | SPI clock phase                    |
|         |             | 5   | USI_CT_SPIIRQ               | R/W | SPI transmission done IRQ enable   |
|         |             | 6   | USI_CT_SPIBSY               | R/- | SPI transceiver busy flag          |
|         |             | 7   | USI_CT_SPIPRSC0             | R/W | SPI clock select (prescaler value) |
|         |             | 8   | USI_CT_SPIPRSC1             | R/W |                                    |
|         |             | 9   | USI_CT_SPIPRSC2             | R/W |                                    |
|         |             | 10  | USI_CT_SPICS0               | R/W | Dedicated SPI CS0 (high-active)    |
|         |             | 11  | USI_CT_SPICS1               | R/W | Dedicated SPI CS1 (high-active)    |
|         |             | 12  | USI_CT_SPICS2               | R/W | Dedicated SPI CS2 (high-active)    |
|         |             | 13  | USI_CT_SPICS3               | R/W | Dedicated SPI CS3 (high-active)    |
|         |             | 14  | USI_CT_SPICS4               | R/W | Dedicated SPI CS4 (high-active)    |
|         |             | 15  | USI_CT_SPICS5               | R/W | Dedicated SPI CS5 (high-active)    |
| 0xFFA2  | USI_SPIRTX  |     | 07                          | R/W | SPI Rx/Tx data                     |
|         |             |     | 815                         | R/- | Reserved, read as 0                |
| 0xFFA4  | USI_UARTRTX | 07  |                             | R/W | UART Rx/Tx data                    |
|         |             | 814 |                             | R/- | Reserved, read as 0                |
|         |             | 15  | 15 USI_UARTRTX_UART RXAVAIL |     | Set if UART Rx data available      |
| 0xFFA6  | USI_BAUD    | 07  |                             | R/W | UART baud value                    |
|         |             | 810 |                             | R/W | UART baud prescaler select         |
|         |             |     | 1015                        | R/- | Reserved                           |

Table 17: USART address map



This unit needs to be reset by software before you can use it. The reset is performed by writing zero to the unit's control register.

#### **Implementation Control**

By default, the USART is always synthesized. You can use the USART\_USE generic of the processor top entity (see cut-out below) to control implementation.

```
USART_USE => true, -- implement USART? (default=true)
```

#### **USART/USI Interrupt**

The USART features a single interrupt output, which can be used to indicate a *UART RX data available* status and/or *UART TX done* and/or an *SPI transfer completed* status. When enabling all sources at the same time, you have to check the USART control register to determine the actual causing event. If the USART module is excluded from synthesis (disabled), its interrupt request signals are always 0.

| Address | IRQ Vector Name | Priority | Source                                                                    |
|---------|-----------------|----------|---------------------------------------------------------------------------|
| 0x8002  | IRQVEC_USART    |          | UART Rx available <b>OR</b> UART Tx done <b>OR</b> SPI transmission done. |

Table 18: USART interrupt vector

## 2.7.1 Universal Asynchronous Receiver/Transmitter (UART)

In most cases, the UART interface is used to establish a communication channel between the computer/user and an application. Of course, you can also use the UART for interfacing chip-external peripheral devices. A standard configuration is used for the UART protocol layout: 8 data bits, 1 stop bit and not parity bit. These values are fixed and cannot be altered. The actual Baudrate is configurable by software. This configuration is explained later.

After configuring the Baud rate, the USART must be activated by setting the global USI\_CT\_EN in the USART control register USI\_CT. Now you can transmit a character by writing it to the USI\_UARTRTX register. The transfer is in progress if the USI\_CT\_UARTTXBSY bit in the control register is set. A received char is available when bit #15 of the USI\_UARTRTX register is set. When reading this register, the available flag is automatically cleared and you have your received character — all done using a single access! That's cool, huh? A "char available" or "transmission completed" interrupt can be activated by setting the according bits in the control register. Note, that both interrupt sources trigger the same interrupt handler (IRQVEC\_USART)! To make the usage of the UART a little bit easier, the neo430\_usart.h library in the sw/lib/neo430 folder features elementary function for sending and receiving data. The only thing you have to do by hand is to enable the UART and call the Baud rate configuration. Well, actually you don't have to do this, since it is already done by the bootloader. The bootloader computes the according Baud value based on the clock speed from the *infomem* for a final Baud rate of 19200.

#### **UART Baudrate**

The actual transfer speed – the Baud rate – can be arbitrarily configured via the USI\_BAUD register. This register defines a prescaler value (**PRSC**, bits 10:8) and also a direct Baud rate divisor factor (**BAUD**, bits 7:0). According to the selected prescaler, the actual Baud rate of the UART interface is computed using the following formula:

$$Baudrate = \frac{f_{main}[Hz]}{PRSC * BAUD}$$

The BAUD parameter can be obtained by finding the largest number for a given clock frequency and a selected prescaler, that fits into 8 bits. The following table shows different **BAUD** values for 8 common Baudrates using one of the 8 prescaler configurations. This setup assumes a clock frequency of **50MHz**. The red highlighted values are invalid, since they cannot fit into the 8-bit wide BAUD register. In contrast, the green values are valid for the according prescaler selection, but you should always use the highest possible BAUD value to minimize the Baudrate error.

## **Baudrate / Prescaler / BAUD Value Look-up-Table**

| Prescaler bits configuration: |                                   | 000   | 001   | 010  | 011 | 100 | 101  | 110  | 111  |
|-------------------------------|-----------------------------------|-------|-------|------|-----|-----|------|------|------|
| Resulting                     | Resulting prescaler <i>PRSC</i> : |       | 4     | 8    | 64  | 128 | 1024 | 2048 | 4096 |
|                               | Baudrate = 1200                   | 20833 | 10417 | 5208 | 651 | 326 | 41   | 20   | 10   |
|                               | Baudrate = 2400                   | 10417 | 5208  | 2604 | 326 | 163 | 20   | 10   | 5    |
|                               | Baudrate = 4800                   | 5208  | 2604  | 1302 | 163 | 81  | 10   | 5    | 3    |
|                               | Baudrate = 9600                   | 2604  | 1302  | 651  | 81  | 41  | 5    | 3    | 1    |
|                               | Baudrate = <b>19200</b>           | 1302  | 651   | 326  | 41  | 20  | 3    | 1    | 1    |
|                               | Baudrate = <b>28800</b>           | 868   | 434   | 217  | 27  | 14  | 2    | 1    | 0    |
|                               | Baudrate = <b>57600</b>           | 434   | 217   | 109  | 14  | 7   | 1    | 0    | 0    |
|                               | Baudrate = 115200                 | 217   | 109   | 54   | 7   | 3   | 0    | 0    | 0    |

## **Example Baudrate Computation**

Clock frequency: 50MHz
Desired Baudrate: 19200
Prescaler:  $64 \rightarrow 0b011$ 

BAUD value: 41

Thus, the USI BAUD register would look like this: **0b0000.0011.0010.1001** 



Actually, you do not have to worry about the configuration of the UART Baud rate at all. A function from the neo430\_usart.h library does all the work for you – just call it once at program start.

## 2.7.2 Serial Peripheral Interface (SPI)

Just like the UART, the SPI is a standard interface for accessing a wide variety of external devices. The data transfer quantity is fixed to 8-bit for a single transfer. However, larger data 'packets' can be implemented, since the actual data size corresponds to the bytes being send during an active chip select (CS) of the connected device. A transmission is started when writing a data byte to the USI\_SPIRTX register. The USI\_CT\_SPIBSY bit of the control register indicates a transfer being in progress. The received data can be obtained by reading the USI\_SPIRTX register as soon as the transmission is done and the busy flag is cleared. A "transmission done" interrupt can be activated by setting the USI\_CT\_SPIIRQ bit. Note, that this interrupt also triggers the same interrupt handler as the interrupt sources from the UART module. The SPI module implements already six dedicated chip select lines, which are directly accessible via the unit's control register, but additional CS lines can be created using the GPIO controller or a specific Wishbone device.

#### **Transmission Configuration**

The SPI transmission provides several configuration options. The actual clock phase can be determined via the USI\_CT\_SPICPHA bit. The clock polarity is fixed to a low idle level. If you wish to use a high idle level, invert the SPI clock signal in your top design. For every transmission, the MSB is send first. Mirror your data byte if you wish to send the LSB first. The actual transmission speed is set via the three prescaler selection bits USI\_CT\_SPIPRSCx of the control register. The resulting prescaler value *PRSC* is shown in the table below:

| Prescaler bits configuration:     | 000 | 001 | 010 | 011 | 100 | 101  | 110  | 111  |
|-----------------------------------|-----|-----|-----|-----|-----|------|------|------|
| Resulting prescaler <i>PRSC</i> : | 2   | 4   | 8   | 64  | 128 | 1024 | 2048 | 4096 |

Based on the PRSC prescaler value, the actual SPI clock frequency is determined by:

$$f_{SPI} = \frac{f_{main}[Hz]}{2 * PRSC}$$

In the neo430\_usart.h library file you can find elementary functions for performing an SPI transmission and for controlling the dedicated chip-select signals of the USART.SPI module.

#### **Dedicated SPI Chip Select (CS) Lines**

The SPI controller features six dedicated chip select lines (signal spi\_cs\_o from the processor's top entity) so you can directly connect up to six SPI slaves to the controller without using e.g., GPIO pins as chip select signals. The six lines are accessible via the USI\_CT\_SPICSx bits in the USART control register. Note that these bits need to be set ('1') in order to activate (set low) the corresponding chip select pin. Two functions (enable CS and disable CS) from the USART's driver functions library help to ease the access to those control register bits.

## 2.8. High-Precision Timer (TIMER)

A high-precision timer (VHDL component neo430\_timer.vhd) is required by many real-time applications. The included device features a simple but powerful module to generate an interrupt in specific time intervals. Besides selecting the main clock-based prescaler, a timer threshold can be configured to accomplish highly-accurate timing.

#### **Implementation Control**

By default, the timer is always synthesized. You can use the TIMER\_USE generic of the processor top entity (see cut-out below) to control implementation.

```
TIMER_USE => true, -- implement timer? (default=true)
```

## Register Map

| Address | Name      |   | Bit(s) (Name) |     | Function                         |
|---------|-----------|---|---------------|-----|----------------------------------|
| 0xFFB0  | TMR_CT    | 0 | 0 TMR_CT_EN   |     | Timer enable                     |
|         |           | 1 | 1 TMR_CT_ARST |     | Automatic reset on timer match   |
|         |           | 2 | TMR_CT_IRQ    | R/W | IRQ enable                       |
|         |           | 3 | TMR_CT_PRSC0  | R/W | Timer counter increment clock    |
|         |           | 4 | TMR_CT_PRSC1  | R/W | prescaler PRSC                   |
|         |           | 5 | TMR_CT_PRSC2  | R/W |                                  |
|         |           |   | 715           |     | Reserved, read as 0              |
| 0xFFB2  | TMR_CNT   |   | 015           |     | Counter register                 |
| 0xFFB4  | TMR_THRES |   | 015           | R/W | Threshold register, IRQ on match |
| 0xFFB6  | -         |   | 015           | R/- | reserved                         |

Table 19: High-precision timer address map



This unit needs to be reset by software before you can use it. The reset is performed by writing zero to the unit's control register.

## **Timer Operation**

An exact timer period is configured using the clock select prescaler bits <code>TMR\_CT\_PRSCx</code> in the control register <code>TMR\_CT</code> and setting a timer threshold <code>TMR\_THRES</code>. Corresponding to the three prescaler selection bits, one of 8 different prescaler values can be selected:

| Prescaler bits configuration:     | 000 | 001 | 010 | 011 | 100 | 101  | 110  | 111  |
|-----------------------------------|-----|-----|-----|-----|-----|------|------|------|
| Resulting prescaler <i>PRSC</i> : | 2   | 4   | 8   | 64  | 128 | 1024 | 2048 | 4096 |

Based on the **PRSC** prescaler value and the **THRES** threshold value, the resulting interrupt "tick frequency" (reciprocal time between two interrupts) is given by:

$$f_{tick} = \frac{f_{main}}{PRSC \cdot (THRES + 1)}$$

**Example:** The desired tick frequency may be 2Hz at a main clock of 100MHz. Using the max. prescaler value (0b111  $\rightarrow$  PRSC = 4096), the threshold value is computed by:

$$THRES = \frac{f_{main}}{f_{tick} \cdot PRSC} - 1 = \frac{1000000000 \, Hz}{2 \, Hz \cdot 4096} - 1 = 12207$$

After the timer is enabled via the TMR\_CT\_EN bit, the timer increments the counter register with the specified ( $\rightarrow$  prescaler bits) frequency. Whenever it reaches the value stored in the threshold register, an interrupt request is asserted (when enabled via the TMR\_CT\_IRQ bit). If the auto-reset bit TMR\_CT\_ARST is set, the counter register is cleared when the threshold value is reached and counting starts again. If not, the user has to clear the counter register manually within the interrupt service routine.

#### **Timer Interrupt**

When the TMR\_IRQ\_EN bit in the timer's control register is set, an interrupt request is generated whenever a counter match occurs (TMR\_THRES == TMR\_CNT). If the timer unit has been excluded from synthesis (disabled), the timer's interrupt request signal is always connected to 0.

| Address | IRQ Vector Name | Priority    | Source                                 |
|---------|-----------------|-------------|----------------------------------------|
| 0x8000  | IRQVEC_TIMER    | 1 (highest) | The timer generates a threshold match. |

Table 20: Timer interrupt vector

## 2.9. Watchdog Timer (WDT)

The WDT (VHDL component neo430\_wdt.vhd) implements a watchdog timer. When enabled, an internal 16-bit counter is started. A program can reset this counter at any time. If the counter is not reset, a system wide hardware reset is executed when the timer overflows. The watchdog is enabled by setting the WDT\_ENABLE bit. Each write access to the watchdog must contain the watchdog access password (=0x47) in the upper 8 bits of the written data word. If the password is wrong and the watchdog is disabled, the access is simply ignored. If the password is wrong and the watchdog is enabled, a hardware reset is generated. A user can determine the cause of the last processor reset by reading the WDT\_RCAUSE bit. If the bit is set, the last reset was generated by a watchdog timeout. If the bit is cleared, the reset was generated via the external reset pin. You can find elementary functions for performing watchdog operations in the neo430\_wdt.h library.

## **Timeout Configuration**

To control the timeout period, one can select 1 of 8 different timeout periods via the WDT CLKSELx bits:

| CLKSELx bits configuration:     | 000     | 001     | 010     | 011       | 100       | 101        | 110         | 111         |
|---------------------------------|---------|---------|---------|-----------|-----------|------------|-------------|-------------|
| CLK prescaler:                  | 2       | 4       | 8       | 64        | 128       | 1024       | 2048        | 4096        |
| Timeout period in clock cycles: | 131 072 | 262 144 | 524 288 | 4 194 304 | 8 388 608 | 67 108 864 | 134 217 728 | 268 435 456 |

#### **Implementation Control**

Use the WDT\_USE generic of the processor top entity (see cut-out below) to control implementation. When disabled, none of the provided functions are available and the system only be reset using the dedicated external reset signal.

```
WDT_USE => true, -- implement WDT? (default=true)
```

## Register Map

| Address | Name   | Bit(s) | Name        | R/W | Function                                                             |
|---------|--------|--------|-------------|-----|----------------------------------------------------------------------|
| 0xFFD0  | WDT_CT | 02     | WDT_CLKSELx | R/W | Timeout interval selection                                           |
|         |        | 3      | WDT_ENABLE  | R/W | Watchdog enable bit                                                  |
|         |        | 4      | WDT_RCAUSE  | R/- | Cause of last processor reset (0=external reset, 1=watchdog timeout) |
|         |        | 515    | -           | R/- | Reserved, read as 0                                                  |

Table 21: Watchdog timer address map

## 2.10. System Configuration Module (SYSCONFIG)

The system information module (VHDL component neo430\_sysconfig.vhd) gives access to various system information, which are mainly defined by the generics of the processor's top entity.

The module, that is implemented as simple ROM with 8 16-bit locations, provides the information regarding the processor hardware configuration. By accessing this component, a program can determine the available RAM space, check if specific instructions or hardware modules are implemented and compute timings (like the UART Baud rate) based on the actual clock speed during run time. Also, a custom user code ca be checked. All these parameters are set using the configuration generics of the NEO430 top entity during instantiation / synthesis.

| Address | Name   |                | Bit(s) (Name)      | R/W | Function                                               |
|---------|--------|----------------|--------------------|-----|--------------------------------------------------------|
| 0xFFF0  | CPUID0 | 0              | 15: HW_VERSION     | R/- | Hardware version                                       |
| 0xFFF2  | CPUID1 | 0              | SYS_MULDIV_EN      | R/- | Set if MULDIV is implemented                           |
|         |        | 1              | SYS_WB32_EN        | R/- | Set if WB32 is implemented                             |
|         |        | 2              | SYS_WDT_EN         | R/- | Set if <b>WDT</b> is implemented                       |
|         |        | 3              | SYS_GPIO_EN        | R/- | Set if <b>GPIO</b> is implemented                      |
|         |        | 4              | SYS_TIMER_EN       | R/- | Set if <b>TIMER</b> is implemented                     |
|         |        | 5              | SYS_USART_EN       | R/- | Set if USART is implemented                            |
|         |        | 6              | SYS_DADD_EN        | R/- | Set if <b>DADD instruction</b> (in CPU) is implemented |
|         |        | 7              | SYS_BTLD_EN        | R/- | Set if <b>bootloader</b> is implemented and used       |
|         |        | 8              | SYS_IROM_EN        | R/- | Set if IMEM is implemented as true ROM                 |
|         |        | 9              | SYS_CRC_EN         | R/- | Set if CRC unit is implemented                         |
|         |        | 10             | SYS_CFU_EN         | R/- | Set if CFU is implemented                              |
|         |        | 11             | SYS_PWM_EN         | R/- | Set if <b>PWM controller</b> is implemented            |
|         |        | 12             | SYS_TRNG_EN        | R/- | Set if <b>TRNG</b> is implemented                      |
|         |        | 1              | 315: reserved      | R/- | Reserved, read as 0                                    |
| 0xFFF4  | CPUID2 | С              | 15: USER_CODE      | R/- | Custom user code, defined via top's generic            |
| 0xFFF6  | CPUID3 | С              | 015: IMEM_SIZE     |     | Size of IMEM in bytes                                  |
| 0xFFF8  | CPUID4 | 015: DMEM_BASE |                    | R/- | Base address of DMEM                                   |
| 0xFFFA  | CPUID5 | C              | 015: DMEM_SIZE     |     | Size of DMEM in bytes                                  |
| 0xFFFC  | CPUID6 | 0              | 015: CLOCKSPEED_LO |     | Low word of clock speed (in Hz)                        |
| 0xFFFE  | CPUID7 | 0              | 15: CLOCKSPEED_HI  | R/- | High word of clock speed (in Hz)                       |

Table 22: System information memory address map

The information provided by the system information ROM is used by the bootloader to perform a system initialization (configure Baud rate, setup the timer interval, check connectivity, ...). Furthermore, the application start-up code (crt0.asm) uses the system information ROM for the minimal-required hardware setup.

## 2.11. Multiplier and Divider Unit (MULDIV)

By default the NEO430 processor includes a serial unsigned multiplier and divider unit (VHDL component neo430\_muldiv.vhd). This unit allows to compute 16-bit division (with remainder) and 16x16-bit = 32-bit multiplication much faster than computing the same operations in software only. This accelerator is NOT compatible to the MSP430-GCC compiler, hence it cannot be used by describing general multiplications and divisions in software. Instead, the accelerator has to be used by using specific functions provided by the neo430\_muldiv.h library. The multiplier/divider unit only performs unsigned operations. If signed operations are required, the provided library functions take care of the necessary conversions.

## **Implementation Control**

In case you do not need the multiply and divide unit, you can use the MULDIV\_USE generic of the processor top entity (see cut-out below) to control implementation.

```
MULDIV_USE => true, -- implement multiplier/divider unit? (default=true)
```

#### **Operation**

The first operand (first factor for multiplication or the dividend for division) is always written to the MULDIV\_OPA register. When writing the second operand (here, the divisor) to the MULDIV\_OPB\_DIV register, a division is triggered. When writing the second operand (here, the second factor) to the MULDIV\_OPB\_MUL register, a multiplication is triggered. The according result (32-bit product or the remainder and the quotient) can be read from the MULDIV\_RESX and MULDIV\_RESY registers. The hardware operation itself needs 18 CPU cycles to generate the requested results.

#### **Register Map**

| Address | Name           | Bit(s) R/W |     | Function                                                                |
|---------|----------------|------------|-----|-------------------------------------------------------------------------|
| 0xFF80  | MULDIV_OPA     | 015        | -/W | First operand (dividend for divisions, first factor for multiplication) |
| 0xFF82  | MULDIV_OPB_DIV | 015        | -/W | Dividend, write access triggers the actual division                     |
| 0xFF84  | MULDIV_OPB_MUL | 015        | -/W | Second factor, write access triggers the actual multiplication          |
| 0xFF8C  | MULDIV_RESX    | 015        | R/- | Low word of the multiplication product or the division's quotient       |
| 0xFF8E  | MULDIV_RESY    | 015        | R/- | High word of the multiplication product or the division's remainder     |

Table 23: Multiplier/divider unit address map

## 2.12. Cyclic Redundancy Checksum Computation Unit (CRC)

Ever been in need to verify a data stream? Then the CRC unit (VHDL component neo430\_crc.vhd) is just right for you! This unit implements a 32-bit shift register, 32-bit XOR mask and an 8-bit data input shift register allowing to compute any CRC16 or CRC32 checksum. The unit operates on chunks of 8-bit input data and can compute the programmed checksum very quickly. Furthermore, the start value of the internal computation shift register can be set in order to specify custom init values.

#### **Implementation Control**

In case you do not need the checksum computation unit, you can use the CRC\_USE generic of the processor top entity (see cut-out below) to control implementation.

```
CRC_USE => true, -- implement CRC unit? (default=true)
```

#### **Operation**

At first, the actual polynomial must be written as according XOR mask to the CRC\_POLY\_LO and CRC\_POLY\_HI registers. If you are using a CRC16 checksum, you only need to configure the lower polynomial register. After that, you can specify an initial seed for the internal 32-bit CRC shift register (CRC\_RESX and CRC\_RESY). In most cases the stat value is set to zero. After the initial configuration new input data in chunks of 8-bit (so only thw lowest 8 bits are used) can be written to the CRC\_CRC16IN register for 16-bit CRC computations or to the CRC\_CRC32IN register for 32-bit CRC computations. The final results can be obtained from the CRC\_RESX register for 32-bit CRC computations and also from the CRC\_RESY register for 32-bit CRC computations. The provided neo430\_crc.h library features some of the mostly required CRC computations functions that also allow an easy and hardware abstract handling of the CRC unit.

#### **Register Map**

| Address | Name        | Bit(s) | R/W Function |                                                                 |
|---------|-------------|--------|--------------|-----------------------------------------------------------------|
| 0xFFC0  | CRC_POLY_LO | 015    | -/W          | Low 16-bit of the polynomial XOR mask                           |
| 0xFFC2  | CRC_POLY_HI | 015    | -/W          | High 16-bit of the polynomial XOR mask                          |
| 0xFFC4  | CRC_CRC16IN | 07     | -/W          | 8-bit input data for 16-bit CRC computation + operation trigger |
| 0xFFC6  | CRC_CRC32IN | 07     | -/W          | 8-bit input data for 32-bit CRC computation + operation trigger |
| 0xFFC8  | _           | 015    | -/-          | reserved                                                        |
| 0xFFCA  | _           | 015    | -/-          | reserved                                                        |
| 0xFFCC  | CRC_RESX    | 015    | R/W          | CRC shift register (result / init value) low part               |
| 0xFFCE  | CRC_RESY    | 015    | R/W          | CRC shift register (result / init value) high part              |

Table 24: CRC unit address map

## 2.13. Custom Functions Unit (CFU)

The custom functions unit (VHDL component neo430\_cfu.vhd) is dedicated for user-defined processor extensions. In contrast to specific hardware accelerators connected to the Wishbone bus interface, the CFU allows the implementation of low-latency and tightly-coupled hardware extensions.

#### **Implementation Control**

In case you want to use the custom functions unit to implement user-defined hardware extensions use the CFU\_USE generic of the processor top entity (see cut-out below) to control implementation. By default, the CFU will **NOT** be synthesized since the provided CFU template does not implement any "useful" operations. It is up to you to implement them;)

```
CFU_USE => false, -- implement custom functions unit? (default=false)
```

#### **Operation**

From a software point of view, the CFU implements 8 16-bit wide registers, that can be used for writing and reading data. These register can only be accessed using full 16-bit-word read/write transfers. The default CFU from the project's rtl folder implements these 8 with no additional computation logic. Therefore, this CFU behaves like a simple register file.

If you want to implement a custom functions, for instance some kind of crypto computations, the actual computations have to made based on data, which is available via one of the 8 CFU register addresses. Take a look at the other processor modules like the GPIO controller or the Timer to get an idea on how to use the register interface. Also make sure to get familiar with the CPU bus protocol, introduced at the beginning of this chapter.

## Register Map

| Address | Name     | Bit(s) | R/W | Function            |
|---------|----------|--------|-----|---------------------|
| 0xFFD0  | CFU_REG0 | 015    | R/W | CFU user register 0 |
| 0xFFD2  | CFU_REG1 | 015    | R/W | CFU user register 1 |
| 0xFFD4  | CFU_REG2 | 015    | R/W | CFU user register 2 |
| 0xFFD6  | CFU_REG3 | 015    | R/W | CFU user register 3 |
| 0xFFD8  | CFU_REG4 | 015    | R/W | CFU user register 4 |
| 0xFFDA  | CFU_REG5 | 015    | R/W | CFU user register 5 |
| 0xFFDC  | CFU_REG6 | 015    | R/W | CFU user register 6 |
| 0xFFDE  | CFU_REG7 | 015    | R/W | CFU user register 7 |

Table 25: Custom functions unit address map

## 2.14. Pulse Width Modulation Controller (PWM)

The PWM controller (VHDL component neo430\_pwm.vhd) implements a very simple 3-channel pulse width modulation controller with 8-bit resolution per channel. It is based on an 8-bit counter with three programmable threshold comparators. Thus, the duty cycle of the channels is modified in order to generate an analog signal. The output channels are available in the processor's top entity via the pwm\_o port. The neo430\_pwm.h library provides basic functions for using the PWM controller. The controller can be used to drive a fancy RGB-LED, providing 24-bit true color, to dim LCD backlight or even for motor control. An external integrator (e.g., an RC low-pass filter) should be used to smooth the generated "analog" signals.

#### **Implementation Control**

If you do not need the processor-internal PWM controller (maybe you have attached a far more complex one to the Wishbone bus), you can exclude it from synthesis using the PWM\_USE generic of the processor top entity (see cut-out below).

#### **Operation**

The PWM controller is activated by setting the PWM\_CT\_ENABLE bit in the module's control register PWM CT. When this flag is cleared, the internal counter is reset and all output channels are set to zero.

The 8-bit duty cycle for each channel, which represents the channel's "intensity", can be specified via the according  $PWM\_CHx$  register. Based on the duty cycles, the according analog output voltage (relative to the IO supply voltage) of each channel can be computed by the formula below. Here, R denotes the internal PWM resolution (default R=8).

Intensity<sub>X</sub> = 
$$\frac{\text{PWM\_CHx}}{2^R}$$
% for channel x = {0..2}

The frequency of the generated PWM signals is defined by the resolution R of the controller and the PWM operating frequency. The controller can operate either in fast or in slow PWM mode. In the fast PWM mode, the internal counter is updated using half of the processor's main clock ( $f_{main}$ ). When in slow PWM mode, the controller updates its internal counter using the processor main clock divided by 2048. The actual mode is configured via the control register's PWM\_CT\_FMODE flag. The two equations below show the according PWM frequencies.

$$f_{fastPWM} = \frac{f_{main}}{2^R * 2} \qquad f_{slowPWM} = \frac{f_{main}}{2^R * 2048}$$

The fast PWM mode allows very fast response to changes in the duty cycles but also introduces high-frequency harmonics on the output channels and requires a lot of dynamic power. In contrast, the slow PWM mode increases the response time but also reduces high-frequency harmonics and dynamic power consumption.

## Register Map

Note that all configuration registers of the PWM controller are write-only.

| Address | Name    | Bit(s) (Name) |                  | R/W     | Function |                                          |
|---------|---------|---------------|------------------|---------|----------|------------------------------------------|
| 0xFFE0  | PWM_CT  | 0             | 0 PWM_CT_CLKSEL0 |         | -/W      | Enable (activate) PWM controller         |
|         |         | 1             | PWM_CT_MODE      |         | -/W      | Fast PWM mode ('1'), slow PWM mode ('0') |
|         |         |               | 215: reserved    |         | R/-      | Reserved, read as 0                      |
| 0xFFE2  | PWM_CH0 | 0.            | 7                | PWM_CH0 | -/W      | Duty cycle for channel 0                 |
| 0xFFE4  | PWM_CH1 | 0.            | 7                | PWM_CH1 | -/W      | Duty cycle for channel 1                 |
| 0xFFE6  | PWM_CH2 | 0.            | 7                | PWM_CH2 | -/W      | Duty cycle for channel 2                 |

Table 26: PWM controller address map

### **Changing PWM Resolution**

If you need more/less resolution and/or if you want to decrease/increase the PWM output frequency, the counter width of the internal PWM generating counters can be modified. Just edit the pwm\_resolution\_c constant in the neo430 pwm.vhd file according to your needs.

```
-- ADVANCED user configuration ------
constant pwm_resolution_c : natural := 8; -- pwm resolution in bits (max 16, default=8)
```

#### Just as a summary:

Higher resolution → lower PWM frequency, higher dynamic power consumption Lower resolution → higher PWM frequency, less dynamic power consumption

## 2.15. True Random Number Generator (TRNG)

The true random number generator (VHDL component neo430\_trng.vhd) is a very cool peace of logic as it generates *real* random numbers. In contrast to other RNGs, like LFSR (linear feedback shift registers), a TRNG does not generate pseudo random numbers based on some (very long) sequence. In a TRNG, the fundamental source of randomness is a truly physical, making it suitable even for some cryptography applications.

The neo430\_trng.h library provides basic functions for using the TRNG. Take a look at the example folder to see how you can use the TRNG in software.

#### **Implementation Control**

By default, the TRNG will not be synthesized, since there is a chance – a very little chance – that some synthesis tools might have issues dealing with an asynchronous element or a combinatorial loop, respectively. However, if you want to include the TRNG into your design, you can enable synthesis by using the TRNG USE generic of the processor top entity (see cut-out below).

```
TRNG_USE => true, -- implement true random number generator? (default=false)
```

#### Warnings during Synthesis

When the TRNG is enabled for synthesis, most EDA tools will at least issue some kind of warning regarding latches and/or combinatorial loops in the TRNG unit. This is **absolutely OK** <u>here</u>, since this is what we really want;)

#### **Operation**

The TRNG is built from several oscillators, where the outputs of all of them are registered two times (to eliminate meta stability in the following logic) and XOR-ed to compute a single bit of randomness. This single bit is sampled in 8 consecutive clock cycles to produce one final random data byte. Each oscillator consists of a latch and a single inverter. No area constraints or VHDL attributes are required for implementation, making it FPGA platform independent. Basically, the inverter's output if fed back to its input via the latch. The latch features a preload function to initialize the oscillator and a "clock signal" (well, actually an "enable" signal) to activate or deactivate the oscillator. Thus, you can still do a simulation when the TRNG is deactivated in software, because the combinatorial loop is interrupted. Furthermore, this allows to still do static timing analysis and also eases the synthesis process.

The actual frequencies of the oscillators are defined by many different factors: Position of the oscillators on the FPGA, routing of the feedback paths, core voltage, chip temperature, mechanical stress, semiconductor manufacturing fluctuations, ... Hence, the actual frequencies cannot be determined making them a nice source of entropy.

The TRNG is activated by setting the TRNG\_EN bit in the module's control register TRNG\_CT. When this flag is cleared, all oscillators are shut down and the output random data stops updating. The generated random data (bytes) can be read from the TRNG DATA register.

## Register Map

Note that the same address is used for the control register (when doing a write access) and the random data register (when doing a read access).

| Address | Name      | Bit(s) (Name)    |           | R/W | Function               |
|---------|-----------|------------------|-----------|-----|------------------------|
| 0xFFE8  | TRNG_CT   | 0 TRNG_CT_ENABLE |           | -/W | Enable (activate) TRNG |
|         |           | 115              | reserved  | -/- | Reserved               |
| 0xFFE8  | TRNG_DATA | 07               | TRNG_DATA | R/- | Random data            |
|         |           | 815              | reserved  | R/- | Reserved, read as 0    |

Table 27: TRNG address map

#### Changing the "Quality" of Randomness

If you are not satisfied with the quality of the random numbers (e.g., they are not equally distributed), you can try to experiment with the number of used oscillators. By using more oscillators, more sources of entropy can be implemented.

```
-- ADVANCED user configuration ------
constant num_trngs_c : natural := 4; -- number of random generators (default = 4)
```

#### **Some Considerations**

Be aware, that when using the TRNG some "issues" might arise:

- The oscillators require "a lot" of power, since they oscillate at very high frequencies
- The additional power might heat up some very small portion of the FPGA
- Due to the high frequencies, additional noise (EM radiation) might occur
- Hence, the TRNG should be deactivated when not being used

#### 3. Software Architecture

Software development for the NEO430 is based on the freely-available **TI msp430-gcc compiler toolchain**, which can be downloaded from (use the "compiler only" package):

#### http://software-dl.ti.com/msp430/msp430 public sw/mcu/msp430/MSPGCC/latest/index FDS.html

With the compiler tool chain, you can turn your C/C++ programs into an NEO430 executable. A batch file or a Linux/Cygwin makefile (compile.bat or makefile) in the sw/common folder helps to do this job. Generating an executable is done in several consecutive steps (all done by the compilation scripts):

- 1. The application start-up code (crt0.asm) is assembled into an object file. This start-up codes conducts the minimal required hardware initialization.
- 2. The actual application program is compiled together with all included files and libraries. The code is optimized for size (**-Os**) by default.
- 3. In the next step, all generated object files are linked together using the special NEO430 linker script (neo430\_linker\_script.x). This specific linker script generates a final object file, that already represents the actual memory layout of the NEO430. Also, an ASM listing file is generated (main.s) for debugging.
- 4. The actual program image is generated.
- 5. In the last step, the program image is converted into a NEO430 executable (main.bin) binary. This file can be uploaded and executed by the NEO430 bootloader. Additionally, an executable VHDL memory initialization image for the IMEM is generated and directly installed into the neo430\_pplication\_image.vhd file no manual copy required.

The last step is done by a small C program, which is located in the sw/tools/image\_gen folder. A precompiled EXE file is available (it was built for a 64-bit Windows machine). If you are using Linux or Cygwin as build-environment, the make process will automatically recompile the image generator.



The size of the final executable, which is printed in your console by the make script, only represents the size of the executable image. Additional RAM is required for allocating dynamic memory for the stack and the head (actual size depends on the application program).

#### 3.1. Executable Program Image

As the last step of the program compilation flow, the NEO430 executables are generated. The binary version of can be uploaded to the processor to be directly executed and/or programmed into an attached flash SPI or EEPROM. The executable VHDL IMEM memory initialization data is directly inserted into the processor's IMEM image VHDL file. The compilation script uses a specific linker script to generate the final image:



Figure 7: Construction of the final program image

#### 3.1.1. Image Sections

The final executable image consists of the following three sections:

- .text Executable instructions, including start-up, application and termination code
- .rodata Read-only data (constants like strings)
- . data Pre-initialized variables (will be copied into RAM during start-up)

#### 3.1.2. Dynamic Memory

The remaining memory – the memory after the .data section until the end of the RAM – is used for the dynamic data during run time. This data includes the stack and the heap. The stack grows from the end of the memory down to the end of the .data section. The heap grows from the end of the .data section up to the end of the memory. Make sure there is no collision between the heap and the stack when using dynamic memory allocation!

#### 3.1.3. Application Start-Up Code

During the linking process, the application start-up code crt0.asm is placed right before the actual application. The resulting code represents the applications .text segment and thus, the final executable. The start-up code implements a basic system setup:

- Setup the stack-pointer according to the memory size/layout configurations from the CPUID registers
- Set all IO device registers (including the interrupt vectors) to 0x0000
- Clear complete DMEM, including .bss segment, copy the .data section from IMEM to DMEM
- Initialize all CPU data registers to zero
- <u>Call</u> the application's main function
- If the main function returns, the watchdog timer is deactivated, interrupts are disabled and the CPU is set to eternal sleep mode

## 3.1.4. Executable Image Formats

The specific image generator program (sw\tools\image\_gen) is used to either create an executable binary or an executable VHDL memory initialization image. The actual conversion target is given by the first argument when calling the image generator. Valid target options are listed below. The second argument determines the input file and the third argument specifies the output file.

| -app_bin | Generates an executable binary "main.bin" (for UART uploading via the bootloader) in the project's folder (including a file header!!!)             |
|----------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| -app_img | Generates an executable VHDL memory initialization image for the IMEM. This function is meant to generate the "neo430_application_image.vhd" file. |
| -bld_img | Generates an executable VHDL memory initialization image for the DMEM. This function is meant to generate the "neo430 bootloader image.vhd" file.  |

There is a special thing about the binary executable format: This executable version has a very small header consisting of three 16-bit words located right at the beginning of the file. The first word (red) is the signature word and is always <code>OxCAFE</code>. Based on this word, the bootloader can identify a valid image file. The next word (green) represents the size in bytes of the program image (so this value is always 6 bytes less than the actual file size). A simple XOR checksum of the program data is given by the third word (blue). This checksum is computed by XOR-ing all program data words (no header data!) of the program image. Below you can see an exemplary binary executable.

```
CA FE 01 7A 19 59 43 03 42 18 FF E8 42 19 FF EA 58 09 43 02 49 01 83 21 43 82
9E 43 82 FF A6 43 82 FF B4 43 82 FF B2 43 82 FF C4 40 B2 47
                                                             00 FF
                                                                    DO 98 09
                                                                             24 04
43 88 00 00 53 28 3F FA 40 35 01 7A 40 36 01 7A 40 37
                                                       80 00 95 06 24 04
00 53 27 3F FA 43 04 43 05 43 06 43 07 43 08 43 09 43 0A 43 0B 43
                                                                   OC 43 OD 43 OE
43 OF 12 B0 00 72 43 02 D0 32 00 10 42 1F FF EE 42 1B FF EC 43 0C
                                                                   43
                                                                       OD 4F
                                                                             0D 4B
OE 43
      ΟF
         DF.
            OC.
               DF
                  0 D
                     3C
                        04
                           50
                              3C
                                 6A 00
                                       63
                                           3D
                                              53
                                                 1 F
                                                    93
                                                       1 D
                                                          2F
                                                             FA
                                                                 90
                                                                    3C
                                                                       96
                  24 02 92 6E 20 07 C3 12
                                           10 OF C3
43 4E 3C
         OΕ
            93
               6E
                                                    12
                                                       10 OF
                                                             C3
                                                                 12
                                                                    10
                                                                       0F
                                                                          3C
12 10 0F 53 5E 90 3F 01 00 2F EF 4E 4E 10 8E DF 0E 4E 82 FF A4 40 B2
                                                                      FE 81
40 3F 01 3E 12 B0 01 16 B2 B2 FF E2 20 07 40 3F 01 5A 12 B0 01 16 43 1F 40
14 43 82 FF B2 43 0F 4F 0E F0 3E 00 FF 53 1F 4E 82 FF B2 40 3E 00 0B 3C 04 43 3D
43 03 53
         3D 23 FD 53 3E 23 FA 3F
                                 F0 41
                                        30 3C OF 90
                                                    7E 00 0A 20 06 B2
                                                                       В2
                                                                          FF
                                                                                23
FD 40 B2 00
            0D FF A2 B2 B2 FF
                              Α6
                                 23 FD
                                        11 8E
                                              4E
                                                 82
                                                    FF
                                                       A2
                                                          4 F
                                                                 93 4E
                                                             7E
                                                                       23
                                                                             41
42 0A 69 6C 6B 6E 6E 69 20 67 45 4C 20 44 65 64 6F 6D 70 20 6F
                                                                72 72 67 6D 61 00
0A 72 45 6F 72 21 72 4E 20 20 6F 49 50 20 4F 6E 75 74 69 73 20 6E 79 68 74
7A 69 64 65 00 21
```

Hex-view of an executable binary image including colorized header

#### 3.2. Internal Bootloader



The bootloader requires at least the TIMER and the USART units to be included into the design! The GPIO unit is optional, since is used just for status indication.

The included bootloader of the NEO430 processor allows you to upload new program images at every time. If you have an external SPI EEPROM connected to the processor, you can store this image to this device and the system can directly boot it after reset without any user interaction. But we will talk about that later...

To interact with the bootloader, attach the UART signals of processor via a COM port (-adapter) to a computer, configure your terminal program using the following settings and perform a reset of the processor.

Terminal console settings (19200-8-N-1):

- 19200 Baud
- 8 data bits
- No parity bit
- 1 stop bit
- Newline on "\r\n" (carriage return, newline)
- No transfer protocol for sending data, just the raw byte stuff;)

The bootloader uses bit #0 of the GPIO output port as high-active status LED (all other outputs are set to low level by the bootloader). After reset, this LED will start blinking at ~2Hz and the following intro screen should show up in your terminal:

```
NEO430 Bootloader V20180424 by Stephan Nolting

HWV: 0x0180

CLK: 0x05F5E100

ROM: 0x1000

RAM: 0x0800

SYS: 0x1AFF

Autoboot in 8s. Press key to abort.
```

NEO430 bootloader start-up screen

This start-up screen gives some brief information about the bootloader version and several system parameters (all in hexadecimal representation):

- **HWV**: Hardware version
- **CLK**: Clock speed in Hz
- **ROM**: Size of internal IMEM in bytes
- **RAM**: Size of internal DMEM in bytes
- **SYS**: System features (synthesized modules)

Now you have 8 seconds to press any key. Otherwise, the bootloader starts the auto boot sequence is started (see next chapter).

When you press any key within the 8 seconds, the actual bootloader user console starts:

```
NEO430 Bootloader V20180424 by Stephan Nolting
HWV: 0x0180
CLK: 0x05F5E100
ROM: 0x1000
RAM: 0x0800
SYS: 0x1AFF
Autoboot in 8s. Press key to abort.
Aborted.
Commands:
 d: Dump MEM
 e: Load EEPROM
h: Help
 p: Store EEPROM
 r: Restart
 s: Start app
 u: Upload
CMD:>
```

NEO430 bootloader console after pressing a key

The auto-boot countdown is stopped and now you can enter a command from the list to perform the corresponding operation:

- d: Core dump of full address space (can be aborted at any time by pressing any key)
- e: Load application image from SPI EEPROM (at SPI.CS0) into IMEM
- h: Show the help text (again)
- p: Store the complete IMEM content as boot image to the SPI EEPROM (at SPI.CS0)
- r: Restart the bootloader
- s: Start the application, which is currently in IMEM
- u: Upload new program executable image (raw \*.bin file) via UART into the IMEM

A new program is uploaded to the NEO430 by using the upload function. The compile scripts of this project generate a compatible binary executable (\*.bin format), which must be transmitted by your terminal program without using any kind of protocol – just raw data.

When the image is completely uploaded, it resides in the IMEM and you can start executing it using the "Start app" option. If you want to take at look at the whole address space, perform a "Core dump". This is very useful for simple debugging or if you just want to see what's going on.

The complete content of the IMEM can be stored in an external SPI EEPROM (program it via "Store EEPROM"). The bootloader can copy the image from the EEPROM at start up and automatically launch it. Of course, you can also load it manually using the "Load EEPROM" option.



When the bootloader is implemented (enabled via the BOOTLD\_USE generic) the IMEM is not initialized by the bitstream at all. This allows a mapping of the IMEM to primitives, that cannot be initialized during bitstream upload.

by Stephan Nolting

#### 3.2.1. Auto Boot Sequence

When you reset the NEO430 processor, the bootloader waits 8 seconds for a console user input before it starts the automatic boot sequence. This sequence tries to fetch a valid boot image from the external SPI EEPROM, connected to SPI chip select bit #0. If a valid boot image is found and can be successfully transferred (at 1/1024 of the processor clock) into the internal IMEM, it is automatically started. If no SPI EEPROM was detected or if there was no valid boot image found, the bootloader stalls and the status LED is permanently activated.

#### 3.2.2. Error Codes

If something goes wrong during the bootloader operation, an error code is shown. In this case, the processor stalls, a bell command is send to the terminal, the status LED is permanently activated and the system must be manually reset to proceed.

- ERR\_00: This error occurs if the attached EEPROM cannot be accessed during write transfers. Make sure you have the right type of EEPROM and that it is connected properly to the NEO430's SPI port at chip select #0 (CS0).
- **ERR\_01**: If you have implemented the IMEM as true ROM (so it <u>cannot</u> be written) this error pops up when trying to install a new application image (e.g. via the UART). Set the IMEM\_AS\_ROM configuration generic of the processor top entity to 'false' to implement the IMEM as writable RAM.
- ERR\_02: If you try to transfer an invalid executable (via UART or from EEPROM), this error message shows up. Also, if no EEPROM was found during a boot attempt, this message will be displayed.
- ERR\_04: Your program is way too big for the internal IMEM. Increase the IMEM size of your NEO430 project or optimize your code to save memory.
- ERR\_08: This indicates a checksum error. Something went wrong during the transfer of the program image (upload via UART or loading it from EEPROM). If the error was caused by a UART upload, just try it again. When the error was generated during an EEPROM access, the stored image might be corrupted or was built for a very old version of the bootloader.
- Image still valid? Boot anyway (y/n)?: This message is shown when you try to execute the content of the IMEM using the bootloader's "s" command while there is no valid image in the IMEM yet. You need to upload a valid image via the serial console or by loading a boot image from an external EEPROM. However, you are prompted if you want to boot from the IMEM anyway. In case you have reset the processor, the image still resides in the IMEM but this message is showed anyway.

## 4. Let's Get It Started!

To make your NEO430 project run, follow the guides from the upcoming sections. There are several guides for the application compilation and all details of the project. The tutorials are partly written for Windows or Cygwin/Linux users, so make sure to select the right one.

#### 4.1. General Hardware Setup

Follow these steps to build the FPGA hardware of your NEO430 project. In this tutorial, we will use a test implementation of the processor, only containing the core itself and just propagating the minimal signals to the outer world. Hence, this guide is intended as evaluation project to check out the NEO430. A little note: The order of the following steps might be a little different for your specific EDA tool.

- 1. Create a new project with your FPGA EDA tool of choice (Xilinx Vivado, Intel Quartus, ...).
- 2. Add all VHDL files from the project's **rtl/core** folder to your project. Make sure to *reference* the files only do not copy them.
- 3. Only for some tools (like Xilinx Vivado): Make sure to set the library to "neo430" when adding all processor rtl files to the new project!
- 4. The neo430\_top.vhd file is the top entity of the NEO430 processor. If you already have a design, instantiate this unit into your design and proceed. If you do not have a design yet and just want to check out the NEO430 no problem! Use the neo430\_test.vhd file from the rtl/top\_templates folder as top entity. Of course, you also need to add this file to your project. This tutorial assumes to use this test entity as top entity, but he basic steps are the same when using the core as part of your project.
- 5. The configuration of the NEO430 processor is done using the generics of the instantiated processor top entity (done in the neo430\_test.vhd file). Let's keep things simple at first and use the default configuration (see below). But there is one generic, that has to be set according to your FPGA / board: The clock frequency of the top's clock input signal (clk\_i). Use the CLOCK\_SPEED generic to specify your clock source's frequency in Hertz (Hz). The default value, that you need to adapt, is marked in red:

stnolting@gmail.com

```
neo430 top test inst: neo430 top
generic map (
   -- general configuration --
  CLOCK_SPEED => 100000000, -- main clock in Hz
IMEM_SIZE => 4*1024, -- internal IMEM_size in bytes, max 32kB (default=4kB)
  IMEM_SIZE => 4*1024,
DMEM_SIZE => 2*1024,
                                               -- internal DMEM size in bytes, max 28kB (default=2kB)
     - additional configuration --
  USER CODE => x"4788",
                                                -- custom user code
   -- module configuration --
  DADD_USE => true, -- implement DADD instruction? (default=true)
MULDIV_USE => true, -- implement multiplier/divider unit? (default=true)
  WB32_USE => true, -- implement Multiplier/divider unit?
WB32_USE => true, -- implement WB32 unit? (default=true)
WDT_USE => true, -- implement WDT? (default=true)
GPIO_USE => true, -- implement GPIO unit? (default=true)
TIMER_USE => true, -- implement USART? (default=true)
USART_USE => true, -- implement CRC unit? (default=true)
   CRC USE
                    => false, -- implement custom functions unit? (default=false)
   CFU USE
   PWM USE
                 => true, -- implement PWM controller? (default=true)
=> false, -- implement true random number generator? (default=false)
   TRNG USE
   -- boot configuration --
  BOOTLD_USE => true,
                                                 -- implement and use bootloader? (default=true)
                                               -- implement IMEM as read-only memory? (default=false)
   \overline{\text{IMEM AS ROM}} => \text{false}
```

- 6. If you feel like it or if your FPGA does not provide enough resources you can modify the memory sizes (IMEM and DMEM) or exclude certain modules from implementation. But as mentioned above, let's keep things simple and use the standard configuration for now. We will come back to the customization of all those configuration generics in later chapters.
- 7. Depending on your FPGA tool of choice, it is time now (or later?) to assign the signals of the test setup top entity to the according pins of your FPGA board. All the signals can be found in the entity:

```
entity neo430_test is
  port (
    -- global control --
    clk_i : in std_ulogic; -- global clock, rising edge
    rst_i : in std_ulogic; -- global reset, async, LOW-active
    -- gpio --
    gpio_o : out std_ulogic_vector(07 downto 0); -- parallel output
    -- serial com --
    uart_txd_o : out std_ulogic; -- UART send data
    uart_rxd_i : in std_ulogic -- UART receive data
    );
end neo430_test;
```

8. Attach the clock input to your clock source and connect the reset line to a button of your FPGA board. Check whether it is low-active or high-active – the reset signal of the processor must be **low-active**, so maybe you need to invert the input signal. If possible, connected at least bit #0 of the GPIO output port to a high-active LED (invert the signal when your LEDs are low-active). Finally, connect the UART signals to your serial host interface (dedicated pins, USB-to-serial converter, etc.). The final test setup is illustrated in the figure below.

## by Stephan Nolting



Figure 8: External hardware configuration of the NEO430 test implementation (neo430\_test.vhd)

- 9. Perform the project HDL compilation (synthesis, mapping, ..., bitstream generation).
- 10. Download the generated bitstream into your FPGA ("program" it) and press the reset button (just to make sure everything is sync).
- 11. Done! If you have assigned the bootloader status LED (bit #0 of the GPIO output port), it should be flashing now and you should receive the bootloader start prompt via the UART.

## 4.2. General Software Setup

So, the hardware thing is done. Now it is time to prepare the general part of the software flow. This must be done regardless whether you are using Windows or Linux/Cygwin for the actual application compilation.

- 1. At first, download the latest version of the TI msp430-gcc compiler tool chain. You can downloaded it without registration (select the "compiler only" package) from <a href="http://software-dl.ti.com/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/msp430/ms
- 2. Extract/install all files into a folder somewhere in your file system. Remember where you have installed the compiler, since this will be important for the setup of compilation scripts in the next chapter(s).
- 3. You need to tell the linker the size of the internal RAM (the data memory, "DMEM", DMEM\_SIZE generic) and the internal ROM (instruction memory, "IMEM", IMEM\_SIZE generic) of the NEO430 (you defined that during the previous tutorial). Open the neo430\_linker\_script.x in the sw/common folder with a text editor and set the parameter LENGTH of the ROM memory section according to the previously configured IMEM\_SIZE generic and the RAM memory section according to the previously configured DMEM\_SIZE generic (hexadecimal representation!). The cut-out below shows the default configuration if you have not changed the memory sizes before you can keep everything in its current state and proceed.

```
MEMORY
{
  rom (rx) : ORIGIN = 0x0000, LENGTH = 0x1000
  ram (rwx) : ORIGIN = 0x8008, LENGTH = 0x0800 - 8
}
```

(Only edit the values marked in red!)



Make sure you **do not** delete the "-8" right after the length of the RAM! This subtraction is required due to the interrupt vectors, which are located at the beginning of the DMEM. Additionally, the origin of the DMEM is set to 0x8008 for the compiler so it does not use the first 8 bytes at all. Of course, the "real" base address of the data memory module is still 0x8000 (for the HW).

4. Well, this is all you need to do for the general software setup. In the next chapter(s), we will take a closer look on the application compile script(s).

## 4.3. Application Program Compilation using Windows CMD Batch File

Use this guide if you want to compile programs using **Windows**. The compilation script file, which must be executed, is "make.bat", which is available in each example project folder.

- 1. At first, open the common compile.bat compilation script in the sw/common folder with a text editor and look for the "USER CONFIGURATION" section.
- 2. Assign the absolute system path of the compiler's binary folder to the **BIN\_PATH** variable (remember where you have installed it?). In the example below, the binary compiler sources are located in folder C:\msp430-qcc-6.4.0.32 win32\bin.

```
@REM Path of compiler binaries:
@if "%MSP430GCC_BIN_PATH%" == "" set MSP430GCC_BIN_PATH=C:\msp430-gcc-6.4.0.32_win32\bin
```

- 3. *Alternatively*, you can also assign an environmental variable MSP430GCC\_BIN\_PATH to configure the path to the MSP430-GCC binaries.
- 4. That's all for now! Now you can start compiling programs. At first, we will begin with a simple example program. Open a console (cmd) and navigate to the blink\_led folder in the project's software examples folder: sw\example\blink led
- 5. Execute the actual compilation make batch file make.bat in the current example folder. By this, the "main.c" file is automatically used as main source file.

```
...\sw\example\blink_led>make
```

6. During the compilation process, several messages are generated:

```
...\sw\example\blink_led>make

Memory utilization:

text data bss dec hex filename

726 0 0 726 2d6 main.elf

Installing application image to rtl\core\neo430_application_image.vhd

Final executable size (bytes):

732
```

7. At first, the memory utilization/distribution is shown (in bytes). After that, a status message is shown that confirms the "installation" process of the generated program image into the instruction memory VHDL component using the neo430\_application\_image.vhd file in the rtl folder. If you resynthesize your design, the image is ready to boot from the IMEM. Finally, the actual size of the created executable is shown.

8. If you want to use a specific file as main source file, you can pass that file as first argument:

```
...\sw\example\blink_led>make custom_main_file.c
```

9. Congratulations, you have just compiled your first application!

## 4.4. Application Program Compilation using Cygwin/Linux Makefile

Use this guide if you want to compile programs using **Cygwin** (on Windows) or directly **Linux**. The compilation script file, which must be executed, is "Makefile" (available in each example project folder).

- 1. At first, open the Makefile main compilation script in the sw/common folder with a text editor and look for the "USER CONFIGURATION" section.
- 2. Assign the absolute system path of the compiler's binary folder to the BIN\_PATH variable (remember where you have installed it?). In the example below, the binary compiler sources are located in the folder /mnt/c/msp430-gcc-6.4.0.32 linux64/bin.

```
# Path of compiler binaries:
MSP430GCC_BIN_PATH ?= /mnt/c/msp430-gcc-6.4.0.32_linux64/bin
```

- 3. Alternatively, you can also assign an environmental variable MSP430GCC\_BIN\_PATH to configure the path to the MSP430-GCC binaries.
- 4. That's all for now! Now you can start compiling programs. At first, we will begin with a simple example program. Open a terminal and navigate to the blink\_led folder in the project's software examples folder: sw/example/blink\_led
- 5. Simply execute the actual compilation **Makefile** in the current example folder:

```
.../sw/example/blink_led$ make
```

6. During the compilation process, several messages are generated:

```
.../sw/example/blink_led$ make

Memory utilization:

text data bss dec hex filename

732 0 0 732 2dc main.elf

Installing application image to rtl/core/neo430_application_image.vhd

Final executable size (bytes):

738
```

by Stephan Nolting

- 7. At first, the memory utilization/distribution is shown (in bytes). After that, a status message is shown, that confirms the "installation" process of the generated program image into the instruction memory VHDL component using the neo430\_application\_image.vhd file in the rtl folder. If you re-synthesize your design, the image is ready to boot from the IMEM. Finally, the actual size of the created executable is shown.
- 8. If you want to use a specific file as main source file, you can pass that file via the MAIN variable:

```
.../sw/example/blink_led$ make MAIN=custom_main_file.c
```

9. Congratulations, you have just compiled your first application! If you want to clean up your project's workspace again, you can execute a make clean.

## 4.5. Uploading and Starting of a Binary Executable Image via UART

When compiling an application, two final files are generated in the projec tfolder:

- main.bin The binary executable used for uploading via the bootloader.
- main.s The ASM listing file of the compiled application (for debugging).

The generated binary executable must be uploaded to the NEO430 to be executed. This tutorial uses **TeraTerm** as an exemplary serial terminal program for **Windows**, but the general procedure is the same for other terminal programs and/or build environments / operating systems.

- 1. Connect the UART interface of your FPGA (board) to a COM port of your computer or use an USB-to-serial adapter.
- 2. Start a terminal program. In this tutorial, I am using TeraTerm for Windows. You can download it from: <a href="https://ttssh2.osdn.jp/index.html.en">https://ttssh2.osdn.jp/index.html.en</a>
- 3. Open a connection to the corresponding COM port. Configure the terminal according to the following parameters:
  - 19200 Baud
  - 8 data bits
  - 1 stop bit
  - No parity bits
  - No transmission/flow control protocol (just raw byte mode)
  - Newline on "\r\n" = carriage return & newline (if configurable at all)



Figure 9: Serial configuration of TeraTerm

4. Also make sure, that single chars are transmitted without any consecutive "new line" or "carriage return" commands (this is highly dependent on your terminal application of choice, TeraTerm only sends the raw chars by default).

5. Press the NEO430's reset button to restart the bootloader. The status LED starts blinking and the bootloader intro screen appears in your console. Hurry up and press any key (hit space!) to abort the automatic boot sequence and to start the actual bootloader user interface console.

```
NEO430 Bootloader V2018042 by Stephan Nolting
HWV: 0x0180
CLK: 0x05F5E100
ROM: 0x1000
RAM: 0x0800
SYS: 0x1AFF
Autoboot in 8s. Press key to abort.
Aborted.
Commands:
d: Dump MEM
e: Load EEPROM
h: Help
 p: Store EEPROM
 r: Restart
s: Start app
u: Upload
CMD:>
```

6. Execute the "Upload" command by typing u. Now, the bootloader is waiting for a binary executable to be send.

```
CMD:> u
Awaiting BINEXE...
```

7. Use the "send file" option of your terminal program to transmit the previously generated binary executable (main.bin) from the sw\example\blink led folder to the NEO430.



Figure 10: Sending a file using Tera Term

8. Make sure to transmit the executable in **raw binary mode** (no transfer protocol, no additional header stuff). When using TeraTerm, select the "binary" option in the send file dialog:



Figure 11: Transfer executable in binary mode (German version of TeraTerm)

9. If everything went fine, **ok** will appear in your terminal:

```
CMD:> u
Awaiting BINEXE... OK
```

10. The program image now resides in the internal IMEM of your NEO430. To execute the program right now, start the application by pressing s. The blink\_led program starts, prints "Blinking LED demo program" and will begin displaying an incrementing counter on the 8 LEDs connected to the GPIO output port.

```
CMD:> s
Booting...
Blinking LED demo program
```

11. Congratulations! Now you are prepared to start your own project!;)

## 4.6. Programming an External SPI Boot EEPROM

If you want the NEO430 bootloader to automatically fetch and execute an application image at start-up ( $\rightarrow$  auto boot configuration), you can store it to an external SPI EEPROM. The advantage of the external EEPROM is to have a non-volatile program storage, which can be re-programed at any time just by executing some bootloader commands. Thus, no FPGA bitstream recompilation is required at all.

You need an EEPROM, that is compatible to a Microchip ® SPI EEPROM like the **25LC512**, with 16-bit addresses and a 32-bit wide SPI transfer frame. The EEPROM must be at least as big as the internal IMEM.

This tutorial explains how to program the external SPI EEPROM assuming it is already connected properly to the NEO430 core top entity SPI port. Make sure to use the SPI chip select #0 signal (spi\_cs\_o(0)) as the chip select for the EEPROM.

- 1. At first, reset the NEO430 processor and wait until the bootloader start screen appears in your terminal program.
- 2. Abort the auto boot sequence and start the user console by pressing any key.
- 3. Press **u** to upload the program image, that you want to store to the external EEPROM. Send the binary in raw binary via your terminal program.

```
CMD:> u
Awaiting image...
```

4. When the uploaded is completed and **OK** appears, press **p** to begin programming of the EEPROM. You need to do this <u>now</u> – do not execute your program to prevent changes in the image!

```
CMD:> u
Awaiting image... OK
CMD:> p
Proceed (y/n)?
```

5. Now you have to confirm the writing sequence, because all previous data in the EEPROM will be lost. Press **y** if you want to proceed or **n** if you wish to abort the process. If you affirm, the actual writing process starts. This might take some time...

```
CMD:> u
Awaiting image... OK
CMD:> p
Proceed (y/n)?
Writing... OK
```

6. If **OK** appears in the terminal line, the writing process was successful. Now you can use the auto boot sequence to automatically boot your application from the EEPROM at system start-up without any user interaction.

## 4.7. Setup of a New Application Program Project

Done with all the introduction tutorials and those example programs? Then it is time to start your own application project!

- 1. The easiest way of creating a new project is to copy an existing one (like the blink\_led project) and use that copy as starting point. Make sure to copy the folder inside the sw/example folder to keep all the file decencies in a correct manner (e.g. into the sw/example/my\_project folder).
- 2. Now you can start modifying the main.c file according to your new project.
- 3. When your new project folder is located somewhere else (than in the **sw/example** folder), you need to adapt the compilation scripts in your project folder:
  - → If you are using the Windows-based batch file (make.bat), open it with a text editor and change the COMMON\_PATH variable. The path given by this variable defines the relative path from the current project folder to the NEO430 sw/common folder.
  - → In case you are using Cygwin/Linux for compilation, the modifications have to be done in the makefile. Change the COMMON\_PATH variable so it defines the relative part from the new project's folder to the sw\common folder folder.
  - → Finally, you also need to adapt the path of the included neo430.h library file in your program code.
- 4. If your project contains additional \*.c files beside the main.c file, you have to include them into your main.c file using the C pre-processor:

```
#include "../../lib/neo430/neo430.h"

#include "some_file.c" // one of your project source files
#include "another_file.c" // another one of your project source files
```

5. That is all you need to do. Now you can compile your new project for the NEO430. If the simulator issues errors regarding the source file dependencies, you should adapt the order of the included C files. Also make sure to use "include guard" in all your source files.

#### 4.8. Simulating the Processor

Before you do an actual FPGA implementation or if you want to see what's going on, you can do a simulation of the processor core. For this purpose, a simple testbench was implemented (neo430\_tb.vhd, located in the project's sim folder). This testbench instantiates the top entity of the processor system (neo430\_top.vhd) and also includes a serial UART receiver unit, which outputs the transmitted UART data to the simulator console. Additionally, the output is printed to a text file (uart\_rx\_dump.txt), which is generated in the simulator project home folder.

By default, the testbench does not simulate the system setup using the bootloader. Instead, your actual application code (in IMEM) will be simulated:

```
...
BOOTLD_USE => false, -- implement and use bootloader? (default=true)
...
```

#### Xilinx ISIM

In case you are using Xilinx ISIM simulator (or the Vivado simulator), a pre-defined waveform configuration including all relevant processor signals can be found in the sim/ISIM folder (neo430\_tb.wcfg). Note, that you have to create a new project before, that needs to include all required rtl VHDL files. The generated uart\_rx\_dump.txt file (processor's UART output log file) is a little bit hard to find, but should be located in: <Xilinx\_project\_home\_folder>\\reproject\_name>.sim\sim\_1\behav.

#### **ModelSim**

When you are using ModelSim, you can start a new simulation project by executing a script from the sim/modelsim folder. Navigate to the folder using the ModelSim simulator console and execute the following command:

```
do simulate.do
```

This will also open a pre-configured waveform to analyze the most important signals of the processor. The UART's output log file (uart rx dump.txt) will also be generated in the sim/modelsim folder.

# 4.9. Changing the Compiler's Optimization Goal

When compiling an application, the code is optimized using a given effort. By default, the optimization goal is to optimize and also to reduced code size. If you want to get more performance (with an increased code size) you can change the compilation effort / optimization goal (-03).

1. If you are using **Windows** as build environment, open the main Windows compilation batch file (sw\common\make.bat) and configure the **EFFORT** variable (in the "USER CONFIGURATION" section) for the required optimization goal:

```
@REM Compiler effort (-Os = optimize for size)
@set EFFORT=-Os
```

2. If you are using **Linux** or **Cygwin** as build environment, open the main Linux makefile (sw/common/makefile) and configure the **EFFORT** variable (in the "USER CONFIGURATION" section) for the required optimization goal:

```
# Compiler effort (-Os = optimize for size)

EFFORT = -Os
```

3. Perform a new compilation process of the software project to apply your changes to the generated executable.

# 4.10. Re-Building the Internal Bootloader



Rebuilding the bootloader is not necessary, since it is designed to work independently of the actual hardware configuration and system setup.

If you want to modify or customize the internal bootloader, you need to re-build it. Follow the upcoming steps to re-compile and re-install the modified bootloader to the boot ROM:

- After you have modified the bootloader's main source file according to your wishes, open a console and navigate to the bootloader source folder: sw\bootloader
- Windows: Open the make .bat file and edit the path to your compiler binaries folder:

```
@REM Path of compiler binaries:
@if "%MSP430GCC_BIN_PATH%" == "" set MSP430GCC_BIN_PATH=C:\msp430-gcc-6.4.0.32_win32\bin
```

• <u>Windows:</u> Now execute a "make". This will compile the bootloader's sources and will also generate the bootloader VHDL memory initialization file neo430\_bootloader\_image.vhd file in the project's rtl/core folder.

```
...\sw\bootloader\default>make
```

• Linux/Cygwin: Open the makefile file and edit the path to your compiler binaries folder:

```
# Path of compiler binaries:
MSP430GCC_BIN_PATH ?= /mnt/c/msp430-gcc-6.4.0.32_linux64/bin
```

• <u>Linux/Cygwin:</u> Now execute a "make". This will compile the bootloader's sources and will also generate the bootloader VHDL memory initialization file neo430\_bootloader\_image.vhd file in the project's rtl\core folder.

```
.../sw/bootloader/default>make
```

• Now perform a new synthesis / HDL compilation to update the bitstream with your new bootloader. Done! :)

## 4.11. Building a Non-Volatile Application (Program Fixed in IMEM)

The purpose of the bootloaders is to re-upload your application code at any time via UART. Additionally, you can use an external SPI EEPROM as non-volatile program storage, that still can be updated at every time via the bootloader console. This provides a lot of flexibility, especially during development. But when you have completed your software development and your application code is *fixed*, the bootloader might not be necessary any longer. Thus, you can disable it to save hardware resources and to directly boot your application at start-up from the internal IMEM.

- 1. At first, compile your application code by running the **make** command. This will automatically install the according memory initialization image into the IMEM.
- 2. Now it is time to exclude the bootloader ROM from synthesis. Set the **BOOTLD\_USE** generic in the instantiation of the processor's top entity (**neo430** top) to 'false':

```
BOOTLD_USE => false, -- implement and use bootloader? (default=true)
```

- 3. This will exclude the boot ROM from synthesis and also changes the CPU boot address from the beginning of the boot ROM to the beginning of the IMEM. Thus, the CPU directly executed your application code after reset.
- 4. The IMEM could be still modified by setting the **R** flag in the CPU's status register allowing write accesses. Hence, the IMEM is implemented as RAM. To prevent this and to implement the IMEM as true ROM (and eventually saving some more hardware), deactivate this feature by setting the **IMEM\_AS\_ROM** generic in the instantiation of the processor's top entity to 'false':

```
IMEM_AS_ROM => true -- implement IMEM as read-only memory? (default=false)
```

5. Perform a synthesis and upload your new bitstream. Your application code resides now unchangeable in the processor's IMEM and is directly executed after reset.

## 4.12. Alternative Top Entities / Avalon Bus / AXI4 Lite Connectivity

The NEO430 processor features a Wishbone-compatible 32-bit bus adapter to attach custom IP blocks. Wishbone is only one protocol for on-chip bus systems. Besides Wishbone, Avalon is a quite popular interface standard, especially in terms of Intel/Altera FPGA systems.

If you want to connect the NEO430 to IP cores using a different bus protocol you can either use a custom interface bridge or you can use one of the alternate processor top entities from the rtl\top\_templates folder. These alternative top entities are a replacement of the default neo430\_top.vhd as they provide the same interface ports as the default top entity. The only exception here is the actual on-chip bus protocol. Internally, the alternative top entity implement a bridging logic to convert the processor's native Wishbone interface into an Avalon Master interface.

Additionally, an alternative version of the default neo430\_top.vhd top entity is provided, which only uses resolved interface types (std logic and std ulogic).

| Alternative top entity   | Description                                                           |
|--------------------------|-----------------------------------------------------------------------|
| neo430_test.vhd          | Simple test setup for fast implementation / evaluation of the NEO430. |
| neo430_top_avm.vhd       | Top entity with Avalon Master connectivity.                           |
| neo430_top_axi4lite.vhd  | Top entity with Axi4 Lite Master connectivity.                        |
| neo430_top_std_logic.vhd | Top entity using only std_logic / std_logic_vector as port types.     |
|                          | More to come;)                                                        |

## 4.13. Troubleshooting

- ✓ Have you added all HDL files from the rtl/core folder to your project? Make sure to add all VHDL files to a new library called "neo430".
- ✓ Have you selected the correct top entity (e.g. neo430 test.vhd)?
- ✓ Have you assigned at least the signals for the clock and reset, the status LED and the UART communication lines? Have you terminated all unused input signals (logical low)?
- ✓ Does the reset button has the correct polarity (active low)?
- ✓ Is your main clock source running at all?
- ✓ Have you made a correct configuration of all the configuration generics of the processor top entity? Especially the clock speed configuration is crucial for the test setup.
- ✓ Are the configured memory sizes in the linker script neo430\_linker\_script.x the same as in the VHDL top entity generic configuration?
- ✓ Do you want to directly execute your application from the IMEM or do you want to use the bootloader?
- ✓ Have you installed your compiler correctly and have you configured the path to its binaries in the compilation script(s)? Did you install the correct compiler (TI's msp430-gcc)?
- ✓ If you are communicating with the bootloader via UART, have you configured your terminal with the right settings (e.g., correct Baud rate)?
- ✓ Are you uploading the binary executable in raw-byte mode?
- ✓ Was the application compilation process successful?

# 5. Change Log

| Date       | HW<br>version | Modifications                                                                                                                                                                                                                                                                                     |
|------------|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 22.02.2017 | 0x0108        | Moved configuration from package file to top entity generics; heavily reworked documentary; reworked C library files (names of hardware registers and bits); removed GP registers from the sysconfig module; renamed PIO unit to GPIO unit; changed C library constants (removed the '_C' suffix) |
| 23.02.2017 | 0x0108        | Updated Xilinx Virtex-6 implementation results                                                                                                                                                                                                                                                    |
| 20.04.2017 | 0x0108        | Corrected typos                                                                                                                                                                                                                                                                                   |
| 22.04.2017 | 0x0109        | Added password error reset to WDT; added IRQ acknowledge signals (only relevant for ext. IRQ); reworked interrupt chapter; added info regarding dynamic memory                                                                                                                                    |
| 07.06.2017 | 0x0111        | Re-added GPIO input IRQ mask register; added information regarding write-only configuration registers                                                                                                                                                                                             |
| 20.06.2017 | 0x0113        | Updated control state machine – in general one cycle less required for instruction execution. Updated implementation results (utilization by entity only).                                                                                                                                        |
| 19.07.2017 | 0x0120        | Updated Wishbone controller (no more timeout counter) and drivers (to support correct byte addressing – thanks to Edward!); added custom functions unit and example application; updated documentary; modified bootloader; minor code fixes                                                       |
| 19.07.2017 | 0x0121        | Debugged and optimized Wishbone adapter; added CFU templates                                                                                                                                                                                                                                      |
| 20.07.2017 | 0x0122        | Changed SPI chip select bits in the USART control register to be high-active (set bit to one to set corresponding CS line to low)                                                                                                                                                                 |
| 15.08.2017 | 0x0123        | Deactivating the global interrupt enable flag keeps the queuing of further interrupt requests; added Avalon master and resolved port type top entities                                                                                                                                            |
| 18.08.2017 | 0x0124        | Added Q flag to the CPU status register to manually clear the IRQ queue                                                                                                                                                                                                                           |
| 05.10.2017 | 0x0125        | Fixed error in SPI module for CPHA=1 mode                                                                                                                                                                                                                                                         |
| 06.10.2017 | 0x0125        | Updated application compilation scripts to newest compiler version                                                                                                                                                                                                                                |
| 08.11.2017 | 0x0125        | Added theoretical average CPI (cycles per instruction); added info regarding environmental variables for configuring the path to the MSP430-GCC binaries; added info about min/max interrupt latency                                                                                              |
| 22.11.2017 | 0x0126        | Wishbone adapter provides classic mode transfer for asynchronous ACK, and pipelined mode transfers for registered slaves                                                                                                                                                                          |
| 02.12.2017 | 0x0140        | Replaced CFU with multiplier/divider unit. Changed documentary according to new processor design                                                                                                                                                                                                  |
| 08.12.2017 | 0x0141        | 25% faster execution of branches; added Xilinx implement. Results                                                                                                                                                                                                                                 |
| 23.12.2017 | 0x0142        | Relocated Io device's control registers; all VHDL source files are part of library "neo430" now; removed USART.UART TX done interrupt (it's pointless); testbench will now executed app code by default                                                                                           |
| 27.12.2017 | 0x0142        | Fixed glitch-issue in UART transmitter; renamed watchdog and GPIO control registers (now they also have the "_CT" suffix)                                                                                                                                                                         |
| 06.01.2018 | 0x0150        | Changed IO address space layout; added CRC module with according software library; added CFU; updated implementation results; added minimal configuration example                                                                                                                                 |
| 10.01.2018 | 0x0154        | Moved IRQ vectors to beginning of DMEM                                                                                                                                                                                                                                                            |

# **NEO430** Processor

by Stephan Nolting

| Date       | HW<br>version | Modifications                                                                                                                      |
|------------|---------------|------------------------------------------------------------------------------------------------------------------------------------|
| 13.01.2018 | 0x0155        | Changed addresses of GPIO module; simplified pin-change interrupt; added processor top entity with experimental AXI-lite interface |
| 14.01.2018 | 0x0160        | Interrupt execution is now 100% MSP430 compatible (status register state on entry)                                                 |
| 16.01.2018 | 0x0161        | Re-added the UART's "TX done" interrupt by request ;)                                                                              |
| 26.01.2018 | 0x0170        | Added PWM controller; modified internal clock generator; fixed errors in address space declarations; clean-up of change log        |
| 28.02.2018 | 0x0172        | Bug-fix in overflow flag; updated implementation results                                                                           |
| 24.04.2018 | 0x0180        | Added true random number generator (TRNG); r-flag is now read-only (and always zero) when implementing IMEM as true ROM            |
| 30.05.2018 | 0x0182        | Added experimental low power mode; optimized IRQ controller logic; optimized SREG logic                                            |
| 01.06.2018 | 0x0183        | Fixed bugs in Wishbone module and driver library; reduced ALU size                                                                 |
| 22.06.2018 | 0x0184        | Fixed SEVERE bug in overflow flag computation! Thx Edward for that ;)                                                              |