

# NMSIS Release 1.0.2-RC1

**Nuclei** 

# **CONTENTS:**

| 1 | Nucle |              | Software Interface Standard(NMSIS)         |
|---|-------|--------------|--------------------------------------------|
|   | 1.1   |              | NMSIS 1                                    |
|   | 1.2   | <b>NMSIS</b> | Components                                 |
|   | 1.3   | <b>NMSIS</b> | Design                                     |
|   | 1.4   | How to       | Access                                     |
|   | 1.5   | Coding       | Rules                                      |
|   | 1.6   | Validati     | ion                                        |
|   | 1.7   | License      | 3                                          |
| 2 | NMS   | IS Core      | 5                                          |
|   | 2.1   | Overvie      | ew                                         |
|   |       | 2.1.1        | Introduction                               |
|   |       | 2.1.2        | Processor Support                          |
|   |       | 2.1.3        | Toolchain Support                          |
|   | 2.2   | Using N      | NMSIS in Embedded Applications             |
|   |       | 2.2.1        | Introduction                               |
|   |       | 2.2.2        | Basic NMSIS Example                        |
|   |       | 2.2.3        | Using Interrupt and Exception/NMI          |
|   |       | 2.2.4        | Using NMSIS with generic Nuclei Processors |
|   |       | 2.2.5        | Create generic Libraries with NMSIS        |
|   | 2.3   | NMSIS        | -Core Device Templates                     |
|   |       | 2.3.1        | Introduction                               |
|   |       | 2.3.2        | NMSIS-Core Processor Files                 |
|   |       | 2.3.3        | Device Examples                            |
|   |       | 2.3.4        | Template Files                             |
|   |       | 2.3.5        | Adapt Template Files to a Device           |
|   |       | 2.3.6        | Device Templates Explaination              |
|   | 2.4   | Registe      | r Mapping                                  |
|   | 2.5   |              | Core API                                   |
|   |       | 2.5.1        | Version Control                            |
|   |       | 2.5.2        | Compiler Control                           |
|   |       | 2.5.3        | Core CSR Register Access                   |
|   |       | 2.5.4        | Core CSR Encoding                          |
|   |       | 2.5.5        | Register Define and Type Definitions       |
|   |       | 2.5.6        | CPU Intrinsic Functions                    |
|   |       | 2.5.7        | Intrinsic Functions for SIMD Instructions  |
|   |       | 2.5.8        | Peripheral Access                          |
|   |       | 2.5.9        | Systick Timer(SysTimer)                    |
|   |       | 2.5.10       | Interrupts and Exceptions                  |
|   |       | 2.5.11       | FPU Functions                              |
|   |       |              |                                            |

|   |     | 2.5.12  | DMD Fronting                          | 222 |
|---|-----|---------|---------------------------------------|-----|
|   |     | 2.5.12  | PMP Functions                         |     |
|   |     | 2.5.13  | Cache Functions                       |     |
|   |     | 2.5.14  | System Device Configuration           | 342 |
|   |     | 2.5.15  | ARM Compatiable Functions             | 346 |
|   |     |         |                                       |     |
| 3 | NMS | SIS DSP |                                       | 349 |
|   | 3.1 | Overvie | ew                                    | 349 |
|   |     | 3.1.1   | Introduction                          |     |
|   |     | 3.1.2   | Using the Library                     |     |
|   |     | 3.1.3   |                                       |     |
|   |     |         | Examples                              |     |
|   |     | 3.1.4   | Toolchain Support                     |     |
|   |     | 3.1.5   | Building the Library                  |     |
|   |     | 3.1.6   | Preprocessor Macros                   | 350 |
|   | 3.2 | Using N | NMSIS-DSP                             | 350 |
|   |     | 3.2.1   | Preparation                           | 350 |
|   |     | 3.2.2   | Tool Setup                            |     |
|   |     | 3.2.3   | Build NMSIS DSP Library               |     |
|   |     | 3.2.4   | How to run                            |     |
|   | 3.3 |         |                                       |     |
|   | 3.3 |         | S DSP API                             |     |
|   |     | 3.3.1   | Basic Math Functions                  |     |
|   |     | 3.3.2   | Fast Math Functions                   |     |
|   |     | 3.3.3   | Complex Math Functions                | 369 |
|   |     | 3.3.4   | Filtering Functions                   | 376 |
|   |     | 3.3.5   | Matrix Functions                      | 444 |
|   |     | 3.3.6   |                                       | 461 |
|   |     | 3.3.7   |                                       | 497 |
|   |     | 3.3.8   |                                       | 507 |
|   |     | 3.3.9   |                                       |     |
|   |     |         | Support Functions                     |     |
|   |     | 3.3.10  | Interpolation Functions               |     |
|   |     | 3.3.11  | Examples                              |     |
|   | 3.4 | _       | elog                                  |     |
|   |     | 3.4.1   | V1.0.2                                | 540 |
|   |     | 3.4.2   | V1.0.1                                | 541 |
|   |     | 3.4.3   | V1.0.0                                | 541 |
|   |     |         |                                       |     |
| 4 | NMS | SIS NN  |                                       | 543 |
|   | 4.1 | Overvie | ew                                    | 543 |
|   |     | 4.1.1   | Introduction                          |     |
|   |     | 4.1.2   | Block Diagram                         |     |
|   |     |         |                                       |     |
|   |     | 4.1.3   | Examples                              |     |
|   |     | 4.1.4   | Pre-processor Macros                  |     |
|   | 4.2 | _       |                                       | 544 |
|   |     | 4.2.1   | Preparation                           | 544 |
|   |     | 4.2.2   | Tool Setup                            | 544 |
|   |     | 4.2.3   | Build NMSIS NN Library                | 545 |
|   |     | 4.2.4   | · · · · · · · · · · · · · · · · · · · | 545 |
|   | 4.3 |         |                                       | 546 |
|   |     | 4.3.1   |                                       | 546 |
|   |     | 4.3.1   |                                       | 587 |
|   |     |         |                                       |     |
|   |     | 4.3.3   | 1                                     | 589 |
|   |     | 4.3.4   | 1                                     | 596 |
|   |     | 4.3.5   | 1                                     | 597 |
|   | 4.4 | Change  | $\epsilon$                            | 598 |
|   |     | 4.4.1   | V1.0.2                                | 598 |

|    | 4.4.2 V1.0.1       | . 598 |
|----|--------------------|-------|
|    | 4.4.3 V1.0.0       | . 598 |
| 5  | Changelog          | 599   |
|    | 5.1 V1.0.2-RC1     | . 599 |
|    | 5.2 V1.0.1         | . 599 |
|    | 5.3 V1.0.1-RC1     | . 600 |
|    | 5.4 V1.0.0-beta1   |       |
|    | 5.5 V1.0.0-beta    |       |
|    | 5.6 V1.0.0-alpha.1 |       |
|    | 5.7 V1.0.0-alpha   | . 602 |
| 6  | Glossary           | 603   |
| 7  | Appendix           | 605   |
| 8  | Indices and tables | 607   |
| In | ndex               | 609   |

**CHAPTER** 

ONE

# **NUCLEI MCU SOFTWARE INTERFACE STANDARD(NMSIS)**

# 1.1 About NMSIS

The **NMSIS** is a vendor-independent hardware abstraction layer for micro-controllers that are based on Nuclei Processors<sup>1</sup>.

The **NMSIS** defines generic tool interfaces and enables consistent device support. It provides simple software interfaces to the processor and the peripherals, simplifying software re-use, reducing the learning curve for micro-controller developers, and reducing the time to market for new devices.

# 1.2 NMSIS Components

NMSIS CORE All Nuclei N/NX Class Processors Standardized API for the Nuclei processor core and peripherals.

NMSIS DSP All Nuclei N/NX Class Processors DSP library collection with a lot of functions for various data types: fixed-point (fractional q7, q15, q31) and single precision floating-point (32-bit). Implementations optimized for the Nuclei Processors which has RISC-V SIMD instruction set.

**NMSIS NN** All Nuclei N/NX Class Processors Collection of efficient neural network kernels developed to maximize the performance and minimize the memory footprint Nuclei processor cores.

# 1.3 NMSIS Design

**NMSIS** is designed to help the Nuclei N/NX Class Processors processors in standardization. It enables consistent software layers and device support across a wide range of development tools and micro-controllers.

**NMSIS** is a lightweight software interface layer that tried to standardize common Nuclei processor-based SOC, and it didn't define any standard peripherals. The silicon industry can therefore support the wide variations of Nuclei processor-based devices with this common standard.

We can achieve the following benefits of NMSIS:

- NMSIS reduces the learning curve, development costs, and time-to-market. Developers can write software quicker through a variety of easy-to-use, standardized software interfaces.
- Consistent software interfaces improve the software portability and re-usability. Generic software libraries and interfaces provide consistent software framework.
- It provides interfaces for debug connectivity, debug peripheral views, software delivery, and device support to reduce time-to-market for new micro-controller deployment.

https://doc.nucleisys.com/nuclei\_spec



Fig. 1: NMSIS Design Diagram

- Being a compiler independent layer, it allows to use the compiler of your choice. Thus, it is supported by mainstream compilers.
- It enhances program debugging with peripheral information for debuggers.

## 1.4 How to Access

If you want to access the code of **NMSIS**, you can visit our opensource NMSIS Github Repository<sup>2</sup>.

# 1.5 Coding Rules

The NMSIS uses the following essential coding rules and conventions:

- Compliant with ANSI C (C99) and C++ (C++03).
- Uses ANSI C standard data types defined in **stdint.h**.
- Variables and parameters have a complete data type.
- Expressions for #define constants are enclosed in parenthesis.

In addition, the **NMSIS** recommends the following conventions for identifiers:

- CAPITAL names to identify Core Registers, Peripheral Registers, and CPU Instructions.
- CamelCase names to identify function names and interrupt functions.
- Namespace\_ prefixes avoid clashes with user identifiers and provide functional groups (i.e. for peripherals, RTOS, or DSP Library).

The **NMSIS** is documented within the source files with:

<sup>&</sup>lt;sup>2</sup> https://github.com/Nuclei-Software/NMSIS

- Comments that use the C or C++ style.
- Doxygen compliant comments, which provide:
  - brief function, variable, macro overview.
  - detailed description of the function, variable, macro.
  - detailed parameter explanation.
  - detailed information about return values.

# 1.6 Validation

Nuclei uses RISC-V GCC Compiler in the various tests of **NMSIS**, and if more compiler is added, it could be easily supported by following the **NMSIS** compiler independent layer. For each component, the section **Validation** describes the scope of the various verifications.

NMSIS components are compatible with a range of C and C++ language standards.

As **NMSIS** defines API interfaces and functions that scale to a wide range of processors and devices, the scope of the run-time test coverage is limited. However, several components are validated using dedicated test suites.

# 1.7 License

This NMSIS is modified based on open-source project CMSIS to match Nuclei requirements.

This **NMSIS** is provided free of charge by Nuclei under the Apache 2.0 License<sup>3</sup>.

1.6. Validation 3

<sup>&</sup>lt;sup>3</sup> http://www.apache.org/licenses/LICENSE-2.0

**CHAPTER** 

**TWO** 

# **NMSIS CORE**

# 2.1 Overview

## 2.1.1 Introduction

**NMSIS-Core** implements the basic run-time system for a Nuclei N/NX Class Processors based device and gives the user access to the processor core and the device peripherals. In detail it defines:

- Hardware Abstraction Layer (HAL) for Nuclei processor registers with standardized definitions for the CSR Registers, TIMER, ECLIC, PMP Registers, DSP Registers, FPU registers, and Core Access Functions.
- Standard core exception/interrupt names to interface to system exceptions or interrupts without having compatibility issues.
- **Methods to organize header files** that makes it easy to learn new Nuclei micro-controller products and improve software portability. This includes naming conventions for device-specific interrupts.
- **Methods for system initialization** to be used by each Device vendor. For example, the standardized SystemInit() (page 343) function is essential for configuring the clock system of the device.
- Intrinsic functions used to generate CPU instructions that are not supported by standard C functions.
- A variable SystemCoreClock (page 343) to determine the system clock frequency which simplifies the setup the timer.

The following sections provide details about the **NMSIS-Core**:

- Using NMSIS in Embedded Applications (page 6) describes the project setup and shows a simple program example
- NMSIS-Core Device Templates (page 12) describes the files of the NMSIS Core (page 5) in detail and explains how to adapt template files provided by Nuclei to silicon vendor devices.
- NMSIS Core API (page 60) describe the features and functions of the Device Header File <device.h> (page 48) in detail.
- Register Define and Type Definitions (page 80) describe the data structures of the Device Header File <device.h> (page 48) in detail.

# 2.1.2 Processor Support

NMSIS have provided support for all the Nuclei N/NX Class Processors.

#### **Nuclei ISA Spec:**

• Nuclei Process Core Instruction Set Architecture Spec<sup>4</sup>

#### **Nuclei N Class Processor Reference Manuals:**

- N200 series<sup>5</sup>
- N300 series<sup>6</sup>
- N600 series<sup>7</sup>

#### **Nuclei NX Class Processor Reference Manuals:**

NX600 series<sup>8</sup>

# 2.1.3 Toolchain Support

The NMSIS-Core Device Templates (page 12) provided by Nuclei have been tested and verified using these toolchains:

GNU Toolchain for RISC-V modified by Nuclei

# 2.2 Using NMSIS in Embedded Applications

## 2.2.1 Introduction

To use the **NMSIS-Core**, the following files are added to the embedded application:

- Startup File startup\_<device>.S (page 14), which provided asm startup code and vector table.
- Interrupt and Exception Handling File: intexc\_<device>.S (page 21), which provided general exception handling code for non-vector interrupts and exceptions.
- Device Linker Script: gcc\_<device>.ld (page 32), which provided linker script for the device.
- System Configuration Files system\_<device>.c and system\_<device>.h (page 37), which provided general device configuration (i.e. for clock and BUS setup).
- Device Header File <device.h> (page 48) gives access to processor core and all peripherals.

**Note:** The files Startup File startup\_<device>.S (page 14), Interrupt and Exception Handling File: intexc\_<device>.S (page 21), Device Linker Script: gcc\_<device>.ld (page 32) and System Configuration Files system\_<device>.c and system\_<device>.h (page 37) may require application specific adaptations and therefore should be copied into the application project folder prior configuration.

The *Device Header File <device.h>* (page 48) is included in all source files that need device access and can be stored on a central include folder that is generic for all projects.

<sup>4</sup> https://doc.nucleisys.com/nuclei\_spec

<sup>&</sup>lt;sup>5</sup> https://www.nucleisys.com/product.php?site=n200

<sup>&</sup>lt;sup>6</sup> https://www.nucleisys.com/product.php?site=n300

<sup>&</sup>lt;sup>7</sup> https://www.nucleisys.com/product.php?site=n600

<sup>8</sup> https://www.nucleisys.com/product.php?site=nx600

The Startup File startup\_<device>.S (page 14) is executed right after device reset, it will do necessary stack pointer initialization, exception and interrupt entry configuration, then call SystemInit() (page 343), after system initialization, will return to assemble startup code and do c/c++ runtime initialization which includes data, bss section initialization, c++ runtime initialization, then it will call main() function in the application code.

In the *Interrupt and Exception Handling File: intexc\_<device>.S* (page 21), it will contain all exception and interrupt vectors and implements a default function for every interrupt. It may also contain stack and heap configurations for the user application.

The System Configuration Files system\_<device>.c and system\_<device>.h (page 37) performs the setup for the processor clock. The variable SystemCoreClock (page 343) indicates the CPU clock speed. Systick Timer(SysTimer) (page 295) describes the minimum feature set. In addition the file may contain functions for the memory BUS setup and clock re-configuration.

The *Device Header File <device.h>* (page 48) is the central include file that the application programmer is using in the C source code. It provides the following features:

- *Peripheral Access* (page 294) provides a standardized register layout for all peripherals. Optionally functions for device-specific peripherals may be available.
- *Interrupts and Exceptions* (page 301) can be accessed with standardized symbols and functions for the **ECLIC** are provided.
- CPU Intrinsic Functions (page 90) allow to access special instructions, for example for activating sleep mode or the NOP instruction.
- Intrinsic Functions for SIMD Instructions (page 96) provide access to the DSP-oriented instructions.
- Systick Timer(SysTimer) (page 295) function to configure and start a periodic timer interrupt.
- Core CSR Register Access (page 63) function to access the core csr registers.
- Cache Functions (page 325) to access the I-CACHE and D-CACHE unit
- FPU Functions (page 319) to access the Floating point unit.
- PMP Functions (page 323) to access the Physical Memory Protection unit
- Version Control (page 60) which defines NMSIS release specific macros.
- Compiler Control (page 61) is compiler agnostic #define symbols for generic C/C++ source code

The NMSIS-Core system files are device specific.

In addition, the *Startup File startup\_<device>.S* (page 14) is also compiler vendor specific, currently only GCC version is provided. The versions provided by NMSIS are only generic templates. The adopted versions for a concrete device are typically provided by the device vendor through the according device familiy package.

For example, the following files are provided by the GD32VF103 device family pack:



Fig. 1: NMSIS-Core User Files

Table 1: Files provided by GD32VF103 device family pack

| File                                          | Description                                                    |
|-----------------------------------------------|----------------------------------------------------------------|
| ./Device/Source/GCC/startup_gd32vf103.S       |                                                                |
|                                               | Startup File startup_ <device>.S</device>                      |
|                                               | for the GD32VF103 device variants.                             |
|                                               |                                                                |
| ./Device/Source/GCC/intexc_gd32vf103.S        |                                                                |
|                                               | Exception and Interrupt Handling File                          |
|                                               | intexc_ <device>.S for the GD32VF103 device variants.</device> |
|                                               |                                                                |
| ./Device/Source/GCC/gcc_gd32vf103.ld          |                                                                |
|                                               | Linker script File gcc_ <device>.ld</device>                   |
|                                               | for the GD32VF103 device variants.                             |
|                                               |                                                                |
| ./Device/Source/system_gd32vf103.c            |                                                                |
|                                               | System Configuration File system_ <device>.c</device>          |
|                                               | for the GD32VF103 device families                              |
| TD 1 T 1 1 / 1 1 22 C102 I                    |                                                                |
| ./Device/Include/system_gd32vf103.h           |                                                                |
|                                               | System Configuration File system_ <device>.h</device>          |
|                                               | for the GD32VF103 device families                              |
| /Davis - // / - / - / - / - / - / - / - / - / |                                                                |
| ./Device/Include/gd32vf103.h                  |                                                                |
|                                               | Device Header File <device.h></device.h>                       |
|                                               | for the GD32VF103 device families.                             |
|                                               |                                                                |

**Note:** The silicon vendors create these device-specific NMSIS-Core files based on *NMSIS-Core Device Templates* (page 12) provided by Nuclei.

Thereafter, the functions described under NMSIS Core API (page 60) can be used in the application.

# 2.2.2 Basic NMSIS Example

A typical example for using the NMSIS layer is provided below. The example is based on a GD32VF103 Device.

Listing 1: gd32vf103\_example.c

```
msTicks++;
                                                   // Increment Counter
10
11
   void WaitForTick (void) {
12
     uint32_t curTicks;
13
14
     curTicks = msTicks;
                                                  // Save Current SysTick Value
15
                                                  // Wait for next SysTick Interrupt
     while (msTicks == curTicks) {
16
      ___WFI ();
                                                  // Power-Down until next Event/
17
   \hookrightarrow Interrupt
   }
21
   void TIMERO_UP_IRQHandler (void) {
                                                 // Timer Interrupt Handler
                                                  // Add user code here
22
   }
23
24
    void timer0_init(int frequency) {
25
26
     ECLIC_EnableIRQ (TIMERO_UP_IRQn);
                                                  // Enable Timer Interrupt
27
28
29
30
  void Device_Initialization (void) {
                                                 // Configure & Initialize MCU
31
   if (SysTick_Config (CONFIG_TICKS)) {
        ; // Handle Error
34
   timer0_init ();
                                                  // setup device-specific timer
35
36
37
   // The processor clock is initialized by NMSIS startup + system file
38
   void main (void) {
                                                  // user application starts here
39
     Device_Initialization ();
                                                  // Configure & Initialize MCU
40
     while (1) {
                                                  // Endless Loop (the Super-Loop)
41
                                                  // Disable all interrupts
       __disable_irq ();
42.
                                                  // Read Values
      Get_InputValues ();
43
                                                 // Enable all interrupts
44
       __enable_irq ();
      Calculation_Response ();
                                                 // Calculate Results
45
      Output_Response ();
                                                 // Output Results
47
      WaitForTick ();
                                                  // Synchronize to SysTick Timer
48
     }
```

# 2.2.3 Using Interrupt and Exception/NMI

Nuclei processors provide NMI(Non-Maskable Interrupt), Exception, Vector Interrupt and Non-Vector Interrupt features.

# 2.2.4 Using NMSIS with generic Nuclei Processors

Nuclei provides NMSIS-Core files for the supported Nuclei Processors and for various compiler vendors. These files can be used when standard Nuclei processors should be used in a project. The table below lists the folder and device names of the Nuclei processors.

| Folder                    | Processor                  | RISC-<br>V | Description                                                                                                                                                                                                           |
|---------------------------|----------------------------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ./Device/Nuclei/NUCLEI_N  | • N200<br>• N300<br>• N600 | RV32       | Contains Include and Source template files configured for the Nuclei N200/N300/N600 processor. The device name is NUCLEI_N and the name of the Device Header File <device.h> is <nuclei_n.h>.</nuclei_n.h></device.h> |
| ./Device/Nuclei/NUCLEI_NX | NX600                      | RV64       | Contains Include and Source template files configured for the Nuclei NX600 processor. The device name is NUCLEI_NX and the name of the Device Header File <device.h> is <nuclei_nx.h>.</nuclei_nx.h></device.h>       |

Table 2: Folder and device names of the Nuclei processors

# 2.2.5 Create generic Libraries with NMSIS

The NMSIS Processor and Core Peripheral files allow also to create generic libraries. The NMSIS-DSP Libraries are an example for such a generic library.

To build a generic library set the define \_\_NMSIS\_GENERIC and include the *nmsis\_core.h* NMSIS CPU & Core Access header file for the processor.

The define \_\_NMSIS\_GENERIC disables device-dependent features such as the SysTick timer and the Interrupt System.

#### **Example**

The following code section shows the usage of the *nmsis\_core.h* header files to build a generic library for N200, N300, N600, NX600.

One of these defines needs to be provided on the compiler command line.

By using this header file, the source code can access the functions for Core CSR Register Access (page 63), CPU Intrinsic Functions (page 90) and Intrinsic Functions for SIMD Instructions (page 96).

Listing 2: core generic.h

```
#define __NMSIS_GENERIC // Disable Eclic and Systick functions
#include <nmsis_core.h>
```

# 2.3 NMSIS-Core Device Templates

## 2.3.1 Introduction

Nuclei supplies NMSIS-Core device template files for the all supported Nuclei N/NX Class Processors and various compiler vendors. Refer to the list of *supported toolchain* (page 6) for compliance.

## These NMSIS-Core device template files include the following:

- Register names of the Core Peripherals and names of the Core Exception/Interrupt Vectors.
- Functions to access core peripherals, special CPU instructions and SIMD instructions
- Generic startup code and system configuration code.

The detailed file structure of the NMSIS-Core device templates is shown in the following picture.



Fig. 2: NMSIS-Core Device Templates

#### 2.3.2 NMSIS-Core Processor Files

The NMSIS-Core processor files provided by Nuclei are in the directory NMSIS/Core/Include.

These header files define all processor specific attributes do not need any modifications.

The *nmsis\_core.h* defines the core peripherals and provides helper functions that access the core registers.

# 2.3.3 Device Examples

The NMSIS Software Pack defines several devices that are based on the Nuclei N/NX processors.

The device related NMSIS-Core files are in the directory *Device/Nuclei* and include NMSIS-Core processor file explained before.

The following sample devices are defined as below:

Table 3: Device Examples of Nuclei Processor

| Family    | Device          | Description                  |
|-----------|-----------------|------------------------------|
| Nuclei N  | NUCLEI N Class  | Nuclei N Class based device  |
| Nuclei NX | NUCLEI NX Class | Nuclei NX Class based device |

# 2.3.4 Template Files

To simplify the creation of NMSIS-Core device files, the following template files are provided that should be extended by the silicon vendor to reflect the actual device and device peripherals.

Silicon vendors add to these template files the following information:

- Device Peripheral Access Layer that provides definitions for device-specific peripherals.
- Access Functions for Peripherals (optional) that provides additional helper functions to access devicespecific peripherals.
- Interrupt vectors in the startup file that are device specific.

Table 4: NMSIS-Core Device Template Files

| Template File                     | Description                                                                |
|-----------------------------------|----------------------------------------------------------------------------|
| (Under ./De-                      |                                                                            |
| vice/_Template_Vendor/Vendor/)    |                                                                            |
| Device/Source/GCC/startup_Device. | Startup file template for GNU GCC RISC-V Embedded Compiler.                |
| Device/Source/GCC/gcc_Device.ld   | Link Script file template for GNU GCC RISC-V Embedded Compiler.            |
| Device/Source/GCC/intexc_Device.S | Exception and Interrupt handling file template                             |
|                                   | for GNU GCC RISC-V Embedded Compiler.                                      |
| Device/Source/system_Device.c     | Generic system_Device.c file for system configuration                      |
|                                   | (i.e. processor clock and memory bus system).                              |
| Device/Include/Device.h           | Generic device header file.                                                |
|                                   | Needs to be extended with the device-specific peripheral registers.        |
|                                   | Optionally functions that access the peripherals can be part of that file. |
| Device/Include/system_Device.h    | Generic system device configuration include file.                          |

Note: The template files for silicon vendors are placed under ./Device/\_Template\_Vendor/Vendor/.

Please goto that folder to find the file list in the above table.

# 2.3.5 Adapt Template Files to a Device

The following steps describe how to adopt the template files to a specific device or device family.

## Copy the complete all files in the template directory and replace:

- directory name Vendor with the abbreviation for the device vendor e.g.: **GD**.
- directory name Device with the specific device name e.g.: GD32VF103.
- in the file names Device with the specific device name e.g.: GD32VF103.

Each template file contains comments that start with **TODO**: that describe a required modification.

The template files contain place holders:

Table 5: Placeholders of Template files

| Placeholder                               | Replaced with                                                   |
|-------------------------------------------|-----------------------------------------------------------------|
| <device></device>                         | the specific device name or device family name; i.e. GD32VF103. |
| <deviceinterrupt></deviceinterrupt>       | a specific interrupt name of the device; i.e. TIM1 for Timer 1. |
| <deviceabbreviation></deviceabbreviation> | short name or abbreviation of the device family; i.e. GD32VF.   |
| Nuclei-N#                                 | the specific Nuclei Class name; i.e. Nuclei N or Nuclei NX.     |

# 2.3.6 Device Templates Explaination

The device configuration of the template files is described in detail on the following pages:

## Startup File startup\_<device>.S

#### The Startup File startup <device>.S contains:

- The reset handler which is executed after CPU reset and typically calls the <code>SystemInit()</code> (page 343) function.
- The setup values for the stack pointer SP.
- Exception vectors of the Nuclei Processor with weak functions that implement default routines.
- Interrupt vectors that are device specific with weak functions that implement default routines.

The processer level start flow is implemented in the *startup\_<device>*.S. Detail description as below picture:

#### Stage1: Interrupt and Exception initialization

- Disable Interrupt
- Initialize GP, stack
- Initialize NMI entry and set default NMI handler
- Initialize Exception entry and set default exception handler
- Initialize vector table entry and set default interrupt handler
- Initialize Interrupt mode as ECLIC mode. (ECLIC mode is proposed. Default mode is CLINT mode)

#### Stage2: Hardware initialization

- Enable FPU if necessary
- Call user defined SystemInit() (page 343) for system clock initialization.

#### **Stage3: Section initialization**

- Copy section, e.g. data section, text section if necessary.
- Clear Block Started by Symbol (BSS) section
- Call \_\_libc\_fini\_array and \_\_libc\_init\_array functions to do C library initialization
- Call premain init function to do initialization steps before main function
- Jump Main

The file exists for each supported toolchain and is the only toolchain specific NMSIS file.

To adapt the file to a new device only the interrupt vector table needs to be extended with the device-specific interrupt handlers.

The naming convention for the interrupt handler names are eclic\_<interrupt\_name>\_handler.

This table needs to be consistent with IRQn\_Type (page 306) that defines all the IRQ numbers for each interrupt.

The following example shows the extension of the interrupt vector table for the GD32VF103 device family.

```
.section .vtable
2
       .weak eclic_msip_handler
       .weak eclic_mtip_handler
       .weak eclic_pmaf_handler
       /* Adjusted for GD32VF103 interrupt handlers */
       .weak eclic_wwdgt_handler
       .weak eclic_lvd_handler
8
       .weak eclic_tamper_handler
Q
           :
10
11
12
       .weak eclic_can1_ewmc_handler
       .weak eclic_usbfs_handler
13
14
       .globl vector_base
15
       .type vector_base, @object
16
   vector_base:
17
       /* Run in FlashXIP download mode */
18
                                                                 /* 0: Reserved, Jump to _
19
       j _start
   →start when reset for vector table not remapped cases.*/
       .align LOG_REGBYTES
                                                                       Need to align 4.
20
   →byte for RV32, 8 Byte for RV64 */
       DECLARE_INT_HANDLER default_intexc_handler
                                                                /* 1: Reserved */
21
       DECLARE_INT_HANDLER default_intexc_handler
22
                                                                /* 2: Reserved */
       DECLARE_INT_HANDLER eclic_msip_handler
23
                                                                 /* 3: Machine software...
   →interrupt */
24
                        :
                                   :
                       :
25
                                   :
       /* Adjusted for Vendor Defined External Interrupts */
26
       DECLARE_INT_HANDLER eclic_wwdgt_handler
                                                                 /* 19: Window watchDog_
27
   →timer interrupt */
28
       DECLARE_INT_HANDLER
                               eclic_lvd_handler
                                                                 /* 20: LVD through EXTI
29
   →line detect interrupt */
       DECLARE_INT_HANDLER
                                eclic_tamper_handler
                                                                /* 21: tamper through.
30
   →EXTI line detect */
                                   :
31
                        :
```

```
DECLARE_INT_HANDLER eclic_can1_ewmc_handler /* 85: CAN1 EWMC_
→interrupt */

DECLARE_INT_HANDLER eclic_usbfs_handler /* 86: USBFS global_
→interrupt */
```

#### startup Device.S Template File

Here provided a riscv-gcc template startup assemble code template file as below. The files for other compilers can slightly differ from this version.

```
* Copyright (c) 2019 Nuclei Limited. All rights reserved.
2
3
    * SPDX-License-Identifier: Apache-2.0
    * Licensed under the Apache License, Version 2.0 (the License); you may
6
    * not use this file except in compliance with the License.
    * You may obtain a copy of the License at
    * www.apache.org/licenses/LICENSE-2.0
10
11
    * Unless required by applicable law or agreed to in writing, software
12
    * distributed under the License is distributed on an AS IS BASIS, WITHOUT
13
    * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
    \star See the License for the specific language governing permissions and
15
    * limitations under the License.
16
   /*****************************
18
   * \file
              startup_<Device>.S
19
    * \brief NMSIS Nuclei N/NX Class Core based Core Device Startup File for
20
               Device <Device>
21
    * \version V1.10
22
               30. July 2021
    * \date
23
24
    *******************************
25
26
   #include "riscv_encoding.h"
27
28
   29
   #if defined(__riscv_xlen) && (__riscv_xlen == 32)
30
       .word \INT_HDL_NAME
31
   #else
32
       .dword \INT_HDL_NAME
33
   #endif
34
   .endm
35
36
37
       * Put the interrupt vectors in this section according to vector remapped or not:
38
        \star .vtable: vector table's LMA and VMA are the same, it is not remapped
39
        * .vtable_ilm: vector table's LMA and VMA are different, it is remapped, and
40
                      VECTOR_TABLE_REMAPPED need to be defined
41
42
   #if defined(VECTOR_TABLE_REMAPPED)
43
       .section .vtable_ilm
44
```

```
.section .vtable
46
   #endif
47
48
       .weak eclic_msip_handler
49
       .weak eclic_mtip_handler
       /* TODO: Adjust vendor interrupt handlers */
51
       .weak eclic_irq19_handler
52
       .weak eclic_irg20_handler
53
       .weak eclic_irq21_handler
54
       .weak eclic_irq22_handler
55
       .weak eclic_irq23_handler
56
       .weak eclic_irq24_handler
       .weak eclic_irq25_handler
       .weak eclic_irg26_handler
59
       .weak eclic irg27 handler
60
       .weak eclic_irg28_handler
61
       .weak eclic_irq29_handler
62
              eclic_irq30_handler
       .weak
63
              eclic_irq31_handler
       .weak
64
       .weak
              eclic_irq32_handler
65
       .weak
              eclic_irq33_handler
66
       .weak eclic_irq34_handler
67
       .weak eclic_irq35_handler
68
       .weak eclic_irq36_handler
69
       .weak eclic_irq37_handler
71
       .weak eclic_irq38_handler
       .weak eclic_irq39_handler
72
       .weak eclic irg40 handler
73
       .weak eclic_irg41_handler
74
       .weak eclic_irq42_handler
75
       .weak
              eclic_irq43_handler
76
77
       .weak
              eclic_irq44_handler
       .weak
              eclic_irq45_handler
78
       .weak eclic_irg46_handler
79
       .weak eclic_irq47_handler
80
       .weak eclic_irq48_handler
81
       .weak eclic_irq49_handler
82
       .weak eclic_irq50_handler
85
       .globl vector_base
       .type vector_base, @object
86
   vector_base:
87
                                                                  /* 0: Reserved, Jump to _
88
       j _start
   →start when reset for vector table not remapped cases.*/
       .align LOG_REGBYTES
                                                                         Need to align 4_
89
   ⇒byte for RV32, 8 Byte for RV64 */
90
       DECLARE INT HANDLER
                                default intexc handler
                                                                  /* 1: Reserved */
91
       DECLARE_INT_HANDLER
                                default_intexc_handler
                                                                  /* 2: Reserved */
92
       DECLARE_INT_HANDLER
                                eclic_msip_handler
                                                                  /* 3: Machine software_
93
   →interrupt */
       DECLARE_INT_HANDLER
                                default_intexc_handler
                                                                  /* 4: Reserved */
95
       DECLARE INT HANDLER
                                default intexc handler
                                                                  /* 5: Reserved */
       DECLARE_INT_HANDLER
                                default intexc handler
                                                                  /* 6: Reserved */
97
       DECLARE_INT_HANDLER
                                eclic_mtip_handler
                                                                  /* 7: Machine timer
    →interrupt */
```

```
DECLARE INT HANDLER
                                 default intexc handler
                                                                   /* 8: Reserved */
100
        DECLARE_INT_HANDLER
                                 default_intexc_handler
                                                                   /* 9: Reserved */
101
                                 default_intexc_handler
                                                                   /* 10: Reserved */
        DECLARE_INT_HANDLER
102
                                                                   /* 11: Reserved */
        DECLARE_INT_HANDLER
                                 default_intexc_handler
103
104
                                 default_intexc_handler
                                                                   /* 12: Reserved */
        DECLARE_INT_HANDLER
105
        DECLARE_INT_HANDLER
                                 default_intexc_handler
                                                                   /* 13: Reserved */
106
                                                                   /* 14: Reserved */
        DECLARE_INT_HANDLER
                                 default_intexc_handler
107
                                 default_intexc_handler
                                                                   /* 15: Reserved */
        DECLARE_INT_HANDLER
108
109
        DECLARE_INT_HANDLER
                                 default_intexc_handler
                                                                   /* 16: Reserved */
110
111
        DECLARE_INT_HANDLER
                                 default_intexc_handler
                                                                   /* 17: Reserved */
        DECLARE_INT_HANDLER
                                 default_intexc_handler
                                                                   /* 18: Reserved */
112
        /* TODO: Adjust Vendor Defined External Interrupts */
113
        DECLARE_INT_HANDLER
                                 eclic_irq19_handler
                                                                   /* 19: Interrupt 19 */
114
115
        DECLARE_INT_HANDLER
                                 eclic_irq20_handler
                                                                   /* 20: Interrupt 20 */
116
        DECLARE_INT_HANDLER
                                 eclic_irq21_handler
                                                                   /* 21: Interrupt 21 */
117
        DECLARE_INT_HANDLER
                                 eclic_irq22_handler
                                                                   /* 22: Interrupt 22 */
118
        DECLARE_INT_HANDLER
                                 eclic_irq23_handler
                                                                   /* 23: Interrupt 23 */
119
120
        DECLARE_INT_HANDLER
                                 eclic_irq24_handler
                                                                   /* 24: Interrupt 24 */
121
                                 eclic_irq25_handler
                                                                   /* 25: Interrupt 25 */
        DECLARE_INT_HANDLER
122
        DECLARE_INT_HANDLER
                                 eclic_irq26_handler
                                                                   /* 26: Interrupt 26 */
123
124
        DECLARE_INT_HANDLER
                                 eclic_irq27_handler
                                                                   /* 27: Interrupt 27 */
125
        DECLARE INT HANDLER
                                 eclic_irq28_handler
                                                                   /* 28: Interrupt 28 */
126
                                 eclic_irq29_handler
                                                                   /* 29: Interrupt 29 */
127
        DECLARE_INT_HANDLER
                                 eclic_irq30_handler
        DECLARE_INT_HANDLER
                                                                   /* 30: Interrupt 30 */
128
                                                                   /* 31: Interrupt 31 */
        DECLARE_INT_HANDLER
                                 eclic_irq31_handler
129
130
        DECLARE_INT_HANDLER
                                 eclic_irq32_handler
                                                                   /* 32: Interrupt 32 */
131
        DECLARE_INT_HANDLER
                                 eclic_irq33_handler
                                                                   /* 33: Interrupt 33 */
132
        DECLARE_INT_HANDLER
                                 eclic_irq34_handler
                                                                   /* 34: Interrupt 34 */
133
134
        DECLARE_INT_HANDLER
                                 eclic_irq35_handler
                                                                   /* 35: Interrupt 35 */
135
                                 eclic_irq36_handler
                                                                   /* 36: Interrupt 36 */
        DECLARE_INT_HANDLER
        DECLARE_INT_HANDLER
                                 eclic_irq37_handler
                                                                   /* 37: Interrupt 37 */
        DECLARE_INT_HANDLER
                                 eclic_irq38_handler
                                                                   /* 38: Interrupt 38 */
138
        DECLARE_INT_HANDLER
                                 eclic_irq39_handler
                                                                   /* 39: Interrupt 39 */
139
140
        DECLARE_INT_HANDLER
                                 eclic_irq40_handler
                                                                   /* 40: Interrupt 40 */
141
                                                                   /* 41: Interrupt 41 */
        DECLARE_INT_HANDLER
                                 eclic_irq41_handler
142
143
        DECLARE_INT_HANDLER
                                 eclic_irq42_handler
                                                                   /* 42: Interrupt 42 */
        DECLARE_INT_HANDLER
                                 eclic_irq43_handler
                                                                   /* 43: Interrupt 43 */
144
145
        DECLARE INT HANDLER
                                 eclic_irq44_handler
                                                                   /* 44: Interrupt 44 */
146
                                                                   /* 45: Interrupt 45 */
        DECLARE_INT_HANDLER
                                 eclic_irq45_handler
147
        DECLARE_INT_HANDLER
                                 eclic_irq46_handler
                                                                   /* 46: Interrupt 46 */
148
149
        DECLARE_INT_HANDLER
                                 eclic_irq47_handler
                                                                   /* 47: Interrupt 47 */
150
        DECLARE_INT_HANDLER
                                 eclic_irq48_handler
                                                                   /* 48: Interrupt 48 */
151
        DECLARE INT HANDLER
                                 eclic irg49 handler
                                                                   /* 49: Interrupt 49 */
152
                                 eclic_irq50_handler
                                                                   /* 50: Interrupt 50 */
153
        DECLARE_INT_HANDLER
        /* Please adjust the above part of interrupt definition code
154
         * according to your device interrupt number and its configuration */
```

```
156
157
    /*** Startup Code Section ***/
158
        .section .init
159
160
        .globl _start
161
        .type _start,@function
162
163
    /**
164
    * Reset Handler called on controller reset
165
    _start:
168
        /* ===== Startup Stage 1 ===== */
        /* Disable Global Interrupt */
169
        csrc CSR_MSTATUS, MSTATUS_MIE
170
171
        /* Initialize GP and Stack Pointer SP */
172
173
        .option push
        .option norelax
174
        la gp, __global_pointer$
175
176
        .option pop
177
        la sp, _sp
178
179
180
181
         * Set the the NMI base mnvec to share
         * with mtvec by setting CSR_MMISC_CTL
182
         * bit 9 NMI_CAUSE_FFF to 1
183
         */
184
        li t0, MMISC_CTL_NMI_CAUSE_FFF
185
186
        csrs CSR_MMISC_CTL, t0
187
188
         * Intialize ECLIC vector interrupt
189
         * base address mtvt to vector_base
190
         */
191
192
        la t0, vector_base
193
        csrw CSR_MTVT, t0
195
         * Set ECLIC non-vector entry to be controlled
196
         * by mtvt2 CSR register.
197
         * Intialize ECLIC non-vector interrupt
198
         * base address mtvt2 to irq_entry.
199
200
         */
        la t0, irq_entry
201
        csrw CSR_MTVT2, t0
202
        csrs CSR MTVT2, 0x1
203
204
205
         * Set Exception Entry MTVEC to exc_entry
         * Due to settings above, Exception and NMI
         * will share common entry.
208
         */
209
        la t0, exc_entry
210
        csrw CSR_MTVEC, t0
211
212
```

```
213
        /\star Set the interrupt processing mode to ECLIC mode \star/
        li t0, 0x3f
214
        csrc CSR_MTVEC, t0
215
        csrs CSR_MTVEC, 0x3
216
217
        /* ===== Startup Stage 2 ===== */
218
219
    #if defined(__riscv_flen) && __riscv_flen > 0
220
        /* Enable FPU */
221
        li t0, MSTATUS_FS
222
        csrs mstatus, t0
223
        csrw fcsr, x0
224
225
    #endif
226
        /* Enable mcycle and minstret counter */
227
        csrci CSR_MCOUNTINHIBIT, 0x5
228
229
        /*
230
         * Call vendor defined SystemInit to
231
         * initialize the micro-controller system.
232
         * TODO: You need to comment this code when run in Flash download mode.
233
         * then you need to put this line of code
234
         \star after data/bss section initialization and before main
235
         */
236
        call SystemInit
237
238
        /* ===== Startup Stage 3 ===== */
239
        /*
240
         * Load code section from FLASH to ILM
241
         * when code LMA is different with VMA
242
243
         */
        la a0, _ilm_lma
244
        la a1, _ilm
245
        /* If the ILM phy-address same as the logic-address, then quit */
246
        beq a0, a1, 2f
247
        la a2, _eilm
248
249
        bgeu a1, a2, 2f
    1:
252
        /* Load code section if necessary */
        lw t0, (a0)
253
        sw t0, (a1)
254
        addi a0, a0, 4
255
        addi a1, a1, 4
256
        bltu a1, a2, 1b
257
258
        /* Load data section */
259
        la a0, _data_lma
260
        la a1, _data
261
        la a2, _edata
262
        bgeu a1, a2, 2f
263
    1:
        lw t0, (a0)
265
        sw t0, (a1)
266
        addi a0, a0, 4
267
        addi a1, a1, 4
268
        bltu a1, a2, 1b
```

```
2:
270
         /* Clear bss section */
271
        la a0, __bss_start
272
         la a1, _end
273
        bgeu a0, a1, 2f
274
    1:
275
         sw zero, (a0)
276
         addi a0, a0, 4
277
        bltu a0, a1, 1b
278
    2:
279
         /\star TODO: Uncomment this code, if you run in Flash download mode \star/
280
         // call SystemInit
281
282
         /* Call global constructors */
283
        la a0, __libc_fini_array
284
        call atexit
285
         /\star Call C/C++ constructor start up code \star/
286
         call __libc_init_array
287
288
         /* do pre-init steps before main */
289
        call _premain_init
290
         /\star ===== Call Main Function ===== \star/
291
         /* argc = argv = 0 */
292
         li a0, 0
293
        li a1, 0
294
295
        call main
         /* do post-main steps after main */
296
        call _postmain_fini
297
298
    1:
299
         j 1b
```

# Interrupt and Exception Handling File: intexc\_<device>.S

## The intexc File intexc\_<device>.S contains:

- Macro to save caller register.
- Macro to restore caller register.
- Default Exception/NMI routine implementation.
- Default Non-Vector Interrupt routine implementation.

Nuclei processors provide NMI(Non-Maskable Interrupt), Exception, Vector Interrupt and Non-Vector Interrupt features.

#### NMI(Non-Maskable Interrupt)

Click NMI<sup>9</sup> to learn about Nuclei Processor Core NMI in Nuclei ISA Spec.

NMI is used for urgent external HW error. It can't be masked and disabled.

When NMI happened, bit 9 of CSR MMSIC\_CTL will be checked. If this bit value is 1, then NMI entry address will be the same as exception(CSR\_MTVEC), and exception code for NMI will be 0xFFF, otherwise NMI entry will be same as reset vector.

In NMSIS-Core, the bit 9 of CSR MMISC\_CTL is set to 1 during core startup, so NMI will be treated as Exception and handled.

## **Exception**

Click Exception<sup>10</sup> to learn about Nuclei Processor Core Exception in Nuclei ISA Spec.

For CPU exception, the entry for exception will be exc\_entry, in this entry code, it will call default exception handler core\_exception\_handler() (page 345).

In the common exception routine(exc\_entry) to get more information like exception code. Exception handle flow show as below picture:



Fig. 3: Exception Handling Flow

NMI and exception could support nesting. Two levels of NMI/Exception state save stacks are supported.

<sup>&</sup>lt;sup>9</sup> https://doc.nucleisys.com/nuclei\_spec/isa/nmi.html

<sup>10</sup> https://doc.nucleisys.com/nuclei\_spec/isa/exception.html

We support three nesting mode as below:

- NMI nesting exception
- · Exception nesting exception
- · Exception nesting NMI

For software, we have provided the common entry for NMI and exception. Silicon vendor only need adapt the interface defined in *Interrupt Exception NMI Handling* (page 344).

Context save and restore have been handled by exc\_entry interface.

When exception exception return it will run the intruction which trigger the exception again. It will cause software dead loop. So in the exception handler for each exception code, we propose to set CSR MEPC to be MEPC+4, then it will start from next instruction of MEPC.

## Interrupt

Click Interrupt<sup>11</sup> to learn about Nuclei Processor Core Interrupt in Nuclei Spec.

Interrupt could be configured as **CLINT** mode or **ECILC** mode.

In NMSIS-Core, Interrupt has been configured as **ECLIC** mode during startup in *startup\_<Devices>.S*, which is also recommended setting using Nuclei Processors.

ECLIC managed interrupt could configured as vector and non-vector mode.

Detail interrupt handling process as below picture:

To get highest priority interrupt we need compare the interrupt level first. If level is the same then compare the priority. High level interrupt could interrupt low level ISR and trigger interrupt nesting. If different priority with same level interrupt pending higher priority will be served first. Interrupt could be configured as vector mode and non-vector mode by vendor. For non-vector mode interrupt handler entry get from MTVT2 and exception/NMI handler entry get from MTVEC. If Vendor need set non vector mode interrupt handler entry from MTVVEC you need set MTVT2.BIT0 as 0.

## Non-Vector Interrupt SW handler

For **non-vector** mode interrupt it will make the necessary CSR registers and context save and restore. Non-vector mode software handle flow show as below pciture:

#### Detail description for non-vector mode interrupt handler as below steps:

- 1. Get non-vector mode handler entry from MTVT2 if MTVT2.BIT0 is 1(proposed configuration).
- 2. Context save to stack for cpu registers.
- 3. Save CSR registers MEPC/MCAUSE/MSUBM to stack.
- 4. Run instruction csrrw ra, CSR\_JALMNXTI, ra. It will enable interrupt, check interrupt pending. If interrupt is pending then get highest priority interrupt and jump to interrupt handler entry in the vector table, otherwise it will go to step 6.
- 5. Execute the interrupt handler routine, when return from isr routine it will jump to step 4.
- 6. Global interrupt disable.
- 7. Restore CSR registers MEPC/MCAUSE/MSUBM.

<sup>11</sup> https://doc.nucleisys.com/nuclei\_spec/isa/interrupt.html



Fig. 4: Interrupt Handling Flow



Fig. 5: Non-vector mode interrupt software handle flow

- 8. Context restore from stack for cpu registers.
- 9. Execute mret to return from handler.

For **non-vector** mode iterrupt it could support **interrupt nesting**.

**Interrupt nesting** handle flow show as below picture:



Fig. 6: Nesting interrupt handling flow

## **Vector Interrupt SW handler**

If vector interrupt handler need support nesting or making function call Vector mode software handling flow show as below picture:



Fig. 7: Vector mode nesting interrupt handling flow

Detail description for nested vector mode interrupt handler as below steps:

- 1. Get vector mode handler from address of vector table entry MTVT added offset.
- 2. Context save to stack for cpu registers, done in each vector interrupt handler via \_\_\_INTERRUPT (page 62)
- 3. Save CSR registers MEPC/MCAUSE/MSUBM to stack, done in each vector interrupt handler by read and save these CSRs into variables.
- 4. Execute the interrupt handling.
- 5. Restore CSR registers MEPC/MCAUSE/MSUBM from stack.
- 6. CSR registers restore from saved variables used in step 3.
- 7. Execute mret to return from handler

Here is sample code for above nested vector interrupt handling process:

```
// Vector interrupt handler for on-board button
    _INTERRUPT void SOC_BUTTON_1_HANDLER(void)
2
3
       // save mepc, mcause, msubm enable interrupts
4
       SAVE_IRQ_CSR_CONTEXT();
5
       printf("%s", "----Begin button1 handler----Vector mode\r\n");
       // Green LED toggle
       gpio_toggle(GPIO, SOC_LED_GREEN_GPIO_MASK);
10
11
       // Clear the GPIO Pending interrupt by writing 1.
12
       gpio_clear_interrupt(GPIO, SOC_BUTTON_1_GPIO_OFS, GPIO_INT_RISE);
13
14
       wait_seconds(1); // Wait for a while
15
16
       printf("%s", "----End button1 handler\r\n");
17
18
       // disable interrupts, restore mepc, mcause, msubm
       RESTORE_IRQ_CSR_CONTEXT();
20
```

#### Detail description for non-nested vector mode interrupt handler as below

To improve the software response latency for vector mode vendor could remove context save/restore and MEPC/MCAUSE/MSUBM save/restore.

If so vector mode interrupt will not support nesting and interrupt handler can only be a leaf function which doesn't make any function calls.

#### Then the vector mode interrupt software flow will be described as below:

- 1. Get vector mode handler from address of vector table entry MTVT added offset.
- 2. Execute the interrupt handler(leaf function).
- 3. Execute mret to return from handler

Here is sample code for above non-nested vector interrupt handler which is a leaf function handling process:

```
static uint32_t btn_pressed = 0;

// Vector interrupt handler for on-board button

// This function is an leaf function, no function call is allowed

INTERRUPT void SOC_BUTTON_1_HANDLER(void)

{
```

```
6
7 btn_pressed ++;
7
```

#### intexc\_Device.S Template File

The file exists for each supported toolchain and is the only toolchain specific NMSIS file.

Normally this file needn't adapt for different device. If CPU CSR registers have done some changes you may need some adaption.

Here we provided intexc\_Device. S template file as below:

```
* Copyright (c) 2019 Nuclei Limited. All rights reserved.
2
    * SPDX-License-Identifier: Apache-2.0
    * Licensed under the Apache License, Version 2.0 (the License); you may
    * not use this file except in compliance with the License.
    * You may obtain a copy of the License at
    * www.apache.org/licenses/LICENSE-2.0
10
11
    * Unless required by applicable law or agreed to in writing, software
12
    * distributed under the License is distributed on an AS IS BASIS, WITHOUT
13
    * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
    * See the License for the specific language governing permissions and
15
    * limitations under the License.
16
17
   /****************************
    * \file
              intexc_<Device>.S
19
              NMSIS Interrupt and Exception Handling Template File
20
               for Nuclei N/NX Class Device
21
    * \version V1.10
22
               30. July 2021
    * \date
23
24
25
    ******************************
26
   #include "riscv_encoding.h"
27
28
29
   * \brief Global interrupt disabled
30
   * \details
31
     This function disable global interrupt.
32
33
      - All the interrupt requests will be ignored by CPU.
34
   */
35
   .macro DISABLE_MIE
36
      csrc CSR_MSTATUS, MSTATUS_MIE
37
   .endm
39
40
   * \brief Macro for context save
41
    * \details
42.
    * This macro save ABI defined caller saved registers in the stack.
```

```
* \remarks
44
     * - This Macro could use to save context when you enter to interrupt
45
    * or exception
46
47
   /* Save caller registers */
48
   .macro SAVE_CONTEXT
49
       /* Allocate stack space for context saving */
50
   #ifndef __riscv_32e
51
       addi sp, sp, -20*REGBYTES
52
   #else
53
       addi sp, sp, -14*REGBYTES
54
   #endif /* __riscv_32e */
57
       STORE x1, 0 * REGBYTES (sp)
       STORE x4, 1*REGBYTES(sp)
58
       STORE x5, 2*REGBYTES(sp)
59
       STORE x6, 3*REGBYTES(sp)
60
       STORE x7, 4 * REGBYTES(sp)
61
       STORE x10, 5 * REGBYTES (sp)
62
       STORE x11, 6*REGBYTES(sp)
63
       STORE x12, 7 * REGBYTES (sp)
64
       STORE x13, 8 * REGBYTES (sp)
65
       STORE x14, 9*REGBYTES(sp)
66
       STORE x15, 10 * REGBYTES (sp)
67
   #ifndef __riscv_32e
       STORE x16, 14 * REGBYTES (sp)
       STORE x17, 15 * REGBYTES (sp)
70
       STORE x28, 16*REGBYTES(sp)
71
       STORE x29, 17 * REGBYTES (sp)
72
       STORE x30, 18 * REGBYTES (sp)
73
       STORE x31, 19*REGBYTES(sp)
74
   #endif /* __riscv_32e */
75
   .endm
76
77
78
    * \brief Macro for restore caller registers
79
    * \details
80
    * This macro restore ABI defined caller saved registers from stack.
83
     * - You could use this macro to restore context before you want return
    * from interrupt or exeception
84
85
   /* Restore caller registers */
86
   .macro RESTORE_CONTEXT
87
88
      LOAD x1, 0 * REGBYTES (sp)
       LOAD x4, 1*REGBYTES(sp)
89
       LOAD x5, 2*REGBYTES(sp)
90
       LOAD x6, 3*REGBYTES(sp)
91
       LOAD x7, 4 \times REGBYTES(sp)
92
       LOAD x10, 5*REGBYTES(sp)
93
       LOAD x11, 6*REGBYTES(sp)
       LOAD x12, 7*REGBYTES(sp)
       LOAD x13, 8 * REGBYTES (sp)
96
       LOAD x14, 9*REGBYTES(sp)
97
       LOAD x15, 10 * REGBYTES (sp)
98
   #ifndef riscv 32e
       LOAD x16, 14 * REGBYTES (sp)
```

```
LOAD x17, 15*REGBYTES(sp)
101
        LOAD x28, 16*REGBYTES(sp)
102
        LOAD x29, 17 * REGBYTES (sp)
103
        LOAD x30, 18 * REGBYTES (sp)
104
        LOAD x31, 19*REGBYTES(sp)
105
106
        /* De-allocate the stack space */
107
        addi sp, sp, 20 * REGBYTES
108
    #else
109
        /* De-allocate the stack space */
110
        addi sp, sp, 14*REGBYTES
111
    #endif /* __riscv_32e */
112
113
   .endm
114
115
116
    * \brief Macro for save necessary CSRs to stack
117
    * \details
118
     * This macro store MCAUSE, MEPC, MSUBM to stack.
119
120
    .macro SAVE_CSR_CONTEXT
121
       /* Store CSR mcause to stack using pushmcause */
122
        csrrwi x0, CSR_PUSHMCAUSE, 11
123
        /* Store CSR mepc to stack using pushmepc */
124
       csrrwi x0, CSR_PUSHMEPC, 12
125
126
        /* Store CSR msub to stack using pushmsub */
127
       csrrwi x0, CSR_PUSHMSUBM, 13
   .endm
128
129
130
    * \brief Macro for restore necessary CSRs from stack
131
132
    * \details
    * This macro restore MSUBM, MEPC, MCAUSE from stack.
133
134
    .macro RESTORE_CSR_CONTEXT
135
       LOAD x5, 13*REGBYTES(sp)
136
       csrw CSR_MSUBM, x5
137
       LOAD x5, 12*REGBYTES(sp)
       csrw CSR_MEPC, x5
       LOAD x5, 11*REGBYTES(sp)
140
       csrw CSR MCAUSE, x5
141
   .endm
142
143
144
    * \brief Exception/NMI Entry
145
    * \details
146
    * This function provide common entry functions for exception/nmi.
147
    * \remarks
148
    * This function provide a default exception/nmi entry.
149
    * ABI defined caller save register and some CSR registers
150
    * to be saved before enter interrupt handler and be restored before return.
    */
   .section .text.trap
153
   /\star In CLIC mode, the exeception entry must be 64bytes aligned \star/
154
   .align 6
155
   .global exc_entry
156
   .weak exc_entry
```

```
exc_entry:
158
        /* Save the caller saving registers (context) */
159
        SAVE_CONTEXT
160
        /* Save the necessary CSR registers */
161
        SAVE_CSR_CONTEXT
162
163
164
         * Set the exception handler function arguments
165
         * argument 1: mcause value
166
         * argument 2: current stack point (SP) value
167
         */
168
        csrr a0, mcause
170
        mv al, sp
        /*
171
         * TODO: Call the exception handler function
172
         * By default, the function template is provided in
173
         * system_Device.c, you can adjust it as you want
174
175
        call core_exception_handler
176
177
        /* Restore the necessary CSR registers */
178
        RESTORE CSR CONTEXT
179
        /* Restore the caller saving registers (context) */
180
        RESTORE_CONTEXT
181
182
183
        /* Return to regular code */
        mret
184
185
186
    * \brief Non-Vector Interrupt Entry
187
188
189
     * This function provide common entry functions for handling
     * non-vector interrupts
190
     * \remarks
191
     * This function provide a default non-vector interrupt entry.
192
     * ABI defined caller save register and some CSR registers need
193
    * to be saved before enter interrupt handler and be restored before return.
194
    .section
                   .text.irg
197
    /* In CLIC mode, the interrupt entry must be 4bytes aligned */
    .align 2
198
    .global irg_entry
199
    .weak irq_entry
200
    /* This label will be set to MTVT2 register */
201
    irq entry:
        /* Save the caller saving registers (context) */
203
        SAVE_CONTEXT
204
        /* Save the necessary CSR registers */
205
        SAVE CSR CONTEXT
206
207
        /* This special CSR read/write operation, which is actually
208
         * claim the CLIC to find its pending highest ID, if the ID
         * is not 0, then automatically enable the mstatus.MIE, and
210
         * jump to its vector-entry-label, and update the link register
211
212
        csrrw ra, CSR_JALMNXTI, ra
213
```

```
/* Critical section with interrupts disabled */
215
        DISABLE_MIE
216
217
        /* Restore the necessary CSR registers */
218
        RESTORE_CSR_CONTEXT
219
        /* Restore the caller saving registers (context) */
220
        RESTORE_CONTEXT
221
222
        /* Return to regular code */
223
224
        mret.
225
    /* Default Handler for Exceptions / Interrupts */
226
227
    .global default_intexc_handler
    .weak default_intexc_handler
228
    Undef Handler:
229
   default_intexc_handler:
230
   1:
231
        j 1b
```

# Device Linker Script: gcc <device>.ld

# The Linker Script File gcc\_<device>.ld contains:

- · Memory base address and size.
- Code, data section, vector table etc. location.
- Stack & heap location and size.

The file exists for each supported toolchain and is the only toolchain specific NMSIS file.

To adapt the file to a new device only when you need change the memory base address, size, data and code location etc.

## gcc Device.ld Template File

Here we provided gcc\_Device.ld template file as below:

```
* Copyright (c) 2019 Nuclei Limited. All rights reserved.
2
    * SPDX-License-Identifier: Apache-2.0
    * Licensed under the Apache License, Version 2.0 (the License); you may
    * not use this file except in compliance with the License.
    * You may obtain a copy of the License at
    * www.apache.org/licenses/LICENSE-2.0
    * Unless required by applicable law or agreed to in writing, software
12
    * distributed under the License is distributed on an AS IS BASIS, WITHOUT
13
    * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
    \star See the License for the specific language governing permissions and
15
    * limitations under the License.
16
    */
```

```
/*********************************
18
              gcc <Device>.ld
   * @file
19
    * @brief GNU Linker Script for Nuclei N/NX based device
20
   \star @version V1.1.0
21
    * @date
              30. July 2021
22
    *******************************
23
24
   /****** Use Configuration Wizard in Context Menu *******************************
25
26
   OUTPUT_ARCH( "riscv" )
27
   28
   * <h> Flash Configuration
    * <o0> Flash Base Address <0x0-0xFFFFFFFF:8>
   * <o1> Flash Size (in Bytes) <0x0-0xFFFFFFFF:8>
31
   * </h>
32
   * /
33
    _{ROM\_BASE} = 0x20000000;
34
   _{\rm ROM\_SIZE} = 0 \times 00400000;
35
36
   /*---- ILM RAM Configuration -----
37
   * <h> ILM RAM Configuration
38
   * <o0> ILM RAM Base Address
                              <0x0-0xFFFFFFFFF:8>
39
   * <o1> ILM RAM Size (in Bytes) <0x0-0xFFFFFFFF:8>
40
41
   * </h>
   */
42
   _{\rm LM_RAM_BASE} = 0x80000000;
   ILM RAM SIZE = 0 \times 00010000;
44
45
  /*---- Embedded RAM Configuration -----
46
   * <h> RAM Configuration
47
   * <o0> RAM Base Address
                           <0x0-0xFFFFFFFF:8>
48
    * <o1> RAM Size (in Bytes) <0x0-0xFFFFFFFF:8>
   * </h>
50
51
    RAM BASE = 0 \times 900000000;
52.
   _{RAM\_SIZE} = 0x00010000;
53
54
  /***************** Stack / Heap Configuration *********************
   * <h> Stack / Heap Configuration
   * <o0> Stack Size (in Bytes) <0x0-0xFFFFFFFF:8>
57
   * <o1> Heap Size (in Bytes) <0x0-0xFFFFFFFF:8>
58
   * </h>
59
   * /
60
    STACK\_SIZE = 0x00000800;
61
   \__{HEAP\_SIZE} = 0x00000800;
62
63
   64
65
   /\star Define base address and length of flash and ram \star/
66
  MEMORY
67
    flash (rxai!w) : ORIGIN = ROM_BASE, LENGTH = ROM_SIZE
   ram (wxa!ri) : ORIGIN = __RAM_BASE, LENGTH = __RAM_SIZE
71
  /* Linker script to place sections and symbol values. Should be used together
72
   * with other linker script that defines memory regions FLASH, ILM and RAM.
73
   * It references following symbols, which must be defined in code:
```

```
_start : Entry of reset handler
75
76
     * It defines following symbols, which code can use without definition:
77
         _ilm_lma
78
         _{
m llm}
          __etext
80
         _etext
81
         etext
82
         _eilm
83
         __preinit_array_start
84
85
         __preinit_array_end
         __init_array_start
87
         __init_array_end
         __fini_array_start
88
          __fini_array_end
89
         _data_lma
90
         _edata
91
         edata
92
         __data_end__
93
         __bss_start
94
         __fbss
95
         _end
96
         end
97
         __heap_end
98
         ___StackLimit
         ___StackTop
100
         ___STACK_SIZE
101
102
    /\star Define entry label of program \star/
103
    ENTRY(_start)
104
    SECTIONS
105
106
      __STACK_SIZE = DEFINED(__STACK_SIZE) ? __STACK_SIZE : 2K;
107
108
      .init
109
110
        /* vector table locate at flash */
111
112
        *(.vtable)
        KEEP (*(SORT_NONE(.init)))
      } >flash AT>flash
114
115
      .ilaliqn
116
117
      {
         . = ALIGN(4);
118
        /\star Create a section label as _ilm_lma which located at flash \star/
119
        PROVIDE( _ilm_lma = . );
120
      } >flash AT>flash
121
122
      .ialign
123
124
        /\star Create a section label as _ilm which located at flash \star/
125
        PROVIDE( _ilm = . );
126
      } >flash AT>flash
127
128
      /* Code section located at flash */
129
      .text
130
```

```
132
        *(.text.unlikely .text.unlikely.*)
        *(.text.startup .text.startup.*)
133
        *(.text .text.*)
134
        *(.gnu.linkonce.t.*)
135
      } >flash AT>flash
136
137
      .rodata : ALIGN(4)
138
139
        . = ALIGN(4);
140
        *(.rdata)
141
142
        *(.rodata .rodata.*)
        *(.gnu.linkonce.r.*)
143
144
        . = ALIGN(8);
        *(.srodata.cst16)
145
        *(.srodata.cst8)
146
        *(.srodata.cst4)
147
        *(.srodata.cst2)
148
        *(.srodata .srodata.*)
149
      } >flash AT>flash
150
151
      .fini
152
      {
153
        KEEP (*(SORT_NONE(.fini)))
154
      } >flash AT>flash
155
156
157
      . = ALIGN(4);
158
      PROVIDE (__etext = .);
159
      PROVIDE (_etext = .);
160
      PROVIDE (etext = .);
161
162
      PROVIDE ( _eilm = . );
163
164
      .preinit_array :
165
166
        PROVIDE_HIDDEN (__preinit_array_start = .);
167
168
        KEEP (*(.preinit_array))
        PROVIDE_HIDDEN (__preinit_array_end = .);
169
170
      } >flash AT>flash
171
      .init_array
172
173
        PROVIDE_HIDDEN (__init_array_start = .);
174
        KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
175
176
        KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o)_
        PROVIDE_HIDDEN (__init_array_end = .);
177
      } >flash AT>flash
178
179
180
      .fini_array
181
      {
        PROVIDE_HIDDEN (__fini_array_start = .);
182
        KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*)))
183
        KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o),
184
    →.dtors))
        PROVIDE_HIDDEN (__fini_array_end = .);
185
186
      } >flash AT>flash
```

```
187
      .ctors
188
189
      {
         /\star gcc uses crtbegin.o to find the start of
190
          * the constructors, so we make sure it is
191
          * first. Because this is a wildcard, it
192
          * doesn't matter if the user does not
193
          * actually link against crtbegin.o; the
194
          * linker won't look for a file to match a
195
          * wildcard. The wildcard also means that it
196
          * doesn't matter which directory crtbegin.o
197
          * is in.
199
         */
        KEEP (*crtbegin.o(.ctors))
200
        KEEP (*crtbegin?.o(.ctors))
201
         /\star We don't want to include the .ctor section from
202
         * the crtend.o file until after the sorted ctors.
203
          \star The .ctor section from the crtend file contains the
204
          * end of ctors marker and it must be last
205
         */
206
         KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors))
207
         KEEP (*(SORT(.ctors.*)))
208
        KEEP (*(.ctors))
209
      } >flash AT>flash
210
211
212
      .dtors
213
        KEEP (*crtbegin.o(.dtors))
214
        KEEP (*crtbegin?.o(.dtors))
215
        KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o) .dtors))
216
217
        KEEP (*(SORT(.dtors.*)))
218
        KEEP (*(.dtors))
      } >flash AT>flash
219
220
      .lalign
221
222
223
         . = ALIGN(4);
        PROVIDE( _data_lma = . );
224
225
      } >flash AT>flash
226
      .dalign
227
228
         . = ALIGN(4);
229
        PROVIDE ( _data = . );
230
231
      } >ram AT>flash
232
      /\star Define data section virtual address is ram and physical address is flash \star/
233
      .data
234
235
      {
         *(.data .data.*)
236
237
         *(.gnu.linkonce.d.*)
         \cdot = ALIGN(8);
238
        PROVIDE ( \underline{\underline{global\_pointer}} = . + 0x800 );
239
         *(.sdata .sdata.* .sdata*)
240
         *(.qnu.linkonce.s.*)
241
      } >ram AT>flash
242
```

```
. = ALIGN(4);
244
      PROVIDE ( _edata = . );
245
      PROVIDE( edata = . );
246
247
      PROVIDE ( \_fbss = . );
248
      PROVIDE( __bss_start = . );
249
       .bss
250
251
         *(.sbss*)
252
         *(.gnu.linkonce.sb.*)
253
         *(.bss .bss.*)
254
         *(.gnu.linkonce.b.*)
255
256
         * (COMMON)
         . = ALIGN(4);
257
       } >ram AT>ram
258
259
       . = ALIGN(8);
260
      PROVIDE ( \_end = . );
261
      PROVIDE ( end = . );
262
       /* Define stack and head location at ram */
263
       .stack ORIGIN(ram) + LENGTH(ram) - __STACK_SIZE :
264
      {
265
        PROVIDE ( _heap_end = . );
266
         ___StackLimit = .;
267
         . = ___STACK_SIZE;
268
         __StackTop = .;
        PROVIDE ( _sp = . );
270
       } >ram AT>ram
271
272
```

## System Configuration Files system <device>.c and system <device>.h

The **System Configuration Files system\_<device>.c** and **system\_<device>.h** provides as a minimum the functions described under *System Device Configuration* (page 342).

These functions are device specific and need adaptations. In addition, the file might have configuration settings for the device such as XTAL frequency or PLL prescaler settings, necessary system initilization, vendor customized interrupt, exception and nmi handling code, refer to *System Device Configuration* (page 342) for more details.

For devices with external memory BUS the system\_<device>.c also configures the BUS system.

The silicon vendor might expose other functions (i.e. for power configuration) in the system\_<device>.c file. In case of additional features the function prototypes need to be added to the system\_<device>.h header file.

# system\_Device.c Template File

Here we provided system\_Device.c template file as below:

```
* Licensed under the Apache License, Version 2.0 (the License); you may
    * not use this file except in compliance with the License.
8
    * You may obtain a copy of the License at
Q
10
    * www.apache.org/licenses/LICENSE-2.0
11
12
    * Unless required by applicable law or agreed to in writing, software
13
    * distributed under the License is distributed on an AS IS BASIS, WITHOUT
14
    * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15
    * See the License for the specific language governing permissions and
16
    * limitations under the License.
17
19
   /******************************
              system_<Device>.c
20
   * @brief NMSIS Nuclei N/NX Device Peripheral Access Layer Source File for
21
              Device <Device>
22
   * @version V1.10
23
    * @date 30. July 2021
24
    *****************************
25
26
   #include <stdint.h>
27
   #include "<Device>.h"
28
29
30
   Define clocks
31
                      -----*/
   *----
   /* TODO: add here your necessary defines for device initialization
33
          following is an example for different system frequencies */
34
                         (12000000U)
                                       /* Oscillator frequency
35
   #define XTAL
36
   #define SYSTEM_CLOCK (5 * XTAL)
37
38
39
   * \defgroup NMSIS_Core_SystemConfig
                                            System Device Configuration
40
   * \brief Functions for system init, clock setup and interrupt/exception/nmi...
41
   →functions available in system_<device>.c.
    * \details
42
   * Nuclei provides a template file **system_Device.c** that must be adapted by
43
   * the silicon vendor to match their actual device. As a <b>minimum requirement</b>,
45
   * this file must provide:
   * - A device-specific system configuration function, \ref SystemInit.
46
   * - A global variable that contains the system frequency, \ref SystemCoreClock.
47
    * - A global eclic configuration initialization, \ref ECLIC_Init.
48
    * - Global c library \ref _init and \ref _fini functions called right before,
49
   →calling main function.
    * - Vendor customized interrupt, exception and nmi handling code, see \ref NMSIS_
50
   → Core_IntExcNMI_Handling
51
   * The file configures the device and, typically, initializes the oscillator (PLL)
52
   →that is part
   * of the microcontroller device. This file might export other functions or variables.
   * a more flexible configuration of the microcontroller system.
54
55
   * And this file also provided common interrupt, exception and NMI exception handling.
56
   → framework template,
   * Silicon vendor can customize these template code as they want.
```

```
58
     \star \note Please pay special attention to the static variable \c SystemCoreClock. This,
    →variable might be
    * used throughout the whole system initialization and runtime to calculate frequency/
60
    \rightarrowtime related values.
    * Thus one must assure that the variable always reflects the actual system clock,
61
    ⇔speed.
62
    * \attention
63
    * Be aware that a value stored to \c SystemCoreClock during low level initialization.
    → (i.e. \c SystemInit()) might get
    * overwritten by C libray startup code and/or .bss section initialization.
    * Thus its highly recommended to call \ref SystemCoreClockUpdate at the beginning of,
    →the user \c main() routine.
67
    * @{
68
    */
69
71
     System Core Clock Variable
72
    *----*/
73
   /* TODO: initialize SystemCoreClock with the system core clock frequency value
74
            achieved after system initialization.
75
            This means system core clock frequency after call to SystemInit() */
76
77
78
    * \brief
                Variable to hold the system core clock value
79
    * Holds the system core clock, which is the system clock frequency supplied to the
80
    \hookrightarrow SysTick
    * timer and the processor core clock. This variable can be used by debuggers to.
81
    → guerv the
82
     * frequency of the debug timer or to configure the trace clock speed.
83
    * \attention
84
    * Compilers must be configured to avoid removing this variable in case the.
85
    →application
    * program is not using it. Debugging systems require the variable to be physically
86
    * present in memory so that it can be examined to configure the debugger.
   uint32_t SystemCoreClock = SYSTEM_CLOCK; /* System Clock Frequency (Core Clock) */
89
90
91
92
     Clock functions
93
94
95
96
    * \brief
                Function to update the variable \ref SystemCoreClock
97
    * \details
98
    * Updates the variable \ref SystemCoreClock and must be called whenever the core_
    →clock is changed
   * during program execution. The function evaluates the clock register settings and
   * the current core clock.
101
102
   void SystemCoreClockUpdate (void) /* Get Core Clock Frequency */
103
```

```
/* TODO: add code to calculate the system frequency based upon the current
105
             register settings.
106
         * Note: This function can be used to retrieve the system core clock frequeny
107
             after user changed register settings.
108
109
        SystemCoreClock = SYSTEM_CLOCK;
110
111
112
113
    * \brief
                   Function to Initialize the system.
114
    * \details
115
    * Initializes the microcontroller system. Typically, this function configures the
    * oscillator (PLL) that is part of the microcontroller device. For systems
    * with a variable clock speed, it updates the variable \ref SystemCoreClock.
118
    * SystemInit is called from the file <b>startup<i>_device</i></b>.
119
120
   void SystemInit (void)
121
122
        /* TODO: add code to initialize the system
123
         * Warn: do not use global variables because this function is called before
124
         * reaching pre-main. RW section maybe overwritten afterwards.
125
126
        SystemCoreClock = SYSTEM_CLOCK;
127
   }
128
129
130
   /**
    * \defgroup NMSIS_Core_IntExcNMI_Handling Interrupt and Exception and NMI Handling
131
    * \brief Functions for interrupt, exception and nmi handle available in system_
132
    → <device>.c.
    * \details
133
    * Nuclei provide a template for interrupt, exception and NMI handling. Silicon,
134
    → Vendor could adapat according
    * to their requirement. Silicon vendor could implement interface for different.
135
    →exception code and
    * replace current implementation.
136
137
138
    * @ {
   /** \brief Max exception handler number, don't include the NMI(0xFFF) one */
   #define MAX_SYSTEM_EXCEPTION_NUM
                                             12
141
   /**
142
    * \brief
                   Store the exception handlers for each exception ID
143
    * \note
144
    * - This SystemExceptionHandlers are used to store all the handlers for all
145
    * the exception codes Nuclei N/NX core provided.
146
    * - Exception code 0 - 11, totally 12 exceptions are mapped to...
147
    →SystemExceptionHandlers[0:11]
    * - Exception for NMI is also re-routed to exception handling(exception code 0xFFF)...
148
    →in startup code configuration, the handler itself is mapped to_
    → SystemExceptionHandlers[MAX_SYSTEM_EXCEPTION_NUM]
149
   static unsigned long SystemExceptionHandlers[MAX_SYSTEM_EXCEPTION_NUM+1];
151
152
   * \brief
                   Exception Handler Function Typedef
153
   * \note
154
    * This typedef is only used internal in this system_<Device>.c file.
```

```
* It is used to do type conversion for registered exception handler before calling.
156
    \hookrightarrow it.
    */
157
    typedef void (*EXC_HANDLER) (unsigned long mcause, unsigned long sp);
158
159
160
     * \brief
                    System Default Exception Handler
161
     * \details
162
     * This function provided a default exception and NMI handling code for all exception,
163
    * By default, It will just print some information for debug, Vendor can customize it.
    →according to its requirements.
165
    static void system_default_exception_handler(unsigned long mcause, unsigned long sp)
166
167
        /* TODO: Uncomment this if you have implement printf function.
168
         \star Or you can implement your own version as you like \star/
169
        //printf("MCAUSE: 0x%lx\r\n", mcause);
170
        //printf("MEPC : 0x%lx\r\n", \__RV\_CSR\_READ(CSR\_MEPC));
171
        //printf("MTVAL : 0x%lx\r\n", __RV_CSR_READ(CSR_MBADADDR));
172
        Exception_DumpFrame(sp);
173
        while(1);
174
175
    }
176
    /**
177
178
    * \brief
                    Initialize all the default core exception handlers
179
    * The core exception handler for each exception id will be initialized to \ref.
180
    → system_default_exception_handler.
    * \note
181
    * Called in \ref _init function, used to initialize default exception handlers for...
182
    →all exception IDs
183
    static void Exception_Init(void)
184
185
        for (int i = 0; i < MAX_SYSTEM_EXCEPTION_NUM+1; i++) {</pre>
186
            SystemExceptionHandlers[i] = (unsigned long) system_default_exception_handler;
187
189
    }
190
191
    * \brief
                    Dump Exception Frame
192
     * \details
193
     * This function provided feature to dump exception frame stored in stack.
194
195
    void Exception_DumpFrame(unsigned long sp)
196
197
        EXC_Frame_Type *exc_frame = (EXC_Frame_Type *)sp;
198
199
    #ifndef ___riscv_32e
200
        printf("ra: 0x%x, tp: 0x%x, t0: 0x%x, t1: 0x%x, t2: 0x%x, t3: 0x%x, t4: 0x%x, t5:...
    \rightarrow 0x%x, t6: 0x%x\n"
                "a0: 0x%x, a1: 0x%x, a2: 0x%x, a3: 0x%x, a4: 0x%x, a5: 0x%x, a6: 0x%x, a7:...
202
    \hookrightarrow 0 \times \% \times \backslash n'' \setminus
                "mcause: 0x%x, mepc: 0x%x, msubm: 0x%x\n", exc_frame->ra, exc_frame->tp,_
203
    \rightarrowexc_frame->t0, \
                exc_frame->t1, exc_frame->t2, exc_frame->t3, exc_frame->t4, exc_frame->t5,
204
                                                                                     (continues on next page)
    ⇒exc_frame->t6,
```

```
exc_frame->a0, exc_frame->a1, exc_frame->a2, exc_frame->a3, exc_frame->a4,...
205
    \rightarrowexc_frame->a5, \
               exc_frame->a6, exc_frame->a7, exc_frame->mcause, exc_frame->mepc, exc_
206
    →frame->msubm);
    #else
207
        printf("ra: 0x%x, tp: 0x%x, t0: 0x%x, t1: 0x%x, t2: 0x%x\n" \
208
                "a0: 0x%x, a1: 0x%x, a2: 0x%x, a3: 0x%x, a4: 0x%x, a5: 0x%x\n" \
209
               "mcause: 0x%x, mepc: 0x%x, msubm: 0x%x\n", exc_frame->ra, exc_frame->tp,_
210
    \rightarrowexc_frame->t0, \
               exc_frame->t1, exc_frame->t2, exc_frame->a0, exc_frame->a1, exc_frame->a2,_
211
    →exc_frame->a3, \
               exc_frame->a4, exc_frame->a5, exc_frame->mcause, exc_frame->mepc, exc_
212
    →frame->msubm);
    #endif
213
214
215
216
    * \brief
                     Register an exception handler for exception code EXCn
217
218
      · \details
    * * For EXCn < \ref MAX_SYSTEM_EXCEPTION_NUM, it will be registered into...
219
    → SystemExceptionHandlers [EXCn-1].
    * * For EXCn == NMI_EXCn, it will be registered into SystemExceptionHandlers[MAX_
220
    \hookrightarrow SYSTEM_EXCEPTION_NUM].
    * \param EXCn See \ref EXCn_Type
221
     * \param exc_handler The exception handler for this exception code EXCn
222
223
    */
    void Exception Register_EXC(uint32 t EXCn, unsigned long exc_handler)
224
225
        if ((EXCn < MAX_SYSTEM_EXCEPTION_NUM) && (EXCn >= 0)) {
226
            SystemExceptionHandlers[EXCn] = exc_handler;
227
        } else if (EXCn == NMI_EXCn) {
228
            SystemExceptionHandlers[MAX_SYSTEM_EXCEPTION_NUM] = exc_handler;
229
230
    }
231
232
233
                    Get current exception handler for exception code EXCn
    * \brief
234
    * \details
    * * For EXCn < \ref MAX SYSTEM EXCEPTION NUM, it will return.
    → SystemExceptionHandlers [EXCn-1].
    * * For EXCn == NMI_EXCn, it will return SystemExceptionHandlers[MAX_SYSTEM_
237
    → EXCEPTION NUM1.
238
    * \param EXCn
                       See \ref EXCn_Type
     * \return Current exception handler for exception code EXCn, if not found, return 0.
239
    unsigned long Exception_Get_EXC(uint32_t EXCn)
241
242
        if ((EXCn < MAX_SYSTEM_EXCEPTION_NUM) && (EXCn >= 0)) {
243
            return SystemExceptionHandlers[EXCn];
244
        } else if (EXCn == NMI_EXCn) {
245
246
            return SystemExceptionHandlers[MAX_SYSTEM_EXCEPTION_NUM];
        } else {
            return 0;
248
        }
249
250
    }
251
252
```

```
* \brief
                    Common NMI and Exception handler entry
253
     * \details
254
    * This function provided a command entry for NMI and exception. Silicon Vendor could,
255
    \hookrightarrow modify
     * this template implementation according to requirement.
256
     * \remarks
     * - RISCV provided common entry for all types of exception. This is proposed code,
258
    →template
       for exception entry function, Silicon Vendor could modify the implementation.
259
    * - For the core_exception_handler template, we provided exception register function_
260
    → \ref Exception_Register_EXCn
    * which can help developer to register your exception handler for specific.
    \rightarrowexception number.
262
    uint32_t core_exception_handler(unsigned long mcause, unsigned long sp)
263
264
        uint32_t EXCn = (uint32_t) (mcause & 0X00000fff);
265
        EXC_HANDLER exc_handler;
266
267
        if ((EXCn < MAX_SYSTEM_EXCEPTION_NUM) && (EXCn >= 0)) {
268
            exc_handler = (EXC_HANDLER) SystemExceptionHandlers[EXCn];
269
        } else if (EXCn == NMI_EXCn) {
270
            exc_handler = (EXC_HANDLER) SystemExceptionHandlers[MAX_SYSTEM_EXCEPTION_NUM];
271
        } else {
272
            exc_handler = (EXC_HANDLER) system_default_exception_handler;
273
274
        if (exc_handler != NULL) {
275
            exc_handler(mcause, sp);
276
277
        return 0;
278
    }
279
280
281
    * \brief Initialize Global ECLIC Config
282
    * \details
283
    * ECLIC needs be initialized after boot up,
284
    * Vendor could also change the initialization
285
    * configuration.
    */
    void ECLIC_Init(void)
288
289
        /* Global Configuration about MTH and NLBits.
290
         * TODO: Please adapt it according to your system requirement.
291
         * This function is called in _init function */
292
        /* Set CSR MTH to zero */
293
        ECLIC SetMth(0);
294
        /* Set Nlbits to the CLICINTCTLBITS, all the bits are level bits */
295
        ECLIC_SetCfgNlbits(__ECLIC_INTCTLBITS);
296
297
    }
298
299
    * \brief Initialize a specific IRQ and register the handler
301
     * This function set vector mode, trigger mode and polarity, interrupt level and,
302
    ⇔priority,
    * assign handler for specific IRQn.
303
                                 NMI interrupt handler address
    * \param [in] IRQn
```

```
\ref ECLIC_NON_VECTOR_INTERRUPT means non-vector mode,
    * \param [in] shv
305
    →and \ref ECLIC_VECTOR_INTERRUPT is vector mode
     * \param [in] trig_mode see \ref ECLIC_TRIGGER_Type
306
     * \param [in] lvl
                                 interupt level
     * \param [in] priority
                                 interrupt priority
     * \param [in] handler
                                 interrupt handler, if NULL, handler will not be installed
310
     * \return
                     -1 means invalid input parameter. 0 means successful.
311
     * \remarks
312
    * - This function use to configure specific eclic interrupt and register its.
313
    →interrupt handler and enable its interrupt.
    * - If the vector table is placed in read-only section (FLASHXIP mode), handler could
    →not be installed
315
   int32_t ECLIC_Register_IRQ(IRQn_Type IRQn, uint8_t shv, ECLIC_TRIGGER_Type trig_mode,_
316
    →uint8_t lvl, uint8_t priority, void *handler)
317
        if ((IRQn > SOC_INT_MAX) || (shv > ECLIC_VECTOR_INTERRUPT) \
318
            || (trig_mode > ECLIC_NEGTIVE_EDGE_TRIGGER )) {
319
            return -1;
320
321
322
        /* set interrupt vector mode */
323
        ECLIC_SetShvIRQ(IRQn, shv);
324
        /* set interrupt trigger mode and polarity */
325
326
       ECLIC_SetTrigIRQ(IRQn, trig_mode);
327
        /* set interrupt level */
       ECLIC_SetLevelIRQ(IRQn, lvl);
328
        /* set interrupt priority */
329
       ECLIC_SetPriorityIRQ(IRQn, priority);
330
        if (handler != NULL) {
331
            /* set interrupt handler entry to vector table */
332
            ECLIC_SetVector(IRQn, (rv_csr_t) handler);
333
334
        /* enable interrupt */
335
        ECLIC_EnableIRQ(IRQn);
336
       return 0;
337
    /** @} */ /* End of Doxygen Group NMSIS_Core_ExceptionAndNMI */
340
341
    * \brief early init function before main
342
     * \details
343
     * This function is executed right before main function.
344
     * For RISC-V gnu toolchain, _init function might not be called
345
     * by __libc_init_array function, so we defined a new function
346
     * to do initialization
347
348
349
   void _premain_init(void)
350
        /* TODO: Add your own initialization code here, called before main */
351
        /* __ICACHE_PRESENT and __DCACHE_PRESENT are defined in <Device>.h */
352
    #if defined(__ICACHE_PRESENT) && __ICACHE_PRESENT == 1
353
       EnableICache();
354
   #endif
355
    #if defined(__DCACHE_PRESENT) && __DCACHE_PRESENT == 1
356
       EnableDCache();
```

```
#endif
358
        // TODO: Add code to set the system clock frequency value SystemCoreClock
359
360
        // TODO: Add code to initialize necessary gpio and basic uart for debug print
361
362
        /* Initialize exception default handlers */
363
       Exception_Init();
364
        /* ECLIC initialization, mainly MTH and NLBIT settings */
365
       ECLIC_Init();
366
367
    }
368
370
    * \brief finish function after main
    * \param [in] status status code return from main
371
    * \details
372
    * This function is executed right after main function.
373
     * For RISC-V qnu toolchain, _fini function might not be called
374
     * by __libc_fini_array function, so we defined a new function
     * to do initialization
376
377
    void _postmain_fini(int status)
378
379
        /* TODO: Add your own finishing code here, called after main */
380
381
382
383
    /**
    * \brief _init function called in __libc_init_array()
384
    * \details
385
    * This `__libc_init_array()` function is called during startup code,
386
    * user need to implement this function, otherwise when link it will
387
     * error init.c:(.text.__libc_init_array+0x26): undefined reference to `_init'
388
     * \note
    * Please use \ref _premain_init function now
390
391
    void _init(void)
392
393
394
        /* Don't put any code here, please use _premain_init now */
397
    * \brief fini function called in libc fini array()
398
399
    * \details
    * This `__libc_fini_array()` function is called when exit main.
400
     * user need to implement this function, otherwise when link it will
401
     * error fini.c:(.text.__libc_fini_array+0x28): undefined reference to `_fini'
403
     * Please use \ref _postmain_fini function now
404
405
   void _fini(void)
406
407
        /* Don't put any code here, please use _postmain_fini now */
408
410
   /** @} */ /* End of Doxygen Group NMSIS_Core_SystemAndClock */
411
```

#### system Device.h Template File

Here we provided system\_Device.h template file as below:

```
1
   * Copyright (c) 2009-2018 Arm Limited. All rights reserved.
2
    * Copyright (c) 2019 Nuclei Limited. All rights reserved.
    * SPDX-License-Identifier: Apache-2.0
    * Licensed under the Apache License, Version 2.0 (the License); you may
7
    * not use this file except in compliance with the License.
    * You may obtain a copy of the License at
9
10
    * www.apache.org/licenses/LICENSE-2.0
11
12
    * Unless required by applicable law or agreed to in writing, software
13
    * distributed under the License is distributed on an AS IS BASIS, WITHOUT
14
    * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15
   * See the License for the specific language governing permissions and
   * limitations under the License.
17
18
   19
   * afile
              system_<Device>.h
20
   * @brief NMSIS Nuclei N/NX Device Peripheral Access Layer Header File for
21
              Device <Device>
22
   * @version V1.10
23
               30. July 2021
    * @date
24
    25
26
   #ifndef __SYSTEM_<Device>_H_ /* TODO: replace '<Device>' with your device name */
27
   #define __SYSTEM_<Device>_H__
28
29
   #ifdef __cplusplus
30
   extern "C" {
31
   #endif
33
   #include <stdint.h>
34
35
   extern uint32_t SystemCoreClock; /*!< System Clock Frequency (Core Clock) */</pre>
36
37
   /** \brief Exception frame structure store in stack */
38
   typedef struct EXC_Frame {
39
      unsigned long ra;
                                      /* ra: x1, return address for jump */
40
      unsigned long tp;
                                      /* tp: x4, thread pointer */
41
      unsigned long t0;
                                      /* t0: x5, temporary register 0 */
42
      unsigned long t1;
                                      /* t1: x6, temporary register 1 */
43
44
      unsigned long t2;
                                      /* t2: x7, temporary register 2 */
      unsigned long a0;
                                      /* a0: x10, return value or function argument 0
45
      unsigned long al;
                                      /* a1: x11, return value or function argument 1
46
      unsigned long a2;
                                      /* a2: x12, function argument 2 */
47
      unsigned long a3;
                                      /* a3: x13, function argument 3 */
48
                                      /* a4: x14, function argument 4 */
      unsigned long a4;
49
      unsigned long a5;
                                      /* a5: x15, function argument 5 */
50
      unsigned long mcause;
                                      /* mcause: machine cause csr register */
51
```

```
unsigned long mepc;
                                          /* mepc: machine exception program counter csr.
52
    ⇒register */
       unsigned long msubm;
                                         /* msubm: machine sub-mode csr register, nuclei
53
    ⇔customized */
   #ifndef __riscv_32e
54
                                          /* a6: x16, function argument 6 */
       unsigned long a6;
55
       unsigned long a7;
                                          /* a7: x17, function argument 7 */
56
       unsigned long t3;
                                          /* t3: x28, temporary register 3 */
57
       unsigned long t4;
                                          /* t4: x29, temporary register 4 */
58
       unsigned long t5;
                                          /* t5: x30, temporary register 5 */
59
       unsigned long t6;
                                          /* t6: x31, temporary register 6 */
60
   #endif
61
62
   } EXC_Frame_Type;
63
64
    * \brief Setup the microcontroller system.
65
    * \details
66
       Initialize the System and update the SystemCoreClock variable.
67
68
   extern void SystemInit(void);
69
70
71
    * \brief Update SystemCoreClock variable.
72
    * \details
73
    * Updates the SystemCoreClock with current core Clock retrieved from cpu registers.
   extern void SystemCoreClockUpdate(void);
76
77
78
    * \brief Dump Exception Frame
79
80
   void Exception_DumpFrame(unsigned long sp);
81
82
83
    * \brief Register an exception handler for exception code EXCn
84
85
   extern void Exception_Register_EXC(uint32_t EXCn, unsigned long exc_handler);
86
    * \brief Get current exception handler for exception code EXCn
89
90
   extern unsigned long Exception_Get_EXC(uint32_t EXCn);
91
92
93
    * \brief Initialize eclic config
94
95
   extern void ECLIC_Init(void);
96
97
98
    * \brief initialize a specific IRQ and register the handler
99
   * \details
100
   * This function set vector mode, trigger mode and polarity, interrupt level and
   * assign handler for specific IRQn.
102
   * \param [in] IRQn
                               NMI interrupt handler address
103
   * \param [in] shv
                                \ref ECLIC_NON_VECTOR_INTERRUPT means non-vector mode,...
104
    →and \ref ECLIC_VECTOR_INTERRUPT is vector mode
```

```
* \param [in] trig_mode see \ref ECLIC_TRIGGER_Type
105
      \param [in] lvl
                              interupt level
106
     * \param [in] priority interrupt priority
107
      \param [in] handler
                              interrupt handler
108
      return
                    -1 means invalid input parameter. 0 means successful.
109
      \remarks
110
     * - This function use to configure specific eclic interrupt and register its...
111
    →interrupt handler and enable its interrupt.
112
   extern int32_t ECLIC_Register_IRQ(IRQn_Type IRQn, uint8_t shv, ECLIC_TRIGGER_Type_
113
    →trig_mode, uint8_t lvl, uint8_t priority, void *handler);
115
   #ifdef __cplusplus
116
   #endif
117
118
   #endif /* ___SYSTEM_<Device>_H__ */
119
```

#### Device Header File <device.h>

The Device Header File <device.h> (page 48) contains the following sections that are device specific:

- Interrupt Number Definition (page 48) provides interrupt numbers (IRQn) for all exceptions and interrupts of the device.
- Configuration of the Processor and Core Peripherals (page 50) reflect the features of the device.
- Device Peripheral Access Layer (page 52) provides definitions for the Peripheral Access (page 294) to all device peripherals. It contains all data structures and the address mapping for device-specific peripherals.
- Access Functions for Peripherals (optional) provide additional helper functions for peripherals that are useful for programming of these peripherals. Access Functions may be provided as inline functions or can be extern references to a device-specific library provided by the silicon vendor.

NMSIS Core API (page 60) describes the standard features and functions of the Device Header File <device.h> (page 48) in detail.

# **Interrupt Number Definition**

Device Header File <device.h> (page 48) contains the enumeration IRQn\_Type (page 306) that defines all exceptions and interr

- Negative IRQn values represent processor core exceptions (internal interrupts).
- Positive IRQn values represent device-specific exceptions (external interrupts). The first device-specific interrupt has the IRQn value 0. The IRQn values needs extension to reflect the device-specific interrupt vector table in the *Startup File startup\_<device>*.S (page 14).

The following example shows the extension of the interrupt vector table for the GD32VF103 device family.

(continued from previous page) Reserved2\_IRQn 2, /\*!< Internal reserved /\*!< System Timer SW interrupt SysTimerSW\_IRQn 3, /\*!< Internal reserved Reserved3\_IRQn 4, Reserved4\_IRQn 5, /\*!< Internal reserved 8 \*/  $\hookrightarrow$ /\*!< Internal reserved Reserved5\_IRQn 6, Q \*/ 7, SysTimer\_IRQn /\*!< System Timer Interrupt 10 = \*/ 11 Reserved6\_IRQn 8, /\*!< Internal reserved \*/ Reserved7\_IRQn 9, /\*!< Internal reserved 12 \*/ /\*!< Internal reserved Reserved8\_IRQn 10, 13 \*/ Reserved9\_IRQn 11, /\*!< Internal reserved 14 Reserved10\_IRQn 12, /\*!< Internal reserved 15 \*/ Reserved11\_IRQn = 13, /\*!< Internal reserved 16 \*/ Reserved12\_IRQn = 14, /\*!< Internal reserved 17 Reserved13\_IRQn = 15, /\*!< Internal reserved 18 = 16. /\*!< Internal reserved 19 Reserved14\_IRQn /\*! Hard Fault, storage access error 20 HardFault\_IRQn = 17, Reserved15\_IRQn = 18, /\*!< Internal reserved 21 22 /\*\*\*\*\* GD32VF103 Specific Interrupt Numbers 23 WWDGT\_IRQn = 19, /\*!< window watchDog timer interrupt \*/ LVD\_IRQn 20, /\*! < LVD through EXTI line detect. 25 →interrupt \*/ TAMPER\_IRQn = 21. /\*!< tamper through EXTI line detect 26 27 28 CAN1\_EWMC\_IRQn = 85, /\*!< CAN1 EWMC interrupt 29 \*/ = 86, /\*!< USBFS global interrupt USBFS IROn 30 \*/ SOC\_INT\_MAX, /\*! < Number of total Interrupts 31 \*/ } IRQn\_Type;

# **Configuration of the Processor and Core Peripherals**

The *Device Header File <device.h>* (page 48) configures the Nuclei N/NX Class Processors and the core peripherals with #define that are set prior to including the file *nmsis\_core.h*.

The following tables list the #define along with the possible values for N200, N300, N600, NX600. If these #define are missing default values are used.

# nmsis\_core.h

Table 6: Macros used in nmsis\_core.h

| #define                          | Value Range        | Default    | Description                                                                                                                                                                                                                                                                                                   |
|----------------------------------|--------------------|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| NUCLEI_N_REV OR<br>NUCLEI_NX_REV | 0x0100  <br>0x0104 | 0x0100     | <ul> <li>For Nuclei N class device, defineNU-CLEI_N_REV, for NX class device, defineNUCLEI_NX_REV.</li> <li>Core revision number ([15:8] revision number, [7:0] patch number), 0x0100 -&gt; 1.0, 0x0104 -&gt; 1.4</li> </ul>                                                                                  |
| SYSTIMER_PRESENT                 | 0 1                | 1          | Define whether Priviate System Timer is present or not. This SysTimer is a Memory Mapped Unit.                                                                                                                                                                                                                |
| SYSTIMER_BASEADDR                | •                  | 0x02000000 | Base address of the System Timer Unit.                                                                                                                                                                                                                                                                        |
| ECLIC_PRESENT                    | 0 1                | 1          | Define whether Enhanced Core Local Interrupt<br>Controller (ECLIC) Unit is present or not                                                                                                                                                                                                                     |
| ECLIC_BASEADDR                   | •                  | 0x0C000000 | Base address of the ECLIC unit.                                                                                                                                                                                                                                                                               |
| ECLIC_INTCTLBITS                 | 18                 | 1          | Define the number of hardware bits are actually implemented in the clicintctl registers.                                                                                                                                                                                                                      |
| ECLIC_INTNUM                     | 1 1024             | 1          | Define the total interrupt number(including the internal core interrupts) of ECLIC Unit                                                                                                                                                                                                                       |
| PMP_PRESENT                      | 0 1                | 0          | Define whether Physical Memory Protection (PMP) Unit is present or not.                                                                                                                                                                                                                                       |
| PMP_ENTRY_NUM                    | 8 or 16            | 8          | Define the numbers of PMP entries.                                                                                                                                                                                                                                                                            |
| FPU_PRESENT                      | 02                 | 0          | Define whether Floating Point Unit (FPU) is present or not.  • 0: Not present  • 1: Single precision FPU present  • 2: Double precision FPU present                                                                                                                                                           |
| DSP_PRESENT                      | 01                 | 0          | Define whether Digital Signal Processing Unit (DSP) is present or not.                                                                                                                                                                                                                                        |
| ICACHE_PRESENT                   | 01                 | 0          | Define whether I-Cache Unit is present or not.                                                                                                                                                                                                                                                                |
| DCACHE_PRESENT                   | 01                 | 0          | Define whether D-Cache Unit is present or not.                                                                                                                                                                                                                                                                |
| Vendor_SysTickConfig             | 01                 | 0          | IfSYSTIMER_PRESENT is 1, then theVendor_SysTickConfig can be set to 0, otherwise it can only set to 1.  If this define is set to 1, then the default SysTick_Config and SysTick_Reload function is excluded.  In this case, the file Device.h must contain a vendor specific implementation of this function. |

#### **NMSIS Version and Processor Information**

The following shows the defines in the *nmsis\_core.h* file that may be used in the *NMSIS-Core Device Templates* (page 12) to verify a minimum version or ensure that the right Nuclei N/NX class is used.

#### **Device Peripheral Access Layer**

The Device Header File <device.h> (page 48) contains for each peripheral:

- · Register Layout Typedef
- · Base Address
- · Access Definitions

The section *Peripheral Access* (page 294) shows examples for peripheral definitions.

# **Device.h Template File**

Here we provided Device.h template file as below:

```
* Copyright (c) 2009-2019 Arm Limited. All rights reserved.
2
    * Copyright (c) 2019 Nuclei Limited. All rights reserved.
3
   * SPDX-License-Identifier: Apache-2.0
5
   * Licensed under the Apache License, Version 2.0 (the License); you may
7
   * not use this file except in compliance with the License.
   * You may obtain a copy of the License at
10
   * www.apache.org/licenses/LICENSE-2.0
11
12
   * Unless required by applicable law or agreed to in writing, software
13
   * distributed under the License is distributed on an AS IS BASIS, WITHOUT
14
   * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15
   * See the License for the specific language governing permissions and
   * limitations under the License.
17
18
   19
   * @file
             <Device>.h
20
   * @brief NMSIS Nuclei N/NX Core Peripheral Access Layer Header File for
21
              Device <Device>
22
   * @version V1.10
23
   * @date
             30. July 2021
24
   *******************************
25
26
  #ifndef ___<Device>_H__
                          /* TODO: replace '<Device>' with your device name */
27
  #define ___<Device>_H__
28
29
  #ifdef __cplusplus
30
  extern "C" {
31
32
  #endif
33
  /* TODO: replace '<Vendor>' with vendor name; add your doxygen comment */
34
   /** @addtogroup <Vendor>
```

```
* @{
36
    */
37
38
39
  /* TODO: replace '<Device>' with device name; add your doxygen comment */
  /** @addtogroup <Device>
41
    * @{
42
    */
43
44
45
  /** @addtogroup Configuration_of_NMSIS
   * @{
47
   */
49
  /** \brief SoC Download mode definition */
50
  /* TODO: device vendor can extend more download modes */
51
  typedef enum {
52
     53
     DOWNLOAD_MODE_FLASH = 1,
54
     DOWNLOAD_MODE_ILM = 2,
                                   /*!< ilm download mode */
55
     DOWNLOAD\_MODE\_DDR = 3,
                                   /*!< ddr download mode */
56
     DOWNLOAD_MODE_MAX,
57
  } DownloadMode_Type;
58
59
   /* =========
                                            Interrupt Number Definition
61
                        62
   g------
63
  typedef enum IRQn {
64
  /* ======================== Nuclei N/NX Specific Interrupt Numbers
65
   ______ */
66
  /* TODO: use this N/NX interrupt numbers if your device is a Nuclei N/NX device */
67
     Reserved0_IRQn = 0, /*!< Internal reserved */
                          = 1,
     Reserved1_IRQn
                                            /*!< Internal reserved */
69
                          = 2,
     Reserved2 IROn
                                            /*!< Internal reserved */
70
                          = 3,
     SvsTimerSW IROn
                                            /*!< System Timer SW interrupt */
71
                           = 4,
                                            /*!< Internal reserved */
    Reserved3 IROn
72
                          = 5,
                                            /*!< Internal reserved */
     Reserved4_IRQn
73
                           = 6,
                                            /*!< Internal reserved */
74
     Reserved5_IRQn
                                            /*!< System Timer Interrupt */
     SysTimer_IRQn
                               7,
75
                           =
     Reserved6_IRQn
                                            /*!< Internal reserved */
                               8,
76
     Reserved7 IROn
                              9,
                                            /*!< Internal reserved */
77
                           = 10,
                                            /*!< Internal reserved */
     Reserved8_IRQn
78
                           = 11,
                                            /*!< Internal reserved */
     Reserved9_IRQn
79
     Reserved10_IRQn
                          = 12,
                                            /*!< Internal reserved */
     Reserved11_IRQn
                          = 13,
                                            /*!< Internal reserved */
81
     Reserved12_IRQn
                          = 14,
                                            /*!< Internal reserved */
82
     Reserved13 IROn
                          = 15,
                                            /*!< Internal reserved */
83
     Reserved14 IROn
                          = 16.
                                            /*!< Internal reserved */
84
                           = 17,
                                            /*!< Internal reserved */
     Reserved15 IROn
85
                           = 18,
                                            /*!< Internal reserved */
     Reserved16_IRQn
```

```
(continued from previous page)
87
   88
   ------ */
   /* TODO: add here your device specific external interrupt numbers. 19~1023 is.
   →reserved number for user. Maxmum interrupt supported
          could get from clicinfo.NUM_INTERRUPT. According the interrupt handlers_
90
   →defined in startup_Device.s
         eg.: Interrupt for Timer#1 eclic_tim0_handler -> TIM0_IRQn */
91
      <DeviceInterrupt>_IRQn = 19,
                                            /*!< Device Interrupt */
92
93
                                             /* Max SoC interrupt Number */
     SOC_INT_MAX,
   } IRQn_Type;
                                               Exception Code Definition
                         ----- */
100
   typedef enum EXCn {
101
   /* ============================== Nuclei N/NX Specific Exception Code
102
   →========= */
103
      InsUnalign_EXCn
                              0,
                                            /*!< Instruction address misaligned
                             1,
      InsAccFault EXCn
                                            /*!< Instruction access fault */
104
                          = 2,
                                            /*! < Illegal instruction */
105
     IlleIns_EXCn
                          = 3,
                                            /*!< Beakpoint */
      Break_EXCn
106
                          = 4,
                                            /*!< Load address misaligned */</pre>
      LdAddrUnalign_EXCn
107
                          = 5,
                                            /*!< Load access fault */
108
      LdFault_EXCn
      StAddrUnalign_EXCn
                             6,
                                            /*! < Store or AMO address
109
   →misaligned */
     StAccessFault_EXCn
                                            /*! < Store or AMO access fault */
                             7,
110
                                           /*! < Environment call from User
     UmodeEcall_EXCn
                          = 8.
111
   →mode */
    MmodeEcall_EXCn
                          = 11,
                                           /*!< Environment call from Machine
   →mode */
    NMI_EXCn
                          = 0xfff,
                                          /*!< NMI interrupt*/
113
   } EXCn_Type;
114
115
116
   Processor and Core Peripheral Section ...
117
                         ----- */
118
   /* ========= Configuration of the Nuclei N/NX Processor and Core
   →Peripherals ======== */
   /* TODO: set the defines according your Device */
121
   /* TODO: define the correct core revision
122
          __NUCLEI_N_REV if your device is a Nuclei-N Class device
123
          __NUCLEI_NX_REV if your device is a Nuclei-NX Class device
```

```
125
                                                     /*!< Core Revision rXpY,_
   #define NUCLEI N# REV
                                0x0100
126
   →version X.Y, change N# to N for Nuclei N class cores, change N# to NX for Nuclei NX.
   ⇔cores */
   /* TODO: define the correct core features for the <Device> */
127
   #define ___ECLIC_PRESENT
                                 7
                                                     /*! < Set to 1 if ECLIC is_
128
   ⇔present */
   #define ___ECLIC_BASEADDR
                                0x0C000000UL
                                                     /*! < Set to ECLIC baseaddr of...
129
   →your device */
   #define ___ECLIC_INTCTLBITS
                                8
                                                     /*! < Set to 1 - 8, the number_
130
   →of hardware bits are actually implemented in the clicintctl registers. */
   #define ___ECLIC_INTNUM
                                 51
                                                     /*!< Set to 1 - 1024, total
   →interrupt number of ECLIC Unit */
   #define ___SYSTIMER_PRESENT
                                                     /*! < Set to 1 if System Timer.
132
   ⇒is present */
   /*! < Set to SysTimer baseaddr.
133
   →of your device */
   #define ___FPU_PRESENT
                          1
                                                     /*!< Set to 0, 1, or 2, 0 not...
   →present, 1 single floating point unit present, 2 double floating point unit present
   #define ___DSP_PRESENT
                                                     /*! < Set to 1 if DSP is...
135
   ⇔present */
   #define ___PMP_PRESENT
                                 7
                                                     /*! < Set to 1 if PMP is...
136
   ⇔present */
   #define ___PMP_ENTRY_NUM
                                 16
                                                     /*!< Set to 8 or 16, the
   →number of PMP entries */
   #define ___ICACHE_PRESENT
                                0
                                                     /*! < Set to 1 if I-Cache is
138
   ⇔present */
                                                     /*!< Set to 1 if D-Cache is
   #define ___DCACHE_PRESENT
                                 0
139
   ⇔present */
   #define ___Vendor_SysTickConfig 0
                                                     /*!< Set to 1 if different...
140
   →SysTick Config is used */
141
   /** @} */ /* End of group Configuration_of_NMSIS */
142
143
144
   #include <nmsis_core.h>
145
   /* TODO: include your system_<Device>.h file
          replace '<Device>' with your device name */
                                             /*!< <Device> System */
   #include "system_<Device>.h"
148
149
150
   /* ================================== Start of section using anonymous unions ...
151
   #if defined (__GNUC__)
152
153
    /* anonymous unions are enabled by default */
154
    #warning Not supported compiler type
155
   #endif
156
157
158
             _____
   /* =========
                                             Device Specific Peripheral Section
160
161
    (continues on next page)
```

```
/* Macros for memory access operations */
162
    #define _REG8P(p, i)
                                                   ((volatile uint8_t *) ((uintptr_t)((p) +...
163
    → (i))))
   #define _REG16P(p, i)
                                                   ((volatile uint16_t *) ((uintptr_t)((p) +_
164
    → (i))))
    #define _REG32P(p, i)
                                                   ((volatile uint32_t *) ((uintptr_t)((p) +_
165
    \hookrightarrow (i))))
   #define _REG64P(p, i)
                                                   ((volatile uint64_t *) ((uintptr_t)((p) +
166
    (i))))
                                                   (*(_REG8P(p, i)))
   #define _REG8(p, i)
167
   #define _REG16(p, i)
                                                   (*(_REG16P(p, i)))
   #define _REG32(p, i)
                                                   (*(_REG32P(p, i)))
   #define _REG64(p, i)
                                                   (*(_REG64P(p, i)))
                                                   _REG8((addr), 0)
   #define REG8(addr)
171
   #define REG16(addr)
                                                  REG16((addr), 0)
172
   #define REG32(addr)
                                                   _REG32((addr), 0)
173
   #define REG64 (addr)
                                                   _REG64((addr), 0)
174
    /* Macros for address type convert and access operations */
176
    #define ADDR16(addr)
                                                   ((uint16_t) (uintptr_t) (addr))
177
    #define ADDR32(addr)
                                                   ((uint32_t) (uintptr_t) (addr))
178
   #define ADDR64 (addr)
                                                   ((uint64_t) (uintptr_t) (addr))
179
   #define ADDR8P(addr)
                                                   ((uint8_t *)(uintptr_t)(addr))
180
   #define ADDR16P(addr)
                                                   ((uint16_t *)(uintptr_t)(addr))
181
   #define ADDR32P(addr)
                                                   ((uint32_t *)(uintptr_t)(addr))
   #define ADDR64P(addr)
                                                   ((uint64_t *)(uintptr_t)(addr))
184
   /* Macros for Bit Operations */
185
   #if __riscv_xlen == 32
186
   #define BITMASK MAX
                                                   Oxffffffffiii.
187
   #define BITOFS_MAX
                                                   31
188
   #else
    #define BITMASK MAX
                                                   0xFFFFFFFFFFFFFFULL
190
   #define BITOFS_MAX
191
   #endif
192
193
   // BIT/BITS only support bit mask for __riscv_xlen
   // For RISC-V 32 bit, it support mask 32 bit wide
   // For RISC-V 64 bit, it support mask 64 bit wide
   #define BIT(ofs)
                                                   (0x1III. << (ofs))
197
   #define BITS(start, end)
                                                   ((BITMASK_MAX) << (start) & (BITMASK_MAX)
198
    \hookrightarrow >> (BITOFS_MAX - (end)))
   #define GET_BIT(regval, bitofs)
                                                   (((regval) >> (bitofs)) & 0x1)
199
   #define SET_BIT(regval, bitofs)
                                                   ((regval) |= BIT(bitofs))
200
    #define CLR_BIT(regval, bitofs)
                                                   ((regval) &= (~BIT(bitofs)))
    #define FLIP_BIT(regval, bitofs)
                                                   ((regval) ^= BIT(bitofs))
202
   #define WRITE_BIT(regval, bitofs, val)
                                                   CLR_BIT(regval, bitofs); ((regval) |=
203
    → ((val) << bitofs) & BIT(bitofs))
   #define CHECK_BIT(regval, bitofs)
                                                   (!!((regval) & (0x1UL << (bitofs))))
204
   #define GET_BITS(regval, start, end)
                                                   (((regval) & BITS((start), (end))) >>_
205
    \hookrightarrow (start))
   #define SET_BITS(regval, start, end)
                                                   ((regval) \mid = BITS((start), (end)))
   #define CLR_BITS(regval, start, end)
                                                   ((regval) &= (~BITS((start), (end))))
   #define FLIP BITS (regval, start, end)
                                                   ((regval) ^= BITS((start), (end)))
208
   #define WRITE_BITS(regval, start, end, val) CLR_BITS(regval, start, end); ((regval)...
209
    \Rightarrow |= ((val) << start) & BITS((start), (end)))
   #define CHECK_BITS_ALL(regval, start, end) (!((~(regval)) & BITS((start), (end))))
```

```
#define CHECK BITS ANY (regval, start, end) ((regval) & BITS ((start), (end)))
211
212
   #define BITMASK_SET(regval, mask)
                                              ((regval) |= (mask))
213
   #define BITMASK_CLR(regval, mask)
                                              ((regval) &= (~(mask)))
214
   #define BITMASK_FLIP(regval, mask)
                                              ((regval) ^= (mask))
   #define BITMASK_CHECK_ALL(regval, mask)
                                              (!((~(regval)) & (mask)))
216
   #define BITMASK_CHECK_ANY(regval, mask)
                                              ((regval) & (mask))
217
218
   /** @addtogroup Device_Peripheral_peripherals
219
     * @ {
220
    */
221
   /* TODO: add here your device specific peripheral access structure typedefs
           following is an example for UART */
224
225
226
    /+ -----
                                                      UART
                             228
    → * /
229
    * @brief UART (UART)
232
   typedef struct {
                                              /*!< (@ 0x40000000) UART Structure
233
     __IOM uint32_t TXFIFO;
                                              /*!< (@ 0x00000000) UART TX FIFO
234
     __IM uint32_t RXFIFO;
                                              /*!< (@ 0x00000004) UART RX FIFO
235
     __IOM uint32_t TXCTRL;
                                              /*!< (@ 0x00000008) UART TX FIFO control ...
236
              */
     __OM uint32_t RXCTRL;
                                              /*!< (@ 0x0000000C) UART RX FIFO control ...
237
            */
     __IM uint32_t IE;
                                              /*!< (@ 0x00000010) UART Interrupt Enable
    →flag
            */
    __IM uint32_t IP;
                                              /*!< (@ 0x00000018) TART Interrupt.
239
    →Pending flag
                    */
    __IM uint32_t DIV;
                                             /*!< (@ 0x00000018) UART Baudrate Divider...
240
    ★/
   } <DeviceAbbreviation>_UART_TypeDef;
241
   /*@}*/ /* end of group <Device>_Peripherals */
243
244
245
   /* ======= End of section using anonymous unions ...
246
    ------ */
   #if defined (__GNUC__)
247
    /* anonymous unions are enabled by default */
249
    #warning Not supported compiler type
250
   #endif
251
252
253
```

```
(continued from previous page)
   254
   → */
255
                                             Device Specific Peripheral Address Map
256
   → */
257
258
   /* TODO: add here your device peripherals base addresses
259
           following is an example for timer */
   /** @addtogroup Device_Peripheral_peripheralAddr
   * @{
262
263
264
   /* Peripheral and SRAM base address */
265
   #define <DeviceAbbreviation>_FLASH_BASE
                                                (0x00000000UL)
    → /*!< (FLASH ) Base Address */
   #define <DeviceAbbreviation>_SRAM_BASE
                                                (0x20000000UL)
267
   → /*!< (SRAM ) Base Address */
   #define <DeviceAbbreviation>_PERIPH_BASE
                                                (0x40000000UL)
268
   → /*!< (Peripheral) Base Address */
269
   /* Peripheral memory map */
   #define <DeviceAbbreviation>UARTO_BASE
                                                (<DeviceAbbreviation>_PERIPH_BASE)
          /*!< (UART 0 ) Base Address */
   #define <DeviceAbbreviation>I2C_BASE
                                                (<DeviceAbbreviation>_PERIPH_BASE +
272
   \rightarrow 0x0800) /*!< (I2C ) Base Address */
   #define <DeviceAbbreviation>GPIO_BASE
                                                (<DeviceAbbreviation>_PERIPH_BASE +
273
   \rightarrow 0x1000) /*!< (GPIO ) Base Address */
   /** @} */ /* End of group Device_Peripheral_peripheralAddr */
275
276
277
   /* =========
                                                     Peripheral declaration
                            ======== */
280
   4-----
   → * /
281
   /* TODO: add here your device peripherals pointer definitions
283
           following is an example for uart0 */
284
   /** @addtogroup Device Peripheral declaration
285
    * @{
286
287
   #define <DeviceAbbreviation>_UART0
                                                ((<DeviceAbbreviation>_TMR_TypeDef *)
   → <DeviceAbbreviation>UARTO_BASE)
289
290
   /** @} */ /* End of group <Device> */
291
292
```

(continues on next page)

/\*\* @} \*/ /\* End of group <Vendor> \*/

```
294 | #ifdef __cplusplus | 296 | | #endif | 298 | #endif | /* __</br/>
299 | #endif /* __</br/>
290 | #endif /* __</br/>
291 | 292 | 293 | 294 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 | 295 |
```

# 2.4 Register Mapping

The table below associates some common register names used in NMSIS to the register names used in Nuclei ISA Spec<sup>12</sup>.

Table 7: Register names used in NMSIS related with the register names in ISA

| NMSIS Register Name                             | N200, N300, N600, NX600 | Register Description                       |  |  |  |
|-------------------------------------------------|-------------------------|--------------------------------------------|--|--|--|
| Enhanced Core Local Interrupt Controller(ECLIC) |                         |                                            |  |  |  |
| ECLIC->CFG                                      | cliccfg                 | ECLIC Global Configuration Register        |  |  |  |
| ECLIC->INFO                                     | clicinfo                | ECLIC Global Information Register          |  |  |  |
| ECLIC->MTH                                      | mth                     | ECLIC Global Machine Mode Threshold Reg-   |  |  |  |
|                                                 |                         | ister                                      |  |  |  |
| ECLIC->CTRL[i].INTIP                            | clicintip[i]            | ECLIC Interrupt Pending Register           |  |  |  |
| ECLIC->CTRL[i].INTIE                            | clicintie[i]            | ECLIC Interrupt Enable Register            |  |  |  |
| ECLIC-                                          | clicintattr[i]          | ECLIC Interrupt Attribute Register         |  |  |  |
| >CTRL[i].INTATTR                                |                         |                                            |  |  |  |
| ECLIC-                                          | clicintctl[i]           | ECLIC Interrupt Input Control Register     |  |  |  |
| >CTRL[i].INTCTRL                                |                         |                                            |  |  |  |
| System Timer Unit(SysTimer)                     |                         |                                            |  |  |  |
| SysTimer->MTIMER                                | mtime_hi<<32 + mtime_lo | System Timer current value 64bits Register |  |  |  |
| SysTimer->MTIMERCMP                             | mtimecmp_hi<<32 +       | System Timer compare value 64bits Register |  |  |  |
|                                                 | mtimecmp_lo             |                                            |  |  |  |
| SysTimer->MSTOP                                 | mstop                   | System Timer Stop Register                 |  |  |  |
| SysTimer->MSIP                                  | msip                    | System Timer SW interrupt Register         |  |  |  |

<sup>12</sup> https://doc.nucleisys.com/nuclei\_spec/

# 2.5 NMSIS Core API

If you want to access doxygen generated NMSIS Core API, please click NMSIS Core Doxygen API Documentation.

# 2.5.1 Version Control

#### group NMSIS Core VersionControl

Version #define symbols for NMSIS release specific C/C++ source code.

We followed the semantic versioning 2.0.0<sup>13</sup> to control NMSIS version. The version format is **MA-JOR.MINOR.PATCH**, increment the:

- 1. MAJOR version when you make incompatible API changes,
- 2. MINOR version when you add functionality in a backwards compatible manner, and
- 3. PATCH version when you make backwards compatible bug fixes.

The header file nmsis\_version.h is included by each core header so that these definitions are available.

#### **Example Usage for NMSIS Version Check:**

```
#if defined(__NMSIS_VERSION) && (__NMSIS_VERSION >= 0x00010105)
    #warning "Yes, we have NMSIS 1.1.5 or later"
#else
    #error "We need NMSIS 1.1.5 or later!"
#endif
```

## **Unnamed Group**

```
\_NUCLEI_N_REV (0x0104)
```

Nuclei N class core revision number.

Reversion number format: [15:8] revision number, [7:0] patch number

**Attention** This define is exclusive with *NUCLEI NX REV* (page 60)

```
NUCLEI NX REV (0x0100)
```

Nuclei NX class core revision number.

Reversion number format: [15:8] revision number, [7:0] patch number

Attention This define is exclusive with \_\_NUCLEI\_N\_REV (page 60)

<sup>13</sup> https://semver.org/

# **Defines**

```
__NMSIS_VERSION_MAJOR (1U)
Represent the NMSIS major version.
The NMSIS major version can be used to differentiate between NMSIS major releases.
__NMSIS_VERSION_MINOR (0U)
Represent the NMSIS minor version.
The NMSIS minor version can be used to query a NMSIS release update including new features.
__NMSIS_VERSION_PATCH (1U)
Represent the NMSIS patch version.
The NMSIS patch version can be used to show bug fixes in this package.
__NMSIS_VERSION ((__NMSIS_VERSION_MAJOR (page 61) << 16U) | (__NMSIS_VERSION_MINOR << 8) | __NMSIS_VERSION_MINOR << 80 | __NMSIS_VERSION_MINOR <
```

NMSIS Version format: MAJOR.MINOR.PATCH

Represent the NMSIS Version.

- MAJOR: \_\_NMSIS\_VERSION\_MAJOR (page 61), stored in bits [31:16] of \_\_NM-SIS\_VERSION (page 61)
- MINOR: \_\_NMSIS\_VERSION\_MINOR (page 61), stored in bits [15:8] of \_\_NMSIS\_VERSION (page 61)
- PATCH: \_\_NMSIS\_VERSION\_PATCH (page 61), stored in bits [7:0] of \_\_NMSIS\_VERSION (page 61)

# 2.5.2 Compiler Control

## group NMSIS\_Core\_CompilerControl

Compiler agnostic #define symbols for generic c/c++ source code.

The NMSIS-Core provides the header file **nmsis\_compiler.h** with consistent #define symbols for generate C or C++ source files that should be compiler agnostic. Each NMSIS compliant compiler should support the functionality described in this section.

The header file **nmsis\_compiler.h** is also included by each Device Header File <device.h> so that these definitions are available.

#### **Defines**

```
__has_builtin(x) (0)
__ASM __asm
    Pass information from the compiler to the assembler.
__INLINE inline
    Recommend that function should be inlined by the compiler.
__STATIC_INLINE static inline
    Define a static function that may be inlined by the compiler.
__STATIC_FORCEINLINE __attribute__((always_inline)) static inline
    Define a static function that should be always inlined by the compiler.
__NO_RETURN __attribute__((__noreturn__))
    Inform the compiler that a function does not return.
```

2.5. NMSIS Core API 61

```
USED attribute ((used))
     Inform that a variable shall be retained in executable image.
___WEAK __attribute__((weak))
     restrict pointer qualifier to enable additional optimizations.
 VECTOR SIZE (x) attribute ((vector size(x)))
     specified the vector size of the variable, measured in bytes
PACKED attribute ((packed, aligned(1)))
     Request smallest possible alignment.
 PACKED_STRUCT struct __attribute__((packed, aligned(1)))
     Request smallest possible alignment for a structure.
PACKED_UNION union __attribute__((packed, aligned(1)))
     Request smallest possible alignment for a union.
__UNALIGNED_UINT16_WRITE (addr, val) (void)((((struct T_UINT16_WRITE (page 63) *)(void)
                                   *)(addr))->v) = (val))
     Pointer for unaligned write of a uint16_t variable.
__UNALIGNED_UINT16_READ (addr) (((const struct T_UINT16_READ (page 63) *)(const void
                                  *)(addr))->v)
     Pointer for unaligned read of a uint16_t variable.
__UNALIGNED_UINT32_WRITE (addr, val) (void)((((struct T_UINT32_WRITE (page 63) *)(void)
                                   *)(addr))->v) = (val))
     Pointer for unaligned write of a uint32 t variable.
 UNALIGNED UINT32 READ (addr) (((const struct T UINT32 READ (page 63) *)(const void
                                  *)(addr))->v)
     Pointer for unaligned read of a uint32_t variable.
__ALIGNED (x) __attribute__((aligned(x)))
     Minimum x bytes alignment for a variable.
___RESTRICT __restrict
     restrict pointer qualifier to enable additional optimizations.
__COMPILER_BARRIER() __ASM (page 61) volatile("":::"memory")
     Barrier to prevent compiler from reordering instructions.
___USUALLY (exp) __builtin_expect((exp), 1)
     provide the compiler with branch prediction information, the branch is usually true
 RARELY (exp) builtin expect((exp), 0)
     provide the compiler with branch prediction information, the branch is rarely true
___INTERRUPT __attribute__((interrupt))
     Use this attribute to indicate that the specified function is an interrupt handler.
```

## **Variables**

```
__PACKED_STRUCT T_UINT16_WRITE
    Packed struct for unaligned uint16_t write access.

__PACKED_STRUCT T_UINT16_READ
    Packed struct for unaligned uint16_t read access.

__PACKED_STRUCT T_UINT32_WRITE
    Packed struct for unaligned uint32_t write access.

__PACKED_STRUCT T_UINT32_READ
    Packed struct for unaligned uint32_t read access.
```

# 2.5.3 Core CSR Register Access

Click Nuclei Core CSR<sup>14</sup> to learn about Core CSR in Nuclei ISA Spec.

# group NMSIS\_Core\_CSR\_Register\_Access

Functions to access the Core CSR Registers.

The following functions or macros provide access to Core CSR registers.

- Core CSR Encodings (page 73)
- Core CSR Registers (page 66)

#### **Defines**

```
___RV_CSR_SWAP (csr, val)
```

CSR operation Macro for csrrw instruction.

Read the content of csr register to \_v, then write content of val into csr register, then return \_v

#### **Parameters**

- csr CSR macro definition defined in *Core CSR Registers* (page 66), eg. *CSR\_MSTATUS* (page 67)
- val value to store into the CSR register

**Returns** the CSR register value before written

```
\__RV_CSR_READ (csr)
```

CSR operation Macro for csrr instruction.

Read the content of csr register to \_\_v and return it

## **Parameters**

• csr – CSR macro definition defined in *Core CSR Registers* (page 66), eg. *CSR\_MSTATUS* (page 67)

**Returns** the CSR register value

```
___RV_CSR_WRITE (csr, val)
```

CSR operation Macro for csrw instruction.

Write the content of val to csr register

#### **Parameters**

2.5. NMSIS Core API 63

<sup>14</sup> https://doc.nucleisys.com/nuclei\_spec/isa/core\_csr.html

- csr CSR macro definition defined in *Core CSR Registers* (page 66), eg. *CSR\_MSTATUS* (page 67)
- val value to store into the CSR register

## RV CSR READ SET (csr, val)

CSR operation Macro for csrrs instruction.

Read the content of csr register to \_\_v, then set csr register to be \_\_v | val, then return \_\_v

#### **Parameters**

- csr CSR macro definition defined in *Core CSR Registers* (page 66), eg. *CSR\_MSTATUS* (page 67)
- val Mask value to be used wih csrrs instruction

**Returns** the CSR register value before written

```
___RV_CSR_SET (csr, val)
```

CSR operation Macro for csrs instruction.

Set csr register to be csr\_content | val

#### **Parameters**

- csr CSR macro definition defined in *Core CSR Registers* (page 66), eg. *CSR\_MSTATUS* (page 67)
- val Mask value to be used wih csrs instruction

#### RV CSR READ CLEAR (csr, val)

CSR operation Macro for csrrc instruction.

Read the content of csr register to \_\_v, then set csr register to be \_\_v & ~val, then return \_\_v

## **Parameters**

- csr CSR macro definition defined in *Core CSR Registers* (page 66), eg. *CSR\_MSTATUS* (page 67)
- val Mask value to be used wih csrrc instruction

**Returns** the CSR register value before written

# RV CSR CLEAR (csr, val)

CSR operation Macro for csrc instruction.

Set csr register to be csr\_content & ~val

#### **Parameters**

- csr CSR macro definition defined in *Core CSR Registers* (page 66), eg. *CSR\_MSTATUS* (page 67)
- val Mask value to be used wih csrc instruction

## **Functions**

\_\_STATIC\_FORCEINLINE void \_\_enable\_irq (void)

Enable IRQ Interrupts.

Enables IRQ interrupts by setting the MIE-bit in the MSTATUS Register.

**Remark** Can only be executed in Privileged modes.

\_\_STATIC\_FORCEINLINE void \_\_disable\_irq (void)

Disable IRQ Interrupts.

Disables IRQ interrupts by clearing the MIE-bit in the MSTATUS Register.

**Remark** Can only be executed in Privileged modes.

\_\_STATIC\_FORCEINLINE uint64\_t \_\_get\_rv\_cycle (void)

Read whole 64 bits value of mcycle counter.

This function will read the whole 64 bits of MCYCLE register

Remark It will work for both RV32 and RV64 to get full 64bits value of MCYCLE

**Returns** The whole 64 bits value of MCYCLE

\_\_STATIC\_FORCEINLINE uint64\_t \_\_get\_rv\_instret (void)

Read whole 64 bits value of machine instruction-retired counter.

This function will read the whole 64 bits of MINSTRET register

**Remark** It will work for both RV32 and RV64 to get full 64bits value of MINSTRET

**Returns** The whole 64 bits value of MINSTRET

\_\_STATIC\_FORCEINLINE uint64\_t \_\_get\_rv\_time (void)

Read whole 64 bits value of real-time clock.

This function will read the whole 64 bits of TIME register

Remark It will work for both RV32 and RV64 to get full 64bits value of TIME

**Attention** only available when user mode available

**Returns** The whole 64 bits value of TIME CSR

# 2.5.4 Core CSR Encoding

Click Nuclei Core CSR<sup>15</sup> to learn about Core CSR in Nuclei ISA Spec.

2.5. NMSIS Core API 65

<sup>15</sup> https://doc.nucleisys.com/nuclei\_spec/isa/core\_csr.html

# **Core CSR Register Definitions**

# group NMSIS\_Core\_CSR\_Registers

NMSIS Core CSR Register Definitions.

The following macros are used for CSR Register Defintions.

# **Defines**

 $\mathtt{CSR\_USTATUS}\ 0x0$ 

 $\textbf{CSR\_FFLAGS}~0x1$ 

 $\textbf{CSR\_FRM}~0x2$ 

 ${\tt CSR\_FCSR}~0x3$ 

 $\mathtt{CSR\_CYCLE}\ 0xc00$ 

 ${\tt CSR\_TIME}~0xc01$ 

 ${\tt CSR\_INSTRET}~0xc02$ 

 ${\tt CSR\_HPMCOUNTER3}~0xc03$ 

CSR\_HPMCOUNTER4 0xc04

CSR\_HPMCOUNTER5 0xc05

 ${\tt CSR\_HPMCOUNTER6}~0xc06$ 

 $\textbf{CSR\_HPMCOUNTER7} \ 0xc07$ 

CSR\_HPMCOUNTER8 0xc08

 $\textbf{CSR\_HPMCOUNTER9}~0xc09$ 

CSR\_HPMCOUNTER10 0xc0a

CSR\_HPMCOUNTER11 0xc0b

CSR\_HPMCOUNTER12 0xc0c

 $\textbf{CSR\_HPMCOUNTER13}~0xc0d$ 

 $\textbf{CSR\_HPMCOUNTER14}~0xc0e$ 

 $\textbf{CSR\_HPMCOUNTER15}~0xc0f$ 

CSR\_HPMCOUNTER16 0xc10

CSR\_HPMCOUNTER17 0xc11

CSR\_HPMCOUNTER18 0xc12

CSR\_HPMCOUNTER19 0xc13

 $\mathtt{CSR\_HPMCOUNTER20}~0xc14$ 

 $\textbf{CSR\_HPMCOUNTER21} \ 0xc15$ 

CSR\_HPMCOUNTER22 0xc16

CSR\_HPMCOUNTER23 0xc17

CSR\_HPMCOUNTER24 0xc18

CSR\_HPMCOUNTER25 0xc19

CSR HPMCOUNTER26 0xc1a

CSR\_HPMCOUNTER27 0xc1b

CSR\_HPMCOUNTER28 0xc1c

CSR\_HPMCOUNTER29 0xc1d

CSR HPMCOUNTER30 0xc1e

CSR HPMCOUNTER31 0xc1f

 ${\tt CSR\_SSTATUS}~0x100$ 

 $\mathtt{CSR\_SIE}\ 0x104$ 

CSR\_STVEC 0x105

CSR\_SSCRATCH 0x140

 $CSR\_SEPC$  0x141

CSR\_SCAUSE 0x142

CSR\_SBADADDR 0x143

 $\mathtt{CSR\_SIP}\ 0x144$ 

 $\mathtt{CSR\_SPTBR}\ 0x180$ 

CSR MSTATUS 0x300

 $\mathtt{CSR\_MISA}\ 0x301$ 

CSR\_MEDELEG 0x302

CSR\_MIDELEG 0x303

**CSR\_MIE** 0x304

CSR\_MTVEC 0x305

CSR\_MCOUNTEREN 0x306

CSR\_MSCRATCH 0x340

CSR MEPC 0x341

CSR\_MCAUSE 0x342

CSR\_MBADADDR 0x343

CSR\_MTVAL 0x343

 $\mathtt{CSR\_MIP}\ 0x344$ 

CSR\_PMPCFG0 0x3a0

CSR\_PMPCFG1 0x3a1

CSR\_PMPCFG2 0x3a2

CSR\_PMPCFG3 0x3a3

CSR\_PMPADDRO 0x3b0

CSR\_PMPADDR1 0x3b1

CSR\_PMPADDR2 0x3b2

CSR\_PMPADDR3 0x3b3

- CSR\_PMPADDR4 0x3b4
- CSR\_PMPADDR5 0x3b5
- CSR\_PMPADDR6 0x3b6
- CSR\_PMPADDR7 0x3b7
- CSR\_PMPADDR8 0x3b8
- CSR\_PMPADDR9 0x3b9
- CSR\_PMPADDR10 0x3ba
- CSR\_PMPADDR11 0x3bb
- CSR\_PMPADDR12 0x3bc
- CSR\_PMPADDR13 0x3bd
- CSR\_PMPADDR14 0x3be
- CSR\_PMPADDR15 0x3bf
- ${\tt CSR\_TSELECT}~0x7a0$
- CSR\_TDATA1 0x7a1
- $CSR\_TDATA2$  0x7a2
- CSR TDATA3 0x7a3
- CSR\_DCSR 0x7b0
- CSR\_DPC 0x7b1
- CSR\_DSCRATCH 0x7b2
- CSR\_MCYCLE 0xb00
- CSR\_MINSTRET 0xb02
- CSR\_MHPMCOUNTER3 0xb03
- CSR\_MHPMCOUNTER4 0xb04
- CSR\_MHPMCOUNTER5 0xb05
- CSR\_MHPMCOUNTER6 0xb06
- CSR\_MHPMCOUNTER7 0xb07
- CSR\_MHPMCOUNTER8 0xb08
- CSR\_MHPMCOUNTER9 0xb09
- CSR\_MHPMCOUNTER10 0xb0a
- CSR\_MHPMCOUNTER11 0xb0b
- CSR\_MHPMCOUNTER12 0xb0c
- CSR\_MHPMCOUNTER13 0xb0d
- CSR\_MHPMCOUNTER14 0xb0e
- CSR\_MHPMCOUNTER15 0xb0f
- $\textbf{CSR\_MHPMCOUNTER16} \ 0xb10$
- CSR\_MHPMCOUNTER17 0xb11

- CSR MHPMCOUNTER18 0xb12
- CSR\_MHPMCOUNTER19 0xb13
- CSR\_MHPMCOUNTER20 0xb14
- CSR\_MHPMCOUNTER21 0xb15
- CSR MHPMCOUNTER22 0xb16
- CSR MHPMCOUNTER23 0xb17
- CSR\_MHPMCOUNTER24 0xb18
- CSR\_MHPMCOUNTER25 0xb19
- CSR MHPMCOUNTER26 0xb1a
- CSR\_MHPMCOUNTER27 0xb1b
- CSR\_MHPMCOUNTER28 0xb1c
- CSR\_MHPMCOUNTER29 0xb1d
- CSR\_MHPMCOUNTER30 0xb1e
- CSR MHPMCOUNTER31 0xb1f
- CSR MUCOUNTEREN 0x320
- CSR MSCOUNTEREN 0x321
- CSR MHPMEVENT3 0x323
- CSR\_MHPMEVENT4 0x324
- CSR\_MHPMEVENT5 0x325
- CSR\_MHPMEVENT6 0x326
- CSR\_MHPMEVENT7 0x327
- $\textbf{CSR\_MHPMEVENT8} \ 0x328$
- $\textbf{CSR\_MHPMEVENT9} \ 0x329$
- $\textbf{CSR\_MHPMEVENT10} \ 0x32a$
- CSR\_MHPMEVENT11 0x32b
- $\textbf{CSR\_MHPMEVENT12} \ 0x32c$
- CSR\_MHPMEVENT13 0x32d
- CSR\_MHPMEVENT14 0x32e
- CSR\_MHPMEVENT15 0x32f
- ${\tt CSR\_MHPMEVENT16}~0x330$
- $CSR\_MHPMEVENT17$  0x331
- CSR\_MHPMEVENT18 0x332
- CSR\_MHPMEVENT19 0x333
- CSR\_MHPMEVENT20 0x334
- CSR MHPMEVENT21 0x335
- CSR\_MHPMEVENT22 0x336

- CSR MHPMEVENT23 0x337
- CSR\_MHPMEVENT24 0x338
- $CSR\_MHPMEVENT25$  0x339
- CSR\_MHPMEVENT26 0x33a
- CSR MHPMEVENT27 0x33b
- CSR MHPMEVENT28 0x33c
- CSR\_MHPMEVENT29 0x33d
- CSR\_MHPMEVENT30 0x33e
- CSR\_MHPMEVENT31 0x33f
- CSR\_MVENDORID 0xf11
- CSR\_MARCHID 0xf12
- CSR\_MIMPID 0xf13
- CSR\_MHARTID 0xf14
- $\mathtt{CSR\_CYCLEH}\ 0xc80$
- CSR\_TIMEH 0xc81
- CSR INSTRETH 0xc82
- CSR\_HPMCOUNTER3H 0xc83
- CSR\_HPMCOUNTER4H 0xc84
- CSR\_HPMCOUNTER5H 0xc85
- CSR\_HPMCOUNTER6H 0xc86
- CSR\_HPMCOUNTER7H 0xc87
- CSR\_HPMCOUNTER8H 0xc88
- CSR\_HPMCOUNTER9H 0xc89
- CSR\_HPMCOUNTER10H 0xc8a
- CSR\_HPMCOUNTER11H 0xc8b
- CSR\_HPMCOUNTER12H 0xc8c
- CSR\_HPMCOUNTER13H 0xc8d
- CSR\_HPMCOUNTER14H 0xc8e
- ${\tt CSR\_HPMCOUNTER15H}~0xc8f$
- CSR\_HPMCOUNTER16H 0xc90
- CSR\_HPMCOUNTER17H 0xc91
- CSR\_HPMCOUNTER18H 0xc92
- CSR\_HPMCOUNTER19H 0xc93
- CSR\_HPMCOUNTER20H 0xc94
- CSR\_HPMCOUNTER21H 0xc95
- CSR\_HPMCOUNTER22H 0xc96

- CSR HPMCOUNTER23H 0xc97
- CSR\_HPMCOUNTER24H 0xc98
- CSR\_HPMCOUNTER25H 0xc99
- CSR\_HPMCOUNTER26H 0xc9a
- CSR HPMCOUNTER27H 0xc9b
- CSR HPMCOUNTER28H 0xc9c
- CSR\_HPMCOUNTER29H 0xc9d
- CSR\_HPMCOUNTER30H 0xc9e
- CSR HPMCOUNTER31H 0xc9f
- CSR\_MCYCLEH 0xb80
- CSR\_MINSTRETH 0xb82
- CSR\_MHPMCOUNTER3H 0xb83
- CSR\_MHPMCOUNTER4H 0xb84
- CSR MHPMCOUNTER5H 0xb85
- CSR\_MHPMCOUNTER6H 0xb86
- CSR MHPMCOUNTER7H 0xb87
- CSR MHPMCOUNTER8H 0xb88
- CSR\_MHPMCOUNTER9H 0xb89
- CSR\_MHPMCOUNTER10H 0xb8a
- CSR MHPMCOUNTER11H 0xb8b
- CSR\_MHPMCOUNTER12H 0xb8c
- CSR\_MHPMCOUNTER13H 0xb8d
- $\textbf{CSR\_MHPMCOUNTER14H}~0xb8e$
- ${\tt CSR\_MHPMCOUNTER15H}~0xb8f$
- CSR MHPMCOUNTER16H 0xb90
- CSR\_MHPMCOUNTER17H 0xb91
- CSR\_MHPMCOUNTER18H 0xb92
- CSR\_MHPMCOUNTER19H 0xb93
- CSR\_MHPMCOUNTER20H 0xb94
- CSR\_MHPMCOUNTER21H 0xb95
- CSR\_MHPMCOUNTER22H 0xb96
- CSR\_MHPMCOUNTER23H 0xb97
- CSR\_MHPMCOUNTER24H 0xb98
- CSR\_MHPMCOUNTER25H 0xb99
- CSR\_MHPMCOUNTER26H 0xb9a
- CSR\_MHPMCOUNTER27H 0xb9b

CSR MHPMCOUNTER28H 0xb9c

CSR\_MHPMCOUNTER29H 0xb9d

CSR\_MHPMCOUNTER30H 0xb9e

 ${\tt CSR\_MHPMCOUNTER31H}~0xb9f$ 

**CSR\_MTVT** 0x307

 $CSR\_MNXTI 0x345$ 

CSR\_MINTSTATUS 0x346

CSR\_MSCRATCHCSW 0x348

CSR\_MSCRATCHCSWL 0x349

CSR\_MCLICBASE 0x350

CSR\_UCODE 0x801

CSR\_MCOUNTINHIBIT 0x320

 $\mathtt{CSR\_MILM\_CTL}\ 0x7C0$ 

CSR\_MDLM\_CTL 0x7C1

CSR\_MECC\_CODE 0x7C2

CSR MNVEC 0x7C3

 $CSR\_MSUBM 0x7C4$ 

CSR\_MDCAUSE 0x7C9

CSR\_MCACHE\_CTL 0x7CA

 $\mathtt{CSR\_MMISC\_CTL}\ 0x7D0$ 

CSR\_MSAVESTATUS 0x7D6

CSR\_MSAVEEPC1 0x7D7

CSR\_MSAVECAUSE1 0x7D8

 $CSR\_MSAVEEPC2 0x7D9$ 

CSR\_MSAVECAUSE2 0x7DA

CSR\_MSAVEDCAUSE1 0x7DB

CSR\_MSAVEDCAUSE2 0x7DC

CSR\_MTLB\_CTL 0x7DD

CSR\_MECC\_LOCK 0x7DE

 $\mathtt{CSR\_PUSHMSUBM}\ 0x7EB$ 

CSR\_MTVT2 0x7EC

CSR\_JALMNXTI 0x7ED

CSR\_PUSHMCAUSE 0x7EE

 $\mathtt{CSR\_PUSHMEPC}\ 0x7\mathrm{EF}$ 

 $\textbf{CSR\_MPPICFG\_INFO}~0x7F0$ 

CSR\_MFIOCFG\_INFO 0x7F1

CSR\_SLEEPVALUE 0x811

**CSR\_TXEVT** 0x812

 $\mathtt{CSR\_WFE}\ 0x810$ 

CSR\_MICFG\_INFO 0xFC0

CSR\_MDCFG\_INFO 0xFC1

CSR\_MCFG\_INFO 0xFC2

CSR\_MTLBCFG\_INFO 0xFC3

CSR\_CCM\_MBEGINADDR 0x7CB

CSR\_CCM\_MCOMMAND 0x7CC

CSR\_CCM\_MDATA 0x7CD

CSR\_CCM\_SUEN 0x7CE

CSR\_CCM\_SBEGINADDR 0x5CB

CSR\_CCM\_SCOMMAND 0x5CC

 $\mathtt{CSR\_CCM\_SDATA}\ 0x5\mathrm{CD}$ 

 ${\tt CSR\_CCM\_UBEGINADDR}~0x4CB$ 

CSR CCM UCOMMAND 0x4CC

CSR\_CCM\_UDATA 0x4CD

 ${\tt CSR\_CCM\_FPIPE}~0x4CF$ 

## **Other Core Related Macros**

## group NMSIS\_Core\_CSR\_Encoding

NMSIS Core CSR Encodings.

The following macros are used for CSR encodings

## **Defines**

 ${\tt MSTATUS\_UIE}~0x00000001$ 

 ${\tt MSTATUS\_SIE}\ 0x00000002$ 

 $MSTATUS\_HIE 0x00000004$ 

 ${\tt MSTATUS\_MIE} \ 0x00000008$ 

 ${\tt MSTATUS\_UPIE}\ 0x00000010$ 

 ${\tt MSTATUS\_SPIE}\ 0x00000020$ 

 ${\tt MSTATUS\_HPIE} \ 0x00000040$ 

MSTATUS MPIE 0x00000080

**MSTATUS\_SPP** 0x00000100

 ${\tt MSTATUS\_MPP}\ 0x00001800$ 

 $MSTATUS_FS 0x00006000$ 

MSTATUS\_XS 0x00018000

MSTATUS\_MPRV 0x00020000

 ${\tt MSTATUS\_PUM}~0x00040000$ 

 $\textbf{MSTATUS\_MXR} \ 0x00080000$ 

 ${\tt MSTATUS\_VM}~0x1F000000$ 

 $MSTATUS32\_SD 0x80000000$ 

 $MSTATUS_FS_INITIAL 0x00002000$ 

 $\textbf{MSTATUS\_FS\_CLEAN} \ 0x00004000$ 

MSTATUS\_FS\_DIRTY 0x00006000

 $\texttt{SSTATUS\_UIE}\ 0x00000001$ 

**SSTATUS\_SIE** 0x00000002

 ${\tt SSTATUS\_UPIE}\ 0x00000010$ 

 $\mathtt{SSTATUS\_SPIE}\ 0x00000020$ 

 $sstatus\_spp 0x00000100$ 

**SSTATUS FS** 0x00006000

**SSTATUS\_XS** 0x00018000

 $\texttt{SSTATUS\_PUM}\ 0x00040000$ 

 $\mathbf{SSTATUS32\_SD}\ 0x80000000$ 

**SSTATUS64\_SD** 0x80000000000000000

 $\mathtt{CSR\_MCACHE\_CTL\_IE}\ 0x00000001$ 

 $\textbf{CSR\_MCACHE\_CTL\_DE} \ 0x00010000$ 

DCSR\_XDEBUGVER (3U<<30)

 $\texttt{DCSR\_NDRESET}$  (1<<29)

DCSR\_FULLRESET (1<<28)

 $DCSR\_EBREAKM (1 << 15)$ 

 $DCSR\_EBREAKH$  (1<<14)

DCSR\_EBREAKS (1<<13)

DCSR\_EBREAKU (1<<12)

DCSR\_STOPCYCLE (1<<10)

DCSR\_STOPTIME (1<<9)

**DCSR\_CAUSE** (7<<6)

DCSR\_DEBUGINT (1<<5)

 $\texttt{DCSR\_HALT}$  (1<<3)

**DCSR\_STEP** (1<<2)

DCSR\_PRV (3<<0)

```
DCSR CAUSE NONE 0
DCSR_CAUSE_SWBP 1
DCSR_CAUSE_HWBP 2
DCSR_CAUSE_DEBUGINT 3
DCSR CAUSE STEP 4
DCSR CAUSE HALT 5
MCONTROL_TYPE (xlen) (0xfULL<<((xlen)-4))
MCONTROL_DMODE (xlen) (1ULL<<((xlen)-5))
MCONTROL_MASKMAX (xlen) (0x3fULL<<((xlen)-11))
MCONTROL_SELECT (1<<19)
MCONTROL_TIMING (1<<18)
MCONTROL_ACTION (0x3f<<12)
MCONTROL_CHAIN (1<<11)
MCONTROL_MATCH (0xf<<7)
MCONTROL_M (1 << 6)
MCONTROL H (1 << 5)
MCONTROL_S (1<<4)
MCONTROL_U (1<<3)
MCONTROL_EXECUTE (1<<2)
MCONTROL_STORE (1<<1)
MCONTROL_LOAD (1<<0)
{\tt MCONTROL\_TYPE\_NONE}\ 0
MCONTROL_TYPE_MATCH 2
MCONTROL ACTION DEBUG MODE 1
MCONTROL_ACTION_TRACE_START 2
MCONTROL_ACTION_TRACE_STOP 3
MCONTROL_ACTION_TRACE_EMIT 4
MCONTROL MATCH EQUAL 0
MCONTROL_MATCH_NAPOT 1
MCONTROL_MATCH_GE 2
MCONTROL_MATCH_LT 3
MCONTROL_MATCH_MASK_LOW 4
MCONTROL_MATCH_MASK_HIGH 5
MIP\_SSIP (1 << IRQ\_S\_SOFT (page 78))
MIP_HSIP (1 << IRQ_H_SOFT (page 78))
```

```
MIP_MSIP (1 \ll IRQ_M\_SOFT (page 78))
MIP\_STIP (1 \ll IRQ\_S\_TIMER (page 78))
\label{eq:mip_htip} \texttt{MIP\_HTIP} \; (1 << IRQ\_H\_TIMER \; (page \; 78))
MIP\_MTIP (1 << IRQ\_M\_TIMER (page 78))
MIP\_SEIP (1 << IRQ\_S\_EXT (page 78))
MIP HEIP (1 \ll IRQ H EXT \text{ (page 78)})
MIP\_MEIP (1 << IRQ\_M\_EXT (page 78))
MIE_SSIE MIP_SSIP (page 75)
MIE_HSIE MIP_HSIP (page 75)
MIE_MSIE MIP_MSIP (page 76)
MIE_STIE MIP_STIP (page 76)
MIE_HTIE MIP_HTIP (page 76)
MIE_MTIE MIP_MTIP (page 76)
MIE_SEIE MIP_SEIP (page 76)
MIE_HEIE MIP_HEIP (page 76)
MIE MEIE MIP MEIP (page 76)
{\tt UCODE\_OV}\,(0x1)
WFE_WFE (0x1)
\texttt{TXEVT}\_\texttt{TXEVT} (0x1)
SLEEPVALUE SLEEPVALUE (0x1)
MCOUNTINHIBIT_IR (1<<2)
MCOUNTINHIBIT_CY (1<<0)
MILM_CTL_ILM_BPA (((1ULL<<((__riscv_xlen)-10))-1)<<10)
MILM_CTL_ILM_RWECC (1<<3)
MILM_CTL_ILM_ECC_EXCP_EN (1<<2)
{\tt MILM\_CTL\_ILM\_ECC\_EN} (1<<1)
MILM\_CTL\_ILM\_EN (1<<0)
MDLM_CTL_DLM_BPA (((1ULL<<((__riscv_xlen)-10))-1)<<10)
MDLM_CTL_DLM_RWECC (1<<3)
MDLM_CTL_DLM_ECC_EXCP_EN (1<<2)
MDLM_CTL_DLM_ECC_EN (1<<1)
\mathtt{MDLM\_CTL\_DLM\_EN} (1<<0)
MSUBM_{PTYP} (0x3 << 8)
MSUBM_TYP (0x3 << 6)
MDCAUSE_MDCAUSE (0x3)
MMISC_CTL_NMI_CAUSE_FFF (1<<9)
```

```
MMISC_CTL_MISALIGN (1<<6)
MMISC_CTL_BPU (1<<3)
MCACHE\_CTL\_IC\_EN (1<<0)
MCACHE\_CTL\_IC\_SCPD\_MOD (1<<1)
MCACHE_CTL_IC_ECC_EN (1<<2)
MCACHE CTL IC ECC EXCP EN (1<<3)
MCACHE_CTL_IC_RWTECC (1<<4)
MCACHE_CTL_IC_RWDECC (1<<5)
MCACHE_CTL_DC_EN (1<<16)
MCACHE_CTL_DC_ECC_EN (1<<17)
MCACHE_CTL_DC_ECC_EXCP_EN (1<<18)
MCACHE_CTL_DC_RWTECC (1<<19)
MCACHE_CTL_DC_RWDECC (1<<20)
MTVT2_MTVT2EN (1<<0)
MTVT2_COMMON_CODE_ENTRY (((1ULL<<((__riscv_xlen)-2))-1)<<2)
MCFG INFO TEE (1 << 0)
MCFG_INFO_ECC (1<<1)
MCFG_INFO_CLIC (1<<2)
MCFG_INFO_PLIC (1<<3)
MCFG_INFO_FIO (1<<4)
MCFG_INFO_PPI (1<<5)
MCFG_INFO_NICE (1<<6)
\texttt{MCFG\_INFO\_ILM} (1<<7)
MCFG INFO DLM (1 << 8)
MCFG_INFO_ICACHE (1<<9)
MCFG_INFO_DCACHE (1<<10)
MICFG_IC_SET (0xF << 0)
\texttt{MICFG\_IC\_WAY} (0x7 << 4)
\texttt{MICFG\_IC\_LSIZE} (0x7<<7)
MICFG_IC_ECC(0x1 << 10)
MICFG_ILM_SIZE (0x1F << 16)
MICFG_ILM_XONLY (0x1<<21)
MICFG_ILM_ECC (0x1<<22)
\mathtt{MDCFG\_DC\_SET} (0xF << 0)
MDCFG DC WAY (0x7 << 4)
```

MDCFG\_DC\_LSIZE (0x7 << 7)

```
MDCFG_DC_ECC(0x1 << 10)
MDCFG_DLM_SIZE (0x1F << 16)
\mathtt{MDCFG\_DLM\_ECC} (0x1 << 21)
MPPICFG_INFO_PPI_SIZE (0x1F<<1)
MPPICFG_INFO_PPI_BPA (((1ULL<<((__riscv_xlen)-10))-1)<<10)
MFIOCFG_INFO_FIO_SIZE (0x1F<<1)
MFIOCFG_INFO_FIO_BPA (((1ULL<<((__riscv_xlen)-10))-1)<<10)
MECC\_LOCK\_ECC\_LOCK(0x1)
MECC_CODE_CODE (0x1FF)
MECC\_CODE\_RAMID (0x1F << 16)
\texttt{MECC\_CODE\_SRAMID} (0x1F << 24)
CCM_SUEN_SUEN (0x1 << 0)
CCM_DATA_DATA (0x7 << 0)
CCM\_COMMAND\_COMMAND (0x1F << 0)
SIP_SSIP MIP_SSIP (page 75)
SIP_STIP MIP_STIP (page 76)
PRV_U 0
PRV_S 1
PRV_H 2
PRV_M 3
VM_MBARE 0
VM_MBB 1
\mathbf{VM\_MBBID}\ 2
VM SV328
VM_SV39 9
VM_SV48 10
IRQ_S_SOFT 1
IRQ_H_SOFT 2
IRQ_M_SOFT 3
IRQ_S_TIMER 5
IRQ_H_TIMER 6
IRQ_M_TIMER 7
{\tt IRQ\_S\_EXT}\ 9
IRQ_H_EXT 10
IRQ M EXT 11
IRQ_COP 12
```

#### IRQ\_HOST 13

## ${\tt FRM\_RNDMODE\_RNE}~0x0$

FPU Round to Nearest, ties to Even.

#### FRM RNDMODE RTZ 0x1

FPU Round Towards Zero.

## $\textbf{FRM\_RNDMODE\_RDN} \ 0x2$

FPU Round Down (towards -inf)

## FRM\_RNDMODE\_RUP 0x3

FPU Round Up (towards +inf)

#### FRM RNDMODE RMM 0x4

FPU Round to nearest, ties to Max Magnitude.

## $\textbf{FRM\_RNDMODE\_DYN}~0x7$

In instruction's rm, selects dynamic rounding mode.

In Rounding Mode register, Invalid

## ${\tt FFLAGS\_AE\_NX}\;(1<\!<\!0)$

FPU Inexact.

## $\textbf{FFLAGS\_AE\_UF} \; (1 << 1)$

FPU Underflow.

## $FFLAGS\_AE\_OF$ (1<<2)

FPU Overflow.

## $\textbf{FFLAGS\_AE\_DZ} \ (1 <\!<\! 3)$

FPU Divide by Zero.

## $FFLAGS\_AE\_NV (1 << 4)$

FPU Invalid Operation.

# **FREG** (idx) f##idx

Floating Point Register f0-f31, eg.

f0 -> FREG(0) (page 79)

 $\mathtt{PMP} \mathtt{\_R} \ 0x01$ 

**PMP\_W** 0x02

**PMP\_X** 0x04

**PMP\_A** 0x18

PMP\_A\_TOR 0x08

 $PMP_A_NA4 0x10$ 

 $PMP_A_NAPOT 0x18$ 

**PMP\_L** 0x80

PMP\_SHIFT 2

**PMP\_COUNT** 16

**PTE\_V** 0x001

PTE\_R 0x002

 $PTE_W 0x004$ 

```
PTE X 0x008
PTE_U 0x010
PTE_G 0x020
PTE_A 0x040
\mathbf{PTE} \mathbf{\_D} \ 0x080
PTE_SOFT 0x300
{\tt PTE\_PPN\_SHIFT}\ 10
PTE_TABLE (PTE) (((PTE) & (PTE_V (page 79) | PTE_R (page 79) | PTE_W (page 79) | PTE_X
            (page 80)) = PTE_V (page 79)
CAUSE MISALIGNED FETCH 0x0
    End of Doxygen Group NMSIS_Core_CSR_Registers.
CAUSE_FAULT_FETCH 0x1
\textbf{CAUSE\_ILLEGAL\_INSTRUCTION} \ 0x2
CAUSE_BREAKPOINT 0x3
CAUSE_MISALIGNED_LOAD 0x4
CAUSE_FAULT_LOAD 0x5
\textbf{CAUSE\_MISALIGNED\_STORE} \ 0x6
CAUSE FAULT STORE 0x7
CAUSE USER ECALL 0x8
CAUSE_SUPERVISOR_ECALL 0x9
{f CAUSE\_HYPERVISOR\_ECALL}~0xa
\textbf{CAUSE\_MACHINE\_ECALL} \ 0xb
DCAUSE_FAULT_FETCH_PMP 0x1
DCAUSE_FAULT_FETCH_INST 0x2
DCAUSE_FAULT_LOAD_PMP 0x1
DCAUSE_FAULT_LOAD_INST 0x2
DCAUSE_FAULT_LOAD_NICE 0x3
DCAUSE_FAULT_STORE_PMP 0x1
DCAUSE_FAULT_STORE_INST 0x2
```

# 2.5.5 Register Define and Type Definitions

#### group NMSIS\_Core\_Registers

Type definitions and defines for core registers.

## **Defines**

```
RISCV XLEN 32
```

Refer to the width of an integer register in bits(either 32 or 64)

## **Typedefs**

```
typedef uint32_t rv_csr_t
```

Type of Control and Status Register(CSR), depends on the XLEN defined in RISC-V.

#### Core

```
group NMSIS_Core_Base_Registers
```

Type definitions and defines for base core registers.

```
union CSR_MISA_Type
```

#include <core\_feature\_base.h> Union type to access MISA register.

## **Public Members**

```
rv_csr_t (page 81) a
    bit: 0 Atomic extension
rv_csr_t (page 81) b
    bit: 1 Tentatively reserved for Bit-Manipulation extension
rv_csr_t (page 81) c
    bit: 2 Compressed extension
rv_csr_t (page 81) d
    bit: 3 Double-precision floating-point extension
    Type used for csr data access.
rv_csr_t (page 81) e
    bit: 4 RV32E base ISA
rv_csr_t (page 81) f
    bit: 5 Single-precision floating-point extension
rv_csr_t (page 81) g
    bit: 6 Additional standard extensions present
rv csr t (page 81) h
    bit: 7 Hypervisor extension
rv_csr_t (page 81) i
    bit: 8 RV32I/64I/128I base ISA
rv_csr_t (page 81) j
    bit: 9 Tentatively reserved for Dynamically Translated Languages extension
rv_csr_t (page 81) _reserved1
    bit: 10 Reserved
rv_csr_t (page 81) 1
    bit: 11 Tentatively reserved for Decimal Floating-Point extension
```

```
rv_csr_t (page 81) m
         bit: 12 Integer Multiply/Divide extension
     rv_csr_t (page 81) n
         bit: 13 User-level interrupts supported
     rv_csr_t (page 81) _reserved2
         bit: 14 Reserved
     rv csr t (page 81) p
         bit: 15 Tentatively reserved for Packed-SIMD extension
     rv_csr_t (page 81) q
         bit: 16 Quad-precision floating-point extension
     rv_csr_t (page 81) _resreved3
         bit: 17 Reserved
     rv_csr_t (page 81) s
         bit: 18 Supervisor mode implemented
     rv csr t (page 81) t
         bit: 19 Tentatively reserved for Transactional Memory extension
     rv_csr_t (page 81) u
         bit: 20 User mode implemented
     rv csr t (page 81) v
         bit: 21 Tentatively reserved for Vector extension
     rv_csr_t (page 81) _reserved4
         bit: 22 Reserved
     rv_csr_t (page 81) x
         bit: 23 Non-standard extensions present
     rv_csr_t (page 81) _reserved5
         bit: 24..29 Reserved
     rv_csr_t (page 81) mxl
         bit: 30..31 Machine XLEN
     struct CSR_MISA_Type (page 81)::[anonymous] b
         Structure used for bit access.
union CSR_MSTATUS_Type
     #include <core_feature_base.h> Union type to access MSTATUS configure register.
     Public Members
     rv_csr_t (page 81) _reserved0
         bit: 0 Reserved
     rv csr t (page 81) sie
         bit: 1 supervisor interrupt enable flag
     rv_csr_t (page 81) _reserved1
         bit: 2 Reserved
     rv_csr_t (page 81) mie
         bit: 3 Machine mode interrupt enable flag
```

```
rv_csr_t (page 81) _reserved2
         bit: 4 Reserved
     rv_csr_t (page 81) spie
         bit: 3 Supervisor Privilede mode interrupt enable flag
     rv_csr_t (page 81) _reserved3
         bit: Reserved
     rv csr t (page 81) mpie
         bit: mirror of MIE flag
     rv_csr_t (page 81) _reserved4
         bit: Reserved
     rv_csr_t (page 81) mpp
         bit: mirror of Privilege Mode
     rv_csr_t (page 81) fs
         bit: FS status flag
     rv_csr_t (page 81) xs
         bit: XS status flag
     rv_csr_t (page 81) mprv
         bit: Machine mode PMP
     rv csr t (page 81) sum
         bit: Supervisor Mode load and store protection
     rv_csr_t (page 81) _reserved6
         bit: 19..30 Reserved
     rv_csr_t (page 81) sd
         bit: Dirty status for XS or FS
     struct CSR_MSTATUS_Type (page 82)::[anonymous] b
         Structure used for bit access.
     rv_csr_t (page 81) d
         Type used for csr data access.
union CSR_MTVEC_Type
     #include <core_feature_base.h> Union type to access MTVEC configure register.
     Public Members
     rv_csr_t (page 81) mode
         bit: 0..5 interrupt mode control
     rv_csr_t (page 81) addr
         bit: 6..31 mtvec address
     struct CSR_MTVEC_Type (page 83)::[anonymous] b
         Structure used for bit access.
     rv_csr_t (page 81) d
         Type used for csr data access.
union CSR_MCAUSE_Type
     #include <core_feature_base.h> Union type to access MCAUSE configure register.
```

#### **Public Members**

```
rv_csr_t (page 81) exccode
    bit: 11..0 exception or interrupt code
rv_csr_t (page 81) _reserved0
    bit: 15..12 Reserved
rv_csr_t (page 81) mpil
    bit: 23..16 Previous interrupt level
rv_csr_t (page 81) _reserved1
    bit: 26..24 Reserved
rv_csr_t (page 81) mpie
    bit: 27 Interrupt enable flag before enter interrupt
rv_csr_t (page 81) mpp
    bit: 29..28 Privilede mode flag before enter interrupt
rv_csr_t (page 81) minhv
    bit: 30 Machine interrupt vector table
rv_csr_t (page 81) interrupt
    bit: 31 trap type.
    0 means exception and 1 means interrupt
struct CSR_MCAUSE_Type (page 83)::[anonymous] b
    Structure used for bit access.
rv_csr_t (page 81) d
    Type used for csr data access.
```

## union CSR MCOUNTINHIBIT Type

#include <core\_feature\_base.h> Union type to access MCOUNTINHIBIT configure register.

## **Public Members**

```
rv_csr_t (page 81) cy
         bit: 0 1 means disable mcycle counter
     rv_csr_t (page 81) _reserved0
         bit: 1 Reserved
     rv_csr_t (page 81) ir
         bit: 2 1 means disable minstret counter
     rv_csr_t (page 81) _reserved1
         bit: 3..31 Reserved
     struct CSR_MCOUNTINHIBIT_Type (page 84)::[anonymous] b
         Structure used for bit access.
     rv_csr_t (page 81) d
         Type used for csr data access.
union CSR MSUBM Type
```

#include <core\_feature\_base.h> Union type to access msubm configure register.

## **Public Members**

```
rv_csr_t (page 81) _reserved0
    bit: 0..5 Reserved
rv_csr_t (page 81) typ
    bit: 6..7 current trap type
rv_csr_t (page 81) ptyp
    bit: 8..9 previous trap type
rv_csr_t (page 81) _reserved1
    bit: 10..31 Reserved
struct CSR_MSUBM_Type (page 84)::[anonymous] b
    Structure used for bit access.
rv_csr_t (page 81) d
    Type used for csr data access.
```

## union CSR MMISCCTRL Type

#include <core\_feature\_base.h> Union type to access MMISC\_CTRL configure register.

## **Public Members**

```
rv_csr_t (page 81) _reserved0
    bit: 0..2 Reserved
rv_csr_t (page 81) bpu
    bit: 3 dynamic prediction enable flag
rv_csr_t (page 81) _reserved1
    bit: 4..5 Reserved
rv_csr_t (page 81) misalign
    bit: 6 misaligned access support flag
rv_csr_t (page 81) _reserved2
    bit: 7..8 Reserved
rv_csr_t (page 81) nmi_cause
    bit: 9 mnvec control and nmi mcase exccode
rv_csr_t (page 81) _reserved3
    bit: 10..31 Reserved
struct CSR_MMISCCTRL_Type (page 85)::[anonymous] b
    Structure used for bit access.
rv_csr_t (page 81) d
    Type used for csr data access.
```

### union CSR\_MSAVESTATUS\_Type

#include <core\_feature\_base.h> Union type to access MSAVESTATUS configure register.

#### **Public Members**

```
rv_csr_t (page 81) mpie1
    bit: 0 interrupt enable flag of fisrt level NMI/exception nestting
rv_csr_t (page 81) mpp1
    bit: 1..2 privilede mode of fisrt level NMI/exception nestting
rv_csr_t (page 81) _reserved0
    bit: 3..5 Reserved
rv_csr_t (page 81) ptyp1
    bit: 6..7 NMI/exception type of before first nestting
rv_csr_t (page 81) mpie2
    bit: 8 interrupt enable flag of second level NMI/exception nestting
rv_csr_t (page 81) mpp2
    bit: 9..10 privilede mode of second level NMI/exception nestting
rv csr t (page 81) reserved1
    bit: 11..13 Reserved
rv_csr_t (page 81) ptyp2
    bit: 14..15 NMI/exception type of before second nestting
rv_csr_t (page 81) _reserved2
    bit: 16..31 Reserved
struct CSR_MSAVESTATUS_Type (page 85)::[anonymous] b
    Structure used for bit access.
rv_csr_t (page 81) w
    Type used for csr data access.
```

# **ECLIC**

## group NMSIS\_Core\_ECLIC\_Registers

Type definitions and defines for eclic registers.

## **Defines**

```
CLIC_CLICCFG_NLBIT_Pos 1U
CLIC_CLICCFG: NLBIT Position.

CLIC_CLICCFG_NLBIT_Msk (0xFUL << CLIC_CLICCFG_NLBIT_Pos)
CLIC CLICCFG: NLBIT Mask.

CLIC_CLICINFO_CTLBIT_Pos 21U
CLIC INTINFO: __ECLIC_GetInfoCtlbits() Position.

CLIC_CLICINFO_CTLBIT_Msk (0xFUL << CLIC_CLICINFO_CTLBIT_Pos)
CLIC INTINFO: __ECLIC_GetInfoCtlbits() Mask.

CLIC_CLICINFO_VER_Pos 13U
CLIC_CLICINFO: VERSION Position.

CLIC_CLICINFO: VERSION Position.

CLIC_CLICINFO_VER_Msk (0xFFUL << CLIC_CLICCFG_NLBIT_Pos)
CLIC_CLICINFO: VERSION Mask.
```

#### CLIC CLICINFO NUM Pos 0U

CLIC CLICINFO: NUM Position.

## CLIC\_CLICINFO\_NUM\_Msk (0xFFFUL << CLIC\_CLICINFO\_NUM\_Pos)

CLIC CLICINFO: NUM Mask.

## CLIC INTIP IP Pos 0U

CLIC INTIP: IP Position.

## CLIC\_INTIP\_IP\_Msk (0x1UL << CLIC\_INTIP\_IP\_Pos)

CLIC INTIP: IP Mask.

#### CLIC INTIE IE Pos 0U

CLIC INTIE: IE Position.

## CLIC\_INTIE\_IE\_Msk (0x1UL << CLIC\_INTIE\_IE\_Pos)

CLIC INTIE: IE Mask.

## CLIC\_INTATTR\_TRIG\_Pos 1U

CLIC INTATTR: TRIG Position.

### CLIC\_INTATTR\_TRIG\_Msk (0x3UL << CLIC\_INTATTR\_TRIG\_Pos)

CLIC INTATTR: TRIG Mask.

## CLIC\_INTATTR\_SHV\_Pos 0U

CLIC INTATTR: SHV Position.

### CLIC\_INTATTR\_SHV\_Msk (0x1UL << CLIC\_INTATTR\_SHV\_Pos)

CLIC INTATTR: SHV Mask.

#### ECLIC MAX NLBITS 8U

Max nlbit of the CLICINTCTLBITS.

## ${\color{red}\textbf{ECLIC\_MODE\_MTVEC\_Msk}}~3U$

ECLIC Mode mask for MTVT CSR Register.

## ${\tt ECLIC\_NON\_VECTOR\_INTERRUPT}~0x0$

Non-Vector Interrupt Mode of ECLIC.

## ECLIC\_VECTOR\_INTERRUPT 0x1

Vector Interrupt Mode of ECLIC.

## ECLIC\_BASE \_\_ECLIC\_BASEADDR

ECLIC Base Address.

## **ECLIC** ((CLIC\_Type (page 88) \*) ECLIC\_BASE (page 87))

CLIC configuration struct.

#### **Enums**

# enum ECLIC\_TRIGGER\_Type

ECLIC Trigger Enum for different Trigger Type.

Values.

## enumerator ECLIC\_LEVEL\_TRIGGER

Level Triggerred, trig[0] = 0.

## enumerator ECLIC\_POSTIVE\_EDGE\_TRIGGER

Postive/Rising Edge Triggered, trig[1] = 1, trig[0] = 0.

# enumerator ECLIC\_NEGTIVE\_EDGE\_TRIGGER

Negtive/Falling Edge Triggered, trig[1] = 1, trig[0] = 1.

#### enumerator ECLIC MAX TRIGGER

MAX Supported Trigger Mode.

## union CLICCFG\_Type

#include <core\_feature\_eclic.h> Union type to access CLICFG configure register.

## **Public Members**

## uint8\_t \_reserved0

bit: 0 Overflow condition code flag

### uint8\_t nlbits

bit: 29 Carry condition code flag

#### uint8\_t \_reserved1

bit: 30 Zero condition code flag

#### uint8 t reserved2

bit: 31 Negative condition code flag

## struct CLICCFG\_Type (page 88)::[anonymous] b

Structure used for bit access.

### uint8 tw

Type used for byte access.

### union CLICINFO\_Type

#include <core\_feature\_eclic.h> Union type to access CLICINFO information register.

#### **Public Members**

## uint32\_t numint

bit: 0..12 number of maximum interrupt inputs supported

#### uint32\_t version

bit: 13..20 20:17 for architecture version,16:13 for implementation version

## uint32\_t intctlbits

bit: 21..24 specifies how many hardware bits are actually implemented in the clicintctl registers

#### uint32\_t \_reserved0

bit: 25...31 Reserved

## struct CLICINFO\_Type (page 88)::[anonymous] b

Structure used for bit access.

### uint32 tw

Type used for word access.

# struct CLIC\_CTRL\_Type

#include <core\_feature\_eclic.h> Access to the structure of a vector interrupt controller.

## struct CLIC\_Type

#include <core\_feature\_eclic.h>

## **SysTimer**

## group NMSIS\_Core\_SysTimer\_Registers

Type definitions and defines for system timer registers.

#### **Defines**

#### SysTimer\_MTIMECTL\_TIMESTOP\_Pos 0U

SysTick Timer MTIMECTL: TIMESTOP bit Position.

# SysTimer\_MTIMECTL\_TIMESTOP\_Msk (1UL << SysTimer\_MTIMECTL\_TIMESTOP\_Pos)

SysTick Timer MTIMECTL: TIMESTOP Mask.

## ${\tt SysTimer\_MTIMECTL\_CMPCLREN\_Pos}~1U$

SysTick Timer MTIMECTL: CMPCLREN bit Position.

## SysTimer MTIMECTL CMPCLREN Msk (1UL << SysTimer MTIMECTL CMPCLREN Pos)

SysTick Timer MTIMECTL: CMPCLREN Mask.

## SysTimer\_MTIMECTL\_CLKSRC\_Pos 2U

SysTick Timer MTIMECTL: CLKSRC bit Position.

## SysTimer\_MTIMECTL\_CLKSRC\_Msk (1UL << SysTimer\_MTIMECTL\_CLKSRC\_Pos)

SysTick Timer MTIMECTL: CLKSRC Mask.

## SysTimer\_MSIP\_MSIP\_Pos 0U

SysTick Timer MSIP: MSIP bit Position.

### SysTimer\_MSIP\_MSIP\_Msk (1UL << SysTimer\_MSIP\_MSIP\_Pos)

SysTick Timer MSIP: MSIP Mask.

## SysTimer\_MTIMER\_Msk (0xFFFFFFFFFFFFFFFFULL)

SysTick Timer MTIMER value Mask.

## SysTimer\_MTIMERCMP\_Msk (0xFFFFFFFFFFFFFFFFULL)

SysTick Timer MTIMERCMP value Mask.

#### SysTimer\_MTIMECTL\_Msk (0xFFFFFFFUL)

SysTick Timer MTIMECTL/MSTOP value Mask.

## SysTimer\_MSIP\_Msk (0xFFFFFFFUL)

SysTick Timer MSIP value Mask.

### SysTimer MSFTRST Msk (0xFFFFFFFFUL)

SysTick Timer MSFTRST value Mask.

### SysTimer\_MSFRST\_KEY (0x80000A5FUL)

SysTick Timer Software Reset Request Key.

## SysTimer\_BASE \_\_SYSTIMER\_BASEADDR

SysTick Base Address.

## SysTimer ((SysTimer\_Type (page 89) \*) SysTimer\_BASE (page 89))

SysTick configuration struct.

#### struct SysTimer\_Type

#include <core\_feature\_timer.h> Structure type to access the System Timer (SysTimer).

Structure definition to access the system timer(SysTimer).

#### Remark

- MSFTRST register is introduced in Nuclei N Core version 1.3(\_\_NUCLEI\_N\_REV (page 60) >= 0x0103)
- MSTOP register is renamed to MTIMECTL register in Nuclei N Core version 1.4(\_\_NU-CLEI\_N\_REV (page 60) >= 0x0104)
- CMPCLREN and CLKSRC bit in MTIMECTL register is introduced in Nuclei N Core version 1.4(\_\_NUCLEI\_N\_REV (page 60) >= 0x0104)

## 2.5.6 CPU Intrinsic Functions

```
enum WFI_SleepMode_Type
    Values:
    enumerator WFI_SHALLOW_SLEEP
    enumerator WFI_DEEP_SLEEP
STATIC FORCEINLINE void NOP (void)
__STATIC_FORCEINLINE void __WFI (void)
__STATIC_FORCEINLINE void __WFE (void)
__STATIC_FORCEINLINE void ___EBREAK (void)
__STATIC_FORCEINLINE void __ECALL (void)
__STATIC_FORCEINLINE void ___set_wfi_sleepmode (WFI_SleepMode_Type mode)
__STATIC_FORCEINLINE void __TXEVT (void)
 _STATIC_FORCEINLINE void __enable_mcycle_counter (void)
__STATIC_FORCEINLINE void __disable_mcycle_counter (void)
 _STATIC_FORCEINLINE void __enable_minstret_counter (void)
 _STATIC_FORCEINLINE void __disable_minstret_counter (void)
__STATIC_FORCEINLINE void __enable_all_counter (void)
__STATIC_FORCEINLINE void __disable_all_counter (void)
__STATIC_FORCEINLINE void __FENCE_I (void)
STATIC FORCEINLINE uint8 t LB (volatile void *addr)
__STATIC_FORCEINLINE uint16_t __LH (volatile void *addr)
_STATIC_FORCEINLINE uint32_t __LW (volatile void *addr)
__STATIC_FORCEINLINE void __SB (volatile void *addr, uint8_t val)
__STATIC_FORCEINLINE void __SH (volatile void *addr, uint16_t val)
 _STATIC_FORCEINLINE void __SW (volatile void *addr, uint32_t val)
__STATIC_FORCEINLINE uint32_t __CAS_W (volatile uint32_t *addr, uint32_t oldval, uint32_t :
__STATIC_FORCEINLINE uint32_t __AMOSWAP_W (volatile uint32_t *addr, uint32_t newval)
__STATIC_FORCEINLINE int32_t __AMOADD_W (volatile int32_t *addr, int32_t value)
__STATIC_FORCEINLINE int32_t __AMOAND_W (volatile int32_t *addr, int32_t value)
```

\_STATIC\_FORCEINLINE int32\_t \_\_AMOOR\_W (volatile int32\_t \*addr, int32\_t value)

```
_STATIC_FORCEINLINE int32_t __AMOXOR_W (volatile int32_t *addr, int32_t value)
 _STATIC_FORCEINLINE uint32_t __AMOMAXU_W (volatile uint32_t *addr, uint32_t value)
__STATIC_FORCEINLINE int32_t __AMOMAX_W (volatile int32_t *addr, int32_t value)
__STATIC_FORCEINLINE uint32_t __AMOMINU_W (volatile uint32_t *addr, uint32_t value)
__STATIC_FORCEINLINE int32_t __AMOMIN_W (volatile int32_t *addr, int32_t value)
___FENCE (p, s) __ASM (page 61) volatile ("fence " #p "," #s : : : "memory")
___RWMB() ___FENCE(iorw,iorw)
___RMB ( ) ___FENCE(ir,ir)
__WMB() __FENCE(ow,ow)
  _SMP_RWMB() __FENCE(rw,rw)
\_\_SMP\_RMB ( ) \_\_FENCE(r,r)
\_\_SMP\_WMB() \_\_FENCE(w,w)
__CPU_RELAX () __ASM (page 61) volatile ("" : : : "memory")
group NMSIS Core CPU Intrinsic
     Functions that generate RISC-V CPU instructions.
     The following functions generate specified RISC-V instructions that cannot be directly accessed by compiler.
     Defines
     FENCE (p, s) ASM (page 61) volatile ("fence" #p"," #s:::"memory")
         Execute fence instruction, p -> pred, s -> succ.
         the FENCE instruction ensures that all memory accesses from instructions preceding the fence in program
         order (the predecessor set) appear earlier in the global memory order than memory accesses from
         instructions appearing after the fence in program order (the successor set). For details, please refer
         to The RISC-V Instruction Set Manual
             Parameters
```

- p predecessor set, such as iorw, rw, r, w
- s successor set, such as iorw, rw, r, w

```
RWMB() FENCE(iorw,iorw)
    Read & Write Memory barrier.
___RMB() __FENCE(ir,ir)
    Read Memory barrier.
WMB() FENCE(ow,ow)
    Write Memory barrier.
 SMP RWMB() FENCE(rw,rw)
    SMP Read & Write Memory barrier.
\_SMP\_RMB () \_FENCE(r,r)
    SMP Read Memory barrier.
 _SMP_WMB() __FENCE(w,w)
    SMP Write Memory barrier.
```

```
___CPU_RELAX () __ASM (page 61) volatile ("" : : : "memory") CPU relax for busy loop.
```

#### **Enums**

## enum WFI\_SleepMode\_Type

WFI Sleep Mode enumeration.

Values:

#### enumerator WFI SHALLOW SLEEP

Shallow sleep mode, the core\_clk will poweroff.

## enumerator WFI DEEP SLEEP

Deep sleep mode, the core\_clk and core\_ano\_clk will poweroff.

#### **Functions**

```
__STATIC_FORCEINLINE void __NOP (void)
```

NOP Instruction.

No Operation does nothing. This instruction can be used for code alignment purposes.

```
__STATIC_FORCEINLINE void __WFI (void)
```

Wait For Interrupt.

Wait For Interrupt is is executed using CSR\_WFE.WFE=0 and WFI instruction. It will suspends execution until interrupt, NMI or Debug happened. When Core is waked up by interrupt, if

- a. mstatus.MIE == 1(interrupt enabled), Core will enter ISR code
- b. mstatus.MIE == 0(interrupt disabled), Core will resume previous execution

```
STATIC FORCEINLINE void WFE (void)
```

Wait For Event.

Wait For Event is executed using CSR\_WFE.WFE=1 and WFI instruction. It will suspends execution until event, NMI or Debug happened. When Core is waked up, Core will resume previous execution

```
__STATIC_FORCEINLINE void __EBREAK (void)
```

Breakpoint Instruction.

Causes the processor to enter Debug state. Debug tools can use this to investigate system state when the instruction at a particular address is reached.

```
__STATIC_FORCEINLINE void __ECALL (void)
```

Environment Call Instruction.

The ECALL instruction is used to make a service request to the execution environment.

\_\_STATIC\_FORCEINLINE void \_\_set\_wfi\_sleepmode (WFI\_SleepMode\_Type mode)

Set Sleep mode of WFI.

Set the SLEEPVALUE CSR register to control the WFI Sleep mode.

**Parameters mode** – [in] The sleep mode to be set

```
__STATIC_FORCEINLINE void __TXEVT (void)
```

Send TX Event.

Set the CSR TXEVT to control send a TX Event. The Core will output signal tx\_evt as output event signal.

```
_STATIC_FORCEINLINE void __enable_mcycle_counter (void)
    Enable MCYCLE counter.
    Clear the CY bit of MCOUNTINHIBIT to 0 to enable MCYCLE Counter
__STATIC_FORCEINLINE void __disable_mcycle_counter (void)
    Disable MCYCLE counter.
    Set the CY bit of MCOUNTINHIBIT to 1 to disable MCYCLE Counter.
__STATIC_FORCEINLINE void __enable_minstret_counter (void)
    Enable MINSTRET counter.
    Clear the IR bit of MCOUNTINHIBIT to 0 to enable MINSTRET Counter
___STATIC_FORCEINLINE void __disable_minstret_counter (void)
    Disable MINSTRET counter.
    Set the IR bit of MCOUNTINHIBIT to 1 to disable MINSTRET Counter
__STATIC_FORCEINLINE void __enable_all_counter (void)
    Enable MCYCLE & MINSTRET counter.
    Clear the IR and CY bit of MCOUNTINHIBIT to 1 to enable MINSTRET & MCYCLE Counter
__STATIC_FORCEINLINE void __disable_all_counter (void)
    Disable MCYCLE & MINSTRET counter.
    Set the IR and CY bit of MCOUNTINHIBIT to 1 to disable MINSTRET & MCYCLE Counter
__STATIC_FORCEINLINE void __FENCE_I (void)
    Fence.i Instruction.
    The FENCE.I instruction is used to synchronize the instruction and data streams.
__STATIC_FORCEINLINE uint8_t __LB (volatile void *addr)
    Load 8bit value from address (8 bit)
    Load 8 bit value.
        Parameters addr - [in] Address pointer to data
        Returns value of type uint8_t at (*addr)
__STATIC_FORCEINLINE uint16_t __LH (volatile void *addr)
    Load 16bit value from address (16 bit)
    Load 16 bit value.
        Parameters addr - [in] Address pointer to data
        Returns value of type uint16 t at (*addr)
 _STATIC_FORCEINLINE uint32_t __LW (volatile void *addr)
    Load 32bit value from address (32 bit)
    Load 32 bit value.
        Parameters addr - [in] Address pointer to data
        Returns value of type uint32_t at (*addr)
 _STATIC_FORCEINLINE void __SB (volatile void *addr, uint8_t val)
    Write 8bit value to address (8 bit)
    Write 8 bit value.
```

2.5. NMSIS Core API 93

**Parameters** 

- addr [in] Address pointer to data
- val [in] Value to set

## \_\_STATIC\_FORCEINLINE void \_\_SH (volatile void \*addr, uint16\_t val)

Write 16bit value to address (16 bit)

Write 16 bit value.

#### **Parameters**

- addr [in] Address pointer to data
- val [in] Value to set

### \_\_STATIC\_FORCEINLINE void \_\_SW (volatile void \*addr, uint32\_t val)

Write 32bit value to address (32 bit)

Write 32 bit value.

#### **Parameters**

- addr [in] Address pointer to data
- val [in] Value to set

\_\_STATIC\_FORCEINLINE uint32\_t \_\_CAS\_W (volatile uint32\_t \*addr, uint32\_t oldval, uint3 Compare and Swap 32bit value using LR and SC.

Compare old value with memory, if identical, store new value in memory. Return the initial value in memory. Success is indicated by comparing return value with OLD. memory address, return 0 if successful, otherwise return !0

#### **Parameters**

- addr [in] Address pointer to data, address need to be 4byte aligned
- oldval [in] Old value of the data in address
- newval [in] New value to be stored into the address

**Returns** return the initial value in memory

\_\_STATIC\_FORCEINLINE uint32\_t \_\_AMOSWAP\_W (volatile uint32\_t \*addr, uint32\_t newval)
Atomic Swap 32bit value into memory.

Atomically swap new 32bit value into memory using amoswap.d.

#### **Parameters**

- addr [in] Address pointer to data, address need to be 4byte aligned
- **newval** [in] New value to be stored into the address

**Returns** return the original value in memory

\_\_STATIC\_FORCEINLINE int32\_t \_\_AMOADD\_W (volatile int32\_t \*addr, int32\_t value)
Atomic Add with 32bit value.

Atomically ADD 32bit value with value in memory using amoadd.d.

#### **Parameters**

- addr [in] Address pointer to data, address need to be 4byte aligned
- value [in] value to be ADDed

Returns return memory value + add value

\_\_STATIC\_FORCEINLINE int32\_t \_\_AMOAND\_W (volatile int32\_t \*addr, int32\_t value)
Atomic And with 32bit value.

Atomically AND 32bit value with value in memory using amoand.d.

#### **Parameters**

- addr [in] Address pointer to data, address need to be 4byte aligned
- value [in] value to be ANDed

Returns return memory value & and value

\_\_STATIC\_FORCEINLINE int32\_t \_\_AMOOR\_W (volatile int32\_t \*addr, int32\_t value)
Atomic OR with 32bit value.

Atomically OR 32bit value with value in memory using amoor.d.

#### **Parameters**

- addr [in] Address pointer to data, address need to be 4byte aligned
- value [in] value to be ORed

Returns return memory value | and value

\_\_STATIC\_FORCEINLINE int32\_t \_\_AMOXOR\_W (volatile int32\_t \*addr, int32\_t value)
Atomic XOR with 32bit value.

Atomically XOR 32bit value with value in memory using amoxor.d.

#### **Parameters**

- addr [in] Address pointer to data, address need to be 4byte aligned
- value [in] value to be XORed

Returns return memory value ^ and value

\_\_STATIC\_FORCEINLINE uint32\_t \_\_AMOMAXU\_W (volatile uint32\_t \*addr, uint32\_t value)
Atomic unsigned MAX with 32bit value.

Atomically unsigned max compare 32bit value with value in memory using amomaxu.d.

### **Parameters**

- addr [in] Address pointer to data, address need to be 4byte aligned
- value [in] value to be compared

**Returns** return the bigger value

\_\_STATIC\_FORCEINLINE int32\_t \_\_AMOMAX\_W (volatile int32\_t \*addr, int32\_t value)

Atomic signed MAX with 32bit value.

Atomically signed max compare 32bit value with value in memory using amomax.d.

### **Parameters**

- addr [in] Address pointer to data, address need to be 4byte aligned
- value [in] value to be compared

Returns the bigger value

\_\_STATIC\_FORCEINLINE uint32\_t \_\_AMOMINU\_W (volatile uint32\_t \*addr, uint32\_t value)
Atomic unsigned MIN with 32bit value.

Atomically unsigned min compare 32bit value with value in memory using amominu.d.

#### **Parameters**

- addr [in] Address pointer to data, address need to be 4byte aligned
- value [in] value to be compared

**Returns** the smaller value

```
__STATIC_FORCEINLINE int32_t __AMOMIN_W (volatile int32_t *addr, int32_t value)
Atomic signed MIN with 32bit value.
```

Atomically signed min compare 32bit value with value in memory using amomin.d.

#### **Parameters**

- addr [in] Address pointer to data, address need to be 4byte aligned
- value [in] value to be compared

**Returns** the smaller value

## 2.5.7 Intrinsic Functions for SIMD Instructions

Click Nuclei DSP Feature<sup>16</sup> to learn about Core DSP in Nuclei ISA Spec.

## SIMD Data Processing Instructions

## SIMD 16-bit Add/Subtract Instructions

```
__STATIC_FORCEINLINE unsigned long __RV_ADD16 (unsigned long a, unsigned long b)
 STATIC FORCEINLINE unsigned long RV CRAS16 (unsigned long a, unsigned long b)
 _STATIC_FORCEINLINE unsigned long __RV_CRSA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_KADD16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_KCRAS16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_KCRSA16 (unsigned long a, unsigned long b)
STATIC FORCEINLINE unsigned long RV KSTAS16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_KSTSA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_KSUB16 (unsigned long a, unsigned long b)
 STATIC_FORCEINLINE unsigned long __RV_RADD16 (unsigned long a, unsigned long b)
 STATIC_FORCEINLINE unsigned long __RV_RCRAS16 (unsigned long a, unsigned long b)
 _STATIC_FORCEINLINE unsigned long __RV_RCRSA16 (unsigned long a, unsigned long b)
 _STATIC_FORCEINLINE unsigned long __RV_RSTAS16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_RSTSA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_RSUB16 (unsigned long a, unsigned long b)
 STATIC_FORCEINLINE unsigned long __RV_STAS16 (unsigned long a, unsigned long b)
 STATIC FORCEINLINE unsigned long RV STSA16 (unsigned long a, unsigned long b)
```

<sup>16</sup> https://doc.nucleisys.com/nuclei\_spec/isa/dsp.html

```
__STATIC_FORCEINLINE unsigned long __RV_UKADD16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_UKADD16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_UKCRAS16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_UKCRSA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_UKSTAS16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_UKSTSA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_UKSUB16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_URADD16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_URCRAS16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_URCRSA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_URSTAS16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_URSTAS16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_URSTSA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_URSTSA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_URSTSA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_URSTSA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_URSTSA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_URSTSA16 (unsigned long a, unsigned long b)
```

SIMD 16-bit Add/Subtract Instructions.

Based on the combination of the types of the two 16-bit arithmetic operations, the SIMD 16-bit add/subtract instructions can be classified into 6 main categories: Addition (two 16-bit addition), Subtraction (two 16-bit subtraction), Crossed Add & Sub (one addition and one subtraction), and Crossed Sub & Add (one subtraction and one addition), Straight Add & Sub (one addition and one subtraction), and Straight Sub & Add (one subtraction and one addition). Based on the way of how an overflow condition is handled, the SIMD 16-bit add/subtract instructions can be classified into 5 groups: Wrap-around (dropping overflow), Signed Halving (keeping overflow by dropping 1 LSB bit), Unsigned Halving, Signed Saturation (clipping overflow), and Unsigned Saturation. Together, there are 30 SIMD 16-bit add/subtract instructions.

## **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_ADD16 (unsigned long a, unsigned long b)
ADD16 (SIMD 16-bit Addition)
```

Type: SIMD

### Syntax:

```
ADD16 Rd, Rs1, Rs2
```

## **Purpose**:

Do 16-bit integer element additions simultaneously.

## **Description**:

This instruction adds the 16-bit integer elements in Rs1 with the 16-bit integer elements in Rs2, and then writes the 16-bit element results to Rd.

#### Note:

This instruction can be used for either signed or unsigned addition.

## **Operations:**

```
Rd.H[x] = Rs1.H[x] + Rs2.H[x];

for RV32: x=1...0,

for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_CRAS16 (unsigned long a, unsigned long b) CRAS16 (SIMD 16-bit Cross Addition & Subtraction)

Type: SIMD

## Syntax:

```
CRAS16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit integer element addition and 16-bit integer element subtraction in a 32-bit chunk simultaneously. Operands are from crossed positions in 32-bit chunks.

### **Description**:

This instruction adds the 16-bit integer element in [31:16] of 32-bit chunks in Rs1 with the 16-bit integer element in [15:0] of 32-bit chunks in Rs2, and writes the result to [31:16] of 32-bit chunks in Rd; at the same time, it subtracts the 16-bit integer element in [31:16] of 32-bit chunks in Rs2 from the 16-bit integer element in [15:0] of 32-bit chunks, and writes the result to [15:0] of 32-bit chunks in Rd.

### Note:

This instruction can be used for either signed or unsigned operations.

### **Operations:**

```
Rd.W[x][31:16] = Rs1.W[x][31:16] + Rs2.W[x][15:0];
Rd.W[x][15:0] = Rs1.W[x][15:0] - Rs2.W[x][31:16];
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_CRSA16 (unsigned long a, unsigned long b) CRSA16 (SIMD 16-bit Cross Subtraction & Addition)

Type: SIMD

## Syntax:

```
CRSA16 Rd, Rs1, Rs2
```

## Purpose:

Do 16-bit integer element subtraction and 16-bit integer element addition in a 32-bit chunk simultaneously. Operands are from crossed positions in 32-bit chunks.

### **Description**:

This instruction subtracts the 16-bit integer element in [15:0] of 32-bit chunks in Rs2 from the 16-bit integer element in [31:16] of 32-bit chunks in Rs1, and writes the result to [31:16] of 32-bit chunks in Rd; at the same time, it adds the 16-bit integer element in [31:16] of 32-bit chunks in Rs2 with the 16-bit integer element in [15:0] of 32-bit chunks in Rs1, and writes the result to [15:0] of 32-bit chunks in Rd.

#### Note:

This instruction can be used for either signed or unsigned operations.

### **Operations:**

```
Rd.W[x][31:16] = Rs1.W[x][31:16] - Rs2.W[x][15:0];
Rd.W[x][15:0] = Rs1.W[x][15:0] + Rs2.W[x][31:16];
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KADD16 (unsigned long a, unsigned long b)

KADD16 (SIMD 16-bit Signed Saturating Addition)

Type: SIMD

#### Syntax:

```
KADD16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit signed integer element saturating additions simultaneously.

## **Description**:

This instruction adds the 16-bit signed integer elements in Rs1 with the 16-bit signed integer elements in Rs2. If any of the results are beyond the Q15 number range ( $-2^15 \le 2^15 \le 2^15 \le 1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

## **Operations:**

```
res[x] = Rs1.H[x] + Rs2.H[x];
if (res[x] > 32767) {
   res[x] = 32767;
   OV = 1;
} else if (res[x] < -32768) {
   res[x] = -32768;
   OV = 1;
}
Rd.H[x] = res[x];
for RV32: x=1...0,
for RV64: x=3...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KCRAS16 (unsigned long a, unsigned long b)

KCRAS16 (SIMD 16-bit Signed Saturating Cross Addition & Subtraction)

Type: SIMD Syntax:

```
KCRAS16 Rd, Rs1, Rs2
```

### Purpose:

Do 16-bit signed integer element saturating addition and 16-bit signed integer element saturating subtraction in a 32-bit chunk simultaneously. Operands are from crossed positions in 32- bit chunks.

## **Description:**

This instruction adds the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs1 with the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs2; at the same time, it subtracts the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs2 from the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs1. If any of the results are beyond the Q15 number range ( $-2^15 <= 2^15-1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to [31:16] of 32-bit chunks in Rd for addition and [15:0] of 32-bit chunks in Rd for subtraction.

### **Operations:**

```
res1 = Rs1.W[x][31:16] + Rs2.W[x][15:0];
res2 = Rs1.W[x][15:0] - Rs2.W[x][31:16];
for (res in [res1, res2]) {
   if (res > (2^15)-1) {
      res = (2^15)-1;
      OV = 1;
    } else if (res < -2^15) {
      res = -2^15;
      OV = 1;
   }
}
Rd.W[x][31:16] = res1;
Rd.W[x][15:0] = res2;
for RV32, x=0
for RV64, x=1...0</pre>
```

### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KCRSA16 (unsigned long a, unsigned long b)

KCRSA16 (SIMD 16-bit Signed Saturating Cross Subtraction & Addition)

Type: SIMD Syntax:

```
KCRSA16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit signed integer element saturating subtraction and 16-bit signed integer element saturating addition in a 32-bit chunk simultaneously. Operands are from crossed positions in 32-bit chunks.

## **Description**:

This instruction subtracts the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs2 from the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs1; at the same time, it adds the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs2 with the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs1. If any of the results are beyond the Q15 number range (- $2^15 <= 2^15-1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to [31:16] of 32-bit chunks in Rd for subtraction and [15:0] of 32-bit chunks in Rd for addition.

## **Operations:**

```
res1 = Rs1.W[x][31:16] - Rs2.W[x][15:0];
res2 = Rs1.W[x][15:0] + Rs2.W[x][31:16];
for (res in [res1, res2]) {
   if (res > (2^15)-1) {
      res = (2^15)-1;
      OV = 1;
   } else if (res < -2^15) {
      res = -2^15;
      OV = 1;
   }
}
Rd.W[x][31:16] = res1;
Rd.W[x][15:0] = res2;
for RV32, x=0
for RV64, x=1...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KSTAS16 (unsigned long a, unsigned long b)

KSTAS16 (SIMD 16-bit Signed Saturating Straight Addition & Subtraction)

Type: SIMD

#### Syntax:

```
KSTAS16 Rd, Rs1, Rs2
```

### Purpose:

Do 16-bit signed integer element saturating addition and 16-bit signed integer element saturating subtraction in a 32-bit chunk simultaneously. Operands are from corresponding positions in 32-bit chunks.

#### **Description**:

This instruction adds the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs1 with the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs2; at the same time, it subtracts the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs2 from the 16-bit signed integer element in [15:0] of 32-bit

chunks in Rs1. If any of the results are beyond the Q15 number range ( $-2^15 \le 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 = 2^15 =$ 

### **Operations:**

```
res1 = Rs1.W[x][31:16] + Rs2.W[x][31:16];
res2 = Rs1.W[x][15:0] - Rs2.W[x][15:0];
for (res in [res1, res2]) {
   if (res > (2^15)-1) {
      res = (2^15)-1;
      OV = 1;
    } else if (res < -2^15) {
      res = -2^15;
      OV = 1;
   }
}
Rd.W[x][31:16] = res1;
Rd.W[x][15:0] = res2;
for RV32, x=0
for RV64, x=1...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KSTSA16 (unsigned long a, unsigned long b)

KSTSA16 (SIMD 16-bit Signed Saturating Straight Subtraction & Addition)

Type: SIMD

## Syntax:

```
KSTSA16 Rd, Rs1, Rs2
```

## Purpose:

Do 16-bit signed integer element saturating subtraction and 16-bit signed integer element saturating addition in a 32-bit chunk simultaneously. Operands are from corresponding positions in 32-bit chunks.

## **Description**:

This instruction subtracts the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs2 from the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs1; at the same time, it adds the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs2 with the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs1. If any of the results are beyond the Q15 number range (- $2^15 < 2^15 - 2^15 - 1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to [31:16] of 32-bit chunks in Rd for subtraction and [15:0] of 32-bit chunks in Rd for addition.

## **Operations**:

```
res1 = Rs1.W[x][31:16] - Rs2.W[x][31:16];
res2 = Rs1.W[x][15:0] + Rs2.W[x][15:0];
for (res in [res1, res2]) {
  if (res > (2^15)-1) {
    res = (2^15)-1;
}
```

(continues on next page)

(continued from previous page)

```
OV = 1;
} else if (res < -2^15) {
    res = -2^15;
    OV = 1;
}
Rd.W[x][31:16] = res1;
Rd.W[x][15:0] = res2;
for RV32, x=0
for RV64, x=1...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KSUB16 (unsigned long a, unsigned long b)

KSUB16 (SIMD 16-bit Signed Saturating Subtraction)

Type: SIMD

## Syntax:

```
KSUB16 Rd, Rs1, Rs2
```

## Purpose:

Do 16-bit signed integer elements saturating subtractions simultaneously.

### **Description:**

This instruction subtracts the 16-bit signed integer elements in Rs2 from the 16-bit signed integer elements in Rs1. If any of the results are beyond the Q15 number range ( $-2^15 \le 2^15-1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

## **Operations:**

```
res[x] = Rs1.H[x] - Rs2.H[x];
if (res[x] > (2^15)-1) {
   res[x] = (2^15)-1;
   OV = 1;
} else if (res[x] < -2^15) {
   res[x] = -2^15;
   OV = 1;
}
Rd.H[x] = res[x];
for RV32: x=1...0,
for RV64: x=3...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_RADD16 (unsigned long a, unsigned long b)
RADD16 (SIMD 16-bit Signed Halving Addition)

Type: SIMD

## Syntax:

```
RADD16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit signed integer element additions simultaneously. The results are halved to avoid overflow or saturation.

#### **Description**:

This instruction adds the 16-bit signed integer elements in Rs1 with the 16-bit signed integer elements in Rs2. The results are first arithmetically right-shifted by 1 bit and then written to Rd.

#### **Examples:**

```
* Rs1 = 0x7FFF, Rs2 = 0x7FFF, Rd = 0x7FFF

* Rs1 = 0x8000, Rs2 = 0x8000, Rd = 0x8000

* Rs1 = 0x4000, Rs2 = 0x8000, Rd = 0xE000
```

### **Operations:**

```
Rd.H[x] = (Rs1.H[x] + Rs2.H[x]) s>> 1; for RV32: x=1...0, for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_RCRAS16 (unsigned long a, unsigned long b)

RCRAS16 (SIMD 16-bit Signed Halving Cross Addition & Subtraction)

Type: SIMD

## Syntax:

```
RCRAS16 Rd, Rs1, Rs2
```

### Purpose:

Do 16-bit signed integer element addition and 16-bit signed integer element subtraction in a 32-bit chunk simultaneously. Operands are from crossed positions in 32-bit chunks. The results are halved to avoid overflow or saturation.

## **Description**:

This instruction adds the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs1 with the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs2, and subtracts the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs2 from the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs1. The element results are first arithmetically right-shifted by 1 bit and then written to [31:16] of 32-bit chunks in Rd and [15:0] of 32-bit chunks in Rd.

## **Examples:**

```
Please see `RADD16` and `RSUB16` instructions.
```

## **Operations:**

```
Rd.W[x][31:16] = (Rs1.W[x][31:16] + Rs2.W[x][15:0]) s>> 1;
Rd.W[x][15:0] = (Rs1.W[x][15:0] - Rs2.W[x][31:16]) s>> 1;
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_RCRSA16 (unsigned long a, unsigned long b)

RCRSA16 (SIMD 16-bit Signed Halving Cross Subtraction & Addition)

Type: SIMD

## Syntax:

```
RCRSA16 Rd, Rs1, Rs2
```

### Purpose:

Do 16-bit signed integer element subtraction and 16-bit signed integer element addition in a 32-bit chunk simultaneously. Operands are from crossed positions in 32-bit chunks. The results are halved to avoid overflow or saturation.

### **Description**:

This instruction subtracts the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs2 from the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs1, and adds the 16-bit signed element integer in [15:0] of 32-bit chunks in Rs1 with the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs2. The two results are first arithmetically right-shifted by 1 bit and then written to [31:16] of 32-bit chunks in Rd and [15:0] of 32-bit chunks in Rd.

## **Examples:**

```
Please see `RADD16` and `RSUB16` instructions.
```

## **Operations:**

```
Rd.W[x][31:16] = (Rs1.W[x][31:16] - Rs2.W[x][15:0]) s>> 1;
Rd.W[x][15:0] = (Rs1.W[x][15:0] + Rs2.W[x][31:16]) s>> 1;
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_RSTAS16 (unsigned long a, unsigned long b)

RSTAS16 (SIMD 16-bit Signed Halving Straight Addition & Subtraction)

Type: SIMD

#### Syntax:

```
RSTAS16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit signed integer element addition and 16-bit signed integer element subtraction in a 32-bit chunk simultaneously. Operands are from corresponding positions in 32-bit chunks. The results are halved to avoid overflow or saturation.

## **Description**:

This instruction adds the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs1 with the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs2, and subtracts the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs2 from the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs1. The element results are first arithmetically right-shifted by 1 bit and then written to [31:16] of 32-bit chunks in Rd and [15:0] of 32-bit chunks in Rd.

### **Examples:**

```
Please see `RADD16` and `RSUB16` instructions.
```

#### **Operations:**

```
Rd.W[x][31:16] = (Rs1.W[x][31:16] + Rs2.W[x][31:16]) s>> 1;
Rd.W[x][15:0] = (Rs1.W[x][15:0] - Rs2.W[x][15:0]) s>> 1;
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_RSTSA16 (unsigned long a, unsigned long b)
RSTSA16 (SIMD 16-bit Signed Halving Straight Subtraction & Addition)

Type: SIMD

## Syntax:

```
RSTSA16 Rd, Rs1, Rs2
```

## Purpose:

Do 16-bit signed integer element subtraction and 16-bit signed integer element addition in a 32-bit chunk simultaneously. Operands are from corresponding positions in 32-bit chunks. The results are halved to avoid overflow or saturation.

## **Description**:

This instruction subtracts the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs2 from the 16-bit signed integer element in [31:16] of 32-bit chunks in Rs1, and adds the 16-bit signed element integer in [15:0] of 32-bit chunks in Rs1 with the 16-bit signed integer element in [15:0] of 32-bit chunks in Rs2. The two results are first arithmetically right-shifted by 1 bit and then written to [31:16] of 32-bit chunks in Rd and [15:0] of 32-bit chunks in Rd.

## **Examples:**

```
Please see `RADD16` and `RSUB16` instructions.
```

#### **Operations:**

```
Rd.W[x][31:16] = (Rs1.W[x][31:16] - Rs2.W[x][31:16]) s>> 1;
Rd.W[x][15:0] = (Rs1.W[x][15:0] + Rs2.W[x][15:0]) s>> 1;
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_RSUB16 (unsigned long a, unsigned long b)
RSUB16 (SIMD 16-bit Signed Halving Subtraction)

Type: SIMD

#### Syntax:

```
RSUB16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit signed integer element subtractions simultaneously. The results are halved to avoid overflow or saturation.

### **Description**:

This instruction subtracts the 16-bit signed integer elements in Rs2 from the 16-bit signed integer elements in Rs1. The results are first arithmetically right-shifted by 1 bit and then written to Rd.

### **Examples:**

```
* Ra = 0x7FFF, Rb = 0x8000, Rt = 0x7FFF

* Ra = 0x8000, Rb = 0x7FFF, Rt = 0x8000

* Ra = 0x8000, Rb = 0x4000, Rt = 0xA000
```

## **Operations:**

```
Rd.H[x] = (Rs1.H[x] - Rs2.H[x]) s>> 1;

for RV32: x=1...0,

for RV64: x=3...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_STAS16 (unsigned long a, unsigned long b) STAS16 (SIMD 16-bit Straight Addition & Subtraction)

Type: SIMD

#### Syntax:

```
STAS16 Rd, Rs1, Rs2
```

#### **Purpose**:

Do 16-bit integer element addition and 16-bit integer element subtraction in a 32-bit chunk simultaneously. Operands are from corresponding positions in 32-bit chunks.

#### **Description:**

This instruction adds the 16-bit integer element in [31:16] of 32-bit chunks in Rs1 with the 16-bit integer element in [31:16] of 32-bit chunks in Rs2, and writes the result to [31:16] of 32-bit chunks in Rd; at the same time, it subtracts the 16-bit integer element in [15:0] of 32-bit chunks in Rs2 from the 16-bit integer element in [15:0] of 32-bit chunks, and writes the result to [15:0] of 32-bit chunks in Rd.

#### Note:

This instruction can be used for either signed or unsigned operations.

## **Operations:**

```
Rd.W[x][31:16] = Rs1.W[x][31:16] + Rs2.W[x][31:16];
Rd.W[x][15:0] = Rs1.W[x][15:0] - Rs2.W[x][15:0];
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_STSA16 (unsigned long a, unsigned long b) STSA16 (SIMD 16-bit Straight Subtraction & Addition)

# Type: SIMD

```
STSA16 Rd, Rs1, Rs2
```

## Purpose:

Syntax:

Do 16-bit integer element subtraction and 16-bit integer element addition in a 32-bit chunk simultaneously. Operands are from corresponding positions in 32-bit chunks.

## **Description:**

This instruction subtracts the 16-bit integer element in [31:16] of 32-bit chunks in Rs2 from the 16-bit integer element in [31:16] of 32-bit chunks in Rs1, and writes the result to [31:16] of 32-bit chunks in Rd; at the same time, it adds the 16-bit integer element in [15:0] of 32-bit chunks in Rs2 with the 16-bit integer element in [15:0] of 32-bit chunks in Rs1, and writes the result to [15:0] of 32-bit chunks in Rd.

#### Note:

This instruction can be used for either signed or unsigned operations.

## **Operations:**

```
Rd.W[x][31:16] = Rs1.W[x][31:16] - Rs2.W[x][31:16];
Rd.W[x][15:0] = Rs1.W[x][15:0] + Rs2.W[x][15:0];
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SUB16 (unsigned long a, unsigned long b) SUB16 (SIMD 16-bit Subtraction)
```

Type: SIMD

#### Syntax:

```
SUB16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit integer element subtractions simultaneously.

## **Description**:

This instruction subtracts the 16-bit integer elements in Rs2 from the 16-bit integer elements in Rs1, and then writes the result to Rd.

#### Note:

This instruction can be used for either signed or unsigned subtraction.

## **Operations:**

```
Rd.H[x] = Rs1.H[x] - Rs2.H[x];

for RV32: x=1...0,

for RV64: x=3...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_UKADD16 (unsigned long a, unsigned long b)
UKADD16 (SIMD 16-bit Unsigned Saturating Addition)
```

Type: SIMD

### Syntax:

```
UKADD16 Rd, Rs1, Rs2
```

### Purpose:

Do 16-bit unsigned integer element saturating additions simultaneously.

## **Description**:

This instruction adds the 16-bit unsigned integer elements in Rs1 with the 16-bit unsigned integer elements in Rs2. If any of the results are beyond the 16-bit unsigned number range ( $0 \le RES \le 2^{16-1}$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

#### **Operations:**

```
res[x] = Rs1.H[x] + Rs2.H[x];
if (res[x] > (2^16)-1) {
  res[x] = (2^16)-1;
  OV = 1;
}
Rd.H[x] = res[x];
for RV32: x=1...0,
for RV64: x=3...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UKCRAS16 (unsigned long a, unsigned long b)
UKCRAS16 (SIMD 16-bit Unsigned Saturating Cross Addition & Subtraction)

Type: SIMD

## Syntax:

```
UKCRAS16 Rd, Rs1, Rs2
```

#### Purpose:

Do one 16-bit unsigned integer element saturating addition and one 16-bit unsigned integer element saturating subtraction in a 32-bit chunk simultaneously. Operands are from crossed positions in 32-bit chunks.

## **Description**:

This instruction adds the 16-bit unsigned integer element in [31:16] of 32-bit chunks in Rs1 with the 16-bit unsigned integer element in [15:0] of 32-bit chunks in Rs2; at the same time, it subtracts the 16-bit unsigned integer element in [31:16] of 32-bit chunks in Rs2 from the 16-bit unsigned integer element in [15:0] of 32-bit chunks in Rs1. If any of the results are beyond the 16-bit unsigned number range (0 <= RES <=  $2^{16-1}$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to [31:16] of 32-bit chunks in Rd for addition and [15:0] of 32-bit chunks in Rd for subtraction.

## **Operations:**

```
res1 = Rs1.W[x][31:16] + Rs2.W[x][15:0];
res2 = Rs1.W[x][15:0] - Rs2.W[x][31:16];
if (res1 > (2^16)-1) {
   res1 = (2^16)-1;
   OV = 1;
}
if (res2 < 0) {
   res2 = 0;
   OV = 1;
}
Rd.W[x][31:16] = res1;
Rd.W[x][15:0] = res2;</pre>
```

(continues on next page)

(continued from previous page)

```
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UKCRSA16 (unsigned long a, unsigned long b)
UKCRSA16 (SIMD 16-bit Unsigned Saturating Cross Subtraction & Addition)

Type: SIMD

## Syntax:

```
UKCRSA16 Rd, Rs1, Rs2
```

### Purpose:

Do one 16-bit unsigned integer element saturating subtraction and one 16-bit unsigned integer element saturating addition in a 32-bit chunk simultaneously. Operands are from crossed positions in 32-bit chunks.

#### **Description**:

This instruction subtracts the 16-bit unsigned integer element in [15:0] of 32-bit chunks in Rs2 from the 16-bit unsigned integer element in [31:16] of 32-bit chunks in Rs1; at the same time, it adds the 16-bit unsigned integer element in [31:16] of 32-bit chunks in Rs2 with the 16-bit unsigned integer element in [15:0] of 32-bit chunks in Rs1. If any of the results are beyond the 16-bit unsigned number range (0 <= RES <=  $2^{16-1}$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to [31:16] of 32-bit chunks in Rd for subtraction and [15:0] of 32-bit chunks in Rd for addition.

## **Operations:**

```
res1 = Rs1.W[x][31:16] - Rs2.W[x][15:0];
res2 = Rs1.W[x][15:0] + Rs2.W[x][31:16];
if (res1 < 0) {
    res1 = 0;
    OV = 1;
} else if (res2 > (2^16)-1) {
    res2 = (2^16)-1;
    OV = 1;
}
Rd.W[x][31:16] = res1;
Rd.W[x][15:0] = res2;
for RV32, x=0
for RV64, x=1...0
```

### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UKSTAS16 (unsigned long a, unsigned long b)
UKSTAS16 (SIMD 16-bit Unsigned Saturating Straight Addition & Subtraction)

Type: SIMD

#### Syntax:

```
UKSTAS16 Rd, Rs1, Rs2
```

#### Purpose:

Do one 16-bit unsigned integer element saturating addition and one 16-bit unsigned integer element saturating subtraction in a 32-bit chunk simultaneously. Operands are from corresponding positions in 32-bit chunks.

## **Description**:

This instruction adds the 16-bit unsigned integer element in [31:16] of 32-bit chunks in Rs1 with the 16-bit unsigned integer element in [31:16] of 32-bit chunks in Rs2; at the same time, it subtracts the 16-bit unsigned integer element in [15:0] of 32-bit chunks in Rs2 from the 16-bit unsigned integer element in [15:0] of 32-bit chunks in Rs1. If any of the results are beyond the 16-bit unsigned number range (0 <= RES <=  $2^{16-1}$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to [31:16] of 32-bit chunks in Rd for addition and [15:0] of 32-bit chunks in Rd for subtraction.

#### **Operations:**

```
res1 = Rs1.W[x][31:16] + Rs2.W[x][31:16];
res2 = Rs1.W[x][15:0] - Rs2.W[x][15:0];
if (res1 > (2^16)-1) {
    res1 = (2^16)-1;
    OV = 1;
}
if (res2 < 0) {
    res2 = 0;
    OV = 1;
}
Rd.W[x][31:16] = res1;
Rd.W[x][15:0] = res2;
for RV32, x=0
for RV64, x=1...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UKSTSA16 (unsigned long a, unsigned long b) UKSTSA16 (SIMD 16-bit Unsigned Saturating Straight Subtraction & Addition)

Type: SIMD

#### Syntax:

```
UKSTSA16 Rd, Rs1, Rs2
```

## Purpose:

Do one 16-bit unsigned integer element saturating subtraction and one 16-bit unsigned integer element saturating addition in a 32-bit chunk simultaneously. Operands are from corresponding positions in 32-bit chunks.

## **Description:**

This instruction subtracts the 16-bit unsigned integer element in [31:16] of 32-bit chunks in Rs2 from the 16-bit unsigned integer element in [31:16] of 32-bit chunks in Rs1; at the same time, it adds the 16-bit unsigned integer element in [15:0] of 32-bit chunks in Rs2 with the 16-bit unsigned integer element in [15:0] of 32-bit chunks in Rs1. If any of the results are beyond the 16-bit unsigned number range (0 <= RES <=  $2^{16-1}$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to [31:16] of 32-bit chunks in Rd for subtraction and [15:0] of 32-bit chunks in Rd for addition.

#### **Operations:**

```
res1 = Rs1.W[x][31:16] - Rs2.W[x][31:16];
res2 = Rs1.W[x][15:0] + Rs2.W[x][15:0];
if (res1 < 0) {
    res1 = 0;
    OV = 1;
} else if (res2 > (2^16)-1) {
    res2 = (2^16)-1;
    OV = 1;
}
Rd.W[x][31:16] = res1;
Rd.W[x][15:0] = res2;
for RV32, x=0
for RV64, x=1...0
```

### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UKSUB16 (unsigned long a, unsigned long b)
UKSUB16 (SIMD 16-bit Unsigned Saturating Subtraction)

Type: SIMD

#### Syntax:

```
UKSUB16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit unsigned integer elements saturating subtractions simultaneously.

#### **Description**:

This instruction subtracts the 16-bit unsigned integer elements in Rs2 from the 16-bit unsigned integer elements in Rs1. If any of the results are beyond the 16-bit unsigned number range ( $0 \le RES \le 2^16-1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

## **Operations:**

```
res[x] = Rs1.H[x] - Rs2.H[x];
if (res[x] < 0) {
  res[x] = 0;
  OV = 1;
}
Rd.H[x] = res[x];
for RV32: x=1...0,
for RV64: x=3...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_URADD16 (unsigned long a, unsigned long b)
URADD16 (SIMD 16-bit Unsigned Halving Addition)

Type: SIMD

## Syntax:

```
URADD16 Rd, Rs1, Rs2
```

## Purpose:

Do 16-bit unsigned integer element additions simultaneously. The results are halved to avoid overflow or saturation.

## **Description**:

This instruction adds the 16-bit unsigned integer elements in Rs1 with the 16-bit unsigned integer elements in Rs2. The results are first logically right-shifted by 1 bit and then written to Rd.

### **Examples:**

```
* Ra = 0x7FFF, Rb = 0x7FFF Rt = 0x7FFF

* Ra = 0x8000, Rb = 0x8000 Rt = 0x8000

* Ra = 0x4000, Rb = 0x8000 Rt = 0x6000
```

#### **Operations:**

```
Rd.H[x] = (Rs1.H[x] + Rs2.H[x]) u>> 1;

for RV32: x=1...0,

for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_URCRAS16 (unsigned long a, unsigned long b) URCRAS16 (SIMD 16-bit Unsigned Halving Cross Addition & Subtraction)

Type: SIMD

#### Syntax:

```
URCRAS16 Rd, Rs1, Rs2
```

### Purpose:

Do 16-bit unsigned integer element addition and 16-bit unsigned integer element subtraction in a 32-bit chunk simultaneously. Operands are from crossed positions in 32-bit chunks. The results are halved to avoid overflow or saturation.

## **Description:**

This instruction adds the 16-bit unsigned integer in [31:16] of 32-bit chunks in Rs1 with the 16-bit unsigned integer in [15:0] of 32-bit chunks in Rs2, and subtracts the 16-bit unsigned integer in [31:16] of 32-bit chunks in Rs2 from the 16-bit unsigned integer in [15:0] of 32-bit chunks in Rs1. The element results are first logically right-shifted by 1 bit and then written to [31:16] of 32- bit chunks in Rd and [15:0] of 32-bit chunks in Rd.

#### **Examples:**

```
Please see `URADD16` and `URSUB16` instructions.
```

### **Operations:**

```
Rd.W[x][31:16] = (Rs1.W[x][31:16] + Rs2.W[x][15:0]) u>> 1;
Rd.W[x][15:0] = (Rs1.W[x][15:0] - Rs2.W[x][31:16]) u>> 1;
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_URCRSA16 (unsigned long a, unsigned long b)
URCRSA16 (SIMD 16-bit Unsigned Halving Cross Subtraction & Addition)

Type: SIMD

## Syntax:

```
URCRSA16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit unsigned integer element subtraction and 16-bit unsigned integer element addition in a 32-bit chunk simultaneously. Operands are from crossed positions in 32-bit chunks. The results are halved to avoid overflow or saturation.

## **Description**:

This instruction subtracts the 16-bit unsigned integer in [15:0] of 32-bit chunks in Rs2 from the 16-bit unsigned integer in [31:16] of 32-bit chunks in Rs1, and adds the 16-bit unsigned integer in [15:0] of 32-bit chunks in Rs1 with the 16-bit unsigned integer in [31:16] of 32-bit chunks in Rs2. The two results are first logically right-shifted by 1 bit and then written to [31:16] of 32-bit chunks in Rd and [15:0] of 32-bit chunks in Rd.

## Examples:

```
Please see `URADD16` and `URSUB16` instructions.
```

#### **Operations:**

```
Rd.W[x][31:16] = (Rs1.W[x][31:16] - Rs2.W[x][15:0]) u>> 1;
Rd.W[x][15:0] = (Rs1.W[x][15:0] + Rs2.W[x][31:16]) u>> 1;
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_URSTAS16 (unsigned long a, unsigned long b)
URSTAS16 (SIMD 16-bit Unsigned Halving Straight Addition & Subtraction)

Type: SIMD Syntax:

```
URSTAS16 Rd, Rs1, Rs2
```

## Purpose:

Do 16-bit unsigned integer element addition and 16-bit unsigned integer element subtraction in a 32-bit chunk simultaneously. Operands are from corresponding positions in 32-bit chunks. The results are halved to avoid overflow or saturation.

### **Description:**

This instruction adds the 16-bit unsigned integer in [31:16] of 32-bit chunks in Rs1 with the 16-bit unsigned integer in [31:16] of 32-bit chunks in Rs2, and subtracts the 16-bit unsigned integer in [15:0] of 32-bit chunks in Rs2 from the 16-bit unsigned integer in [15:0] of 32-bit chunks in Rs1. The element results are first logically right-shifted by 1 bit and then written to [31:16] of 32- bit chunks in Rd and [15:0] of 32-bit chunks in Rd.

## **Examples:**

```
Please see `URADD16` and `URSUB16` instructions.
```

## **Operations:**

```
Rd.W[x][31:16] = (Rs1.W[x][31:16] + Rs2.W[x][31:16]) u>> 1;
Rd.W[x][15:0] = (Rs1.W[x][15:0] - Rs2.W[x][15:0]) u>> 1;
for RV32, x=0
for RV64, x=1...0
```

### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_STATIC\_FORCEINLINE unsigned long \_\_RV\_URSTSA16 (unsigned long a, unsigned long b)
URSTSA16 (SIMD 16-bit Unsigned Halving Straight Subtraction & Addition)

Type: SIMD

## Syntax:

```
URCRSA16 Rd, Rs1, Rs2
```

## Purpose:

Do 16-bit unsigned integer element subtraction and 16-bit unsigned integer element addition in a 32-bit chunk simultaneously. Operands are from corresponding positions in 32-bit chunks. The results are halved to avoid overflow or saturation.

## **Description**:

This instruction subtracts the 16-bit unsigned integer in [31:16] of 32-bit chunks in Rs2 from the 16-bit unsigned integer in [31:16] of 32-bit chunks in Rs1, and adds the 16-bit unsigned integer in [15:0] of 32-bit chunks in Rs1 with the 16-bit unsigned integer in [15:0] of 32-bit chunks in Rs2. The two results are first logically right-shifted by 1 bit and then written to [31:16] of 32-bit chunks in Rd and [15:0] of 32-bit chunks in Rd.

## **Examples:**

```
Please see `URADD16` and `URSUB16` instructions.
```

### **Operations:**

```
Rd.W[x][31:16] = (Rs1.W[x][31:16] - Rs2.W[x][31:16]) u>> 1;
Rd.W[x][15:0] = (Rs1.W[x][15:0] + Rs2.W[x][15:0]) u>> 1;
for RV32, x=0
for RV64, x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_URSUB16 (unsigned long a, unsigned long b)
URSUB16 (SIMD 16-bit Unsigned Halving Subtraction)

Type: SIMD

## Syntax:

```
URSUB16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit unsigned integer element subtractions simultaneously. The results are halved to avoid overflow or saturation.

## **Description**:

This instruction subtracts the 16-bit unsigned integer elements in Rs2 from the 16-bit unsigned integer elements in Rs1. The results are first logically right-shifted by 1 bit and then written to Rd.

#### **Examples**:

```
* Ra = 0x7FFF, Rb = 0x8000 Rt = 0xFFFF

* Ra = 0x8000, Rb = 0x7FFF Rt = 0x0000

* Ra = 0x8000, Rb = 0x4000 Rt = 0x2000
```

## **Operations**:

```
Rd.H[x] = (Rs1.H[x] - Rs2.H[x]) u>> 1;

for RV32: x=1...0,

for RV64: x=3...0
```

### **Parameters**

• a – [in] unsigned long type of value stored in a

• **b** – [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

## **SIMD 16-bit Shift Instructions**

```
__STATIC_FORCEINLINE unsigned long __RV_KSLL16 (unsigned long a, unsigned int b)
__STATIC_FORCEINLINE unsigned long __RV_KSLRA16 (unsigned long a, int b)
__STATIC_FORCEINLINE unsigned long __RV_KSLRA16_U (unsigned long a, int b)
 STATIC_FORCEINLINE unsigned long __RV_SLL16 (unsigned long a, unsigned int b)
__STATIC_FORCEINLINE unsigned long __RV_SRA16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_SRA16_U (unsigned long a, unsigned long b)
 _STATIC_FORCEINLINE unsigned long \_\_	ext{RV\_SRL16} (unsigned long a, unsigned int b)
 _STATIC_FORCEINLINE unsigned long __RV_SRL16_U (unsigned long a, unsigned int b)
___RV_KSLLI16 (a, b)
\mathbf{RV}_{\mathbf{SLLI16}}(a, b)
\mathbf{RV}_{\mathbf{SRAI16}}(a, b)
RV SRAI16 U(a,b)
RV SRLI16 (a, b)
 _RV_SRLI16_U(a, b)
group NMSIS_Core_DSP_Intrinsic_SIMD_16B_SHIFT
    SIMD 16-bit Shift Instructions.
    there are 14 SIMD 16-bit shift instructions.
    Defines
      RV KSLLI16 (a, b)
         KSLLI16 (SIMD 16-bit Saturating Shift Left Logical Immediate)
         Type: SIMD
         Syntax:
         KSLLI16 Rd, Rs1, imm4u
```

#### Purpose

Do 16-bit elements logical left shift operations with saturation simultaneously. The shift amount is an immediate value.

## **Description**:

The 16-bit data elements in Rs1 are left-shifted logically. The shifted out bits are filled with zero and the shift amount is specified by the imm4u constant. Any shifted value greater than 2^15-1 is saturated to 2^15-1. Any shifted value smaller than -2^15 is saturated to -2^15. And the saturated results are written to Rd. If any saturation is performed, set OV bit to 1.

#### **Operations:**

```
sa = imm4u[3:0];
if (sa != 0) {
    res[(15+sa):0] = Rs1.H[x] << sa;
    if (res > (2^15)-1) {
        res = 0x7fff; OV = 1;
    } else if (res < -2^15) {
        res = 0x8000; OV = 1;
    }
    Rd.H[x] = res[15:0];
} else {
    Rd = Rs1;
}
for RV32: x=1...0,
for RV64: x=3...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

```
	extbf{	extbf{L}}	extbf{	extbf{RV}}	extbf{	extbf{SLLI16}}\,(a,b)
```

SLLI16 (SIMD 16-bit Shift Left Logical Immediate)

Type: SIMD

#### Syntax:

```
SLLI16 Rd, Rs1, imm4[3:0]
```

#### **Purpose:**

Do 16-bit element logical left shift operations simultaneously. The shift amount is an immediate value.

## **Description**:

The 16-bit elements in Rs1 are left-shifted logically. The shifted out bits are filled with zero and the shift amount is specified by the imm4[3:0] constant. And the results are written to Rd.

## **Operations:**

```
sa = imm4[3:0];
Rd.H[x] = Rs1.H[x] << sa;
for RV32: x=1...0,
for RV64: x=3...0</pre>
```

### **Parameters**

- $\mathbf{a}$   $[\mathbf{in}]$  unsigned long type of value stored in a
- **b [in]** unsigned int type of value stored in b

Returns value stored in unsigned long type

```
__RV_SRAI16 (a, b)
SRAI16 (SIMD 16-bit Shift Right Arithmetic Immediate)
Type: SIMD
```

#### Syntax:

```
SRAI16 Rd, Rs1, imm4u
SRAI16.u Rd, Rs1, imm4u
```

#### **Purpose:**

Do 16-bit elements arithmetic right shift operations simultaneously. The shift amount is an immediate value. The .u form performs additional rounding up operations on the shifted results.

#### **Description**:

The 16-bit data elements in Rs1 are right-shifted arithmetically, that is, the shifted out bits are filled with the sign-bit of the 16-bit data elements. The shift amount is specified by the imm4u constant. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 16-bit data to calculate the final results. And the results are written to Rd.

#### **Operations:**

```
sa = imm4u[3:0];
if (sa > 0) {
  if (`.u` form) { // SRAI16.u
    res[15:-1] = SE17(Rs1.H[x][15:sa-1]) + 1;
    Rd.H[x] = res[15:0];
  } else { // SRAI16
    Rd.H[x] = SE16(Rs1.H[x][15:sa]);
  }
} else {
  Rd = Rs1;
}
for RV32: x=1...0,
for RV64: x=3...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

```
RV SRAI16 U(a,b)
```

SRAI16.u (SIMD 16-bit Rounding Shift Right Arithmetic Immediate)

## Type: SIMD

## Syntax:

```
SRAI16 Rd, Rs1, imm4u
SRAI16.u Rd, Rs1, imm4u
```

### Purpose:

Do 16-bit elements arithmetic right shift operations simultaneously. The shift amount is an immediate value. The .u form performs additional rounding up operations on the shifted results.

### **Description:**

The 16-bit data elements in Rs1 are right-shifted arithmetically, that is, the shifted out bits are filled with the sign-bit of the 16-bit data elements. The shift amount is specified by the imm4u constant. For the

rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 16-bit data to calculate the final results. And the results are written to Rd.

### **Operations:**

```
sa = imm4u[3:0];
if (sa > 0) {
  if (`.u` form) { // SRAI16.u
    res[15:-1] = SE17(Rs1.H[x][15:sa-1]) + 1;
    Rd.H[x] = res[15:0];
  } else { // SRAI16
    Rd.H[x] = SE16(Rs1.H[x][15:sa]);
  }
} else {
  Rd = Rs1;
}
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
___RV_SRLI16 (a, b)
```

SRLI16 (SIMD 16-bit Shift Right Logical Immediate)

Type: SIMD

## Syntax:

```
SRLI16 Rt, Ra, imm4u
SRLI16.u Rt, Ra, imm4u
```

## Purpose:

Do 16-bit elements logical right shift operations simultaneously. The shift amount is an immediate value. The .u form performs additional rounding up operations on the shifted results.

### **Description**:

The 16-bit data elements in Rs1 are right-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the imm4u constant. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 16-bit data element to calculate the final results. And the results are written to Rd.

## **Operations:**

```
sa = imm4u;
if (sa > 0) {
  if (`.u` form) { // SRLI16.u
    res[16:0] = ZE17(Rs1.H[x][15:sa-1]) + 1;
    Rd.H[x] = res[16:1];
  } else { // SRLI16
    Rd.H[x] = ZE16(Rs1.H[x][15:sa]);
  }
} else {
```

(continues on next page)

(continued from previous page)

```
Rd = Rs1;
}
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned int type of value stored in b

Returns value stored in unsigned long type

```
\mathbf{RV}_{\mathbf{SRLI16}}\mathbf{U}\left(a,b\right)
```

SRLI16.u (SIMD 16-bit Rounding Shift Right Logical Immediate)

Type: SIMD

## Syntax:

```
SRLI16 Rt, Ra, imm4u
SRLI16.u Rt, Ra, imm4u
```

### Purpose:

Do 16-bit elements logical right shift operations simultaneously. The shift amount is an immediate value. The .u form performs additional rounding up operations on the shifted results.

## **Description:**

The 16-bit data elements in Rs1 are right-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the imm4u constant. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 16-bit data element to calculate the final results. And the results are written to Rd.

## **Operations:**

```
sa = imm4u;
if (sa > 0) {
  if (`.u` form) { // SRLI16.u
    res[16:0] = ZE17(Rs1.H[x][15:sa-1]) + 1;
    Rd.H[x] = res[16:1];
  } else { // SRLI16
    Rd.H[x] = ZE16(Rs1.H[x][15:sa]);
  }
} else {
  Rd = Rs1;
}
for RV32: x=1...0,
for RV64: x=3...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned int type of value stored in b

Returns value stored in unsigned long type

#### **Functions**

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KSLL16 (unsigned long a, unsigned int b) KSLL16 (SIMD 16-bit Saturating Shift Left Logical)

Type: SIMD

#### Syntax:

```
KSLL16 Rd, Rs1, Rs2
```

## Purpose:

Do 16-bit elements logical left shift operations with saturation simultaneously. The shift amount is a variable from a GPR.

## **Description**:

The 16-bit data elements in Rs1 are left-shifted logically. The shifted out bits are filled with zero and the shift amount is specified by the low-order 4-bits of the value in the Rs2 register. Any shifted value greater than 2^15-1 is saturated to 2^15-1. Any shifted value smaller than -2^15 is saturated to -2^15. And the saturated results are written to Rd. If any saturation is performed, set OV bit to 1.

### **Operations:**

```
sa = Rs2[3:0];
if (sa != 0) {
    res[(15+sa):0] = Rs1.H[x] << sa;
    if (res > (2^15)-1) {
        res = 0x7fff; OV = 1;
    } else if (res < -2^15) {
        res = 0x8000; OV = 1;
    }
    Rd.H[x] = res[15:0];
} else {
    Rd = Rs1;
}
for RV32: x=1...0,
for RV64: x=3...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KSLRA16 (unsigned long a, int b)

KSLRA16 (SIMD 16-bit Shift Left Logical with Saturation or Shift Right Arithmetic)

Type: SIMD

#### Syntax:

```
KSLRA16 Rd, Rs1, Rs2
KSLRA16.u Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit elements logical left (positive) or arithmetic right (negative) shift operation with Q15 saturation for the left shift. The  $\, . \, u$  form performs additional rounding up operations for the right shift.

#### **Description**:

The 16-bit data elements of Rs1 are left-shifted logically or right-shifted arithmetically based on the value of Rs2[4:0]. Rs2[4:0] is in the signed range of [-2^4, 2^4-1]. A positive Rs2[4:0] means logical left shift and a negative Rs2[4:0] means arithmetic right shift. The shift amount is the absolute value of Rs2[4:0]. However, the behavior of Rs2[4:0]== $-2^4$  (0×10) is defined to be equivalent to the behavior of Rs2[4:0]== $-(2^4-1)$  (0×11). The left-shifted results are saturated to the 16-bit signed integer range of [-2^15, 2^15-1]. For the .u form of the instruction, the right-shifted results are added a 1 to the most significant discarded bit position for rounding effect. After the shift, saturation, or rounding, the final results are written to Rd. If any saturation happens, this instruction sets the OV flag. The value of Rs2[31:5] will not affect this instruction.

### **Operations:**

```
if (Rs2[4:0] < 0) {
 sa = -Rs2[4:0];
  sa = (sa == 16)? 15 : sa;
 if (`.u` form) {
   res[15:-1] = SE17(Rs1.H[x][15:sa-1]) + 1;
   Rd.H[x] = res[15:0];
  } else {
   Rd.H[x] = SE16(Rs1.H[x][15:sa]);
  }
} else {
 sa = Rs2[3:0];
 res[(15+sa):0] = Rs1.H[x] << (logic) sa;
 if (res > (2^15)-1) {
   res[15:0] = 0x7fff; OV = 1;
  else if (res < -2^15) {
    res[15:0] = 0x8000; OV = 1;
 d.H[x] = res[15:0];
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** int type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KSLRA16\_U (unsigned long a, int b) KSLRA16.u (SIMD 16-bit Shift Left Logical with Saturation or Rounding Shift Right Arithmetic)

Type: SIMD

## Syntax:

```
KSLRA16 Rd, Rs1, Rs2
KSLRA16.u Rd, Rs1, Rs2
```

## Purpose:

Do 16-bit elements logical left (positive) or arithmetic right (negative) shift operation with Q15 saturation for the left shift. The  $\, . \, u$  form performs additional rounding up operations for the right shift.

## **Description**:

The 16-bit data elements of Rs1 are left-shifted logically or right-shifted arithmetically based on the value of Rs2[4:0]. Rs2[4:0] is in the signed range of [-2^4, 2^4-1]. A positive Rs2[4:0] means logical left shift and a negative Rs2[4:0] means arithmetic right shift. The shift amount is the absolute value of Rs2[4:0]. However, the behavior of Rs2[4:0]== $-2^4$  (0×10) is defined to be equivalent to the behavior of Rs2[4:0]== $-(2^4-1)$  (0×11). The left-shifted results are saturated to the 16-bit signed integer range of [-2^15, 2^15-1]. For the .u form of the instruction, the right-shifted results are added a 1 to the most significant discarded bit position for rounding effect. After the shift, saturation, or rounding, the final results are written to Rd. If any saturation happens, this instruction sets the OV flag. The value of Rs2[31:5] will not affect this instruction.

#### **Operations:**

```
if (Rs2[4:0] < 0) {
 sa = -Rs2[4:0];
 sa = (sa == 16)? 15 : sa;
 if (`.u` form) {
   res[15:-1] = SE17(Rs1.H[x][15:sa-1]) + 1;
   Rd.H[x] = res[15:0];
  } else {
   Rd.H[x] = SE16(Rs1.H[x][15:sa]);
} else {
 sa = Rs2[3:0];
 res[(15+sa):0] = Rs1.H[x] << (logic) sa;
 if (res > (2^15)-1) {
   res[15:0] = 0x7fff; OV = 1;
  } else if (res < -2^15) {
   res[15:0] = 0x8000; OV = 1;
 d.H[x] = res[15:0];
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] int type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_SLL16 (unsigned long a, unsigned int b) SLL16 (SIMD 16-bit Shift Left Logical)

Type: SIMD

#### Syntax:

```
SLL16 Rd, Rs1, Rs2
```

## Purpose:

Do 16-bit elements logical left shift operations simultaneously. The shift amount is a variable from a GPR.

#### **Description**:

The 16-bit elements in Rs1 are left-shifted logically. And the results are written to Rd. The shifted out bits are filled with zero and the shift amount is specified by the low-order 4-bits of the value in the Rs2 register.

## **Operations:**

```
sa = Rs2[3:0];
Rd.H[x] = Rs1.H[x] << sa;
for RV32: x=1...0,
for RV64: x=3...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SRA16 (unsigned long a, unsigned long b) SRA16 (SIMD 16-bit Shift Right Arithmetic)
```

Type: SIMD

#### Syntax:

```
SRA16 Rd, Rs1, Rs2
SRA16.u Rd, Rs1, Rs2
```

## Purpose:

Do 16-bit element arithmetic right shift operations simultaneously. The shift amount is a variable from a GPR. The .u form performs additional rounding up operations on the shifted results.

#### **Description**:

The 16-bit data elements in Rs1 are right-shifted arithmetically, that is, the shifted out bits are filled with the sign-bit of the data elements. The shift amount is specified by the low-order 4-bits of the value in the Rs2 register. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 16-bit data element to calculate the final results. And the results are written to Rd.

## **Operations:**

```
sa = Rs2[3:0];
if (sa != 0) {
   if (`.u` form) { // SRA16.u
      res[15:-1] = SE17(Rs1.H[x][15:sa-1]) + 1;
      Rd.H[x] = res[15:0];
   } else { // SRA16
      Rd.H[x] = SE16(Rs1.H[x][15:sa])
   }
} else {
   Rd = Rs1;
}
for RV32: x=1...0,
for RV64: x=3...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SRA16_U (unsigned long a, unsigned long b)
SRA16.u (SIMD 16-bit Rounding Shift Right Arithmetic)
```

## Type: SIMD

#### Syntax:

```
SRA16 Rd, Rs1, Rs2
SRA16.u Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit element arithmetic right shift operations simultaneously. The shift amount is a variable from a GPR. The .u form performs additional rounding up operations on the shifted results.

#### **Description**:

The 16-bit data elements in Rs1 are right-shifted arithmetically, that is, the shifted out bits are filled with the sign-bit of the data elements. The shift amount is specified by the low-order 4-bits of the value in the Rs2 register. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 16-bit data element to calculate the final results. And the results are written to Rd.

## **Operations**:

```
sa = Rs2[3:0];
if (sa != 0) {
   if (`.u` form) { // SRA16.u
      res[15:-1] = SE17(Rs1.H[x][15:sa-1]) + 1;
      Rd.H[x] = res[15:0];
   } else { // SRA16
      Rd.H[x] = SE16(Rs1.H[x][15:sa])
   }
} else {
   Rd = Rs1;
}
for RV32: x=1...0,
for RV64: x=3...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_SRL16 (unsigned long a, unsigned int b) SRL16 (SIMD 16-bit Shift Right Logical)

Type: SIMD

### Syntax:

```
SRL16 Rt, Ra, Rb
SRL16.u Rt, Ra, Rb
```

#### Purpose:

Do 16-bit elements logical right shift operations simultaneously. The shift amount is a variable from a GPR. The .u form performs additional rounding upoperations on the shifted results.

## **Description:**

The 16-bit data elements in Rs1 are right-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the low-order 4-bits of the value in the Rs2 register. For the rounding

operation of the .u form, a value of 1 is added to the most significant discarded bit of each 16-bit data element to calculate the final results. And the results are written to Rd.

### **Operations:**

```
sa = Rs2[3:0];
if (sa > 0) {
   if (`.u` form) { // SRL16.u
      res[16:0] = ZE17(Rs1.H[x][15:sa-1]) + 1;
      Rd.H[x] = res[16:1];
   } else { // SRL16
      Rd.H[x] = ZE16(Rs1.H[x][15:sa]);
   }
} else {
   Rd = Rs1;
}
for RV32: x=1...0,
for RV64: x=3...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

Returns value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_SRL16\_U (unsigned long a, unsigned int b) SRL16.u (SIMD 16-bit Rounding Shift Right Logical)

Type: SIMD

## Syntax:

```
SRL16 Rt, Ra, Rb
SRL16.u Rt, Ra, Rb
```

## Purpose:

Do 16-bit elements logical right shift operations simultaneously. The shift amount is a variable from a GPR. The .u form performs additional rounding upoperations on the shifted results.

## **Description**:

The 16-bit data elements in Rs1 are right-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the low-order 4-bits of the value in the Rs2 register. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 16-bit data element to calculate the final results. And the results are written to Rd.

#### **Operations:**

```
sa = Rs2[3:0];
if (sa > 0) {
  if (`.u` form) { // SRL16.u
    res[16:0] = ZE17(Rs1.H[x][15:sa-1]) + 1;
    Rd.H[x] = res[16:1];
  } else { // SRL16
    Rd.H[x] = ZE16(Rs1.H[x][15:sa]);
  }
} else {
  Rd = Rs1;
```

(continues on next page)

(continued from previous page)

```
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

#### SIMD 8-bit Shift Instructions

```
__STATIC_FORCEINLINE unsigned long __RV_KSLL8 (unsigned long a, unsigned int b)
__STATIC_FORCEINLINE unsigned long __RV_KSLRA8 (unsigned long a, int b)
__STATIC_FORCEINLINE unsigned long __RV_KSLRA8_U (unsigned long a, int b)
__STATIC_FORCEINLINE unsigned long __RV_SLL8 (unsigned long a, unsigned int b)
__STATIC_FORCEINLINE unsigned long __RV_SRA8 (unsigned long a, unsigned int b)
STATIC FORCEINLINE unsigned long RV SRA8 U (unsigned long a, unsigned int b)
__STATIC_FORCEINLINE unsigned long __RV_SRL8 (unsigned long a, unsigned int b)
__STATIC_FORCEINLINE unsigned long __RV_SRL8_U (unsigned long a, unsigned int b)
\mathbf{RV}_{\mathbf{KSLLI8}}(a, b)
___RV_SLLI8 (a, b)
\_RV_SRAI8 (a, b)
\_RV_SRAI8_U (a, b)
___RV_SRLI8 (a, b)
RV SRLI8 U(a,b)
group NMSIS_Core_DSP_Intrinsic_SIMD_8B_SHIFT
    SIMD 8-bit Shift Instructions.
    there are 14 SIMD 8-bit shift instructions.
    Defines
     RV KSLLI8 (a, b)
        KSLLI8 (SIMD 8-bit Saturating Shift Left Logical Immediate)
        Type: SIMD
        Syntax:
        KSLLI8 Rd, Rs1, imm3u
```

## Purpose:

Do 8-bit elements logical left shift operations with saturation simultaneously. The shift amount is an immediate value.

## **Description**:

The 8-bit data elements in Rs1 are left-shifted logically. The shifted out bits are filled with zero and the shift amount is specified by the imm3u constant. Any shifted value greater than 2^7-1 is saturated to 2^7-1. Any shifted value smaller than -2^7 is saturated to -2^7. And the saturated results are written to Rd. If any saturation is performed, set OV bit to 1.

## **Operations:**

```
sa = imm3u[2:0];
if (sa != 0) {
    res[(7+sa):0] = Rs1.B[x] << sa;
    if (res > (2^7)-1) {
        res = 0x7f; OV = 1;
    } else if (res < -2^7) {
        res = 0x80; OV = 1;
    }
    Rd.B[x] = res[7:0];
} else {
    Rd = Rs1;
}
for RV32: x=3...0,
for RV64: x=7...0</pre>
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

```
\mathbf{RV}\mathbf{SLLI8}\left(a,b\right)
```

SLLI8 (SIMD 8-bit Shift Left Logical Immediate)

Type: SIMD

## Syntax:

```
SLLI8 Rd, Rs1, imm3u
```

## Purpose:

Do 8-bit elements logical left shift operations simultaneously. The shift amount is an immediate value.

#### **Description**:

The 8-bit elements in Rs1 are left-shifted logically. And the results are written to Rd. The shifted out bits are filled with zero and the shift amount is specified by the imm3u constant.

## **Operations:**

```
sa = imm3u[2:0];
Rd.B[x] = Rs1.B[x] << sa;
for RV32: x=3...0,
for RV64: x=7...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

```
\mathbf{RV}\mathbf{SRAI8}(a,b)
```

SRAI8 (SIMD 8-bit Shift Right Arithmetic Immediate)

Type: SIMD

## Syntax:

```
SRAI8 Rd, Rs1, imm3u
SRAI8.u Rd, Rs1, imm3u
```

### Purpose:

Do 8-bit element arithmetic right shift operations simultaneously. The shift amount is an immediate value. The .u form performs additional rounding up operations on the shifted results.

## **Description**:

The 8-bit data elements in Rs1 are right-shifted arithmetically, that is, the shifted out bits are filled with the sign-bit of the data elements. The shift amount is specified by the imm3u constant. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 8-bit data element to calculate the final results. And the results are written to Rd.

## **Operations:**

```
sa = imm3u[2:0];
if (sa > 0) {
   if (`.u` form) { // SRA8.u
      res[7:-1] = SE9(Rs1.B[x][7:sa-1]) + 1;
      Rd.B[x] = res[7:0];
   } else { // SRA8
      Rd.B[x] = SE8(Rd.B[x][7:sa])
   }
} else {
   Rd = Rs1;
}
for RV32: x=3...0,
for RV64: x=7...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned int type of value stored in b

Returns value stored in unsigned long type

```
\_\_\mathtt{RV\_SRAI8\_U}\left(a,b\right)
```

SRAI8.u (SIMD 8-bit Rounding Shift Right Arithmetic Immediate)

Type: SIMD

Syntax:

```
SRAI8 Rd, Rs1, imm3u
SRAI8.u Rd, Rs1, imm3u
```

#### Purpose:

Do 8-bit element arithmetic right shift operations simultaneously. The shift amount is an immediate value. The .u form performs additional rounding up operations on the shifted results.

### **Description:**

The 8-bit data elements in Rs1 are right-shifted arithmetically, that is, the shifted out bits are filled with the sign-bit of the data elements. The shift amount is specified by the imm3u constant. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 8-bit data element to calculate the final results. And the results are written to Rd.

### **Operations:**

```
sa = imm3u[2:0];
if (sa > 0) {
  if (`.u` form) { // SRA8.u
    res[7:-1] = SE9(Rs1.B[x][7:sa-1]) + 1;
    Rd.B[x] = res[7:0];
  } else { // SRA8
    Rd.B[x] = SE8(Rd.B[x][7:sa])
  }
} else {
  Rd = Rs1;
}
for RV32: x=3...0,
for RV64: x=7...0
```

### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned int type of value stored in b

**Returns** value stored in unsigned long type

#### RV SRLI8 (a, b)

SRLI8 (SIMD 8-bit Shift Right Logical Immediate)

## Type: SIMD

## Syntax:

```
SRLI8 Rt, Ra, imm3u
SRLI8.u Rt, Ra, imm3u
```

## Purpose:

Do 8-bit elements logical right shift operations simultaneously. The shift amount is an immediate value. The .u form performs additional rounding up operations on the shifted results.

#### **Description**:

The 8-bit data elements in Rs1 are right-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the imm3u constant. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 8-bit data element to calculate the final results. And the results are written to Rd.

## **Operations:**

```
sa = imm3u[2:0];
if (sa > 0) {
  if (`.u` form) { // SRLI8.u
    res[8:0] = ZE9(Rs1.B[x][7:sa-1]) + 1;
    Rd.B[x] = res[8:1];
  } else { // SRLI8
    Rd.B[x] = ZE8(Rs1.B[x][7:sa]);
  }
} else {
  Rd = Rs1;
}
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

Returns value stored in unsigned long type

```
\mathbf{RV}_{\mathbf{SRLI8}}\mathbf{U}\left(a,b\right)
```

SRLI8.u (SIMD 8-bit Rounding Shift Right Logical Immediate)

Type: SIMD

## Syntax:

```
SRLI8 Rt, Ra, imm3u
SRLI8.u Rt, Ra, imm3u
```

### Purpose:

Do 8-bit elements logical right shift operations simultaneously. The shift amount is an immediate value. The .u form performs additional rounding up operations on the shifted results.

## **Description**:

The 8-bit data elements in Rs1 are right-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the imm3u constant. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 8-bit data element to calculate the final results. And the results are written to Rd.

## **Operations:**

```
sa = imm3u[2:0];
if (sa > 0) {
  if (`.u` form) { // SRLI8.u
    res[8:0] = ZE9(Rs1.B[x][7:sa-1]) + 1;
    Rd.B[x] = res[8:1];
  } else { // SRLI8
    Rd.B[x] = ZE8(Rs1.B[x][7:sa]);
  }
} else {
  Rd = Rs1;
}
```

(continues on next page)

(continued from previous page)

```
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

#### **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_KSLL8 (unsigned long a, unsigned int b) KSLL8 (SIMD 8-bit Saturating Shift Left Logical)
```

Type: SIMD

## Syntax:

```
KSLL8 Rd, Rs1, Rs2
```

## Purpose:

Do 8-bit elements logical left shift operations with saturation simultaneously. The shift amount is a variable from a GPR.

## **Description:**

The 8-bit data elements in Rs1 are left-shifted logically. The shifted out bits are filled with zero and the shift amount is specified by the low-order 3-bits of the value in the Rs2 register. Any shifted value greater than 2^7-1 is saturated to 2^7-1. Any shifted value smaller than -2^7 is saturated to -2^7. And the saturated results are written to Rd. If any saturation is performed, set OV bit to 1.

## **Operations:**

```
sa = Rs2[2:0];
if (sa != 0) {
    res[(7+sa):0] = Rs1.B[x] << sa;
    if (res > (2^7)-1) {
        res = 0x7f; OV = 1;
    } else if (res < -2^7) {
        res = 0x80; OV = 1;
    }
    Rd.B[x] = res[7:0];
} else {
    Rd = Rs1;
}
for RV32: x=3...0,
for RV64: x=7...0</pre>
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KSLRA8 (unsigned long a, int b)
KSLRA8 (SIMD 8-bit Shift Left Logical with Saturation or Shift Right Arithmetic)

Type: SIMD

## Syntax:

```
KSLRA8 Rd, Rs1, Rs2
KSLRA8.u Rd, Rs1, Rs2
```

#### Purpose:

Do 8-bit elements logical left (positive) or arithmetic right (negative) shift operation with Q7 saturation for the left shift. The . u form performs additional rounding up operations for the right shift.

## **Description**:

The 8-bit data elements of Rs1 are left-shifted logically or right-shifted arithmetically based on the value of Rs2[3:0]. Rs2[3:0] is in the signed range of [-2^3, 2^3-1]. A positive Rs2[3:0] means logical left shift and a negative Rs2[3:0] means arithmetic right shift. The shift amount is the absolute value of Rs2[3:0]. However, the behavior of Rs2[3:0]==-2^3 (0x8) is defined to be equivalent to the behavior of Rs2[3:0]=-(2^3-1) (0x9). The left-shifted results are saturated to the 8-bit signed integer range of [-2^7, 2^7-1]. For the .u form of the instruction, the right-shifted results are added a 1 to the most significant discarded bit position for rounding effect. After the shift, saturation, or rounding, the final results are written to Rd. If any saturation happens, this instruction sets the OV flag. The value of Rs2[31:4] will not affect this instruction.

## **Operations:**

```
if (Rs2[3:0] < 0) {
 sa = -Rs2[3:0];
 sa = (sa == 8)? 7 : sa;
 if (`.u` form) {
   res[7:-1] = SE9(Rs1.B[x][7:sa-1]) + 1;
   Rd.B[x] = res[7:0];
  } else {
    Rd.B[x] = SE8(Rs1.B[x][7:sa]);
} else {
 sa = Rs2[2:0];
 res[(7+sa):0] = Rs1.B[x] << (logic) sa;
 if (res > (2^7)-1) {
   res[7:0] = 0x7f; OV = 1;
  } else if (res < -2^7) {
    res[7:0] = 0x80; OV = 1;
 Rd.B[x] = res[7:0];
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] int type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KSLRA8\_U (unsigned long a, int b)

KSLRA8.u (SIMD 8-bit Shift Left Logical with Saturation or Rounding Shift Right Arithmetic)

#### Type: SIMD

#### Syntax:

```
KSLRA8 Rd, Rs1, Rs2
KSLRA8.u Rd, Rs1, Rs2
```

#### Purpose:

Do 8-bit elements logical left (positive) or arithmetic right (negative) shift operation with Q7 saturation for the left shift. The .u form performs additional rounding up operations for the right shift.

#### **Description**:

The 8-bit data elements of Rs1 are left-shifted logically or right-shifted arithmetically based on the value of Rs2[3:0]. Rs2[3:0] is in the signed range of [-2^3, 2^3-1]. A positive Rs2[3:0] means logical left shift and a negative Rs2[3:0] means arithmetic right shift. The shift amount is the absolute value of Rs2[3:0]. However, the behavior of Rs2[3:0]==-2^3 (0x8) is defined to be equivalent to the behavior of Rs2[3:0]==-(2^3-1) (0x9). The left-shifted results are saturated to the 8-bit signed integer range of [-2^7, 2^7-1]. For the .u form of the instruction, the right-shifted results are added a 1 to the most significant discarded bit position for rounding effect. After the shift, saturation, or rounding, the final results are written to Rd. If any saturation happens, this instruction sets the OV flag. The value of Rs2[31:4] will not affect this instruction.

## **Operations:**

```
if (Rs2[3:0] < 0) {
 sa = -Rs2[3:0];
  sa = (sa == 8)? 7 : sa;
 if (`.u` form) {
   res[7:-1] = SE9(Rs1.B[x][7:sa-1]) + 1;
   Rd.B[x] = res[7:0];
  } else {
    Rd.B[x] = SE8(Rs1.B[x][7:sa]);
} else {
 sa = Rs2[2:0];
 res[(7+sa):0] = Rs1.B[x] <<(logic) sa;
 if (res > (2^7)-1) {
   res[7:0] = 0x7f; OV = 1;
  \} else if (res < -2^7) {
   res[7:0] = 0x80; OV = 1;
 Rd.B[x] = res[7:0];
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** int type of value stored in b

**Returns** value stored in unsigned long type

Type: SIMD

### Syntax:

```
SLL8 Rd, Rs1, Rs2
```

#### Purpose:

Do 8-bit elements logical left shift operations simultaneously. The shift amount is a variable from a GPR.

## **Description**:

The 8-bit elements in Rs1 are left-shifted logically. And the results are written to Rd. The shifted out bits are filled with zero and the shift amount is specified by the low-order 3-bits of the value in the Rs2 register.

## **Operations**:

```
sa = Rs2[2:0];
Rd.B[x] = Rs1.B[x] << sa;
for RV32: x=3...0,
for RV64: x=7...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned int type of value stored in b

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SRA8 (unsigned long a, unsigned int b) SRA8 (SIMD 8-bit Shift Right Arithmetic)
```

Type: SIMD

## Syntax:

```
SRA8 Rd, Rs1, Rs2
SRA8.u Rd, Rs1, Rs2
```

## Purpose:

Do 8-bit element arithmetic right shift operations simultaneously. The shift amount is a variable from a GPR. The .u form performs additional rounding up operations on the shifted results.

## **Description:**

The 8-bit data elements in Rs1 are right-shifted arithmetically, that is, the shifted out bits are filled with the sign-bit of the data elements. The shift amount is specified by the low-order 3-bits of the value in the Rs2 register. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 8-bit data element to calculate the final results. And the results are written to Rd.

#### **Operations:**

```
sa = Rs2[2:0];
if (sa > 0) {
  if (`.u` form) { // SRA8.u
    res[7:-1] = SE9(Rs1.B[x][7:sa-1]) + 1;
    Rd.B[x] = res[7:0];
  } else { // SRA8
    Rd.B[x] = SE8(Rd.B[x][7:sa])
  }
} else {
  Rd = Rs1;
```

(continues on next page)

(continued from previous page)

```
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

Returns value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_SRA8\_U (unsigned long a, unsigned int b)
SRA8.u (SIMD 8-bit Rounding Shift Right Arithmetic)

Type: SIMD

## Syntax:

```
SRA8 Rd, Rs1, Rs2
SRA8.u Rd, Rs1, Rs2
```

### Purpose:

Do 8-bit element arithmetic right shift operations simultaneously. The shift amount is a variable from a GPR. The . u form performs additional rounding up operations on the shifted results.

## **Description**:

The 8-bit data elements in Rs1 are right-shifted arithmetically, that is, the shifted out bits are filled with the sign-bit of the data elements. The shift amount is specified by the low-order 3-bits of the value in the Rs2 register. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 8-bit data element to calculate the final results. And the results are written to Rd.

## **Operations:**

```
sa = Rs2[2:0];
if (sa > 0) {
   if (`.u` form) { // SRA8.u
      res[7:-1] = SE9(Rs1.B[x][7:sa-1]) + 1;
      Rd.B[x] = res[7:0];
   } else { // SRA8
      Rd.B[x] = SE8(Rd.B[x][7:sa])
   }
} else {
   Rd = Rs1;
}
for RV32: x=3...0,
for RV64: x=7...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- b [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_SRL8 (unsigned long a, unsigned int b) SRL8 (SIMD 8-bit Shift Right Logical)

### Type: SIMD

#### Syntax:

```
SRL8 Rt, Ra, Rb
SRL8.u Rt, Ra, Rb
```

#### Purpose:

Do 8-bit elements logical right shift operations simultaneously. The shift amount is a variable from a GPR. The .u form performs additional rounding up operations on the shifted results.

#### **Description**:

The 8-bit data elements in Rs1 are right-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the low-order 3-bits of the value in the Rs2 register. For the rounding operation of the .u form, a value of 1 is added to the most significant discarded bit of each 8-bit data element to calculate the final results. And the results are written to Rd.

### **Operations:**

```
sa = Rs2[2:0];
if (sa > 0) {
   if (`.u` form) { // SRL8.u
      res[8:0] = ZE9(Rs1.B[x][7:sa-1]) + 1;
      Rd.B[x] = res[8:1];
   } else { // SRL8
      Rd.B[x] = ZE8(Rs1.B[x][7:sa]);
   }
} else {
   Rd = Rs1;
}
for RV32: x=3...0,
for RV64: x=7...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_SRL8\_U (unsigned long a, unsigned int b) SRL8.u (SIMD 8-bit Rounding Shift Right Logical)

# Type: SIMD

# Syntax:

```
SRL8 Rt, Ra, Rb
SRL8.u Rt, Ra, Rb
```

# Purpose :

Do 8-bit elements logical right shift operations simultaneously. The shift amount is a variable from a GPR. The .u form performs additional rounding up operations on the shifted results.

### **Description:**

The 8-bit data elements in Rs1 are right-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the low-order 3-bits of the value in the Rs2 register. For the rounding

operation of the .u form, a value of 1 is added to the most significant discarded bit of each 8-bit data element to calculate the final results. And the results are written to Rd.

#### **Operations:**

```
sa = Rs2[2:0];
if (sa > 0) {
  if (`.u` form) { // SRL8.u
    res[8:0] = ZE9(Rs1.B[x][7:sa-1]) + 1;
    Rd.B[x] = res[8:1];
  } else { // SRL8
    Rd.B[x] = ZE8(Rs1.B[x][7:sa]);
  }
} else {
  Rd = Rs1;
}
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

Returns value stored in unsigned long type

# SIMD 16-bit Compare Instructions

```
__STATIC_FORCEINLINE unsigned long __RV_CMPEQ16 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_SCMPLE16 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_SCMPLT16 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_UCMPLE16 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_UCMPLT16 (unsigned long a, unsigned long b)

group NMSIS_Core_DSP_Intrinsic_SIMD_16B_CMP

SIMD 16-bit Compare Instructions.
```

there are 5 SIMD 16-bit Compare instructions.

### **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_CMPEQ16 (unsigned long a, unsigned long b) CMPEQ16 (SIMD 16-bit Integer Compare Equal)
```

Type: SIMD

#### **Syntax:**

```
CMPEQ16 Rd, Rs1, Rs2
```

#### Purpose :

Do 16-bit integer elements equal comparisons simultaneously.

# **Description**:

This instruction compares the 16-bit integer elements in Rs1 with the 16-bit integer elements in Rs2 to see if they are equal. If they are equal, the result is 0xFFFF; otherwise, the result is 0x0. The 16-bit element comparison results are written to Rt.

#### Note:

This instruction can be used for either signed or unsigned numbers.

### **Operations:**

```
Rd.H[x] = (Rs1.H[x] == Rs2.H[x])? 0xfffff : 0x0;
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SCMPLE16 (unsigned long a, unsigned long b) SCMPLE16 (SIMD 16-bit Signed Compare Less Than & Equal)
```

Type: SIMD

#### Syntax:

```
SCMPLE16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit signed integer elements less than & equal comparisons simultaneously.

# **Description**:

This instruction compares the 16-bit signed integer elements in Rs1 with the 16-bit signed integer elements in Rs2 to see if the one in Rs1 is less than or equal to the one in Rs2. If it is true, the result is 0xFFFF; otherwise, the result is 0x0. The element comparison results are written to Rd.

#### **Operations:**

```
Rd.H[x] = (Rs1.H[x] {le} Rs2.H[x])? 0xfffff : 0x0;
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- $\mathbf{a} [\mathbf{in}]$  unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_SCMPLT16 (unsigned long a, unsigned long b)
SCMPLT16 (SIMD 16-bit Signed Compare Less Than)

Type: SIMD

#### Syntax:

```
SCMPLT16 Rd, Rs1, Rs2
```

Do 16-bit signed integer elements less than comparisons simultaneously.

#### **Description**:

This instruction compares the 16-bit signed integer elements in Rs1 with the two 16- bit signed integer elements in Rs2 to see if the one in Rs1 is less than the one in Rs2. If it is true, the result is 0xFFFF; otherwise, the result is 0x0. The element comparison results are written to Rd.

#### **Operations:**

```
Rd.H[x] = (Rs1.H[x] < Rs2.H[x])? 0xfffff : 0x0;
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UCMPLE16 (unsigned long a, unsigned long b)
UCMPLE16 (SIMD 16-bit Unsigned Compare Less Than & Equal)

Type: SIMD

### Syntax:

```
UCMPLE16 Rd, Rs1, Rs2
```

### Purpose:

Do 16-bit unsigned integer elements less than & equal comparisons simultaneously.

### **Description**:

This instruction compares the 16-bit unsigned integer elements in Rs1 with the 16-bit unsigned integer elements in Rs2 to see if the one in Rs1 is less than or equal to the one in Rs2. If it is true, the result is 0xFFFF; otherwise, the result is 0x0. The element comparison results are written to Rd.

### **Operations:**

```
Rd.H[x] = (Rs1.H[x] <=u Rs2.H[x])? 0xfffff : 0x0;
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UCMPLT16 (unsigned long a, unsigned long b)
UCMPLT16 (SIMD 16-bit Unsigned Compare Less Than)

Type: SIMD

Syntax:

```
UCMPLT16 Rd, Rs1, Rs2
```

Do 16-bit unsigned integer elements less than comparisons simultaneously.

#### **Description**:

This instruction compares the 16-bit unsigned integer elements in Rs1 with the 16-bit unsigned integer elements in Rs2 to see if the one in Rs1 is less than the one in Rs2. If it is true, the result is 0xFFFF; otherwise, the result is 0x0. The element comparison results are written to Rd.

# **Operations:**

```
Rd.H[x] = (Rs1.H[x] <u Rs2.H[x])? 0xfffff : 0x0;
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

### **SIMD 8-bit Compare Instructions**

```
__STATIC_FORCEINLINE unsigned long __RV_CMPEQ8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_SCMPLE8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_SCMPLT8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_UCMPLE8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_UCMPLT8 (unsigned long a, unsigned long b)

group NMSIS_Core_DSP_Intrinsic_SIMD_8B_CMP

SIMD 8-bit Compare Instructions.
```

there are 5 SIMD 8-bit Compare instructions.

### **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_CMPEQ8 (unsigned long a, unsigned long b)
    CMPEQ8 (SIMD 8-bit Integer Compare Equal)

Type: SIMD

Syntax:
```

```
CMPEQ8 Rs, Rs1, Rs2
```

### Purpose:

Do 8-bit integer elements equal comparisons simultaneously.

### **Description**:

This instruction compares the 8-bit integer elements in Rs1 with the 8-bit integer elements in Rs2 to see if they are equal. If they are equal, the result is 0xFF; otherwise, the result is 0x0. The 8-bit element comparison results are written to Rd.

#### Note:

This instruction can be used for either signed or unsigned numbers.

### **Operations:**

```
Rd.B[x] = (Rs1.B[x] == Rs2.B[x])? 0xff : 0x0;
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SCMPLE8 (unsigned long a, unsigned long b) SCMPLE8 (SIMD 8-bit Signed Compare Less Than & Equal)
```

Type: SIMD

#### Syntax:

```
SCMPLE8 Rd, Rs1, Rs2
```

#### Purpose:

Do 8-bit signed integer elements less than & equal comparisons simultaneously.

# **Description**:

This instruction compares the 8-bit signed integer elements in Rs1 with the 8-bit signed integer elements in Rs2 to see if the one in Rs1 is less than or equal to the one in Rs2. If it is true, the result is 0xFF; otherwise, the result is 0x0. The element comparison results are written to Rd

## **Operations:**

```
Rd.B[x] = (Rs1.B[x] {le} Rs2.B[x])? 0xff : 0x0;
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- $\mathbf{a} [\mathbf{in}]$  unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SCMPLT8 (unsigned long a, unsigned long b) SCMPLT8 (SIMD 8-bit Signed Compare Less Than)
```

Type: SIMD

#### Syntax:

```
SCMPLT8 Rd, Rs1, Rs2
```

Do 8-bit signed integer elements less than comparisons simultaneously.

### **Description**:

This instruction compares the 8-bit signed integer elements in Rs1 with the 8-bit signed integer elements in Rs2 to see if the one in Rs1 is less than the one in Rs2. If it is true, the result is 0xFF; otherwise, the result is 0x0. The element comparison results are written to Rd.

# **Operations:**

```
Rd.B[x] = (Rs1.B[x] < Rs2.B[x])? 0xff : 0x0;
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UCMPLE8 (unsigned long a, unsigned long b)

UCMPLE8 (SIMD 8-bit Unsigned Compare Less Than & Equal)

Type: SIMD

#### Syntax:

```
UCMPLE8 Rd, Rs1, Rs2
```

# Purpose:

Do 8-bit unsigned integer elements less than & equal comparisons simultaneously.

### **Description**:

This instruction compares the 8-bit unsigned integer elements in Rs1 with the 8-bit unsigned integer elements in Rs2 to see if the one in Rs1 is less than or equal to the one in Rs2. If it is true, the result is 0xFF; otherwise, the result is 0x0. The four comparison results are written to Rd.

### **Operations:**

```
Rd.B[x] = (Rs1.B[x] <=u Rs2.B[x])? 0xff : 0x0;
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UCMPLT8 (unsigned long a, unsigned long b)

UCMPLT8 (SIMD 8-bit Unsigned Compare Less Than)

Type: SIMD Syntax:

```
UCMPLT8 Rd, Rs1, Rs2
```

Do 8-bit unsigned integer elements less than comparisons simultaneously.

### **Description**:

This instruction compares the 8-bit unsigned integer elements in Rs1 with the 8-bit unsigned integer elements in Rs2 to see if the one in Rs1 is less than the one in Rs2. If it is true, the result is 0xFF; otherwise, the result is 0x0. The element comparison results are written to Rd.

# **Operations:**

```
Rd.B[x] = (Rs1.B[x] <u Rs2.B[x])? 0xff : 0x0;
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

# SIMD 16-bit Multiply Instructions

```
__STATIC_FORCEINLINE unsigned long __RV_KHM16 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_KHMX16 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long long __RV_SMUL16 (unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_SMULX16 (unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_UMUL16 (unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_UMULX16 (unsigned int a, unsigned int b)

group NMSIS_Core_DSP_Intrinsic_SIMD_16B_MULTIPLY

SIMD 16-bit Multiply Instructions.

there are 6 SIMD 16-bit Multiply instructions.
```

### **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_KHM16 (unsigned long a, unsigned long b) KHM16 (SIMD Signed Saturating Q15 Multiply)
```

Type: SIMD

### Syntax:

```
KHM16 Rd, Rs1, Rs2
KHMX16 Rd, Rs1, Rs2
```

#### Purpose:

Do Q15xQ15 element multiplications simultaneously. The Q30 results are then reduced to Q15 numbers again.

### **Description**:

For the KHM16 instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. For the KHMX16 instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. The Q30 results are then right-shifted 15-bits and saturated into Q15 values. The Q15 results are then written into Rd. When both the two Q15 inputs of a multiplication are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.

# **Operations:**

```
if (is `KHM16`) {
  oplt = Rs1.H[x+1]; op2t = Rs2.H[x+1]; // top
  oplb = Rs1.H[x]; op2b = Rs2.H[x]; // bottom
} else if (is `KHMX16`) {
  oplt = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top
  oplb = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  if (0x8000 != aop | 0x8000 != bop) {
    res = (aop s* bop) >> 15;
  } else {
    res= 0x7FFF;
    OV = 1;
  }
}
Rd.W[x/2] = concat(rest, resb);
for RV32: x=0
for RV64: x=0,2
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KHMX16 (unsigned long a, unsigned long b)
KHMX16 (SIMD Signed Saturating Crossed Q15 Multiply)

Type: SIMD

### Syntax:

```
KHM16 Rd, Rs1, Rs2
KHMX16 Rd, Rs1, Rs2
```

### Purpose:

Do Q15xQ15 element multiplications simultaneously. The Q30 results are then reduced to Q15 numbers again.

# **Description**:

For the KHM16 instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. For the KHMX16 instruction,

multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. The Q30 results are then right-shifted 15-bits and saturated into Q15 values. The Q15 results are then written into Rd. When both the two Q15 inputs of a multiplication are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.

# **Operations:**

```
if (is `KHM16`) {
    op1t = Rs1.H[x+1]; op2t = Rs2.H[x+1]; // top
    op1b = Rs1.H[x]; op2b = Rs2.H[x]; // bottom
} else if (is `KHMX16`) {
    op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top
    op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
    if (0x8000 != aop | 0x8000 != bop) {
        res = (aop s* bop) >> 15;
    } else {
        res= 0x7FFF;
        OV = 1;
    }
}
Rd.W[x/2] = concat(rest, resb);
for RV32: x=0
for RV64: x=0,2
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_SMUL16 (unsigned int a, unsigned int b) SMUL16 (SIMD Signed 16-bit Multiply)

Type: SIMD

# Syntax:

```
SMUL16 Rd, Rs1, Rs2
SMULX16 Rd, Rs1, Rs2
```

#### Purpose:

Do signed 16-bit multiplications and generate two 32-bit results simultaneously.

#### **RV32 Description**:

For the SMUL16 instruction, multiply the top 16-bit Q15 content of Rs1 with the top 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. For the SMULX16 instruction, multiply the top 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the top 16-bit Q15 content of Rs2. The two Q30 results are then written into an even/odd pair of registers specified by Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even 2d register of the pair contains the 32-bit result calculated from the bottom part of Rs1.

#### **RV64 Description**:

For the SMUL16 instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. For the SMULX16 instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. The two 32-bit Q30 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]

# **Operations:**

```
* RV32:
if (is `SMUL16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
 op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `SMULX16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
 op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
 res = aop s* bop;
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H] = rest;
R[t_L] = resb;
* RV64:
if (is `SMUL16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
 op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `SMULX16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
 op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
 res = aop s* bop;
Rd.W[1] = rest;
Rd.W[0] = resb;
```

### **Parameters**

- a [in] unsigned int type of value stored in a
- **b [in]** unsigned int type of value stored in b

**Returns** value stored in unsigned long long type

\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_SMULX16 (unsigned int a, unsigned int b) SMULX16 (SIMD Signed Crossed 16-bit Multiply)

Type: SIMD

#### Syntax:

```
SMUL16 Rd, Rs1, Rs2
SMULX16 Rd, Rs1, Rs2
```

Do signed 16-bit multiplications and generate two 32-bit results simultaneously.

# **RV32 Description:**

For the SMUL16 instruction, multiply the top 16-bit Q15 content of Rs1 with the top 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. For the SMULX16 instruction, multiply the top 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the top 16-bit Q15 content of Rs2. The two Q30 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even 2d register of the pair contains the 32-bit result calculated from the bottom part of Rs1.

### **RV64 Description:**

For the SMUL16 instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. For the SMULX16 instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. The two 32-bit Q30 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]

### **Operations:**

```
* RV32:
if (is `SMUL16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
 op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `SMULX16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
 op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
 res = aop s* bop;
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H] = rest;
R[t_L] = resb;
* RV64:
if (is `SMUL16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
 op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `SMULX16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
 op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
 res = aop s* bop;
Rd.W[1] = rest;
Rd.W[0] = resb;
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b [in]** unsigned int type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_UMUL16 (unsigned int a, unsigned int b) UMUL16 (SIMD Unsigned 16-bit Multiply)

Type: SIMD

## Syntax:

```
UMUL16 Rd, Rs1, Rs2
UMULX16 Rd, Rs1, Rs2
```

### Purpose:

Do unsigned 16-bit multiplications and generate two 32-bit results simultaneously.

#### **RV32 Description:**

For the UMUL16 instruction, multiply the top 16-bit U16 content of Rs1 with the top 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. For the UMULX16 instruction, multiply the top 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the top 16-bit U16 content of Rs2. The two U32 results are then written into an even/odd pair of registers specified by Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even 2d register of the pair contains the 32-bit result calculated from the bottom part of Rs1.

#### **RV64 Description:**

For the UMUL16 instruction, multiply the top 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs2. For the UMULX16 instruction, multiply the top 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. The two 32-bit U32 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]

#### **Operations:**

```
* RV32:
if (is `UMUL16`) {
  oplt = Rs1.H[1]; op2t = Rs2.H[1]; // top
  oplb = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `UMULX16`) {
  oplt = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
  oplb = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  res = aop u* bop;
}
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H] = rest;
R[t_L] = resb;
```

(continues on next page)

(continued from previous page)

```
* RV64:
if (is `UMUL16`) {
  oplt = Rs1.H[1]; op2t = Rs2.H[1]; // top
  oplb = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `UMULX16`) {
  oplt = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
  oplb = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  res = aop u* bop;
}
Rd.W[1] = rest;
Rd.W[0] = resb;
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long long type

```
__STATIC_FORCEINLINE unsigned long long __RV_UMULX16 (unsigned int a, unsigned int b)
    UMULX16 (SIMD Unsigned Crossed 16-bit Multiply)
```

Type: SIMD

# Syntax:

```
UMUL16 Rd, Rs1, Rs2
UMULX16 Rd, Rs1, Rs2
```

### Purpose:

Do unsigned 16-bit multiplications and generate two 32-bit results simultaneously.

### **RV32 Description:**

For the UMUL16 instruction, multiply the top 16-bit U16 content of Rs1 with the top 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. For the UMULX16 instruction, multiply the top 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the top 16-bit U16 content of Rs2. The two U32 results are then written into an even/odd pair of registers specified by Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even 2d register of the pair contains the 32-bit result calculated from the bottom part of Rs1.

### **RV64 Description:**

For the UMUL16 instruction, multiply the top 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. The two 32-bit U32 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]

### **Operations:**

```
* RV32:
if (is `UMUL16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
 op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `UMULX16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
 op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
 res = aop u* bop;
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H] = rest;
R[t_L] = resb;
* RV64:
if (is `UMUL16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
 op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `UMULX16`) {
 op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
 op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
 res = aop u* bop;
Rd.W[1] = rest;
Rd.W[0] = resb;
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long long type

# **SIMD 8-bit Multiply Instructions**

```
__STATIC_FORCEINLINE unsigned long __RV_KHM8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_KHMX8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long long __RV_SMUL8 (unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_SMULX8 (unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_UMUL8 (unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_UMULX8 (unsigned int a, unsigned int b)

group NMSIS_Core_DSP_Intrinsic_SIMD_8B_MULTIPLY

SIMD 8-bit Multiply Instructions.

there are 6 SIMD 8-bit Multiply instructions.
```

#### **Functions**

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KHM8 (unsigned long a, unsigned long b)
KHM8 (SIMD Signed Saturating Q7 Multiply)

Type: SIMD

#### Syntax:

```
KHM8 Rd, Rs1, Rs2
KHMX8 Rd, Rs1, Rs2
```

#### Purpose:

Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.

### **Description**:

For the KHM8 instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. For the KHMX16 instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. The Q14 results are then right-shifted 7-bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.

# **Operations:**

```
if (is `KHM8`) {
  op1t = Rs1.B[x+1]; op2t = Rs2.B[x+1]; // top
  op1b = Rs1.B[x]; op2b = Rs2.B[x]; // bottom
} else if (is `KHMX8`) {
  op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top
  op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  if (0x80 != aop | 0x80 != bop) {
    res = (aop s* bop) >> 7;
  } else {
    res= 0x7F;
    OV = 1;
  }
}
Rd.H[x/2] = concat(rest, resb);
for RV32, x=0,2
for RV64, x=0,2,4,6
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KHMX8 (unsigned long a, unsigned long b)
KHMX8 (SIMD Signed Saturating Crossed Q7 Multiply)

Type: SIMD

### **Syntax:**

```
KHM8 Rd, Rs1, Rs2
KHMX8 Rd, Rs1, Rs2
```

#### Purpose:

Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.

### **Description:**

For the KHM8 instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. For the KHMX16 instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. The Q14 results are then right-shifted 7-bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.

# **Operations:**

```
if (is `KHM8`) {
  op1t = Rs1.B[x+1]; op2t = Rs2.B[x+1]; // top
  op1b = Rs1.B[x]; op2b = Rs2.B[x]; // bottom
} else if (is `KHMX8`) {
  op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top
  op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  if (0x80 != aop | 0x80 != bop) {
    res = (aop s* bop) >> 7;
  } else {
    res= 0x7F;
    OV = 1;
  }
}
Rd.H[x/2] = concat(rest, resb);
for RV32, x=0,2
for RV64, x=0,2,4,6
```

# **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_SMUL8 (unsigned int a, unsigned int b) SMUL8 (SIMD Signed 8-bit Multiply)

Type: SIMD

# Syntax:

```
SMUL8 Rd, Rs1, Rs2
SMULX8 Rd, Rs1, Rs2
```

Do signed 8-bit multiplications and generate four 16-bit results simultaneously.

# **RV32 Description:**

For the SMUL8 instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8 instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.

### **RV64 Description:**

For the SMUL8 instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8 instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.

### **Operations:**

```
* RV32:
if (is `SMUT.8`) {
 op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
 op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `SMULX8`) {
 op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
 op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
rest[x/2] = op1t[x/2] s* op2t[x/2];
resb[x/2] = op1b[x/2] s* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1];
R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0];
x = 0 and 2
* RV64:
if (is `SMUL8`) {
 op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
 op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `SMULX8`) {
 op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
 op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
rest[x/2] = op1t[x/2] s* op2t[x/2];
resb[x/2] = op1b[x/2] s* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1];
Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0];
x = 0 and 2
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_SMULX8 (unsigned int a, unsigned int b) SMULX8 (SIMD Signed Crossed 8-bit Multiply)

Type: SIMD

#### Syntax:

```
SMUL8 Rd, Rs1, Rs2
SMULX8 Rd, Rs1, Rs2
```

#### Purpose:

Do signed 8-bit multiplications and generate four 16-bit results simultaneously.

# **RV32 Description:**

For the SMUL8 instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8 instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.

### **RV64 Description**:

For the SMUL8 instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8 instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.

### **Operations:**

```
* RV32:
if (is `SMUL8`) {
 oplt[x/2] = Rsl.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
 op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `SMULX8`) {
 op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
 op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
rest[x/2] = oplt[x/2] s* op2t[x/2];
resb[x/2] = op1b[x/2] s* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1];
R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0];
x = 0 and 2
* RV64:
if (is `SMUL8`) {
 op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
 op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `SMULX8`) {
 op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
 op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
```

(continues on next page)

(continued from previous page)

```
rest[x/2] = op1t[x/2] s* op2t[x/2];
resb[x/2] = op1b[x/2] s* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1];
Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0];
x = 0 and 2
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b [in]** unsigned int type of value stored in b

**Returns** value stored in unsigned long long type

```
__STATIC_FORCEINLINE unsigned long long __RV_UMUL8 (unsigned int a, unsigned int b) UMUL8 (SIMD Unsigned 8-bit Multiply)
```

Type: SIMD

# Syntax:

```
UMUL8 Rd, Rs1, Rs2
UMULX8 Rd, Rs1, Rs2
```

#### Purpose:

Do unsigned 8-bit multiplications and generate four 16-bit results simultaneously.

#### **RV32 Description**:

For the UMUL8 instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8 instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.

#### **RV64 Description:**

For the UMUL8 instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8 instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.

#### **Operations:**

```
* RV32:

if (is `UMUL8`) {

  op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top

  op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom

} else if (is `UMULX8`) {

  op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
```

(continues on next page)

(continued from previous page)

```
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
rest[x/2] = op1t[x/2] u* op2t[x/2];
resb[x/2] = op1b[x/2] u* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1];
R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0];
x = 0 and 2
* RV64:
if (is `UMUL8`) {
   op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
   op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `UMULX8`) {
   op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
   op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
rest[x/2] = op1t[x/2] u* op2t[x/2];
resb[x/2] = op1b[x/2] u* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1];
Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_UMULX8 (unsigned int a, unsigned int b)
 UMULX8 (SIMD Unsigned Crossed 8-bit Multiply)

Type: SIMD

#### Syntax:

```
UMUL8 Rd, Rs1, Rs2
UMULX8 Rd, Rs1, Rs2
```

#### **Purpose:**

Do unsigned 8-bit multiplications and generate four 16-bit results simultaneously.

### **RV32 Description**:

For the UMUL8 instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8 instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.

#### **RV64 Description:**

For the UMUL8 instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8 instruction, multiply the first and second unsigned

8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.

#### **Operations:**

```
* RV32:
if (is `UMUL8`) {
 op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
 op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `UMULX8`) {
 op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
 op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
rest[x/2] = op1t[x/2] u* op2t[x/2];
resb[x/2] = op1b[x/2] u* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1];
R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0];
x = 0 and 2
* RV64:
if (is `UMUL8`) {
   op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
   op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `UMULX8`) {
   op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
   op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
rest[x/2] = op1t[x/2] u* op2t[x/2];
resb[x/2] = op1b[x/2] u* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1];
Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b [in]** unsigned int type of value stored in b

**Returns** value stored in unsigned long long type

### SIMD 16-bit Miscellaneous Instructions

```
__STATIC_FORCEINLINE unsigned long __RV_CLRS16 (unsigned long a)

__STATIC_FORCEINLINE unsigned long __RV_CLO16 (unsigned long a)

__STATIC_FORCEINLINE unsigned long __RV_CLZ16 (unsigned long a)

__STATIC_FORCEINLINE unsigned long __RV_KABS16 (unsigned long a)

__STATIC_FORCEINLINE unsigned long __RV_SMAX16 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_SMIN16 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_UMAX16 (unsigned long a, unsigned long b)
```

```
__STATIC_FORCEINLINE unsigned long __RV_UMIN16 (unsigned long a, unsigned long b)
__RV_SCLIP16 (a, b)
__RV_UCLIP16 (a, b)
group NMSIS_Core_DSP_Intrinsic_SIMD_16B_MISC
    SIMD 16-bit Miscellaneous Instructions.
there are 10 SIMD 16-bit Misc instructions.

Defines

__RV_SCLIP16 (a, b)
    SCLIP16 (SIMD 16-bit Signed Clip Value)
    Type: SIMD
    Syntax:
```

SCLIP16 Rd, Rs1, imm4u[3:0]

Limit the 16-bit signed integer elements of a register into a signed range simultaneously.

# **Description**:

This instruction limits the 16-bit signed integer elements stored in Rs1 into a signed integer range between 2imm4u-1 and -2imm4u, and writes the limited results to Rd. For example, if imm4u is 3, the 16-bit input values should be saturated between 7 and -8. If saturation is performed, set OV bit to 1.

# **Operations**:

```
src = Rs1.H[x];
if (src > (2^imm4u)-1) {
    src = (2^imm4u)-1;
    OV = 1;
} else if (src < -2^imm4u) {
    src = -2^imm4u;
    OV = 1;
}
Rd.H[x] = src
for RV32: x=1...0,
for RV64: x=3...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned int type of value stored in b

Returns value stored in unsigned long type

```
UCLIP16 Rt, Ra, imm4u
```

Limit the 16-bit signed elements of a register into an unsigned range simultaneously.

#### **Description**:

This instruction limits the 16-bit signed elements stored in Rs1 into an unsigned integer range between 2imm4u-1 and 0, and writes the limited results to Rd. For example, if imm4u is 3, the 16-bit input values should be saturated between 7 and 0. If saturation is performed, set OV bit to 1.

# **Operations:**

```
src = Rs1.H[x];
if (src > (2^imm4u)-1) {
    src = (2^imm4u)-1;
    OV = 1;
} else if (src < 0) {
    src = 0;
    OV = 1;
}
Rd.H[x] = src;
for RV32: x=1...0,
for RV64: x=3...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

Returns value stored in unsigned long type

#### **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_CLRS16 (unsigned long a) CLRS16 (SIMD 16-bit Count Leading Redundant Sign)
```

Type: SIMD

# Syntax:

```
CLRS16 Rd, Rs1
```

#### **Purpose:**

Count the number of redundant sign bits of the 16-bit elements of a general register.

# **Description**:

Starting from the bits next to the sign bits of the 16-bit elements of Rs1, this instruction counts the number of redundant sign bits and writes the result to the corresponding 16- bit elements of Rd.

### **Operations:**

```
snum[x] = Rs1.H[x];
cnt[x] = 0;
for (i = 14 to 0) {
  if (snum[x](i) == snum[x](15)) {
```

(continues on next page)

(continued from previous page)

```
cnt[x] = cnt[x] + 1;
} else {
    break;
}
Rd.H[x] = cnt[x];
for RV32: x=1...0
for RV64: x=3...0
```

Parameters a – [in] unsigned long type of value stored in a

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_CLO16 (unsigned long a) CLO16 (SIMD 16-bit Count Leading One)
```

Type: SIMD

# Syntax:

```
CL016 Rd, Rs1
```

### Purpose:

Count the number of leading one bits of the 16-bit elements of a general register.

### **Description:**

Starting from the most significant bits of the 16-bit elements of Rs1, this instruction counts the number of leading one bits and writes the results to the corresponding 16-bit elements of Rd.

### **Operations:**

```
snum[x] = Rs1.H[x];
cnt[x] = 0;
for (i = 15 to 0) {
   if (snum[x](i) == 1) {
      cnt[x] = cnt[x] + 1;
   } else {
      break;
   }
}
Rd.H[x] = cnt[x];
for RV32: x=1...0
for RV64: x=3...0
```

Parameters a – [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_CLZ16 (unsigned long a) CLZ16 (SIMD 16-bit Count Leading Zero)
```

Type: SIMD

### Syntax:

```
CLZ16 Rd, Rs1
```

Count the number of leading zero bits of the 16-bit elements of a general register.

### **Description**:

Starting from the most significant bits of the 16-bit elements of Rs1, this instruction counts the number of leading zero bits and writes the results to the corresponding 16-bit elements of Rd.

## **Operations:**

```
snum[x] = Rs1.H[x];
cnt[x] = 0;
for (i = 15 to 0) {
   if (snum[x](i) == 0) {
     cnt[x] = cnt[x] + 1;
   } else {
     break;
   }
}
Rd.H[x] = cnt[x];
for RV32: x=1...0
for RV64: x=3...0
```

Parameters a – [in] unsigned long type of value stored in a

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_KABS16 (unsigned long a)
KABS16 (SIMD 16-bit Saturating Absolute)
```

Type: SIMD

## Syntax:

```
KABS16 Rd, Rs1
```

### Purpose:

Get the absolute value of 16-bit signed integer elements simultaneously.

#### **Description**:

This instruction calculates the absolute value of 16-bit signed integer elements stored in Rs1 and writes the element results to Rd. If the input number is 0x8000, this instruction generates 0x7fff as the output and sets the OV bit to 1.

#### **Operations:**

```
src = Rs1.H[x];
if (src == 0x8000) {
    src = 0x7fff;
    OV = 1;
} else if (src[15] == 1)
    src = -src;
}
Rd.H[x] = src;
for RV32: x=1...0,
for RV64: x=3...0
```

**Parameters**  $\mathbf{a} - [\mathbf{in}]$  unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SMAX16 (unsigned long a, unsigned long b) SMAX16 (SIMD 16-bit Signed Maximum)
```

Type: SIMD

### Syntax:

```
SMAX16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit signed integer elements finding maximum operations simultaneously.

# **Description**:

This instruction compares the 16-bit signed integer elements in Rs1 with the 16-bit signed integer elements in Rs2 and selects the numbers that is greater than the other one. The selected results are written to Rd.

### **Operations:**

```
Rd.H[x] = (Rs1.H[x] > Rs2.H[x])? Rs1.H[x] : Rs2.H[x];
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SMIN16 (unsigned long a, unsigned long b) SMIN16 (SIMD 16-bit Signed Minimum)
```

Type: SIMD

#### Syntax:

```
SMIN16 Rd, Rs1, Rs2
```

### Purpose:

Do 16-bit signed integer elements finding minimum operations simultaneously.

### **Description**:

This instruction compares the 16-bit signed integer elements in Rs1 with the 16-bit signed integer elements in Rs2 and selects the numbers that is less than the other one. The selected results are written to Rd.

## **Operations:**

```
Rd.H[x] = (Rs1.H[x] < Rs2.H[x])? Rs1.H[x] : Rs2.H[x];
for RV32: x=1...0,
for RV64: x=3...0
```

### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_UMAX16 (unsigned long a, unsigned long b) UMAX16 (SIMD 16-bit Unsigned Maximum)
```

Type: SIMD

### Syntax:

```
UMAX16 Rd, Rs1, Rs2
```

#### Purpose:

Do 16-bit unsigned integer elements finding maximum operations simultaneously.

# **Description**:

This instruction compares the 16-bit unsigned integer elements in Rs1 with the 16-bit unsigned integer elements in Rs2 and selects the numbers that is greater than the other one. The selected results are written to Rd.

### **Operations:**

```
Rd.H[x] = (Rs1.H[x] >u Rs2.H[x])? Rs1.H[x] : Rs2.H[x];
for RV32: x=1...0,
for RV64: x=3...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- b [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_UMIN16 (unsigned long a, unsigned long b)
UMIN16 (SIMD 16-bit Unsigned Minimum)
```

Type: SIMD

#### Syntax:

```
UMIN16 Rd, Rs1, Rs2
```

# Purpose:

Do 16-bit unsigned integer elements finding minimum operations simultaneously.

#### **Description**:

This instruction compares the 16-bit unsigned integer elements in Rs1 with the 16-bit unsigned integer elements in Rs2 and selects the numbers that is less than the other one. The selected results are written to Rd.

### **Operations:**

```
Rd.H[x] = (Rs1.H[x] <u Rs2.H[x])? Rs1.H[x] : Rs2.H[x];
for RV32: x=1...0,
for RV64: x=3...0
```

### **Parameters**

• a – [in] unsigned long type of value stored in a

• **b** – [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

#### SIMD 8-bit Miscellaneous Instructions

```
__STATIC_FORCEINLINE unsigned long __RV_CLRS8 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_CLO8 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_CLZ8 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_KABS8 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_SMAX8 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_SMIN8 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_UMAX8 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_UMIN8 (unsigned long a, unsigned long b)
__RV_SCLIP8 (a, b)
__RV_UCLIP8 (a, b)
group NMSIS_Core_DSP_Intrinsic_SIMD_8B_MISC
__SIMD_8-bit Miscellaneous Instructions.
```

there are 10 SIMD 8-bit Miscellaneous instructions.

### **Defines**

```
SCLIP8 (SIMD 8-bit Signed Clip Value)

Type: SIMD

Syntax:
```

```
SCLIP8 Rd, Rs1, imm3u[2:0]
```

# Purpose:

RV SCLIP8 (a, b)

Limit the 8-bit signed integer elements of a register into a signed range simultaneously.

# **Description**:

This instruction limits the 8-bit signed integer elements stored in Rs1 into a signed integer range between 2^imm3u-1 and -2^imm3u, and writes the limited results to Rd. For example, if imm3u is 3, the 8-bit input values should be saturated between 7 and -8. If saturation is performed, set OV bit to 1.

# **Operations**:

```
src = Rs1.B[x];
if (src > (2^imm3u) -1) {
    src = (2^imm3u) -1;
    OV = 1;
} else if (src < -2^imm3u) {
    src = -2^imm3u;
    OV = 1;
}</pre>
```

(continues on next page)

(continued from previous page)

```
Rd.B[x] = src

for RV32: x=3...0,

for RV64: x=7...0
```

### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned int type of value stored in b

Returns value stored in unsigned long type

```
\mathbf{RV}_{\mathbf{UCLIP8}}(a, b)
```

UCLIP8 (SIMD 8-bit Unsigned Clip Value)

Type: SIMD

# Syntax:

```
UCLIP8 Rt, Ra, imm3u
```

# Purpose:

Limit the 8-bit signed elements of a register into an unsigned range simultaneously.

### **Description**:

This instruction limits the 8-bit signed elements stored in Rs1 into an unsigned integer range between 2^imm3u-1 and 0, and writes the limited results to Rd. For example, if imm3u is 3, the 8- bit input values should be saturated between 7 and 0. If saturation is performed, set OV bit to 1.

# **Operations:**

```
src = Rs1.H[x];
if (src > (2^imm3u)-1) {
    src = (2^imm3u)-1;
    OV = 1;
} else if (src < 0) {
    src = 0;
    OV = 1;
}
Rd.H[x] = src;
for RV32: x=3...0,
for RV64: x=7...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned int type of value stored in b

Returns value stored in unsigned long type

#### **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_CLRS8 (unsigned long a) CLRS8 (SIMD 8-bit Count Leading Redundant Sign)
```

Type: SIMD

### Syntax:

```
CLRS8 Rd, Rs1
```

#### Purpose:

Count the number of redundant sign bits of the 8-bit elements of a general register.

# **Description**:

Starting from the bits next to the sign bits of the 8-bit elements of Rs1, this instruction counts the number of redundant sign bits and writes the result to the corresponding 8-bit elements of Rd.

### **Operations:**

```
snum[x] = Rs1.B[x];
cnt[x] = 0;
for (i = 6 to 0) {
   if (snum[x](i) == snum[x](7)) {
      cnt[x] = cnt[x] + 1;
   } else {
      break;
   }
}
Rd.B[x] = cnt[x];
for RV32: x=3...0
for RV64: x=7...0
```

Parameters a – [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_CLO8 (unsigned long a)
```

CLO8 (SIMD 8-bit Count Leading One)

Type: SIMD

# Syntax:

```
CL08 Rd, Rs1
```

### **Purpose:**

Count the number of leading one bits of the 8-bit elements of a general register.

# **Description**:

Starting from the most significant bits of the 8-bit elements of Rs1, this instruction counts the number of leading one bits and writes the results to the corresponding 8-bit elements of Rd.

## **Operations:**

```
snum[x] = Rs1.B[x];
cnt[x] = 0;
for (i = 7 to 0) {
```

(continues on next page)

(continued from previous page)

```
if (snum[x](i) == 1) {
   cnt[x] = cnt[x] + 1;
} else {
   break;
}
Rd.B[x] = cnt[x];
for RV32: x=3...0
for RV64: x=7...0
```

Parameters a – [in] unsigned long type of value stored in a

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_CLZ8 (unsigned long a) CLZ8 (SIMD 8-bit Count Leading Zero)
```

Type: SIMD

### Syntax:

```
CLZ8 Rd, Rs1
```

### Purpose:

Count the number of leading zero bits of the 8-bit elements of a general register.

### **Description**:

Starting from the most significant bits of the 8-bit elements of Rs1, this instruction counts the number of leading zero bits and writes the results to the corresponding 8-bit elements of Rd.

## **Operations:**

```
snum[x] = Rs1.B[x];
cnt[x] = 0;
for (i = 7 to 0) {
   if (snum[x](i) == 0) {
     cnt[x] = cnt[x] + 1;
   } else {
     break;
   }
}
Rd.B[x] = cnt[x];
for RV32: x=3...0
for RV64: x=7...0
```

Parameters a - [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_KABS8 (unsigned long a) KABS8 (SIMD 8-bit Saturating Absolute)
```

Type: SIMD

### Syntax:

```
KABS8 Rd, Rs1
```

Get the absolute value of 8-bit signed integer elements simultaneously.

### **Description**:

This instruction calculates the absolute value of 8-bit signed integer elements stored in Rs1 and writes the element results to Rd. If the input number is 0x80, this instruction generates 0x7f as the output and sets the OV bit to 1.

### **Operations:**

```
src = Rs1.B[x];
if (src == 0x80) {
    src = 0x7f;
    OV = 1;
} else if (src[7] == 1)
    src = -src;
}
Rd.B[x] = src;
for RV32: x=3...0,
for RV64: x=7...0
```

Parameters a - [in] unsigned long type of value stored in a

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SMAX8 (unsigned long a, unsigned long b) SMAX8 (SIMD 8-bit Signed Maximum)
```

Type: SIMD

# Syntax:

```
SMAX8 Rd, Rs1, Rs2
```

#### Purpose:

Do 8-bit signed integer elements finding maximum operations simultaneously.

## **Description**:

This instruction compares the 8-bit signed integer elements in Rs1 with the 8-bit signed integer elements in Rs2 and selects the numbers that is greater than the other one. The selected results are written to Rd.

#### **Operations:**

```
Rd.B[x] = (Rs1.B[x] > Rs2.B[x])? Rs1.B[x] : Rs2.B[x];
for RV32: x=3...0,
for RV64: x=7...0
```

# **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SMIN8 (unsigned long a, unsigned long b) SMIN8 (SIMD 8-bit Signed Minimum)
```

Type: SIMD

#### Syntax:

```
SMIN8 Rd, Rs1, Rs2
```

### **Purpose**:

Do 8-bit signed integer elements finding minimum operations simultaneously.

### **Description**:

This instruction compares the 8-bit signed integer elements in Rs1 with the 8-bit signed integer elements in Rs2 and selects the numbers that is less than the other one. The selected results are written to Rd.

### **Operations:**

```
Rd.B[x] = (Rs1.B[x] < Rs2.B[x])? Rs1.B[x] : Rs2.B[x];
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_UMAX8 (unsigned long a, unsigned long b)
UMAX8 (SIMD 8-bit Unsigned Maximum)
```

Type: SIMD

#### Syntax:

```
UMAX8 Rd, Rs1, Rs2
```

# Purpose:

Do 8-bit unsigned integer elements finding maximum operations simultaneously.

#### **Description**:

This instruction compares the 8-bit unsigned integer elements in Rs1 with the four 8- bit unsigned integer elements in Rs2 and selects the numbers that is greater than the other one. The two selected results are written to Rd.

### **Operations:**

```
Rd.B[x] = (Rs1.B[x] >u Rs2.B[x])? Rs1.B[x] : Rs2.B[x];
for RV32: x=3...0,
for RV64: x=7...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_UMIN8 (unsigned long a, unsigned long b)
UMIN8 (SIMD 8-bit Unsigned Minimum)
```

Type: SIMD

### Syntax:

```
UMIN8 Rd, Rs1, Rs2
```

### Purpose:

Do 8-bit unsigned integer elements finding minimum operations simultaneously.

### **Description**:

This instruction compares the 8-bit unsigned integer elements in Rs1 with the 8-bit unsigned integer elements in Rs2 and selects the numbers that is less than the other one. The selected results are written to Rd.

# **Operations:**

```
Rd.B[x] = (Rs1.B[x] <u Rs2.B[x])? Rs1.B[x] : Rs2.B[x];
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

Returns value stored in unsigned long type

## SIMD 8-bit Unpacking Instructions

there are 8 SIMD 8-bit Unpacking instructions.

```
__STATIC_FORCEINLINE unsigned long __RV_SUNPKD810 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_SUNPKD820 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_SUNPKD830 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_SUNPKD831 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_SUNPKD832 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_ZUNPKD810 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_ZUNPKD820 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_ZUNPKD830 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_ZUNPKD831 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_ZUNPKD831 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_ZUNPKD832 (unsigned long a)
```

#### **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_SUNPKD810 (unsigned long a) SUNPKD810 (Signed Unpacking Bytes 1 & 0)
```

Type: DSP
Syntax:

```
SUNPKD8xy Rd, Rs1
xy = {10, 20, 30, 31, 32}
```

#### Purpose:

Unpack byte *x and byte y* of 32-bit chunks in a register into two 16-bit signed halfwords of 32-bit chunks in a register.

### **Description**:

For the SUNPKD8 (x) (\*y\*) instruction, it unpacks byte x and byte y of 32-bit chunks in Rs1 into two 16-bit signed halfwords and writes the results to the top part and the bottom part of 32-bit chunks in Rd.

# **Operations:**

Parameters a - [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SUNPKD820 (unsigned long a) SUNPKD820 (Signed Unpacking Bytes 2 & 0)
```

Type: DSP

### Syntax:

```
SUNPKD8xy Rd, Rs1
xy = {10, 20, 30, 31, 32}
```

# Purpose:

Unpack byte *x and byte y* of 32-bit chunks in a register into two 16-bit signed halfwords of 32-bit chunks in a register.

### **Description:**

For the SUNPKD8 (x) (\*y\*) instruction, it unpacks byte *x* and byte *y* of 32-bit chunks in Rs1 into two 16-bit signed halfwords and writes the results to the top part and the bottom part of 32-bit chunks in Rd.

#### **Operations:**

```
Rd.W[m].H[1] = SE16(Rs1.W[m].B[x])
Rd.W[m].H[0] = SE16(Rs1.W[m].B[y])

// SUNPKD810, x=1,y=0

// SUNPKD820, x=2,y=0

// SUNPKD830, x=3,y=0

// SUNPKD831, x=3,y=1

// SUNPKD832, x=3,y=2

for RV32: m=0,
for RV64: m=1...0
```

**Parameters a** – [in] unsigned long type of value stored in a

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SUNPKD830 (unsigned long a) SUNPKD830 (Signed Unpacking Bytes 3 & 0)
```

Type: DSP

## Syntax:

```
SUNPKD8xy Rd, Rs1
xy = {10, 20, 30, 31, 32}
```

## Purpose:

Unpack byte *x and byte y* of 32-bit chunks in a register into two 16-bit signed halfwords of 32-bit chunks in a register.

#### **Description**:

For the SUNPKD8 (x) (\*y\*) instruction, it unpacks byte x and byte y of 32-bit chunks in Rs1 into two 16-bit signed halfwords and writes the results to the top part and the bottom part of 32-bit chunks in Rd.

#### **Operations:**

```
Rd.W[m].H[1] = SE16(Rs1.W[m].B[x])
Rd.W[m].H[0] = SE16(Rs1.W[m].B[y])
// SUNPKD810, x=1,y=0
// SUNPKD820, x=2,y=0
// SUNPKD830, x=3,y=0
// SUNPKD831, x=3,y=1
// SUNPKD832, x=3,y=2
for RV32: m=0,
for RV64: m=1...0
```

**Parameters a** – [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SUNPKD831 (unsigned long a) SUNPKD831 (Signed Unpacking Bytes 3 & 1)
```

Type: DSP

## Syntax:

```
SUNPKD8xy Rd, Rs1
xy = {10, 20, 30, 31, 32}
```

#### Purpose:

Unpack byte *x and byte y* of 32-bit chunks in a register into two 16-bit signed halfwords of 32-bit chunks in a register.

## **Description:**

For the SUNPKD8 (x) ( $\star$ y $\star$ ) instruction, it unpacks byte *x* and byte *y* of 32-bit chunks in Rs1 into two 16-bit signed halfwords and writes the results to the top part and the bottom part of 32-bit chunks in Rd.

#### **Operations:**

```
Rd.W[m].H[1] = SE16(Rs1.W[m].B[x])
Rd.W[m].H[0] = SE16(Rs1.W[m].B[y])

// SUNPKD810, x=1,y=0

// SUNPKD820, x=2,y=0

// SUNPKD830, x=3,y=0

// SUNPKD831, x=3,y=1

// SUNPKD832, x=3,y=2

for RV32: m=0,
for RV64: m=1...0
```

Parameters a - [in] unsigned long type of value stored in a

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_SUNPKD832 (unsigned long a) SUNPKD832 (Signed Unpacking Bytes 3 & 2)
```

Type: DSP Syntax:

```
SUNPKD8xy Rd, Rs1
xy = {10, 20, 30, 31, 32}
```

#### Purpose:

Unpack byte *x and byte y* of 32-bit chunks in a register into two 16-bit signed halfwords of 32-bit chunks in a register.

## **Description**:

For the SUNPKD8 (x) ( $\star$ y $\star$ ) instruction, it unpacks byte *x* and byte *y* of 32-bit chunks in Rs1 into two 16-bit signed halfwords and writes the results to the top part and the bottom part of 32-bit chunks in Rd.

#### **Operations:**

```
Rd.W[m].H[1] = SE16(Rs1.W[m].B[x])
Rd.W[m].H[0] = SE16(Rs1.W[m].B[y])
// SUNPKD810, x=1,y=0
// SUNPKD820, x=2,y=0
// SUNPKD830, x=3,y=0
// SUNPKD831, x=3,y=1
// SUNPKD832, x=3,y=2
for RV32: m=0,
for RV64: m=1...0
```

Parameters a – [in] unsigned long type of value stored in a

Returns value stored in unsigned long type

# \_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_ZUNPKD810 (unsigned long a) ZUNPKD810 (Unsigned Unpacking Bytes 1 & 0)

Type: DSP Syntax:

```
ZUNPKD8xy Rd, Rs1
xy = {10, 20, 30, 31, 32}
```

## Purpose:

Unpack byte x and byte y of 32-bit chunks in a register into two 16-bit unsigned halfwords of 32-bit chunks in a register.

## **Description**:

For the ZUNPKD8 (x) ( $\star$ y $\star$ ) instruction, it unpacks byte *x* and byte *y* of 32-bit chunks in Rs1 into two 16-bit unsigned halfwords and writes the results to the top part and the bottom part of 32-bit chunks in Rd.

## **Operations:**

Parameters a - [in] unsigned long type of value stored in a

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_ZUNPKD820 (unsigned long a) ZUNPKD820 (Unsigned Unpacking Bytes 2 & 0)
```

Type: DSP

### Syntax:

```
ZUNPKD8xy Rd, Rs1
xy = {10, 20, 30, 31, 32}
```

## Purpose:

Unpack byte x and byte y of 32-bit chunks in a register into two 16-bit unsigned halfwords of 32-bit chunks in a register.

## **Description**:

For the ZUNPKD8 (x) ( $\star$ y $\star$ ) instruction, it unpacks byte *x* and byte *y* of 32-bit chunks in Rs1 into two 16-bit unsigned halfwords and writes the results to the top part and the bottom part of 32-bit chunks in Rd.

#### **Operations:**

```
Rd.W[m].H[1] = ZE16(Rs1.W[m].B[x])
Rd.W[m].H[0] = ZE16(Rs1.W[m].B[y])
// ZUNPKD810, x=1,y=0
// ZUNPKD820, x=2,y=0
```

(continues on next page)

```
// ZUNPKD830, x=3,y=0

// ZUNPKD831, x=3,y=1

// ZUNPKD832, x=3,y=2

for RV32: m=0,

for RV64: m=1...0
```

Parameters a – [in] unsigned long type of value stored in a

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_ZUNPKD830 (unsigned long a) ZUNPKD830 (Unsigned Unpacking Bytes 3 & 0)
```

Type: DSP Syntax:

```
ZUNPKD8xy Rd, Rs1
xy = {10, 20, 30, 31, 32}
```

### **Purpose**:

Unpack byte x and byte y of 32-bit chunks in a register into two 16-bit unsigned halfwords of 32-bit chunks in a register.

#### **Description**:

For the ZUNPKD8 (x) ( $\star$ y $\star$ ) instruction, it unpacks byte *x* and byte *y* of 32-bit chunks in Rs1 into two 16-bit unsigned halfwords and writes the results to the top part and the bottom part of 32-bit chunks in Rd.

## **Operations:**

Parameters a - [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_ZUNPKD831 (unsigned long a) ZUNPKD831 (Unsigned Unpacking Bytes 3 & 1)
```

Type: DSP

## Syntax:

```
ZUNPKD8xy Rd, Rs1
xy = {10, 20, 30, 31, 32}
```

## Purpose:

Unpack byte x and byte y of 32-bit chunks in a register into two 16-bit unsigned halfwords of 32-bit chunks in a register.

## **Description**:

For the ZUNPKD8 (x) ( $\star y \star$ ) instruction, it unpacks byte *x* and byte *y* of 32-bit chunks in Rs1 into two 16-bit unsigned halfwords and writes the results to the top part and the bottom part of 32-bit chunks in Rd.

## **Operations:**

```
Rd.W[m].H[1] = ZE16(Rs1.W[m].B[x])
Rd.W[m].H[0] = ZE16(Rs1.W[m].B[y])
// ZUNPKD810, x=1,y=0
// ZUNPKD820, x=2,y=0
// ZUNPKD830, x=3,y=0
// ZUNPKD831, x=3,y=1
// ZUNPKD832, x=3,y=2
for RV32: m=0,
for RV64: m=1...0
```

**Parameters**  $\mathbf{a} - [\mathbf{in}]$  unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_ZUNPKD832 (unsigned long a) ZUNPKD832 (Unsigned Unpacking Bytes 3 & 2)
```

Type: DSP

## Syntax:

```
ZUNPKD8xy Rd, Rs1
xy = {10, 20, 30, 31, 32}
```

## Purpose:

Unpack byte x and byte y of 32-bit chunks in a register into two 16-bit unsigned halfwords of 32-bit chunks in a register.

#### **Description**:

For the ZUNPKD8 (x) (\*y\*) instruction, it unpacks byte x and byte y of 32-bit chunks in Rs1 into two 16-bit unsigned halfwords and writes the results to the top part and the bottom part of 32-bit chunks in Rd.

## **Operations:**

Parameters a – [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
group NMSIS_Core_DSP_Intrinsic_SIMD_DATA_PROCESS SIMD Data Processing Instructions.
```

#### **Non-SIMD Instructions**

#### Non-SIMD Q15 saturation ALU Instructions

```
__STATIC_FORCEINLINE long __RV_KADDH (int a, int b)

__STATIC_FORCEINLINE long __RV_KHMBB (unsigned int a, unsigned int b)

__STATIC_FORCEINLINE long __RV_KHMBT (unsigned int a, unsigned int b)

__STATIC_FORCEINLINE long __RV_KHMTT (unsigned int a, unsigned int b)

__STATIC_FORCEINLINE long __RV_KSUBH (int a, int b)

__STATIC_FORCEINLINE unsigned long __RV_UKADDH (unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long __RV_UKSUBH (unsigned int a, unsigned int b)

group NMSIS_Core_DSP_Intrinsic_NON_SIMD_Q15_SAT_ALU

Non-SIMD Q15 saturation ALU Instructions.

there are 7 Non-SIMD Q15 saturation ALU Instructions
```

#### **Functions**

```
__STATIC_FORCEINLINE long __RV_KADDH (int a, int b)

KADDH (Signed Addition with Q15 Saturation)
```

Type: DSP

#### Syntax:

```
KADDH Rd, Rs1, Rs2
```

## Purpose:

Add the signed lower 32-bit content of two registers with Q15 saturation.

### **Description**:

The signed lower 32-bit content of Rs1 is added with the signed lower 32-bit content of Rs2. And the result is saturated to the 16-bit signed integer range of [-2^15, 2^15-1] and then sign- extended and written to Rd. If saturation happens, this instruction sets the OV flag.

## **Operations:**

```
tmp = Rs1.W[0] + Rs2.W[0];
if (tmp > 32767) {
  res = 32767;
  OV = 1;
} else if (tmp < -32768) {
  res = -32768;
  OV = 1
} else {
  res = tmp;
}
Rd = SE(tmp[15:0]);</pre>
```

#### **Parameters**

• a – [in] int type of value stored in a

• **b** – [in] int type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KHMBB (unsigned int a, unsigned int b) KHMBB (Signed Saturating Half Multiply B16 x B16)
```

Type: DSP

### Syntax:

```
KHMxy Rd, Rs1, Rs2 (xy = BB, BT, TT)
```

### Purpose:

Multiply the signed Q15 number contents of two 16-bit data in the corresponding portion of the lower 32-bit chunk in registers and then right-shift 15 bits to turn the Q30 result into a Q15 number again and saturate the Q15 result into the destination register. If saturation happens, an overflow flag OV will be set.

#### **Description**:

Multiply the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs1 with the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs2. The Q30 result is then right- shifted 15-bits and saturated into a Q15 value. The Q15 value is then sing-extended and written into Rd. When both the two Q15 inputs are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.

## **Operations:**

```
aop = Rs1.H[0]; bop = Rs2.H[0]; // KHMBB
aop = Rs1.H[0]; bop = Rs2.H[1]; // KHMBT
aop = Rs1.H[1]; bop = Rs2.H[1]; // KHMTT
If (0x8000 != aop | 0x8000 != bop) {
    Mresult[31:0] = aop * bop;
    res[15:0] = Mresult[30:15];
} else {
    res[15:0] = 0x7FFF;
    OV = 1;
}
Rd = SE32(res[15:0]); // Rv32
Rd = SE64(res[15:0]); // RV64
```

## **Parameters**

- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long ___RV_KHMBT (unsigned int a, unsigned int b) KHMBT (Signed Saturating Half Multiply B16 x T16)
```

Type: DSP

#### Syntax:

```
KHMxy Rd, Rs1, Rs2 (xy = BB, BT, TT)
```

## Purpose:

Multiply the signed Q15 number contents of two 16-bit data in the corresponding portion of the lower 32-bit chunk in registers and then right-shift 15 bits to turn the Q30 result into a Q15 number again and saturate the Q15 result into the destination register. If saturation happens, an overflow flag OV will be set.

#### **Description**:

Multiply the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs1 with the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs2. The Q30 result is then right- shifted 15-bits and saturated into a Q15 value. The Q15 value is then sing-extended and written into Rd. When both the two Q15 inputs are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.

## **Operations:**

```
aop = Rs1.H[0]; bop = Rs2.H[0]; // KHMBB
aop = Rs1.H[0]; bop = Rs2.H[1]; // KHMBT
aop = Rs1.H[1]; bop = Rs2.H[1]; // KHMTT
If (0x8000 != aop | 0x8000 != bop) {
    Mresult[31:0] = aop * bop;
    res[15:0] = Mresult[30:15];
} else {
    res[15:0] = 0x7FFF;
    OV = 1;
}
Rd = SE32(res[15:0]); // Rv32
Rd = SE64(res[15:0]); // RV64
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in long type

```
_STATIC_FORCEINLINE long __RV_KHMTT (unsigned int a, unsigned int b)
KHMTT (Signed Saturating Half Multiply T16 x T16)
```

Type: DSP

#### Syntax:

```
KHMxy Rd, Rs1, Rs2 (xy = BB, BT, TT)
```

## Purpose:

Multiply the signed Q15 number contents of two 16-bit data in the corresponding portion of the lower 32-bit chunk in registers and then right-shift 15 bits to turn the Q30 result into a Q15 number again and saturate the Q15 result into the destination register. If saturation happens, an overflow flag OV will be set.

## **Description**:

Multiply the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs1 with the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs2. The Q30 result is then right- shifted 15-bits and saturated into a Q15 value. The Q15 value is then sing-extended and written into Rd. When both the two Q15 inputs are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.

## **Operations:**

```
aop = Rs1.H[0]; bop = Rs2.H[0]; // KHMBB
aop = Rs1.H[0]; bop = Rs2.H[1]; // KHMBT
aop = Rs1.H[1]; bop = Rs2.H[1]; // KHMTT
If (0x8000 != aop | 0x8000 != bop) {
   Mresult[31:0] = aop * bop;
   res[15:0] = Mresult[30:15];
} else {
   res[15:0] = 0x7FFF;
   OV = 1;
}
Rd = SE32(res[15:0]); // Rv32
Rd = SE64(res[15:0]); // RV64
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KSUBH (int a, int b)
```

KSUBH (Signed Subtraction with Q15 Saturation)

Type: DSP

## Syntax:

```
KSUBH Rd, Rs1, Rs2
```

#### Purpose:

Subtract the signed lower 32-bit content of two registers with Q15 saturation.

## **Description**:

The signed lower 32-bit content of Rs2 is subtracted from the signed lower 32-bit content of Rs1. And the result is saturated to the 16-bit signed integer range of [-2^15, 2^15-1] and then sign-extended and written to Rd. If saturation happens, this instruction sets the OV flag.

## **Operations:**

```
tmp = Rs1.W[0] - Rs2.W[0];
if (tmp > (2^15)-1) {
   res = (2^15)-1;
   OV = 1;
} else if (tmp < -2^15) {
   res = -2^15;
   OV = 1
} else {
   res = tmp;
}
Rd = SE(res[15:0]);</pre>
```

## **Parameters**

- a [in] int type of value stored in a
- **b** [in] int type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UKADDH (unsigned int a, unsigned int b) UKADDH (Unsigned Addition with U16 Saturation)

Type: DSP Syntax:

```
UKADDH Rd, Rs1, Rs2
```

#### Purpose:

Add the unsigned lower 32-bit content of two registers with U16 saturation.

#### **Description**:

The unsigned lower 32-bit content of Rs1 is added with the unsigned lower 32-bit content of Rs2. And the result is saturated to the 16-bit unsigned integer range of [0, 2^16-1] and then sign-extended and written to Rd. If saturation happens, this instruction sets the OV flag.

#### **Operations:**

```
tmp = Rs1.W[0] + Rs2.W[0];
if (tmp > (2^16)-1) {
  tmp = (2^16)-1;
  OV = 1;
}
Rd = SE(tmp[15:0]);
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UKSUBH (unsigned int a, unsigned int b) UKSUBH (Unsigned Subtraction with U16 Saturation)

Type: DSP

## Syntax:

```
UKSUBH Rd, Rs1, Rs2
```

## Purpose:

Subtract the unsigned lower 32-bit content of two registers with U16 saturation.

#### **Description**:

The unsigned lower 32-bit content of Rs2 is subtracted from the unsigned lower 32-bit content of Rs1. And the result is saturated to the 16-bit unsigned integer range of [0, 2^16-1] and then sign-extended and written to Rd. If saturation happens, this instruction sets the OV flag.

## **Operations:**

```
tmp = Rs1.W[0] - Rs2.W[0];
if (tmp > (2^16)-1) {
  tmp = (2^16)-1;
  OV = 1;
}
else if (tmp < 0) {</pre>
```

(continues on next page)

```
tmp = 0;

OV = 1;

}

Rd = SE(tmp[15:0]);
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

#### Non-SIMD Q31 saturation ALU Instructions

```
__STATIC_FORCEINLINE unsigned long __RV_KABSW (signed long a)
 STATIC FORCEINLINE long RV KADDW (int a, int b)
__STATIC_FORCEINLINE long __RV_KDMBB (unsigned int a, unsigned int b)
__STATIC_FORCEINLINE long __RV_KDMBT (unsigned int a, unsigned int b)
 _STATIC_FORCEINLINE long ___RV_KDMTT (unsigned int a, unsigned int b)
__STATIC_FORCEINLINE long __RV_KDMABB (long t, unsigned int a, unsigned int b)
__STATIC_FORCEINLINE long __RV_KDMABT (long t, unsigned int a, unsigned int b)
__STATIC_FORCEINLINE long __RV_KDMATT (long t, unsigned int a, unsigned int b)
__STATIC_FORCEINLINE long __RV_KSLLW (long a, unsigned int b)
 _STATIC_FORCEINLINE long __RV_KSLRAW (int a, int b)
__STATIC_FORCEINLINE long __RV_KSLRAW_U (int a, int b)
__STATIC_FORCEINLINE long __RV_KSUBW (int a, int b)
__STATIC_FORCEINLINE unsigned long __RV_UKADDW (unsigned int a, unsigned int b)
__STATIC_FORCEINLINE unsigned long __RV_UKSUBW (unsigned int a, unsigned int b)
\mathbf{RV}_{\mathbf{KSLLIW}}(a, b)
group NMSIS_Core_DSP_Intrinsic_NON_SIMD_Q31_SAT_ALU
    Non-SIMD Q31 saturation ALU Instructions.
    there are Non-SIMD Q31 saturation ALU Instructions
    Defines
     RV KSLLIW (a, b)
        KSLLIW (Saturating Shift Left Logical Immediate for Word)
        Type: DSP
        Syntax:
        KSLLIW Rd, Rs1, imm5u
```

#### Purpose:

Do logical left shift operation with saturation on a 32-bit word. The shift amount is an immediate value.

## **Description**:

The first word data in Rs1 is left-shifted logically. The shifted out bits are filled with zero and the shift amount is specified by the imm5u constant. Any shifted value greater than 2^31-1 is saturated to 2^31-1. Any shifted value smaller than -2^31 is saturated to -2^31. And the saturated result is sign-extended and written to Rd. If any saturation is performed, set OV bit to 1.

## **Operations:**

```
sa = imm5u;
res[(31+sa):0] = Rs1.W[0] << sa;
if (res > (2^31)-1) {
   res = 0x7ffffffff; OV = 1;
} else if (res < -2^31) {
   res = 0x80000000; OV = 1;
}
Rd[31:0] = res[31:0]; // RV32
Rd[63:0] = SE(res[31:0]); // RV64</pre>
```

#### **Parameters**

- a [in] long type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in long type

#### **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_KABSW (signed long a)
```

KABSW (Scalar 32-bit Absolute Value with Saturation)

Type: DSP Syntax:

```
KABSW Rd, Rs1
```

## Purpose:

Get the absolute value of a signed 32-bit integer in a general register.

#### **Description**:

This instruction calculates the absolute value of a signed 32-bit integer stored in Rs1. The result is sign-extended (for RV64) and written to Rd. This instruction with the minimum negative integer input of 0x80000000 will produce a saturated output of maximum positive integer of 0x7ffffffff and the OV flag will be set to 1.

## **Operations:**

```
if (Rs1.W[0] >= 0) {
   res = Rs1.W[0];
} else {
   If (Rs1.W[0] == 0x80000000) {
     res = 0x7ffffffff;
}
```

(continues on next page)

```
OV = 1;
} else {
  res = -Rs1.W[0];
}
Rd = SE32(res);
```

Parameters a - [in] signed long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE long __RV_KADDW (int a, int b)
```

KADDW (Signed Addition with Q31 Saturation)

Type: DSP

Syntax:

```
KADDW Rd, Rs1, Rs2
```

# Purpose :

Add the lower 32-bit signed content of two registers with Q31 saturation.

## **Description**:

The lower 32-bit signed content of Rs1 is added with the lower 32-bit signed content of Rs2. And the result is saturated to the 32-bit signed integer range of [-2^31, 2^31-1] and then sign- extended and written to Rd. If saturation happens, this instruction sets the OV flag.

## **Operations:**

```
tmp = Rs1.W[0] + Rs2.W[0];
if (tmp > (2^31)-1) {
   res = (2^31)-1;
   OV = 1;
} else if (tmp < -2^31) {
   res = -2^31;
   OV = 1
} else {
   res = tmp;
}
Rd = res[31:0]; // RV32
Rd = SE(res[31:0]) // RV64</pre>
```

### **Parameters**

- a [in] int type of value stored in a
- **b** [in] int type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KDMBB (unsigned int a, unsigned int b) KDMBB (Signed Saturating Double Multiply B16 x B16)
```

Type: DSP

Syntax:

```
KDMxy Rd, Rs1, Rs2 (xy = BB, BT, TT)
```

## Purpose:

Multiply the signed Q15 integer contents of two 16-bit data in the corresponding portion of the lower 32-bit chunk in registers and then double and saturate the Q31 result. The result is written into the destination register for RV32 or sign-extended to 64-bits and written into the destination register for RV64. If saturation happens, an overflow flag OV will be set.

#### **Description**:

Multiply the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs1 with the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs2. The Q30 result is then doubled and saturated into a Q31 value. The Q31 value is then written into Rd (sign-extended in RV64). When both the two Q15 inputs are 0x8000, saturation will happen. The result will be saturated to 0x7FFFFFFF and the overflow flag OV will be set.

### **Operations:**

```
aop = Rs1.H[0]; bop = Rs2.H[0]; // KDMBB
aop = Rs1.H[0]; bop = Rs2.H[1]; // KDMBT
aop = Rs1.H[1]; bop = Rs2.H[1]; // KDMTT

If (0x8000 != aop | 0x8000 != bop) {
    Mresult = aop * bop;
    resQ31 = Mresult << 1;
    Rd = resQ31; // RV32
    Rd = SE(resQ31); // RV64
} else {
    resQ31 = 0x7FFFFFFF;
    Rd = resQ31; // RV32
    Rd = SE(resQ31); // RV64
    OV = 1;
}</pre>
```

## **Parameters**

- a [in] unsigned int type of value stored in a
- **b** − [in] unsigned int type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KDMBT (unsigned int a, unsigned int b)
KDMBT (Signed Saturating Double Multiply B16 x T16)
```

Type: DSP

## Syntax:

```
KDMxy Rd, Rs1, Rs2 (xy = BB, BT, TT)
```

## Purpose:

Multiply the signed Q15 integer contents of two 16-bit data in the corresponding portion of the lower 32-bit chunk in registers and then double and saturate the Q31 result. The result is written into the destination register for RV32 or sign-extended to 64-bits and written into the destination register for RV64. If saturation happens, an overflow flag OV will be set.

## **Description:**

Multiply the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs1 with the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs2. The Q30 result is then doubled and saturated into a Q31 value. The Q31 value is then written into Rd (sign-extended in RV64). When both the two Q15 inputs are 0x8000, saturation will happen. The result will be saturated to 0x7FFFFFFF and the overflow flag OV will be set.

#### **Operations:**

```
aop = Rs1.H[0]; bop = Rs2.H[0]; // KDMBB
aop = Rs1.H[0]; bop = Rs2.H[1]; // KDMBT
aop = Rs1.H[1]; bop = Rs2.H[1]; // KDMTT
If (0x8000 != aop | 0x8000 != bop) {
    Mresult = aop * bop;
    resQ31 = Mresult << 1;
    Rd = resQ31; // RV32
    Rd = SE(resQ31); // RV64
} else {
    resQ31 = 0x7FFFFFFF;
    Rd = resQ31; // RV32
    Rd = SE(resQ31); // RV64
    OV = 1;
}</pre>
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KDMTT (unsigned int a, unsigned int b)
KDMTT (Signed Saturating Double Multiply T16 x T16)
```

Type: DSP

Syntax:

```
KDMxy Rd, Rs1, Rs2 (xy = BB, BT, TT)
```

#### Purpose:

Multiply the signed Q15 integer contents of two 16-bit data in the corresponding portion of the lower 32-bit chunk in registers and then double and saturate the Q31 result. The result is written into the destination register for RV32 or sign-extended to 64-bits and written into the destination register for RV64. If saturation happens, an overflow flag OV will be set.

## **Description**:

Multiply the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs1 with the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs2. The Q30 result is then doubled and saturated into a Q31 value. The Q31 value is then written into Rd (sign-extended in RV64). When both the two Q15 inputs are 0x8000, saturation will happen. The result will be saturated to 0x7FFFFFFF and the overflow flag OV will be set.

#### **Operations:**

```
aop = Rs1.H[0]; bop = Rs2.H[0]; // KDMBB
aop = Rs1.H[0]; bop = Rs2.H[1]; // KDMBT
aop = Rs1.H[1]; bop = Rs2.H[1]; // KDMTT
```

(continues on next page)

```
If (0x8000 != aop | 0x8000 != bop) {
   Mresult = aop * bop;
   resQ31 = Mresult << 1;
   Rd = resQ31; // RV32
   Rd = SE(resQ31); // RV64
} else {
   resQ31 = 0x7FFFFFFF;
   Rd = resQ31; // RV32
   Rd = SE(resQ31); // RV64
   OV = 1;
}</pre>
```

### **Parameters**

- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KDMABB (long t, unsigned int a, unsigned int b) KDMABB (Signed Saturating Double Multiply Addition B16 x B16)

Type: DSP

#### Syntax:

```
KDMAxy Rd, Rs1, Rs2 (xy = BB, BT, TT)
```

### Purpose:

Multiply the signed Q15 integer contents of two 16-bit data in the corresponding portion of the lower 32-bit chunk in registers and then double and saturate the Q31 result, add the result with the sign-extended lower 32-bit chunk destination register and write the saturated addition result into the destination register. If saturation happens, an overflow flag OV will be set.

## **Description**:

Multiply the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs1 with the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs2. The Q30 result is then doubled and saturated into a Q31 value. The Q31 value is then added with the content of Rd. If the addition result is beyond the Q31 number range ( $-2^31 \le 2^31-1$ ), it is saturated to the range and the OV flag is set to 1. The result after saturation is written to Rd. When both the two Q15 inputs are 0x8000, saturation will happen and the overflow flag OV will be set.

## **Operations:**

```
aop = Rs1.H[0]; bop = Rs2.H[0]; // KDMABB
aop = Rs1.H[0]; bop = Rs2.H[1]; // KDMABT
aop = Rs1.H[1]; bop = Rs2.H[1]; // KDMATT
If (0x8000 != aop | 0x8000 != bop) {
   Mresult = aop * bop;
   resQ31 = Mresult << 1;
} else {
   resQ31 = 0x7FFFFFFF;
   OV = 1;
}
resadd = Rd + resQ31; // RV32
resadd = Rd.W[0] + resQ31; // RV64</pre>
```

(continues on next page)

```
if (resadd > (2^31)-1) {
   resadd = (2^31)-1;
   OV = 1;
} else if (resadd < -2^31) {
   resadd = -2^31;
   OV = 1;
}
Rd = resadd; // RV32
Rd = SE(resadd); // RV64</pre>
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KDMABT (long t, unsigned int a, unsigned int b) KDMABT (Signed Saturating Double Multiply Addition B16 x T16)

Type: DSP

## Syntax:

```
KDMAxy Rd, Rs1, Rs2 (xy = BB, BT, TT)
```

#### **Purpose**:

Multiply the signed Q15 integer contents of two 16-bit data in the corresponding portion of the lower 32-bit chunk in registers and then double and saturate the Q31 result, add the result with the sign-extended lower 32-bit chunk destination register and write the saturated addition result into the destination register. If saturation happens, an overflow flag OV will be set.

#### **Description**:

Multiply the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs1 with the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs2. The Q30 result is then doubled and saturated into a Q31 value. The Q31 value is then added with the content of Rd. If the addition result is beyond the Q31 number range ( $-2^31 \le 2^31 \le 2^31$ ), it is saturated to the range and the OV flag is set to 1. The result after saturation is written to Rd. When both the two Q15 inputs are 0x8000, saturation will happen and the overflow flag OV will be set.

## **Operations:**

```
aop = Rs1.H[0]; bop = Rs2.H[0]; // KDMABB
aop = Rs1.H[0]; bop = Rs2.H[1]; // KDMABT
aop = Rs1.H[1]; bop = Rs2.H[1]; // KDMATT
If (0x80000!= aop | 0x80000!= bop) {
   Mresult = aop * bop;
   resQ31 = Mresult << 1;
} else {
   resQ31 = 0x7FFFFFFF;
   OV = 1;
}
resadd = Rd + resQ31; // RV32
resadd = Rd.W[0] + resQ31; // RV64</pre>
```

(continues on next page)

```
if (resadd > (2^31)-1) {
   resadd = (2^31)-1;
   OV = 1;
} else if (resadd < -2^31) {
   resadd = -2^31;
   OV = 1;
}
Rd = resadd; // RV32
Rd = SE(resadd); // RV64</pre>
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KDMATT (long t, unsigned int a, unsigned int b) KDMATT (Signed Saturating Double Multiply Addition T16 x T16)

Type: DSP

## Syntax:

```
KDMAxy Rd, Rs1, Rs2 (xy = BB, BT, TT)
```

#### **Purpose**:

Multiply the signed Q15 integer contents of two 16-bit data in the corresponding portion of the lower 32-bit chunk in registers and then double and saturate the Q31 result, add the result with the sign-extended lower 32-bit chunk destination register and write the saturated addition result into the destination register. If saturation happens, an overflow flag OV will be set.

#### **Description**:

Multiply the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs1 with the top or bottom 16-bit Q15 content of the lower 32-bit portion in Rs2. The Q30 result is then doubled and saturated into a Q31 value. The Q31 value is then added with the content of Rd. If the addition result is beyond the Q31 number range ( $-2^31 \le 2^31 \le 2^31$ ), it is saturated to the range and the OV flag is set to 1. The result after saturation is written to Rd. When both the two Q15 inputs are 0x8000, saturation will happen and the overflow flag OV will be set.

## **Operations:**

```
aop = Rs1.H[0]; bop = Rs2.H[0]; // KDMABB
aop = Rs1.H[0]; bop = Rs2.H[1]; // KDMABT
aop = Rs1.H[1]; bop = Rs2.H[1]; // KDMATT
If (0x8000 != aop | 0x8000 != bop) {
   Mresult = aop * bop;
   resQ31 = Mresult << 1;
} else {
   resQ31 = 0x7FFFFFFF;
   OV = 1;
}
resadd = Rd + resQ31; // RV32
resadd = Rd.W[0] + resQ31; // RV64</pre>
```

(continues on next page)

```
if (resadd > (2^31)-1) {
   resadd = (2^31)-1;
   OV = 1;
} else if (resadd < -2^31) {
   resadd = -2^31;
   OV = 1;
}
Rd = resadd; // RV32
Rd = SE(resadd); // RV64</pre>
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned int type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KSLLW (long a, unsigned int b)

KSLLW (Saturating Shift Left Logical for Word)
```

Type: DSP

## Syntax:

```
KSLLW Rd, Rs1, Rs2
```

### Purpose:

Do logical left shift operation with saturation on a 32-bit word. The shift amount is a variable from a GPR.

## **Description**:

The first word data in Rs1 is left-shifted logically. The shifted out bits are filled with zero and the shift amount is specified by the low-order 5-bits of the value in the Rs2 register. Any shifted value greater than 2^31-1 is saturated to 2^31-1. Any shifted value smaller than -2^31 is saturated to -2^31. And the saturated result is sign-extended and written to Rd. If any saturation is performed, set OV bit to 1.

## **Operations:**

```
sa = Rs2[4:0];
res[(31+sa):0] = Rs1.W[0] << sa;
if (res > (2^31)-1) {
   res = 0x7ffffffff; OV = 1;
} else if (res < -2^31) {
   res = 0x80000000; OV = 1;
}
Rd[31:0] = res[31:0]; // RV32
Rd[63:0] = SE(res[31:0]); // RV64</pre>
```

#### **Parameters**

- a [in] long type of value stored in a
- **b** [in] unsigned int type of value stored in b

**Returns** value stored in long type

## \_\_STATIC\_FORCEINLINE long \_\_RV\_KSLRAW (int a, int b)

KSLRAW (Shift Left Logical with Q31 Saturation or Shift Right Arithmetic)

Type: DSP Syntax:

```
KSLRAW Rd, Rs1, Rs2
```

#### Purpose:

Perform a logical left (positive) or arithmetic right (negative) shift operation with Q31 saturation for the left shift on a 32-bit data.

## **Description**:

The lower 32-bit content of Rs1 is left-shifted logically or right-shifted arithmetically based on the value of Rs2[5:0]. Rs2[5:0] is in the signed range of [-25, 25-1]. A positive Rs2[5:0] means logical left shift and a negative Rs2[5:0] means arithmetic right shift. The shift amount is the absolute value of Rs2[5:0] clamped to the actual shift range of [0, 31]. The left-shifted result is saturated to the 32-bit signed integer range of [-2^31, 2^31-1]. After the shift operation, the final result is bit-31 sign-extended and written to Rd. If any saturation happens, this instruction sets the OV flag. The value of Rs2[31:6] will not affected the operation of this instruction.

#### **Operations:**

```
if (Rs2[5:0] < 0) {
 sa = -Rs2[5:0];
  sa = (sa == 32)? 31 : sa;
 res[31:0] = Rs1.W[0] >> (arith) sa;
} else {
 sa = Rs2[5:0];
 tmp = Rs1.W[0] \ll (logic) sa;
 if (tmp > (2^31)-1) {
   res[31:0] = (2^31)-1;
   OV = 1;
  \} else if (tmp < -2^31) {
   res[31:0] = -2^31;
   OV = 1
  } else {
    res[31:0] = tmp[31:0];
Rd = res[31:0]; // RV32
Rd = SE64(res[31:0]); // RV64
```

## **Parameters**

- a [in] int type of value stored in a
- **b [in]** int type of value stored in b

**Returns** value stored in long type

```
___STATIC_FORCEINLINE long __RV_KSLRAW_U (int a, int b)
```

KSLRAW.u (Shift Left Logical with Q31 Saturation or Rounding Shift Right Arithmetic)

Type: DSP Syntax:

```
KSLRAW.u Rd, Rs1, Rs2
```

## Purpose:

Perform a logical left (positive) or arithmetic right (negative) shift operation with Q31 saturation for the left shift and a rounding up operation for the right shift on a 32-bit data.

## **Description**:

The lower 32-bit content of Rs1 is left-shifted logically or right-shifted arithmetically based on the value of Rs2[5:0]. Rs2[5:0] is in the signed range of [-25, 25-1]. A positive Rs2[5:0] means logical left shift and a negative Rs2[5:0] means arithmetic right shift. The shift amount is the absolute value of Rs2[5:0] clamped to the actual shift range of [0, 31]. The left-shifted result is saturated to the 32-bit signed integer range of [-2^31, 2^31-1]. The right-shifted result is added a 1 to the most significant discarded bit position for rounding effect. After the shift, saturation, or rounding, the final result is bit-31 sign-extended and written to Rd. If any saturation happens, this instruction sets the OV flag. The value of Rs2[31:6] will not affect the operation of this instruction.

## **Operations:**

```
if (Rs2[5:0] < 0) {
 sa = -Rs2[5:0];
 sa = (sa == 32)? 31 : sa;
 res[31:-1] = SE33(Rs1[31:(sa-1)]) + 1;
 rst[31:0] = res[31:0];
} else {
 sa = Rs2[5:0];
 tmp = Rs1.W[0] \ll (logic) sa;
 if (tmp > (2^31)-1) {
   rst[31:0] = (2^31)-1;
   OV = 1;
  \} else if (tmp < -2^31) {
    rst[31:0] = -2^31;
   OV = 1
  } else {
    rst[31:0] = tmp[31:0];
Rd = rst[31:0]; // RV32
Rd = SE64(rst[31:0]); // RV64
```

## **Parameters**

- a [in] int type of value stored in a
- **b** [in] int type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KSUBW (int a, int b)
```

KSUBW (Signed Subtraction with Q31 Saturation)

Type: DSP Syntax:

```
KSUBW Rd, Rs1, Rs2
```

#### Purpose:

Subtract the signed lower 32-bit content of two registers with Q31 saturation.

#### **Description**:

The signed lower 32-bit content of Rs2 is subtracted from the signed lower 32-bit content of Rs1. And the result is saturated to the 32-bit signed integer range of [-2^31, 2^31-1] and then sign-extende and written to Rd. If saturation happens, this instruction sets the OV flag.

### **Operations:**

```
tmp = Rs1.W[0] - Rs2.W[0];
if (tmp > (2^31)-1) {
   res = (2^31)-1;
   OV = 1;
} else if (tmp < -2^31) {
   res = -2^31;
   OV = 1
} else {
   res = tmp;
}
Rd = res[31:0]; // RV32
Rd = SE(res[31:0]); // RV64</pre>
```

#### **Parameters**

- a [in] int type of value stored in a
- **b** [in] int type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UKADDW (unsigned int a, unsigned int b) UKADDW (Unsigned Addition with U32 Saturation)

Type: DSP

## Syntax:

```
UKADDW Rd, Rs1, Rs2
```

## Purpose:

Add the unsigned lower 32-bit content of two registers with U32 saturation.

## **Description:**

The unsigned lower 32-bit content of Rs1 is added with the unsigned lower 32-bit content of Rs2. And the result is saturated to the 32-bit unsigned integer range of [0, 2^32-1] and then sign-extended and written to Rd. If saturation happens, this instruction sets the OV flag.

#### **Operations:**

```
tmp = Rs1.W[0] + Rs2.W[0];
if (tmp > (2^32)-1) {
  tmp[31:0] = (2^32)-1;
  OV = 1;
}
Rd = tmp[31:0]; // RV32
Rd = SE(tmp[31:0]); // RV64
```

## **Parameters**

• a – [in] unsigned int type of value stored in a

• b – [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_UKSUBW (unsigned int a, unsigned int b) UKSUBW (Unsigned Subtraction with U32 Saturation)
```

Type: DSP

### Syntax:

```
UKSUBW Rd, Rs1, Rs2
```

#### Purpose:

Subtract the unsigned lower 32-bit content of two registers with unsigned 32-bit saturation.

## **Description:**

The unsigned lower 32-bit content of Rs2 is subtracted from the unsigned lower 32-bit content of Rs1. And the result is saturated to the 32-bit unsigned integer range of [0, 2^32-1] and then sign-extended and written to Rd. If saturation happens, this instruction sets the OV flag.

## **Operations:**

```
tmp = Rs1.W[0] - Rs2.W[0];
if (tmp < 0) {
  tmp[31:0] = 0;
  OV = 1;
}
Rd = tmp[31:0]; // RV32
Rd = SE(tmp[31:0]); // RV64</pre>
```

### **Parameters**

- a [in] unsigned int type of value stored in a
- **b [in]** unsigned int type of value stored in b

**Returns** value stored in unsigned long type

## 32-bit Computation Instructions

```
__STATIC_FORCEINLINE long __RV_MAXW (int a, int b)

__STATIC_FORCEINLINE long __RV_MINW (int a, int b)

__STATIC_FORCEINLINE unsigned long long __RV_MULR64 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE long long __RV_MULSR64 (long a, long b)

__STATIC_FORCEINLINE long __RV_RADDW (int a, int b)

__STATIC_FORCEINLINE long __RV_RSUBW (int a, int b)

__STATIC_FORCEINLINE unsigned long __RV_URADDW (unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long __RV_URSUBW (unsigned int a, unsigned int b)

group NMSIS_Core_DSP_Intrinsic_32B_COMPUTATION

32-bit Computation Instructions

there are 8 32-bit Computation Instructions
```

#### **Functions**

```
__STATIC_FORCEINLINE long __RV_MAXW (int a, int b)
```

MAXW (32-bit Signed Word Maximum)

Type: DSP

## Syntax:

```
MAXW Rd, Rs1, Rs2
```

## Purpose:

Get the larger value from the 32-bit contents of two general registers.

## **Description:**

This instruction compares two signed 32-bit integers stored in Rs1 and Rs2, picks the larger value as the result, and writes the result to Rd.

## **Operations:**

```
if (Rs1.W[0] >= Rs2.W[0]) {
  Rd = SE(Rs1.W[0]);
} else {
  Rd = SE(Rs2.W[0]);
}
```

#### **Parameters**

- a [in] int type of value stored in a
- **b** [in] int type of value stored in b

Returns value stored in long type

```
__STATIC_FORCEINLINE long __RV_MINW (int a, int b)
```

MINW (32-bit Signed Word Minimum)

Type: DSP

## Syntax:

```
MINW Rd, Rs1, Rs2
```

## Purpose:

Get the smaller value from the 32-bit contents of two general registers.

## **Description**:

This instruction compares two signed 32-bit integers stored in Rs1 and Rs2, picks the smaller value as the result, and writes the result to Rd.

## **Operations:**

```
if (Rs1.W[0] >= Rs2.W[0])  { Rd = SE(Rs2.W[0]); } else { Rd = SE(Rs1.W[0]); }
```

## **Parameters**

- a [in] int type of value stored in a
- **b** [in] int type of value stored in b

#### **Returns** value stored in long type

```
__STATIC_FORCEINLINE unsigned long long __RV_MULR64 (unsigned long a, unsigned long b)

MULR64 (Multiply Word Unsigned to 64-bit Data)
```

Type: DSP Syntax:

```
MULR64 Rd, Rs1, Rs2
```

#### Purpose:

Multiply the 32-bit unsigned integer contents of two registers and write the 64-bit result.

## **RV32 Description:**

This instruction multiplies the 32-bit content of Rs1 with that of Rs2 and writes the 64-bit multiplication result to an even/odd pair of registers containing Rd. Rd(4,1) index d determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the result and the even 2d register of the pair contains the low 32-bit of the result. The lower 32-bit contents of Rs1 and Rs2 are treated as unsigned integers.

### **RV64 Description:**

This instruction multiplies the lower 32-bit content of Rs1 with that of Rs2 and writes the 64-bit multiplication result to Rd. The lower 32-bit contents of Rs1 and Rs2 are treated as unsigned integers.

## **Operations:**

```
RV32:
Mresult = CONCAT(1`b0,Rs1) u* CONCAT(1`b0,Rs2);
R[Rd(4,1).1(0)][31:0] = Mresult[63:32];
R[Rd(4,1).0(0)][31:0] = Mresult[31:0];
RV64:
Rd = Mresult[63:0];
Mresult = CONCAT(1`b0,Rs1.W[0]) u* CONCAT(1`b0,Rs2.W[0]);
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long long type

```
__STATIC_FORCEINLINE long long __RV_MULSR64 (long a, long b)
```

MULSR64 (Multiply Word Signed to 64-bit Data)

Type: DSP Syntax:

```
MULSR64 Rd, Rs1, Rs2
```

## Purpose:

Multiply the 32-bit signed integer contents of two registers and write the 64-bit result.

## **RV32 Description**:

This instruction multiplies the lower 32-bit content of Rs1 with the lower 32-bit content of Rs2 and writes the 64-bit multiplication result to an even/odd pair of registers containing Rd. Rd(4,1) index d determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1.

The odd 2d+1 register of the pair contains the high 32-bit of the result and the even 2d register of the pair contains the low 32-bit of the result. The lower 32-bit contents of Rs1 and Rs2 are treated as signed integers.

## **RV64 Description:**

This instruction multiplies the lower 32-bit content of Rs1 with the lower 32-bit content of Rs2 and writes the 64-bit multiplication result to Rd. The lower 32-bit contents of Rs1 and Rs2 are treated as signed integers.

## **Operations:**

```
RV32:
Mresult = Ra s* Rb;
R[Rd(4,1).1(0)][31:0] = Mresult[63:32];
R[Rd(4,1).0(0)][31:0] = Mresult[31:0];
RV64:
Mresult = Ra.W[0] s* Rb.W[0];
Rd = Mresult[63:0];
```

#### **Parameters**

- a [in] long type of value stored in a
- **b** [in] long type of value stored in b

**Returns** value stored in long long type

```
__STATIC_FORCEINLINE long __RV_RADDW (int a, int b)
RADDW (32-bit Signed Halving Addition)
```

Type: DSP

# Syntax:

```
RADDW Rd, Rs1, Rs2
```

#### Purpose:

Add 32-bit signed integers and the results are halved to avoid overflow or saturation.

## **Description**:

This instruction adds the first 32-bit signed integer in Rs1 with the first 32-bit signed integer in Rs2. The result is first arithmetically right-shifted by 1 bit and then sign-extended and written to Rd.

#### **Examples:**

```
* Rs1 = 0x7FFFFFFF, Rs2 = 0x7FFFFFFF, Rd = 0x7FFFFFFF

* Rs1 = 0x80000000, Rs2 = 0x80000000, Rd = 0x80000000

* Rs1 = 0x40000000, Rs2 = 0x80000000, Rd = 0xE0000000
```

## **Operations:**

```
RV32:
Rd[31:0] = (Rs1[31:0] + Rs2[31:0]) s>> 1;
RV64:
resw[31:0] = (Rs1[31:0] + Rs2[31:0]) s>> 1;
Rd[63:0] = SE(resw[31:0]);
```

#### **Parameters**

- a [in] int type of value stored in a
- **b [in]** int type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_RSUBW (int a, int b)
```

RSUBW (32-bit Signed Halving Subtraction)

Type: DSP Syntax:

```
RSUBW Rd, Rs1, Rs2
```

## Purpose:

Subtract 32-bit signed integers and the result is halved to avoid overflow or saturation.

### **Description**:

This instruction subtracts the first 32-bit signed integer in Rs2 from the first 32-bit signed integer in Rs1. The result is first arithmetically right-shifted by 1 bit and then sign-extended and written to Rd.

### **Examples:**

```
* Rs1 = 0x7FFFFFFF, Rs2 = 0x80000000, Rd = 0x7FFFFFFF

* Rs1 = 0x80000000, Rs2 = 0x7FFFFFFF, Rd = 0x80000000

* Rs1 = 0x80000000, Rs2 = 0x40000000, Rd = 0xA0000000
```

## **Operations:**

```
RV32:

Rd[31:0] = (Rs1[31:0] - Rs2[31:0]) s>> 1;

RV64:

resw[31:0] = (Rs1[31:0] - Rs2[31:0]) s>> 1;

Rd[63:0] = SE(resw[31:0]);
```

### **Parameters**

- a [in] int type of value stored in a
- **b** [in] int type of value stored in b

Returns value stored in long type

```
__STATIC_FORCEINLINE unsigned long __RV_URADDW (unsigned int a, unsigned int b)
URADDW (32-bit Unsigned Halving Addition)
```

Type: DSP

## Syntax:

```
URADDW Rd, Rs1, Rs2
```

## Purpose:

Add 32-bit unsigned integers and the results are halved to avoid overflow or saturation.

#### **Description**:

This instruction adds the first 32-bit unsigned integer in Rs1 with the first 32-bit unsigned integer in Rs2. The result is first logically right-shifted by 1 bit and then sign-extended and written to Rd.

### **Examples**:

```
* Ra = 0x7FFFFFFF, Rb = 0x7FFFFFFF Rt = 0x7FFFFFFF

* Ra = 0x80000000, Rb = 0x80000000 Rt = 0x80000000

* Ra = 0x40000000, Rb = 0x80000000 Rt = 0x60000000
```

## **Operations:**

```
* RV32:

Rd[31:0] = (Rs1[31:0] + Rs2[31:0]) u>> 1;

* RV64:

resw[31:0] = (Rs1[31:0] + Rs2[31:0]) u>> 1;

Rd[63:0] = SE(resw[31:0]);
```

#### **Parameters**

- a [in] unsigned int type of value stored in a
- b [in] unsigned int type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_URSUBW (unsigned int a, unsigned int b) URSUBW (32-bit Unsigned Halving Subtraction)

Type: DSP

## Syntax:

```
URSUBW Rd, Rs1, Rs2
```

## Purpose:

Subtract 32-bit unsigned integers and the result is halved to avoid overflow or saturation.

## **Description**:

This instruction subtracts the first 32-bit signed integer in Rs2 from the first 32-bit signed integer in Rs1. The result is first logically right-shifted by 1 bit and then sign-extended and written to Rd.

## **Examples:**

```
* Ra = 0x7FFFFFFF, Rb = 0x80000000 Rt = 0xFFFFFFFF

* Ra = 0x80000000, Rb = 0x7FFFFFFF Rt = 0x00000000

* Ra = 0x80000000, Rb = 0x40000000 Rt = 0x20000000
```

## **Operations:**

```
* RV32:

Rd[31:0] = (Rs1[31:0] - Rs2[31:0]) u>> 1;

* RV64:

resw[31:0] = (Rs1[31:0] - Rs2[31:0]) u>> 1;

Rd[63:0] = SE(resw[31:0]);
```

## **Parameters**

- a [in] unsigned int type of value stored in a
- **b [in]** unsigned int type of value stored in b

**Returns** value stored in unsigned long type

## OV (Overflow) flag Set/Clear Instructions

```
__STATIC_FORCEINLINE void __RV_CLROV (void)
__STATIC_FORCEINLINE unsigned long __RV_RDOV (void)
group NMSIS_Core_DSP_Intrinsic_OV_FLAG_SC
     OV (Overflow) flag Set/Clear Instructions.
     The following table lists the user instructions related to Overflow (OV) flag manipulation. there are 2 OV
     (Overflow) flag Set/Clear Instructions
     Functions
      _STATIC_FORCEINLINE void __RV_CLROV (void)
         CLROV (Clear OV flag)
         Type: DSP
         Syntax:
         CLROV # pseudo mnemonic
         Purpose:
         This pseudo instruction is an alias to CSRRCI x0, ucode, 1 instruction.
       _STATIC_FORCEINLINE unsigned long __RV_RDOV (void)
         RDOV (Read OV flag)
         Type: DSP
         Syntax:
```

#### Purpose:

RDOV Rd

This pseudo instruction is an alias to CSRR Rd, ucode instruction which maps to the real instruction of CSRRS Rd, ucode, x0.

**Returns** value stored in unsigned long type

# pseudo mnemonic

## $group \ {\tt NMSIS\_Core\_DSP\_Intrinsic\_NON\_SIMD}$

Non-SIMD Instructions.

## **Partial-SIMD Data Processing Instructions**

## **SIMD 16-bit Packing Instructions**

```
__STATIC_FORCEINLINE unsigned long __RV_PKBB16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_PKBT16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_PKTT16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_PKTB16 (unsigned long a, unsigned long b)
```

#### group NMSIS\_Core\_DSP\_Intrinsic\_SIMD\_16B\_PACK

SIMD 16-bit Packing Instructions.

there are 4 SIMD16-bit Packing Instructions.

#### **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_PKBB16 (unsigned long a, unsigned long b) PKBB16 (Pack Two 16-bit Data from Both Bottom Half)
```

Type: DSP

## Syntax:

```
PKBB16 Rd, Rs1, Rs2
PKBT16 Rd, Rs1, Rs2
PKTT16 Rd, Rs1, Rs2
PKTB16 Rd, Rs1, Rs2
```

## Purpose:

Pack 16-bit data from 32-bit chunks in two registers.

- PKBB16: bottom.bottom
- PKBT16 bottom.top
- PKTT16 top.top
- PKTB16 top.bottom

## **Description**:

(PKBB16) moves Rs1.W[x][15:0] to Rd.W[x][31:16] and moves Rs2.W[x] [15:0] to Rd.W[x] [15:0]. (PKBT16) moves Rs1.W[x] [15:0] to Rd.W[x] [31:16] and moves Rs2.W[x] [31:16] to Rd.W[x] [15:0]. (PKTT16) moves Rs1.W[x] [31:16] to Rd.W[x] [31:16] and moves Rs2.W[x] [31:16] to Rd.W[x] [15:0]. (PKTB16) moves Rs1.W[x] [31:16] to Rd.W[x] [31:16] and moves Rs2.W[x] [15:0] to Rd.W[x] [15:0].

### **Operations:**

```
Rd.W[x][31:0] = CONCAT(Rs1.W[x][15:0], Rs2.W[x][15:0]); // PKBB16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][15:0], Rs2.W[x][31:16]); // PKBT16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][31:16], Rs2.W[x][15:0]); // PKTB16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][31:16], Rs2.W[x][31:16]); // PKTT16
for RV32: x=0,
for RV64: x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_PKBT16 (unsigned long a, unsigned long b)
PKBT16 (Pack Two 16-bit Data from Bottom and Top Half)
```

Type: DSP

Syntax:

```
PKBB16 Rd, Rs1, Rs2
PKBT16 Rd, Rs1, Rs2
PKTT16 Rd, Rs1, Rs2
PKTB16 Rd, Rs1, Rs2
```

#### Purpose:

Pack 16-bit data from 32-bit chunks in two registers.

- PKBB16: bottom.bottom
- PKBT16 bottom.top
- PKTT16 top.top
- PKTB16 top.bottom

### **Description**:

(PKBB16) moves Rs1.W[x][15:0] to Rd.W[x][31:16] and moves Rs2.W[x] [15:0] to Rd.W[x] [15:0]. (PKBT16) moves Rs1.W[x] [15:0] to Rd.W[x] [31:16] and moves Rs2.W[x] [31:16] to Rd.W[x] [15:0]. (PKTT16) moves Rs1.W[x] [31:16] to Rd.W[x] [31:16] and moves Rs2.W[x] [31:16] to Rd.W[x] [15:0]. (PKTB16) moves Rs1.W[x] [31:16] to Rd.W[x] [31:16] and moves Rs2.W[x] [15:0] to Rd.W[x] [15:0].

## **Operations:**

```
Rd.W[x][31:0] = CONCAT(Rs1.W[x][15:0], Rs2.W[x][15:0]); // PKBB16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][15:0], Rs2.W[x][31:16]); // PKBT16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][31:16], Rs2.W[x][15:0]); // PKTB16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][31:16], Rs2.W[x][31:16]); // PKTT16
for RV32: x=0,
for RV64: x=1...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_PKTT16 (unsigned long a, unsigned long b)
PKTT16 (Pack Two 16-bit Data from Both Top Half)

# Type: DSP

### Syntax:

```
PKBB16 Rd, Rs1, Rs2
PKBT16 Rd, Rs1, Rs2
PKTT16 Rd, Rs1, Rs2
PKTB16 Rd, Rs1, Rs2
```

## Purpose:

Pack 16-bit data from 32-bit chunks in two registers.

- PKBB16: bottom.bottom
- PKBT16 bottom.top
- PKTT16 top.top
- PKTB16 top.bottom

#### **Description**:

(PKBB16) moves Rs1.W[x][15:0] to Rd.W[x][31:16] and moves Rs2.W[x] [15:0] to Rd.W[x] [15:0]. (PKBT16) moves Rs1.W[x] [15:0] to Rd.W[x] [31:16] and moves Rs2.W[x] [31:16] to Rd.W[x] [15:0]. (PKTT16) moves Rs1.W[x] [31:16] to Rd.W[x] [31:16] and moves Rs2.W[x] [31:16] to Rd.W[x] [15:0]. (PKTB16) moves Rs1.W[x] [31:16] to Rd.W[x] [31:16] and moves Rs2.W[x] [15:0] to Rd.W[x] [15:0].

## **Operations:**

```
Rd.W[x][31:0] = CONCAT(Rs1.W[x][15:0], Rs2.W[x][15:0]); // PKBB16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][15:0], Rs2.W[x][31:16]); // PKBT16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][31:16], Rs2.W[x][15:0]); // PKTB16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][31:16], Rs2.W[x][31:16]); // PKTT16
for RV32: x=0,
for RV64: x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_PKTB16 (unsigned long a, unsigned long b)
PKTB16 (Pack Two 16-bit Data from Top and Bottom Half)

Type: DSP

## Syntax:

```
PKBB16 Rd, Rs1, Rs2
PKBT16 Rd, Rs1, Rs2
PKTT16 Rd, Rs1, Rs2
PKTB16 Rd, Rs1, Rs2
```

#### Purpose:

Pack 16-bit data from 32-bit chunks in two registers.

- PKBB16: bottom.bottom
- PKBT16 bottom.top
- PKTT16 top.top
- PKTB16 top.bottom

## **Description**:

(PKBB16) moves Rs1.W[x][15:0] to Rd.W[x][31:16] and moves Rs2.W[x] [15:0] to Rd.W[x] [15:0]. (PKBT16) moves Rs1.W[x] [15:0] to Rd.W[x] [31:16] and moves Rs2.W[x] [31:16] to Rd.W[x] [15:0]. (PKTT16) moves Rs1.W[x] [31:16] to Rd.W[x] [31:16] and moves Rs2.W[x] [31:16] to Rd.W[x] [15:0]. (PKTB16) moves Rs1.W[x] [31:16] to Rd.W[x] [31:16] and moves Rs2.W[x] [15:0] to Rd.W[x] [15:0].

## **Operations:**

```
Rd.W[x][31:0] = CONCAT(Rs1.W[x][15:0], Rs2.W[x][15:0]); // PKBB16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][15:0], Rs2.W[x][31:16]); // PKBT16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][31:16], Rs2.W[x][15:0]); // PKTB16
Rd.W[x][31:0] = CONCAT(Rs1.W[x][31:16], Rs2.W[x][31:16]); // PKTT16
for RV32: x=0,
for RV64: x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

## Signed MSW 32x32 Multiply and Add Instructions

```
__STATIC_FORCEINLINE long __RV_KMMAC (long t, long a, long b)
__STATIC_FORCEINLINE long __RV_KMMAC_U (long t, long a, long b)
__STATIC_FORCEINLINE long __RV_KMMSB (long t, long a, long b)
__STATIC_FORCEINLINE long __RV_KMMSB_U (long t, long a, long b)
__STATIC_FORCEINLINE long __RV_KWMMUL (long a, long b)
__STATIC_FORCEINLINE long __RV_KWMMUL_U (long a, long b)
__STATIC_FORCEINLINE long __RV_SMMUL (long a, long b)
__STATIC_FORCEINLINE long __RV_SMMUL_U (long a, long b)
group NMSIS_Core_DSP_Intrinsic_SIGNED_MSW_32X32_MAC
Signed MSW 32x32 Multiply and Add Instructions.
```

there are 8 Signed MSW 32x32 Multiply and Add Instructions

## **Functions**

```
__STATIC_FORCEINLINE long __RV_KMMAC (long t, long a, long b)

KMMAC (SIMD Saturating MSW Signed Multiply Word and Add)

Type: SIMD

Syntax:
```

```
KMMAC Rd, Rs1, Rs2
KMMAC.u Rd, Rs1, Rs2
```

## Purpose:

Multiply the signed 32-bit integer elements of two registers and add the most significant 32-bit results with the signed 32-bit integer elements of a third register. The addition results are saturated first and then written back to the third register. The . u form performs an additional rounding up operation on the multiplication results before adding the most significant 32-bit part of the results.

### **Description**:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and adds the most significant 32-bit multiplication results with the signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range ( $-2^31 \le 2^31-1$ ), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The .u form of the instruction additionally rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.

## **Operations:**

```
Mres[x][63:0] = Rs1.W[x] * Rs2.W[x];
if (`.u` form) {
   Round[x][32:0] = Mres[x][63:31] + 1;
   res[x] = Rd.W[x] + Round[x][32:1];
} else {
   res[x] = Rd.W[x] + Mres[x][63:32];
}
if (res[x] > (2^31)-1) {
   res[x] = (2^31)-1;
   OV = 1;
} else if (res[x] < -2^31) {
   res[x] = -2^31;
   OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0</pre>
```

## **Parameters**

- t [in] long type of value stored in t
- a [in] long type of value stored in a
- **b** [in] long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KMMAC_U (long t, long a, long b)

KMMAC.u (SIMD Saturating MSW Signed Multiply Word and Add with Rounding)
```

Type: SIMD

#### Syntax:

```
KMMAC Rd, Rs1, Rs2
KMMAC.u Rd, Rs1, Rs2
```

#### Purpose:

Multiply the signed 32-bit integer elements of two registers and add the most significant 32-bit results with the signed 32-bit integer elements of a third register. The addition results are saturated first and then written back to the third register. The . u form performs an additional rounding up operation on the multiplication results before adding the most significant 32-bit part of the results.

## **Description**:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and adds the most significant 32-bit multiplication results with the signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range ( $-2^31 \le 2^31-1$ ), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The .u form of the instruction additionally rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.

#### **Operations:**

```
Mres[x][63:0] = Rs1.W[x] * Rs2.W[x];
if (`.u` form) {
  Round[x][32:0] = Mres[x][63:31] + 1;
  res[x] = Rd.W[x] + Round[x][32:1];
```

(continues on next page)

```
} else {
    res[x] = Rd.W[x] + Mres[x][63:32];
}
if (res[x] > (2^31)-1) {
    res[x] = (2^31)-1;
    OV = 1;
} else if (res[x] < -2^31) {
    res[x] = -2^31;
    OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0</pre>
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] long type of value stored in a
- **b** [in] long type of value stored in b

**Returns** value stored in long type

```
_STATIC_FORCEINLINE long __RV_KMMSB (long t, long a, long b)
KMMSB (SIMD Saturating MSW Signed Multiply Word and Subtract)
```

Type: SIMD

#### Syntax:

```
KMMSB Rd, Rs1, Rs2
KMMSB.u Rd, Rs1, Rs2
```

## **Purpose:**

Multiply the signed 32-bit integer elements of two registers and subtract the most significant 32-bit results from the signed 32-bit elements of a third register. The subtraction results are written to the third register. The . u form performs an additional rounding up operation on the multiplication results before subtracting the most significant 32-bit part of the results.

## **Description:**

This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and subtracts the most significant 32-bit multiplication results from the signed 32-bit elements of Rd. If the subtraction result is beyond the Q31 number range ( $-2^31 \le 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^31 = 2^$ 

## **Operations:**

```
Mres[x][63:0] = Rs1.W[x] * Rs2.W[x];
if (`.u` form) {
   Round[x][32:0] = Mres[x][63:31] + 1;
   res[x] = Rd.W[x] - Round[x][32:1];
} else {
   res[x] = Rd.W[x] - Mres[x][63:32];
}
```

(continues on next page)

```
if (res[x] > (2^31)-1) {
  res[x] = (2^31)-1;
  OV = 1;
} else if (res[x] < -2^31) {
  res[x] = -2^31;
  OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0</pre>
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] long type of value stored in a
- **b** [in] long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KMMSB_U (long t, long a, long b)

KMMSB.u (SIMD Saturating MSW Signed Multiply Word and Subtraction with Rounding)
```

Type: SIMD

### Syntax:

```
KMMSB Rd, Rs1, Rs2
KMMSB.u Rd, Rs1, Rs2
```

#### Purpose:

Multiply the signed 32-bit integer elements of two registers and subtract the most significant 32-bit results from the signed 32-bit elements of a third register. The subtraction results are written to the third register. The .u form performs an additional rounding up operation on the multiplication results before subtracting the most significant 32-bit part of the results.

## **Description**:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and subtracts the most significant 32-bit multiplication results from the signed 32-bit elements of Rd. If the subtraction result is beyond the Q31 number range ( $-2^31 \le 2^31-1$ ), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The .u form of the instruction additionally rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.

### **Operations:**

```
Mres[x][63:0] = Rs1.W[x] * Rs2.W[x];
if (`.u` form) {
  Round[x][32:0] = Mres[x][63:31] + 1;
  res[x] = Rd.W[x] - Round[x][32:1];
} else {
  res[x] = Rd.W[x] - Mres[x][63:32];
}
if (res[x] > (2^31)-1) {
  res[x] = (2^31)-1;
  OV = 1;
```

(continues on next page)

(continued from previous page)

```
} else if (res[x] < -2^31) {
    res[x] = -2^31;
    OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0</pre>
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] long type of value stored in a
- **b** [in] long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KWMMUL (long a, long b)

KWMMUL (SIMD Saturating MSW Signed Multiply Word & Double)
```

Type: SIMD

# Syntax:

```
KWMMUL Rd, Rs1, Rs2
KWMMUL.u Rd, Rs1, Rs2
```

#### Purpose:

Multiply the signed 32-bit integer elements of two registers, shift the results left 1-bit, saturate, and write the most significant 32-bit results to a register. The .u form additionally rounds up the multiplication results from the most signification discarded bit.

# **Description**:

This instruction multiplies the 32-bit elements of Rs1 with the 32-bit elements of Rs2. It then shifts the multiplication results one bit to the left and takes the most significant 32-bit results. If the shifted result is greater than 2^31-1, it is saturated to 2^31-1 and the OV flag is set to 1. The final element result is written to Rd. The 32-bit elements of Rs1 and Rs2 are treated as signed integers. The .u form of the instruction additionally rounds up the 64-bit multiplication results by adding a 1 to bit 30 before the shift and saturation operations.

## **Operations:**

```
if ((0x80000000 != Rs1.W[x]) | (0x800000000 != Rs2.W[x])) {
   Mres[x][63:0] = Rs1.W[x] * Rs2.W[x];
   if (`.u` form) {
      Round[x][33:0] = Mres[x][63:30] + 1;
      Rd.W[x] = Round[x][32:1];
   } else {
      Rd.W[x] = Mres[x][62:31];
   }
} else {
   Rd.W[x] = 0x7fffffff;
   OV = 1;
}
for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- a [in] long type of value stored in a
- **b** [in] long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KWMMUL_U (long a, long b)

KWMMUL.u (SIMD Saturating MSW Signed Multiply Word & Double with Rounding)
```

Type: SIMD

## Syntax:

```
KWMMUL Rd, Rs1, Rs2
KWMMUL.u Rd, Rs1, Rs2
```

#### Purpose:

Multiply the signed 32-bit integer elements of two registers, shift the results left 1-bit, saturate, and write the most significant 32-bit results to a register. The .u form additionally rounds up the multiplication results from the most signification discarded bit.

## **Description**:

This instruction multiplies the 32-bit elements of Rs1 with the 32-bit elements of Rs2. It then shifts the multiplication results one bit to the left and takes the most significant 32-bit results. If the shifted result is greater than 2^31-1, it is saturated to 2^31-1 and the OV flag is set to 1. The final element result is written to Rd. The 32-bit elements of Rs1 and Rs2 are treated as signed integers. The .u form of the instruction additionally rounds up the 64-bit multiplication results by adding a 1 to bit 30 before the shift and saturation operations.

#### **Operations:**

```
if ((0x80000000 != Rs1.W[x]) | (0x80000000 != Rs2.W[x])) {
   Mres[x][63:0] = Rs1.W[x] * Rs2.W[x];
   if (`.u` form) {
     Round[x][33:0] = Mres[x][63:30] + 1;
     Rd.W[x] = Round[x][32:1];
   } else {
     Rd.W[x] = Mres[x][62:31];
   }
} else {
   Rd.W[x] = 0x7ffffffff;
   OV = 1;
}
for RV32: x=0
for RV64: x=1...0
```

# **Parameters**

- a [in] long type of value stored in a
- **b** [in] long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_SMMUL (long a, long b)
SMMUL (SIMD MSW Signed Multiply Word)
```

Type: SIMD

```
SMMUL Rd, Rs1, Rs2
SMMUL.u Rd, Rs1, Rs2
```

### **Purpose:**

Multiply the 32-bit signed integer elements of two registers and write the most significant 32-bit results to the corresponding 32-bit elements of a register. The .u form performs an additional rounding up operation on the multiplication results before taking the most significant 32-bit part of the results.

## **Description**:

This instruction multiplies the 32-bit elements of Rs1 with the 32-bit elements of Rs2 and writes the most significant 32-bit multiplication results to the corresponding 32-bit elements of Rd. The 32-bit elements of Rs1 and Rs2 are treated as signed integers. The .u form of the instruction rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.

• For smmul/RV32 instruction, it is an alias to mulh/RV32 instruction.

### **Operations:**

```
Mres[x][63:0] = Rs1.W[x] * Rs2.W[x];
if (`.u` form) {
   Round[x][32:0] = Mres[x][63:31] + 1;
   Rd.W[x] = Round[x][32:1];
} else {
   Rd.W[x] = Mres[x][63:32];
}
for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- a [in] long type of value stored in a
- **b** [in] long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_SMMUL_U (long a, long b)
SMMUL.u (SIMD MSW Signed Multiply Word with Rounding)
```

Type: SIMD

#### Syntax:

```
SMMUL Rd, Rs1, Rs2
SMMUL.u Rd, Rs1, Rs2
```

#### **Purpose:**

Multiply the 32-bit signed integer elements of two registers and write the most significant 32-bit results to the corresponding 32-bit elements of a register. The . u form performs an additional rounding up operation on the multiplication results before taking the most significant 32-bit part of the results.

## **Description**:

This instruction multiplies the 32-bit elements of Rs1 with the 32-bit elements of Rs2 and writes the most significant 32-bit multiplication results to the corresponding 32-bit elements of Rd. The 32-bit elements of Rs1 and Rs2 are treated as signed integers. The .u form of the instruction rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.

• For smmul/RV32 instruction, it is an alias to mulh/RV32 instruction.

#### **Operations:**

```
Mres[x][63:0] = Rs1.W[x] * Rs2.W[x];
if (`.u` form) {
   Round[x][32:0] = Mres[x][63:31] + 1;
   Rd.W[x] = Round[x][32:1];
} else {
   Rd.W[x] = Mres[x][63:32];
}
for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- a [in] long type of value stored in a
- **b** [in] long type of value stored in b

**Returns** value stored in long type

## Signed MSW 32x16 Multiply and Add Instructions

```
__STATIC_FORCEINLINE long __RV_KMMAWB (long t, unsigned long a, unsigned long b)
STATIC FORCEINLINE long RV KMMAWB U (long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_KMMAWB2 (long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_KMMAWB2_U (long t, unsigned long a, unsigned long b)
 _STATIC_FORCEINLINE long __RV_KMMAWT (long t, unsigned long a, unsigned long b)
_STATIC_FORCEINLINE long __RV_KMMAWT_U (long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_KMMAWT2 (long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_KMMAWT2_U (long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_KMMWB2 (long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_KMMWB2_U (long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_KMMWT2 (long a, unsigned long b)
STATIC FORCEINLINE long RV KMMWT2 U (long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_SMMWB (long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_SMMWB_U (long a, unsigned long b)
 _STATIC_FORCEINLINE long __RV_SMMWT (long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_SMMWT_U (long a, unsigned long b)
group NMSIS_Core_DSP_Intrinsic_SIGNED_MSW_32X16_MAC
    Signed MSW 32x16 Multiply and Add Instructions.
```

there are 15 Signed MSW 32x16 Multiply and Add Instructions

#### **Functions**

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMMAWB (long t, unsigned long a, unsigned long b)
KMMAWB (SIMD Saturating MSW Signed Multiply Word and Bottom Half and Add)

Type: SIMD

### Syntax:

```
KMMAWB Rd, Rs1, Rs2
KMMAWB.u Rd, Rs1, Rs2
```

## Purpose:

Multiply the signed 32-bit integer elements of one register and the bottom 16-bit of the corresponding 32-bit elements of another register and add the most significant 32-bit results with the corresponding signed 32-bit elements of a third register. The addition result is written to the corresponding 32-bit elements of the third register. The .u form rounds up the multiplication results from the most significant discarded bit before the addition operations.

# **Description**:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed bottom 16-bit content of the corresponding 32-bit elements of Rs2 and adds the most significant 32-bit multiplication results with the corresponding signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range (- $2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^$ 

# **Operations**:

```
Mres[x][47:0] = Rs1.W[x] * Rs2.W[x].H[0];
if (`.u` form) {
   Round[x][32:0] = Mres[x][47:15] + 1;
   res[x] = Rd.W[x] + Round[x][32:1];
} else {
   res[x] = Rd.W[x] + Mres[x][47:16];
}
if (res[x] > (2^31)-1) {
   res[x] = (2^31)-1;
   OV = 1;
} else if (res[x] < -2^31) {
   res[x] = -2^31;
   OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0</pre>
```

### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMMAWB\_U (long t, unsigned long a, unsigned long b) KMMAWB.u (SIMD Saturating MSW Signed Multiply Word and Bottom Half and Add with Rounding)

Type: SIMD

## Syntax:

```
KMMAWB Rd, Rs1, Rs2
KMMAWB.u Rd, Rs1, Rs2
```

#### Purpose:

Multiply the signed 32-bit integer elements of one register and the bottom 16-bit of the corresponding 32-bit elements of another register and add the most significant 32-bit results with the corresponding signed 32-bit elements of a third register. The addition result is written to the corresponding 32-bit elements of the third register. The .u form rounds up the multiplication results from the most significant discarded bit before the addition operations.

# **Description**:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed bottom 16-bit content of the corresponding 32-bit elements of Rs2 and adds the most significant 32-bit multiplication results with the corresponding signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range (- $2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^31 < 2^$ 

## **Operations:**

```
Mres[x][47:0] = Rs1.W[x] * Rs2.W[x].H[0];
if (`.u` form) {
   Round[x][32:0] = Mres[x][47:15] + 1;
   res[x] = Rd.W[x] + Round[x][32:1];
} else {
   res[x] = Rd.W[x] + Mres[x][47:16];
}
if (res[x] > (2^31)-1) {
   res[x] = (2^31)-1;
   OV = 1;
} else if (res[x] < -2^31) {
   res[x] = -2^31;
   OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0</pre>
```

## **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMMAWB2 (long t, unsigned long a, unsigned long b) KMMAWB2 (SIMD Saturating MSW Signed Multiply Word and Bottom Half & 2 and Add)

# Type: SIMD

### Syntax:

```
KMMAWB2 Rd, Rs1, Rs2
KMMAWB2.u Rd, Rs1, Rs2
```

### Purpose:

Multiply the signed 32-bit elements of one register and the bottom 16-bit of the corresponding 32-bit elements of another register, double the multiplication results and add the saturated most significant 32-bit results with the corresponding signed 32-bit elements of a third register. The saturated addition result is written to the corresponding 32-bit elements of the third register. The .u form rounds up the multiplication results from the most significant discarded bit before the addition operations.

## **Description**:

This instruction multiplies the signed 32-bit Q31 elements of Rs1 with the signed bottom 16-bit Q15 content of the corresponding 32-bit elements of Rs2, doubles the Q46 results to Q47 numbers and adds the saturated most significant 32-bit Q31 multiplication results with the corresponding signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to the corresponding 32-bit elements of Rd. The .u form of the instruction rounds up the most significant 32-bit of the 48-bit Q47 multiplication results by adding a 1 to bit 15 (i.e., bit 14 before doubling) of the result before the addition operations.

### **Operations:**

```
if ((Rs1.W[x] == 0x80000000) & (Rs2.W[x].H[0] == 0x8000)) {
 addop.W[x] = 0x7ffffffff;
 OV = 1;
} else {
 Mres[x][47:0] = Rs1.W[x] s* Rs2.W[x].H[0];
 if (`.u` form) {
   Mres[x][47:14] = Mres[x][47:14] + 1;
 addop.W[x] = Mres[x][46:15]; // doubling
res[x] = Rd.W[x] + addop.W[x];
if (res[x] > (2^31)-1) {
 res[x] = (2^31)-1;
 OV = 1;
else if (res[x] < -2^31) {
 res[x] = -2^31;
 OV = 1;
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMMAWB2\_U (long t, unsigned long a, unsigned long b) KMMAWB2.u (SIMD Saturating MSW Signed Multiply Word and Bottom Half & 2 and Add with Rounding)

Type: SIMD

### Syntax:

```
KMMAWB2 Rd, Rs1, Rs2
KMMAWB2.u Rd, Rs1, Rs2
```

### Purpose:

Multiply the signed 32-bit elements of one register and the bottom 16-bit of the corresponding 32-bit elements of another register, double the multiplication results and add the saturated most significant 32-bit results with the corresponding signed 32-bit elements of a third register. The saturated addition result is written to the corresponding 32-bit elements of the third register. The .u form rounds up the multiplication results from the most significant discarded bit before the addition operations.

### **Description:**

This instruction multiplies the signed 32-bit Q31 elements of Rs1 with the signed bottom 16-bit Q15 content of the corresponding 32-bit elements of Rs2, doubles the Q46 results to Q47 numbers and adds the saturated most significant 32-bit Q31 multiplication results with the corresponding signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range (-2^31  $\leq$  Q31  $\leq$  2^31-1), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to the corresponding 32-bit elements of Rd. The .u form of the instruction rounds up the most significant 32-bit of the 48-bit Q47 multiplication results by adding a 1 to bit 15 (i.e., bit 14 before doubling) of the result before the addition operations.

### **Operations:**

```
if ((Rs1.W[x] == 0x80000000) & (Rs2.W[x].H[0] == 0x8000)) {
 addop.W[x] = 0x7fffffff;
 OV = 1;
} else {
 Mres[x][47:0] = Rs1.W[x] s* Rs2.W[x].H[0];
 if (`.u` form) {
   Mres[x][47:14] = Mres[x][47:14] + 1;
 addop.W[x] = Mres[x][46:15]; // doubling
res[x] = Rd.W[x] + addop.W[x];
if (res[x] > (2^31)-1) {
 res[x] = (2^31)-1;
 OV = 1;
else if (res[x] < -2^31) {
 res[x] = -2^31;
 OV = 1;
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0
```

### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a

• **b** – [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMMAWT (long t, unsigned long a, unsigned long b)
KMMAWT (SIMD Saturating MSW Signed Multiply Word and Top Half and Add)

Type: SIMD

## Syntax:

```
KMMAWT Rd, Rs1, Rs2
KMMAWT.u Rd Rs1, Rs2
```

### Purpose:

Multiply the signed 32-bit integer elements of one register and the signed top 16-bit of the corresponding 32-bit elements of another register and add the most significant 32-bit results with the corresponding signed 32-bit elements of a third register. The addition results are written to the corresponding 32-bit elements of the third register. The .u form rounds up the multiplication results from the most significant discarded bit before the addition operations.

## **Description**:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed top 16-bit of the corresponding 32-bit elements of Rs2 and adds the most significant 32-bit multiplication results with the corresponding signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range ( $-2^31 \le 2^31-1$ ), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to the corresponding 32-bit elements of Rd. The .u form of the instruction rounds up the most significant 32-bit of the 48-bit multiplication results by adding a 1 to bit 15 of the result before the addition operations.

# **Operations:**

```
Mres[x][47:0] = Rs1.W[x] * Rs2.W[x].H[1];
if (`.u` form) {
   Round[x][32:0] = Mres[x][47:15] + 1;
   res[x] = Rd.W[x] + Round[x][32:1];
} else {
   res[x] = Rd.W[x] + Mres[x][47:16];
}
if (res[x] > (2^31)-1) {
   res[x] = (2^31)-1;
   OV = 1;
} else if (res[x] < -2^31) {
   res[x] = -2^31;
   OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0</pre>
```

### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMMAWT\_U (long t, unsigned long a, unsigned long b) KMMAWT.u (SIMD Saturating MSW Signed Multiply Word and Top Half and Add with Rounding)

Type: SIMD

# Syntax:

```
KMMAWT Rd, Rs1, Rs2
KMMAWT.u Rd Rs1, Rs2
```

#### Purpose:

Multiply the signed 32-bit integer elements of one register and the signed top 16-bit of the corresponding 32-bit elements of another register and add the most significant 32-bit results with the corresponding signed 32-bit elements of a third register. The addition results are written to the corresponding 32-bit elements of the third register. The .u form rounds up the multiplication results from the most significant discarded bit before the addition operations.

# **Description**:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed top 16-bit of the corresponding 32-bit elements of Rs2 and adds the most significant 32-bit multiplication results with the corresponding signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range ( $-2^31 \le 2^31-1$ ), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to the corresponding 32-bit elements of Rd. The . u form of the instruction rounds up the most significant 32-bit of the 48-bit multiplication results by adding a 1 to bit 15 of the result before the addition operations.

# **Operations:**

```
Mres[x][47:0] = Rs1.W[x] * Rs2.W[x].H[1];
if (`.u` form) {
   Round[x][32:0] = Mres[x][47:15] + 1;
   res[x] = Rd.W[x] + Round[x][32:1];
} else {
   res[x] = Rd.W[x] + Mres[x][47:16];
}
if (res[x] > (2^31)-1) {
   res[x] = (2^31)-1;
   OV = 1;
} else if (res[x] < -2^31) {
   res[x] = -2^31;
   OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0</pre>
```

# **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMMAWT2 (long t, unsigned long a, unsigned long b)
KMMAWT2 (SIMD Saturating MSW Signed Multiply Word and Top Half & 2 and Add)

Type: SIMD

```
KMMAWT2 Rd, Rs1, Rs2
KMMAWT2.u Rd, Rs1, Rs2
```

### **Purpose:**

Multiply the signed 32-bit elements of one register and the top 16-bit of the corresponding 32-bit elements of another register, double the multiplication results and add the saturated most significant 32-bit results with the corresponding signed 32-bit elements of a third register. The saturated addition result is written to the corresponding 32-bit elements of the third register. The .u form rounds up the multiplication results from the most significant discarded bit before the addition operations.

### **Description**:

This instruction multiplies the signed 32-bit Q31 elements of Rs1 with the signed top 16-bit Q15 content of the corresponding 32-bit elements of Rs2, doubles the Q46 results to Q47 numbers and adds the saturated most significant 32-bit Q31 multiplication results with the corresponding signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range ( $-2^31 \le 2^31 \le$ 

# **Operations:**

```
if ((Rs1.W[x] == 0x80000000) & (Rs2.W[x].H[1] == 0x8000)) {
 addop.W[x] = 0x7fffffff;
 OV = 1;
} else {
 Mres[x][47:0] = Rs1.W[x] s* Rs2.W[x].H[1];
 if (`.u` form) {
   Mres[x][47:14] = Mres[x][47:14] + 1;
 addop.W[x] = Mres[x][46:15]; // doubling
res[x] = Rd.W[x] + addop.W[x];
if (res[x] > (2^31)-1) {
 res[x] = (2^31) -1;
 OV = 1;
else if (res[x] < -2^31) {
 res[x] = -2^31;
 OV = 1;
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0
```

# **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMMAWT2\_U (long t, unsigned long a, unsigned long b) KMMAWT2.u (SIMD Saturating MSW Signed Multiply Word and Top Half & 2 and Add with Rounding)

Type: SIMD

```
KMMAWT2 Rd, Rs1, Rs2
KMMAWT2.u Rd, Rs1, Rs2
```

### **Purpose**:

Multiply the signed 32-bit elements of one register and the top 16-bit of the corresponding 32-bit elements of another register, double the multiplication results and add the saturated most significant 32-bit results with the corresponding signed 32-bit elements of a third register. The saturated addition result is written to the corresponding 32-bit elements of the third register. The .u form rounds up the multiplication results from the most significant discarded bit before the addition operations.

### **Description**:

This instruction multiplies the signed 32-bit Q31 elements of Rs1 with the signed top 16-bit Q15 content of the corresponding 32-bit elements of Rs2, doubles the Q46 results to Q47 numbers and adds the saturated most significant 32-bit Q31 multiplication results with the corresponding signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range ( $-2^31 \le 2^31 \le$ 

## **Operations:**

```
if ((Rs1.W[x] == 0x80000000) & (Rs2.W[x].H[1] == 0x8000)) {
 addop.W[x] = 0x7fffffff;
 OV = 1;
} else {
 Mres[x][47:0] = Rs1.W[x] s* Rs2.W[x].H[1];
 if (`.u` form) {
   Mres[x][47:14] = Mres[x][47:14] + 1;
 addop.W[x] = Mres[x][46:15]; // doubling
res[x] = Rd.W[x] + addop.W[x];
if (res[x] > (2^31)-1) {
 res[x] = (2^31) -1;
 OV = 1;
else if (res[x] < -2^31) {
 res[x] = -2^31;
 OV = 1;
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KMMWB2 (long a, unsigned long b) KMMWB2 (SIMD Saturating MSW Signed Multiply Word and Bottom Half & 2)
```

Type: SIMD

```
KMMWB2 Rd, Rs1, Rs2
KMMWB2.u Rd, Rs1, Rs2
```

### Purpose:

Multiply the signed 32-bit integer elements of one register and the bottom 16-bit of the corresponding 32-bit elements of another register, double the multiplication results and write the saturated most significant 32-bit results to the corresponding 32-bit elements of a register. The .u form rounds up the results from the most significant discarded bit.

### **Description:**

This instruction multiplies the signed 32-bit Q31 elements of Rs1 with the signed bottom 16-bit Q15 content of the corresponding 32-bit elements of Rs2, doubles the Q46 results to Q47 numbers and writes the saturated most significant 32-bit Q31 multiplication results to the corresponding 32-bit elements of Rd. The .u form of the instruction rounds up the most significant 32-bit of the 48-bit Q47 multiplication results by adding a 1 to bit 15 (i.e., bit 14 before doubling) of the results.

# **Operations**:

```
if ((Rs1.W[x] == 0x80000000) & (Rs2.W[x].H[0] == 0x8000)) {
   Rd.W[x] = 0x7fffffff;
   OV = 1;
} else {
   Mres[x][47:0] = Rs1.W[x] s* Rs2.W[x].H[0];
   if (`.u` form) {
      Round[x][32:0] = Mres[x][46:14] + 1;
      Rd.W[x] = Round[x][32:1];
   } else {
      Rd.W[x] = Mres[x][46:15];
   }
for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- a [in] long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in long type

```
__STATIC_FORCEINLINE long __RV_KMMWB2_U (long a, unsigned long b)
KMMWB2.u (SIMD Saturating MSW Signed Multiply Word and Bottom Half & 2 with Rounding)
```

Type: SIMD

## Syntax:

```
KMMWB2 Rd, Rs1, Rs2
KMMWB2.u Rd, Rs1, Rs2
```

## Purpose:

Multiply the signed 32-bit integer elements of one register and the bottom 16-bit of the corresponding 32-bit elements of another register, double the multiplication results and write the saturated most significant 32-bit results to the corresponding 32-bit elements of a register. The .u form rounds up the results from the most significant discarded bit.

### **Description**:

This instruction multiplies the signed 32-bit Q31 elements of Rs1 with the signed bottom 16-bit Q15 content of the corresponding 32-bit elements of Rs2, doubles the Q46 results to Q47 numbers and writes the saturated most significant 32-bit Q31 multiplication results to the corresponding 32-bit elements of Rd. The .u form of the instruction rounds up the most significant 32-bit of the 48-bit Q47 multiplication results by adding a 1 to bit 15 (i.e., bit 14 before doubling) of the results.

### **Operations:**

```
if ((Rs1.W[x] == 0x80000000) & (Rs2.W[x].H[0] == 0x8000)) {
   Rd.W[x] = 0x7fffffff;
   OV = 1;
} else {
   Mres[x][47:0] = Rs1.W[x] s* Rs2.W[x].H[0];
   if (`.u` form) {
      Round[x][32:0] = Mres[x][46:14] + 1;
      Rd.W[x] = Round[x][32:1];
   } else {
      Rd.W[x] = Mres[x][46:15];
   }
} for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- a [in] long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

```
_STATIC_FORCEINLINE long __RV_KMMWT2 (long a, unsigned long b)
KMMWT2 (SIMD Saturating MSW Signed Multiply Word and Top Half & 2)
```

Type: SIMD

### Syntax:

```
KMMWT2 Rd, Rs1, Rs2
KMMWT2.u Rd, Rs1, Rs2
```

## Purpose:

Multiply the signed 32-bit integer elements of one register and the top 16-bit of the corresponding 32-bit elements of another register, double the multiplication results and write the saturated most significant 32-bit results to the corresponding 32-bit elements of a register. The .u form rounds up the results from the most significant discarded bit.

# **Description**:

This instruction multiplies the signed 32-bit Q31 elements of Rs1 with the signed top 16-bit Q15 content of the corresponding 32-bit elements of Rs2, doubles the Q46 results to Q47 numbers and writes the saturated most significant 32-bit Q31 multiplication results to the corresponding 32-bit elements of Rd. The .u form of the instruction rounds up the most significant 32-bit of the 48-bit Q47 multiplication results by adding a 1 to bit 15 (i.e., bit 14 before doubling) of the results.

## **Operations:**

```
if ((Rs1.W[x] == 0x80000000) & (Rs2.W[x].H[1] == 0x8000)) {
   Rd.W[x] = 0x7fffffff;
   OV = 1;
} else {
   Mres[x][47:0] = Rs1.W[x] s* Rs2.W[x].H[1];
   if (`.u` form) {
      Round[x][32:0] = Mres[x][46:14] + 1;
      Rd.W[x] = Round[x][32:1];
   } else {
      Rd.W[x] = Mres[x][46:15];
   }
for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- a [in] long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KMMWT2_U (long a, unsigned long b)

KMMWT2.u (SIMD Saturating MSW Signed Multiply Word and Top Half & 2 with Rounding)
```

Type: SIMD

## Syntax:

```
KMMWT2 Rd, Rs1, Rs2
KMMWT2.u Rd, Rs1, Rs2
```

# **Purpose**:

Multiply the signed 32-bit integer elements of one register and the top 16-bit of the corresponding 32-bit elements of another register, double the multiplication results and write the saturated most significant 32-bit results to the corresponding 32-bit elements of a register. The .u form rounds up the results from the most significant discarded bit.

# **Description**:

This instruction multiplies the signed 32-bit Q31 elements of Rs1 with the signed top 16-bit Q15 content of the corresponding 32-bit elements of Rs2, doubles the Q46 results to Q47 numbers and writes the saturated most significant 32-bit Q31 multiplication results to the corresponding 32-bit elements of Rd. The .u form of the instruction rounds up the most significant 32-bit of the 48-bit Q47 multiplication results by adding a 1 to bit 15 (i.e., bit 14 before doubling) of the results.

## **Operations:**

```
if ((Rs1.W[x] == 0x80000000) & (Rs2.W[x].H[1] == 0x8000)) {
  Rd.W[x] = 0x7fffffff;
  OV = 1;
} else {
  Mres[x][47:0] = Rs1.W[x] s* Rs2.W[x].H[1];
  if (`.u` form) {
    Round[x][32:0] = Mres[x][46:14] + 1;
    Rd.W[x] = Round[x][32:1];
} else {
  Rd.W[x] = Mres[x][46:15];
```

(continues on next page)

(continued from previous page)

```
}
for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- a [in] long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_SMMWB (long a, unsigned long b)
SMMWB (SIMD MSW Signed Multiply Word and Bottom Half)
```

Type: SIMD

## Syntax:

```
SMMWB Rd, Rs1, Rs2
SMMWB.u Rd, Rs1, Rs2
```

## Purpose:

Multiply the signed 32-bit integer elements of one register and the bottom 16-bit of the corresponding 32-bit elements of another register, and write the most significant 32-bit results to the corresponding 32-bit elements of a register. The .u form rounds up the results from the most significant discarded bit.

## **Description:**

This instruction multiplies the signed 32-bit elements of Rs1 with the signed bottom 16-bit content of the corresponding 32-bit elements of Rs2 and writes the most significant 32-bit multiplication results to the corresponding 32-bit elements of Rd. The .u form of the instruction rounds up the most significant 32-bit of the 48-bit multiplication results by adding a 1 to bit 15 of the results.

# **Operations:**

```
Mres[x][47:0] = Rs1.W[x] * Rs2.W[x].H[0];
if (`.u` form) {
   Round[x][32:0] = Mres[x][47:15] + 1;
   Rd.W[x] = Round[x][32:1];
} else {
   Rd.W[x] = Mres[x][47:16];
}
for RV32: x=0
for RV64: x=1...0
```

## **Parameters**

- a [in] long type of value stored in a
- **b [in]** unsigned long type of value stored in b

Returns value stored in long type

```
__STATIC_FORCEINLINE long __RV_SMMWB_U (long a, unsigned long b) SMMWB.u (SIMD MSW Signed Multiply Word and Bottom Half with Rounding)
```

Type: SIMD

```
SMMWB Rd, Rs1, Rs2
SMMWB.u Rd, Rs1, Rs2
```

### Purpose:

Multiply the signed 32-bit integer elements of one register and the bottom 16-bit of the corresponding 32-bit elements of another register, and write the most significant 32-bit results to the corresponding 32-bit elements of a register. The .u form rounds up the results from the most significant discarded bit.

## **Description**:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed bottom 16-bit content of the corresponding 32-bit elements of Rs2 and writes the most significant 32-bit multiplication results to the corresponding 32-bit elements of Rd. The .u form of the instruction rounds up the most significant 32-bit of the 48-bit multiplication results by adding a 1 to bit 15 of the results.

## **Operations:**

```
Mres[x][47:0] = Rs1.W[x] * Rs2.W[x].H[0];
if (`.u` form) {
   Round[x][32:0] = Mres[x][47:15] + 1;
   Rd.W[x] = Round[x][32:1];
} else {
   Rd.W[x] = Mres[x][47:16];
}
for RV32: x=0
for RV64: x=1...0
```

# **Parameters**

- a [in] long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_SMMWT (long a, unsigned long b)
SMMWT (SIMD MSW Signed Multiply Word and Top Half)
```

Type: SIMD

### Syntax:

```
SMMWT Rd, Rs1, Rs2
SMMWT.u Rd, Rs1, Rs2
```

# Purpose :

Multiply the signed 32-bit integer elements of one register and the top 16-bit of the corresponding 32-bit elements of another register, and write the most significant 32-bit results to the corresponding 32-bit elements of a register. The .u form rounds up the results from the most significant discarded bit.

# **Description**:

This instruction multiplies the signed 32-bit elements of Rs1 with the top signed 16-bit content of the corresponding 32-bit elements of Rs2 and writes the most significant 32-bit multiplication results to the corresponding 32-bit elements of Rd. The .u form of the instruction rounds up the most significant 32-bit of the 48-bit multiplication results by adding a 1 to bit 15 of the results.

# **Operations:**

```
Mres[x][47:0] = Rs1.W[x] * Rs2.W[x].H[1];
if (`.u` form) {
  Round[x][32:0] = Mres[x][47:15] + 1;
  Rd.W[x] = Round[x][32:1];
} else {
  Rd.W[x] = Mres[x][47:16];
}
for RV32: x=0
for RV64: x=1...0
```

## **Parameters**

- a [in] long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_SMMWT_U (long a, unsigned long b)
SMMWT.u (SIMD MSW Signed Multiply Word and Top Half with Rounding)
```

Type: SIMD

## Syntax:

```
SMMWT Rd, Rs1, Rs2
SMMWT.u Rd, Rs1, Rs2
```

#### Purpose:

Multiply the signed 32-bit integer elements of one register and the top 16-bit of the corresponding 32-bit elements of another register, and write the most significant 32-bit results to the corresponding 32-bit elements of a register. The .u form rounds up the results from the most significant discarded bit.

#### **Description**:

This instruction multiplies the signed 32-bit elements of Rs1 with the top signed 16-bit content of the corresponding 32-bit elements of Rs2 and writes the most significant 32-bit multiplication results to the corresponding 32-bit elements of Rd. The .u form of the instruction rounds up the most significant 32-bit of the 48-bit multiplication results by adding a 1 to bit 15 of the results.

## **Operations:**

```
Mres[x][47:0] = Rs1.W[x] * Rs2.W[x].H[1];
if (`.u` form) {
   Round[x][32:0] = Mres[x][47:15] + 1;
   Rd.W[x] = Round[x][32:1];
} else {
   Rd.W[x] = Mres[x][47:16];
}
for RV32: x=0
for RV64: x=1...0
```

### **Parameters**

- a [in] long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

## Signed 16-bit Multiply 32-bit Add/Subtract Instructions

```
__STATIC_FORCEINLINE long __RV_KMABB (long t, unsigned long a, unsigned long b)
 _STATIC_FORCEINLINE long __RV_KMABT (long t, unsigned long a, unsigned long b)
 STATIC_FORCEINLINE long __RV_KMATT (long t, unsigned long a, unsigned long b)
 _STATIC_FORCEINLINE long __RV_KMADA (long t, unsigned long a, unsigned long b)
 _STATIC_FORCEINLINE long __RV_KMAXDA (long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_KMADS (long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_KMADRS (long t, unsigned long a, unsigned long b)
STATIC FORCEINLINE long RV KMAXDS (long t, unsigned long a, unsigned long b)
STATIC FORCEINLINE long RV KMDA (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_KMXDA (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_KMSDA (long t, unsigned long a, unsigned long b)
 _STATIC_FORCEINLINE long __RV_KMSXDA (long t, unsigned long a, unsigned long b)
 __STATIC_FORCEINLINE long __RV_SMBT16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_SMTT16 (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_SMDS (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long __RV_SMDRS (unsigned long a, unsigned long b)
 _STATIC_FORCEINLINE long __RV_SMXDS (unsigned long a, unsigned long b)
group NMSIS_Core_DSP_Intrinsic_SIGNED_16B_MULT_32B_ADDSUB
    Signed 16-bit Multiply 32-bit Add/Subtract Instructions.
```

there are 18 Signed 16-bit Multiply 32-bit Add/Subtract Instructions

## **Functions**

```
__STATIC_FORCEINLINE long __RV_KMABB (long t, unsigned long a, unsigned long b) KMABB (SIMD Saturating Signed Multiply Bottom Halfs & Add)
```

Type: SIMD

#### Syntax:

```
KMABB Rd, Rs1, Rs2
KMABT Rd, Rs1, Rs2
KMATT Rd, Rs1, Rs2
```

## Purpose :

Multiply the signed 16-bit content of 32-bit elements in a register with the 16-bit content of 32-bit elements in another register and add the result to the content of 32-bit elements in the third register. The addition result may be saturated and is written to the third register.

- KMABB: rd.W[x] + bottom\*bottom (per 32-bit element)
- KMABT rd.W[x] + bottom\*top (per 32-bit element)

• KMATT rd.W[x] + top\*top (per 32-bit element)

## **Description**:

For the KMABB instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2. For the KMABT instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. For the KMATT instruction, it multiplies the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. The multiplication result is added to the content of 32-bit elements in Rd. If the addition result is beyond the Q31 number range (-2^31  $\leq$  Q31  $\leq$  2^31-1), it is saturated to the range and the OV bit is set to

a. The results after saturation are written to Rd. The 16-bit contents of Rs1 and Rs2 are treated as signed integers.

### **Operations:**

```
res[x] = Rd.W[x] + (Rs1.W[x].H[0] * Rs2.W[x].H[0]); // KMABB
res[x] = Rd.W[x] + (Rs1.W[x].H[0] * Rs2.W[x].H[1]); // KMABT
res[x] = Rd.W[x] + (Rs1.W[x].H[1] * Rs2.W[x].H[1]); // KMATT
if (res[x] > (2^31)-1) {
   res[x] = (2^31)-1;
   OV = 1;
} else if (res[x] < -2^31) {
   res[x] = -2^31;
   OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0</pre>
```

## **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMABT (long t, unsigned long a, unsigned long b)
KMABT (SIMD Saturating Signed Multiply Bottom & Top Halfs & Add)

Type: SIMD

#### Syntax:

```
KMABB Rd, Rs1, Rs2
KMABT Rd, Rs1, Rs2
KMATT Rd, Rs1, Rs2
```

## Purpose:

Multiply the signed 16-bit content of 32-bit elements in a register with the 16-bit content of 32-bit elements in another register and add the result to the content of 32-bit elements in the third register. The addition result may be saturated and is written to the third register.

- KMABB: rd.W[x] + bottom\*bottom (per 32-bit element)
- KMABT rd.W[x] + bottom\*top (per 32-bit element)
- KMATT rd.W[x] + top\*top (per 32-bit element)

## **Description**:

For the KMABB instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2. For the KMABT instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. For the KMATT instruction, it multiplies the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. The multiplication result is added to the content of 32-bit elements in Rd. If the addition result is beyond the Q31 number range (-2^31  $\leq$  Q31  $\leq$  2^31-1), it is saturated to the range and the QV bit is set to

a. The results after saturation are written to Rd. The 16-bit contents of Rs1 and Rs2 are treated as signed integers.

## **Operations:**

```
res[x] = Rd.W[x] + (Rs1.W[x].H[0] * Rs2.W[x].H[0]); // KMABB
res[x] = Rd.W[x] + (Rs1.W[x].H[0] * Rs2.W[x].H[1]); // KMABT
res[x] = Rd.W[x] + (Rs1.W[x].H[1] * Rs2.W[x].H[1]); // KMATT
if (res[x] > (2^31)-1) {
   res[x] = (2^31)-1;
   OV = 1;
} else if (res[x] < -2^31) {
   res[x] = -2^31;
   OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0</pre>
```

### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- b [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMATT (long t, unsigned long a, unsigned long b)

KMATT (SIMD Saturating Signed Multiply Top Halfs & Add)

Type: SIMD

#### Syntax:

```
KMABB Rd, Rs1, Rs2
KMABT Rd, Rs1, Rs2
KMATT Rd, Rs1, Rs2
```

## Purpose:

Multiply the signed 16-bit content of 32-bit elements in a register with the 16-bit content of 32-bit elements in another register and add the result to the content of 32-bit elements in the third register. The addition result may be saturated and is written to the third register.

- KMABB: rd.W[x] + bottom\*bottom (per 32-bit element)
- KMABT rd.W[x] + bottom\*top (per 32-bit element)
- KMATT rd.W[x] + top\*top (per 32-bit element)

### **Description**:

For the KMABB instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2. For the KMABT instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. For the KMATT instruction, it multiplies the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. The multiplication result is added to the content of 32-bit elements in Rd. If the addition result is beyond the Q31 number range (-2^31  $\leq$  Q31  $\leq$  2^31-1), it is saturated to the range and the QV bit is set to

a. The results after saturation are written to Rd. The 16-bit contents of Rs1 and Rs2 are treated as signed integers.

## **Operations:**

```
res[x] = Rd.W[x] + (Rs1.W[x].H[0] * Rs2.W[x].H[0]); // KMABB
res[x] = Rd.W[x] + (Rs1.W[x].H[0] * Rs2.W[x].H[1]); // KMABT
res[x] = Rd.W[x] + (Rs1.W[x].H[1] * Rs2.W[x].H[1]); // KMATT
if (res[x] > (2^31)-1) {
   res[x] = (2^31)-1;
   OV = 1;
} else if (res[x] < -2^31) {
   res[x] = -2^31;
   OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0</pre>
```

### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMADA (long t, unsigned long a, unsigned long b)

KMADA (SIMD Saturating Signed Multiply Two Halfs and Two Adds)

Type: SIMD

#### Syntax:

```
KMADA Rd, Rs1, Rs2
KMAXDA Rd, Rs1, Rs2
```

#### Purpose:

Do two signed 16-bit multiplications from 32-bit elements in two registers; and then adds the two 32-bit results and 32-bit elements in a third register together. The addition result may be saturated.

- KMADA: rd.W[x] + top\*top + bottom\*bottom (per 32-bit element)
- KMAXDA: rd.W[x] + top\*bottom + bottom\*top (per 32-bit element)

## **Description:**

For the `KMADA instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then adds the result to the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. For the

KMAXDA instruction, it multiplies the top 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then adds the result to the result of multiplying the bottom 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. The result is added to the content of 32-bit elements in Rd. If the addition result is beyond the Q31 number range (- $2^3$ 1 <= Q31 <=  $2^3$ 1-1), it is saturated to the range and the OV bit is set to 1. The 32-bit results after saturation are written to Rd. The 16-bit contents of Rs1 and Rs2 are treated as signed integers.

## **Operations:**

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMAXDA (long t, unsigned long a, unsigned long b)

KMAXDA (SIMD Saturating Signed Crossed Multiply Two Halfs and Two Adds)

Type: SIMD

## Syntax:

```
KMADA Rd, Rs1, Rs2
KMAXDA Rd, Rs1, Rs2
```

## Purpose:

Do two signed 16-bit multiplications from 32-bit elements in two registers; and then adds the two 32-bit results and 32-bit elements in a third register together. The addition result may be saturated.

- KMADA: rd.W[x] + top\*top + bottom\*bottom (per 32-bit element)
- KMAXDA: rd.W[x] + top\*bottom + bottom\*top (per 32-bit element)

## **Description**:

For the `KMADA instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then adds the result to the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. For the KMAXDA instruction, it multiplies the top 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then adds the result to the result of multiplying the bottom 16-bit

content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. The result is added to the content of 32-bit elements in Rd. If the addition result is beyond the Q31 number range ( $-2^31 \le 2^31-1$ ), it is saturated to the range and the OV bit is set to 1. The 32-bit results after saturation are written to Rd. The 16-bit contents of Rs1 and Rs2 are treated as signed integers.

## **Operations:**

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMADS (long t, unsigned long a, unsigned long b)
KMADS (SIMD Saturating Signed Multiply Two Halfs & Subtract & Add)

Type: SIMD

### Syntax:

```
KMADS Rd, Rs1, Rs2
KMADRS Rd, Rs1, Rs2
KMAXDS Rd, Rs1, Rs2
```

#### Purpose:

Do two signed 16-bit multiplications from 32-bit elements in two registers; and then perform a subtraction operation between the two 32-bit results. Then add the subtraction result to the corresponding 32-bit elements in a third register. The addition result may be saturated.

- KMADS: rd.W[x] + (top\*top bottom\*bottom) (per 32-bit element)
- KMADRS: rd.W[x] + (bottom\*bottom top\*top) (per 32-bit element)
- KMAXDS: rd.W[x] + (top\*bottom bottom\*top) (per 32-bit element)

### **Description:**

For the KMADS instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. For

the KMADRS instruction, it multiplies the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2. For the KMAXDS instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2. The subtraction result is then added to the content of the corresponding 32-bit elements in Rd. If the addition result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The 32-bit results after saturation are written to Rd. The 16-bit contents of Rs1 and Rs2 are treated as signed integers.

# **Operations:**

```
// KMADS
{\tt res[x]} \ = \ {\tt Rd.W[x]} \ + \ ({\tt Rs1.W[x].H[1]} \ * \ {\tt Rs2.W[x].H[1]}) \ - \ ({\tt Rs1.W[x].H[0]} \ * \ {\tt Rs2.W[x].H[0]} \ * \ {\tt Rs2.W[x].H[
  \rightarrowW[x].H[0]);
// KMADRS
\hookrightarrow W[x].H[1]);
// KMAXDS
res[x] = Rd.W[x] + (Rs1.W[x].H[1] * Rs2.W[x].H[0]) - (Rs1.W[x].H[0] * Rs2.W[x].H[0]) - (Rs1.W[x].H[0]) + (Rs1.W[x].H[0
  \rightarrow W[x].H[1]);
if (res[x] > (2^31)^{-1}) {
                  res[x] = (2^31)-1;
                  OV = 1;
} else if (res[x] < -2^31) {
                  res[x] = -2^31;
                  OV = 1;
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMADRS (long t, unsigned long a, unsigned long b)
KMADRS (SIMD Saturating Signed Multiply Two Halfs & Reverse Subtract & Add)

Type: SIMD

## Syntax:

```
KMADS Rd, Rs1, Rs2
KMADRS Rd, Rs1, Rs2
KMAXDS Rd, Rs1, Rs2
```

#### Purpose:

Do two signed 16-bit multiplications from 32-bit elements in two registers; and then perform a subtraction operation between the two 32-bit results. Then add the subtraction result to the corresponding 32-bit elements in a third register. The addition result may be saturated.

• KMADS: rd.W[x] + (top\*top - bottom\*bottom) (per 32-bit element)

- KMADRS: rd.W[x] + (bottom\*bottom top\*top) (per 32-bit element)
- KMAXDS: rd.W[x] + (top\*bottom bottom\*top) (per 32-bit element)

## **Description:**

For the KMADS instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. For the KMADRS instruction, it multiplies the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of 32-bit elements in Rs2. For the KMAXDS instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2. The subtraction result is then added to the content of the corresponding 32-bit elements in Rd. If the addition result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The 32-bit results after saturation are written to Rd. The 16-bit contents of Rs1 and Rs2 are treated as signed integers.

## **Operations:**

```
// KMADS
res[x] = Rd.W[x] + (Rs1.W[x].H[1] * Rs2.W[x].H[1]) - (Rs1.W[x].H[0] * Rs2.W[x].H[1]) - (Rs1.W[x].H[0] * Rs2.W[x].H[1]) - (Rs1.W[x].H[1]) + (Rs1.W[x].H[1])
 \rightarrow W[x].H[0]);
// KMADRS
\rightarrowW[x].H[1]);
 // KMAXDS
\rightarrowW[x].H[1]);
if (res[x] > (2^31)-1) {
         res[x] = (2^31)-1;
         OV = 1;
} else if (res[x] < -2^31) {
         res[x] = -2^31;
         OV = 1:
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KMAXDS (long t, unsigned long a, unsigned long b)
KMAXDS (SIMD Saturating Signed Crossed Multiply Two Halfs & Subtract & Add)
```

Type: SIMD Syntax:

```
KMADS Rd, Rs1, Rs2
KMADRS Rd, Rs1, Rs2
KMAXDS Rd, Rs1, Rs2
```

### Purpose:

Do two signed 16-bit multiplications from 32-bit elements in two registers; and then perform a subtraction operation between the two 32-bit results. Then add the subtraction result to the corresponding 32-bit elements in a third register. The addition result may be saturated.

- KMADS: rd.W[x] + (top\*top bottom\*bottom) (per 32-bit element)
- KMADRS: rd.W[x] + (bottom\*bottom top\*top) (per 32-bit element)
- KMAXDS: rd.W[x] + (top\*bottom bottom\*top) (per 32-bit element)

### **Description**:

For the KMADS instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2. For the KMADRS instruction, it multiplies the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of 32-bit elements in Rs2. For the KMAXDS instruction, it multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2. The subtraction result is then added to the content of the corresponding 32-bit elements in Rd. If the addition result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The 32-bit results after saturation are written to Rd. The 16-bit contents of Rs1 and Rs2 are treated as signed integers.

# **Operations:**

```
// KMADS
res[x] = Rd.W[x] + (Rs1.W[x].H[1] * Rs2.W[x].H[1]) - (Rs1.W[x].H[0] * Rs2.W[x].H[1]) - (Rs1.W[x].H[0] * Rs2.W[x].H[1]) - (Rs1.W[x].H[1]) + (Rs1.W[x].H[1])
  \rightarrowW[x].H[0]);
// KMADRS
\hookrightarrow W[x].H[1]);
// KMAXDS
res[x] = Rd.W[x] + (Rs1.W[x].H[1] * Rs2.W[x].H[0]) - (Rs1.W[x].H[0] * Rs2.W[x].H[0] + (Rs1.W[x].H[0]) + (Rs1.W[x].H[0]
  \rightarrow W[x].H[1]);
if (res[x] > (2^31)^{-1}) {
                  res[x] = (2^31) -1;
                  OV = 1;
 } else if (res[x] < -2^31) {
                  res[x] = -2^31;
                  OV = 1:
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0
```

### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a

• **b** – [in] unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KMDA (unsigned long a, unsigned long b) KMDA (SIMD Signed Multiply Two Halfs and Add)
```

Type: SIMD

### Syntax:

```
KMDA Rd, Rs1, Rs2
KMXDA Rd, Rs1, Rs2
```

### Purpose:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then adds the two 32-bit results together. The addition result may be saturated.

- KMDA: top\*top + bottom\*bottom (per 32-bit element)
- KMXDA: top\*bottom + bottom\*top (per 32-bit element)

## **Description**:

For the KMDA instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then adds the result to the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the KMXDA instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then adds the result to the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. The addition result is checked for saturation. If saturation happens, the result is saturated to 2^31-1. The final results are written to Rd. The 16-bit contents are treated as signed integers.

### **Operations:**

```
if Rs1.W[x]
             ! = 0x80008000)
                                or (Rs2.W[x]
                                               ! =
                                                   0x80008000
                                                                        KMDA
                                                                              Rd.
\rightarrow W[x] = Rs1.W[x].H[1]
Rs2.W[x].H[1]) + (Rs1.W[x].H[0] * Rs2.W[x].H[0]; // KMXDA Rd.W[x] = Rs1.W[x].
\hookrightarrowH[1] * Rs2.W[x].H[0])
  (Rs1.W[x].H[0] * Rs2.W[x].H[1];
                                          else { Rd.W[x] = 0x7ffffffff;
                                       }
         } for RV32: x=0 for
     1;
x=1...0
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_KMXDA (unsigned long a, unsigned long b) KMXDA (SIMD Signed Crossed Multiply Two Halfs and Add)
```

Type: SIMD

#### Syntax:

```
KMDA Rd, Rs1, Rs2
KMXDA Rd, Rs1, Rs2
```

### **Purpose:**

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then adds the two 32-bit results together. The addition result may be saturated.

- KMDA: top\*top + bottom\*bottom (per 32-bit element)
- KMXDA: top\*bottom + bottom\*top (per 32-bit element)

#### **Description**:

For the KMDA instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then adds the result to the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the KMXDA instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then adds the result to the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. The addition result is checked for saturation. If saturation happens, the result is saturated to 2^31-1. The final results are written to Rd. The 16-bit contents are treated as signed integers.

### **Operations:**

### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMSDA (long t, unsigned long a, unsigned long b)

KMSDA (SIMD Saturating Signed Multiply Two Halfs & Add & Subtract)

Type: SIMD

# Syntax:

```
KMSDA Rd, Rs1, Rs2
KMSXDA Rd, Rs1, Rs2
```

### Purpose:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then subtracts the two 32-bit results from the corresponding 32-bit elements of a third register. The subtraction result may be saturated.

- KMSDA: rd.W[x] top\*top bottom\*bottom (per 32-bit element)
- KMSXDA: rd.W[x] top\*bottom bottom\*top (per 32-bit element)

### **Description:**

For the KMSDA instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the KMSXDA instruction, it

multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and multiplies the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. The two 32-bit multiplication results are then subtracted from the content of the corresponding 32- bit elements of Rd. If the subtraction result is beyond the Q31 number range ( $-2^31 \le 2^31 \le 2^31 \le 1$ ), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The 16-bit contents are treated as signed integers.

### **Operations:**

```
// KMSDA
res[x] = Rd.W[x] - (Rs1.W[x].H[1] * Rs2.W[x].H[1]) - (Rs1.W[x].H[0] * Rs2.

→W[x].H[0]);
// KMSXDA
res[x] = Rd.W[x] - (Rs1.W[x].H[1] * Rs2.W[x].H[0]) - (Rs1.W[x].H[0] * Rs2.

→W[x].H[1]);
if (res[x] > (2^31)-1) {
  res[x] = (2^31)-1;
  OV = 1;
} else if (res[x] < -2^31) {
  res[x] = -2^31;
  OV = 1;
}
Rd.W[x] = res[x];
for RV32: x=0
for RV64: x=1...0
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_KMSXDA (long t, unsigned long a, unsigned long b)

KMSXDA (SIMD Saturating Signed Crossed Multiply Two Halfs & Add & Subtract)

Type: SIMD

## Syntax:

```
KMSDA Rd, Rs1, Rs2
KMSXDA Rd, Rs1, Rs2
```

#### Purpose:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then subtracts the two 32-bit results from the corresponding 32-bit elements of a third register. The subtraction result may be saturated.

- KMSDA: rd.W[x] top\*top bottom\*bottom (per 32-bit element)
- KMSXDA: rd.W[x] top\*bottom bottom\*top (per 32-bit element)

## **Description**:

For the KMSDA instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the KMSXDA instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit

elements of Rs2 and multiplies the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. The two 32-bit multiplication results are then subtracted from the content of the corresponding 32- bit elements of Rd. If the subtraction result is beyond the Q31 number range ( $-2^31 \le 2^31 \le 2^31 \le 1$ ), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The 16-bit contents are treated as signed integers.

## **Operations:**

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_SMBB16 (unsigned long a, unsigned long b)

SMBB16 (SIMD Signed Multiply Bottom Half & Bottom Half)

Type: SIMD

# Syntax:

```
SMBB16 Rd, Rs1, Rs2
SMBT16 Rd, Rs1, Rs2
SMTT16 Rd, Rs1, Rs2
```

#### Purpose:

Multiply the signed 16-bit content of the 32-bit elements of a register with the signed 16-bit content of the 32-bit elements of another register and write the result to a third register.

- SMBB16: W[x].bottom\*W[x].bottom
- SMBT16: W[x].bottom \*W[x].top
- SMTT16: W[x].top \* W[x].top

## **Description:**

For the SMBB16 instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. For the SMBT16 instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2.

For the SMTT16 instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. The multiplication results are written to Rd. The 16-bit contents of Rs1 and Rs2 are treated as signed integers.

## **Operations:**

```
Rd.W[x] = Rs1.W[x].H[0] * Rs2.W[x].H[0]; // SMBB16
Rd.W[x] = Rs1.W[x].H[0] * Rs2.W[x].H[1]; // SMBT16
Rd.W[x] = Rs1.W[x].H[1] * Rs2.W[x].H[1]; // SMTT16

for RV32: x=0,
for RV64: x=1...0
```

# **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_SMBT16 (unsigned long a, unsigned long b)
SMBT16 (SIMD Signed Multiply Bottom Half & Top Half)
```

Type: SIMD

## Syntax:

```
SMBB16 Rd, Rs1, Rs2
SMBT16 Rd, Rs1, Rs2
SMTT16 Rd, Rs1, Rs2
```

# Purpose:

Multiply the signed 16-bit content of the 32-bit elements of a register with the signed 16-bit content of the 32-bit elements of another register and write the result to a third register.

- SMBB16: W[x].bottom\*W[x].bottom
- SMBT16: W[x].bottom \*W[x].top
- SMTT16: W[x].top \* W[x].top

## **Description**:

For the SMBB16 instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. For the SMBT16 instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMTT16 instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. The multiplication results are written to Rd. The 16-bit contents of Rs1 and Rs2 are treated as signed integers.

### **Operations:**

```
Rd.W[x] = Rs1.W[x].H[0] * Rs2.W[x].H[0]; // SMBB16
Rd.W[x] = Rs1.W[x].H[0] * Rs2.W[x].H[1]; // SMBT16
Rd.W[x] = Rs1.W[x].H[1] * Rs2.W[x].H[1]; // SMTT16

for RV32: x=0,
for RV64: x=1...0
```

### **Parameters**

• a – [in] unsigned long type of value stored in a

• **b** – [in] unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_SMTT16 (unsigned long a, unsigned long b) SMTT16 (SIMD Signed Multiply Top Half & Top Half)
```

Type: SIMD

## **Syntax:**

```
SMBB16 Rd, Rs1, Rs2
SMBT16 Rd, Rs1, Rs2
SMTT16 Rd, Rs1, Rs2
```

### **Purpose:**

Multiply the signed 16-bit content of the 32-bit elements of a register with the signed 16-bit content of the 32-bit elements of another register and write the result to a third register.

- SMBB16: W[x].bottom\*W[x].bottom
- SMBT16: W[x].bottom \*W[x].top
- SMTT16: W[x].top \* W[x].top

## **Description**:

For the SMBB16 instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. For the SMBT16 instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMTT16 instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. The multiplication results are written to Rd. The 16-bit contents of Rs1 and Rs2 are treated as signed integers.

## **Operations:**

```
Rd.W[x] = Rs1.W[x].H[0] * Rs2.W[x].H[0]; // SMBB16
Rd.W[x] = Rs1.W[x].H[0] * Rs2.W[x].H[1]; // SMBT16
Rd.W[x] = Rs1.W[x].H[1] * Rs2.W[x].H[1]; // SMTT16

for RV32: x=0,
for RV64: x=1...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_SMDS (unsigned long a, unsigned long b) SMDS (SIMD Signed Multiply Two Halfs and Subtract)
```

Type: SIMD

#### Syntax:

```
SMDS Rd, Rs1, Rs2
SMDRS Rd, Rs1, Rs2
SMXDS Rd, Rs1, Rs2
```

### **Purpose:**

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then perform a subtraction operation between the two 32-bit results.

- SMDS: top\*top bottom\*bottom (per 32-bit element)
- SMDRS: bottom\*bottom top\*top (per 32-bit element)
- SMXDS: top\*bottom bottom\*top (per 32-bit element)

# **Description**:

For the SMDS instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMDRS instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. The subtraction result is written to the corresponding 32-bit element of Rd. The 16-bit contents of multiplication are treated as signed integers.

## **Operations:**

```
* SMDS:
Rd.W[x] = (Rs1.W[x].H[1] * Rs2.W[x].H[1]) - (Rs1.W[x].H[0] * Rs2.W[x].H[0]);

* SMDRS:
Rd.W[x] = (Rs1.W[x].H[0] * Rs2.W[x].H[0]) - (Rs1.W[x].H[1] * Rs2.W[x].H[1]);

* SMXDS:
Rd.W[x] = (Rs1.W[x].H[1] * Rs2.W[x].H[0]) - (Rs1.W[x].H[0] * Rs2.W[x].H[1]);
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- b [in] unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_SMDRS (unsigned long a, unsigned long b)
SMDRS (SIMD Signed Multiply Two Halfs and Reverse Subtract)
```

Type: SIMD

## Syntax:

```
SMDS Rd, Rs1, Rs2
SMDRS Rd, Rs1, Rs2
SMXDS Rd, Rs1, Rs2
```

## Purpose:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then perform a subtraction operation between the two 32-bit results.

- SMDS: top\*top bottom\*bottom (per 32-bit element)
- SMDRS: bottom\*bottom top\*top (per 32-bit element)
- SMXDS: top\*bottom bottom\*top (per 32-bit element)

## **Description**:

For the SMDS instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMDRS instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of Rs1 with the bo

### **Operations:**

```
* SMDS:
Rd.W[x] = (Rs1.W[x].H[1] * Rs2.W[x].H[1]) - (Rs1.W[x].H[0] * Rs2.W[x].H[0]);

* SMDRS:
Rd.W[x] = (Rs1.W[x].H[0] * Rs2.W[x].H[0]) - (Rs1.W[x].H[1] * Rs2.W[x].H[1]);

* SMXDS:
Rd.W[x] = (Rs1.W[x].H[1] * Rs2.W[x].H[0]) - (Rs1.W[x].H[0] * Rs2.W[x].H[1]);
```

## **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

```
__STATIC_FORCEINLINE long __RV_SMXDS (unsigned long a, unsigned long b) SMXDS (SIMD Signed Crossed Multiply Two Halfs and Subtract)
```

Type: SIMD

Syntax:

```
SMDS Rd, Rs1, Rs2
SMDRS Rd, Rs1, Rs2
SMXDS Rd, Rs1, Rs2
```

#### **Purpose**:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then perform a subtraction operation between the two 32-bit results.

- SMDS: top\*top bottom\*bottom (per 32-bit element)
- SMDRS: bottom\*bottom top\*top (per 32-bit element)
- SMXDS: top\*bottom bottom\*top (per 32-bit element)

## **Description**:

For the SMDS instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMDRS instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. For the SMXDS instruction, it multiplies the bottom 16-bit content of the 32-bit elements of

Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. The subtraction result is written to the corresponding 32-bit element of Rd. The 16-bit contents of multiplication are treated as signed integers.

## **Operations:**

```
* SMDS:
Rd.W[x] = (Rs1.W[x].H[1] * Rs2.W[x].H[1]) - (Rs1.W[x].H[0] * Rs2.W[x].H[0]);

* SMDRS:
Rd.W[x] = (Rs1.W[x].H[0] * Rs2.W[x].H[0]) - (Rs1.W[x].H[1] * Rs2.W[x].H[1]);

* SMXDS:
Rd.W[x] = (Rs1.W[x].H[1] * Rs2.W[x].H[0]) - (Rs1.W[x].H[0] * Rs2.W[x].H[1]);
```

### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

### **Partial-SIMD Miscellaneous Instructions**

```
__STATIC_FORCEINLINE unsigned long __RV_CLRS32 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_CLO32 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_CLZ32 (unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_PBSAD (unsigned long a, unsigned long b)
__STATIC_FORCEINLINE unsigned long __RV_PBSADA (unsigned long t, unsigned long a, unsigned __RV_SCLIP32 (a, b)
__RV_UCLIP32 (a, b)
group NMSIS_Core_DSP_Intrinsic_PART_SIMD_MISC
Partial-SIMD Miscellaneous Instructions.
there are 7 Partial-SIMD Miscellaneous Instructions
```

```
__RV_SCLIP32 (a, b)
SCLIP32 (SIMD 32-bit Signed Clip Value)
Type: DSP
Syntax:

SCLIP32 Rd, Rs1, imm5u[4:0]
```

#### Purpose:

Limit the 32-bit signed integer elements of a register into a signed range simultaneously.

# **Description**:

This instruction limits the 32-bit signed integer elements stored in Rs1 into a signed integer range between 2imm5u-1 and -2imm5u, and writes the limited results to Rd. For example, if imm5u is 3, the 32-bit input values should be saturated between 7 and -8. If saturation is performed, set OV bit to 1.

# **Operations:**

```
src = Rs1.W[x];
if (src > (2^imm5u)-1) {
    src = (2^imm5u)-1;
    OV = 1;
} else if (src < -2^imm5u) {
    src = -2^imm5u;
    OV = 1;
}
Rd.W[x] = src
for RV32: x=0,
for RV64: x=1...0</pre>
```

### **Parameters**

- a [in] long type of value stored in a
- **b** [in] unsigned int type of value stored in b

Returns value stored in long type

```
\mathbf{RV}\mathbf{UCLIP32} (a, b)
```

UCLIP32 (SIMD 32-bit Unsigned Clip Value)

Type: SIMD

# Syntax:

```
UCLIP32 Rd, Rs1, imm5u[4:0]
```

# Purpose:

Limit the 32-bit signed integer elements of a register into an unsigned range simultaneously.

# **Description**:

This instruction limits the 32-bit signed integer elements stored in Rs1 into an unsigned integer range between 2imm5u-1 and 0, and writes the limited results to Rd. For example, if imm5u is 3, the 32-bit input values should be saturated between 7 and 0. If saturation is performed, set OV bit to 1.

# **Operations:**

```
src = Rs1.W[x];
if (src > (2^imm5u)-1) {
    src = (2^imm5u)-1;
    OV = 1;
} else if (src < 0) {
    src = 0;
    OV = 1;
}
Rd.W[x] = src
for RV32: x=0,
for RV64: x=1...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned int type of value stored in b

Returns value stored in unsigned long type

#### **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_CLRS32 (unsigned long a) CLRS32 (SIMD 32-bit Count Leading Redundant Sign)
```

Type: SIMD

#### Syntax:

```
CLRS32 Rd, Rs1
```

### Purpose:

Count the number of redundant sign bits of the 32-bit elements of a general register.

### **Description:**

Starting from the bits next to the sign bits of the 32-bit elements of Rs1, this instruction counts the number of redundant sign bits and writes the result to the corresponding 32- bit elements of Rd.

# **Operations:**

```
snum[x] = Rs1.W[x];
cnt[x] = 0;
for (i = 30 to 0) {
   if (snum[x](i) == snum[x](31)) {
      cnt[x] = cnt[x] + 1;
   } else {
      break;
   }
}
Rd.W[x] = cnt[x];
for RV32: x=0
for RV64: x=1...0
```

Parameters a – [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_CLO32 (unsigned long a) CLO32 (SIMD 32-bit Count Leading One)
```

Type: SIMD

#### Syntax:

```
CL032 Rd, Rs1
```

#### Purpose:

Count the number of leading one bits of the 32-bit elements of a general register.

# **Description**:

Starting from the most significant bits of the 32-bit elements of Rs1, this instruction counts the number of leading one bits and writes the results to the corresponding 32-bit elements of Rd.

#### **Operations:**

```
snum[x] = Rs1.W[x];
cnt[x] = 0;
for (i = 31 to 0) {
   if (snum[x](i) == 1) {
      cnt[x] = cnt[x] + 1;
   } else {
      break;
   }
}
Rd.W[x] = cnt[x];
for RV32: x=0
for RV64: x=1...0
```

Parameters a – [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_CLZ32 (unsigned long a) CLZ32 (SIMD 32-bit Count Leading Zero)
```

Type: SIMD

# Syntax:

```
CLZ32 Rd, Rs1
```

### Purpose:

Count the number of leading zero bits of the 32-bit elements of a general register.

#### **Description**:

Starting from the most significant bits of the 32-bit elements of Rs1, this instruction counts the number of leading zero bits and writes the results to the corresponding 32-bit elements of Rd.

# **Operations:**

```
snum[x] = Rs1.W[x];
cnt[x] = 0;
for (i = 31 to 0) {
   if (snum[x](i) == 0) {
      cnt[x] = cnt[x] + 1;
   } else {
      break;
   }
}
Rd.W[x] = cnt[x];
for RV32: x=0
for RV64: x=1...0
```

Parameters a - [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_PBSAD (unsigned long a, unsigned long b)
PBSAD (Parallel Byte Sum of Absolute Difference)
```

Type: DSP

Syntax:

```
PBSAD Rd, Rs1, Rs2
```

# Purpose:

Calculate the sum of absolute difference of unsigned 8-bit data elements.

### **Description**:

This instruction subtracts the un-signed 8-bit elements of Rs2 from those of Rs1. Then it adds the absolute value of each difference together and writes the result to Rd.

#### **Operations:**

```
absdiff[x] = ABS(Rs1.B[x] - Rs2.B[x]);
Rd = SUM(absdiff[x]);
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_PBSADA (unsigned long t, unsigned long a, unsigned long a, unsigned long a, unsigned long byte Sum of Absolute Difference Accum)

```
Type: DSP
Syntax:
```

```
PBSADA Rd, Rs1, Rs2
```

#### Purpose:

Calculate the sum of absolute difference of four unsigned 8-bit data elements and accumulate it into a register.

# **Description**:

This instruction subtracts the un-signed 8-bit elements of Rs2 from those of Rs1. It then adds the absolute value of each difference together along with the content of Rd and writes the accumulated result back to Rd.

#### **Operations:**

```
absdiff[x] = ABS(Rs1.B[x] - Rs2.B[x]);
Rd = Rd + SUM(absdiff[x]);
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- t [in] unsigned long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

# 8-bit Multiply with 32-bit Add Instructions

```
__STATIC_FORCEINLINE long __RV_SMAQA (long t, unsigned long a, unsigned long b)

__STATIC_FORCEINLINE long __RV_SMAQA_SU (long t, unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_UMAQA (unsigned long t, unsigned long a, unsigned long a, unsigned long b)

group NMSIS_Core_DSP_Intrinsic_8B_MULT_32B_ADD

8-bit Multiply with 32-bit Add Instructions
```

there are 3 8-bit Multiply with 32-bit Add Instructions

#### **Functions**

```
__STATIC_FORCEINLINE long __RV_SMAQA (long t, unsigned long a, unsigned long b) SMAQA (Signed Multiply Four Bytes with 32-bit Adds)
```

**Type**: Partial-SIMD (Reduction)

#### Syntax:

```
SMAQA Rd, Rs1, Rs2
```

#### Purpose:

Do four signed 8-bit multiplications from 32-bit chunks of two registers; and then adds the four 16-bit results and the content of corresponding 32-bit chunks of a third register together.

#### **Description**:

This instruction multiplies the four signed 8-bit elements of 32-bit chunks of Rs1 with the four signed 8-bit elements of 32-bit chunks of Rs2 and then adds the four results together with the signed content of the corresponding 32-bit chunks of Rd. The final results are written back to the corresponding 32-bit chunks in Rd.

#### **Operations:**

```
res[x] = Rd.W[x] +
    (Rs1.W[x].B[3] s* Rs2.W[x].B[3]) + (Rs1.W[x].B[2] s* Rs2.W[x].B[2]) +
    (Rs1.W[x].B[1] s* Rs2.W[x].B[1]) + (Rs1.W[x].B[0] s* Rs2.W[x].B[0]);
Rd.W[x] = res[x];
for RV32: x=0,
for RV64: x=1,0
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE long \_\_RV\_SMAQA\_SU (long t, unsigned long a, unsigned long b) SMAQA.SU (Signed and Unsigned Multiply Four Bytes with 32-bit Adds)

**Type**: Partial-SIMD (Reduction)

Syntax:

```
SMAQA.SU Rd, Rs1, Rs2
```

# Purpose:

Do four signed x unsigned 8-bit multiplications from 32-bit chunks of two registers; and then adds the four 16-bit results and the content of corresponding 32-bit chunks of a third register together.

# **Description**:

This instruction multiplies the four signed 8-bit elements of 32-bit chunks of Rs1 with the four unsigned 8-bit elements of 32-bit chunks of Rs2 and then adds the four results together with the signed content of the corresponding 32-bit chunks of Rd. The final results are written back to the corresponding 32-bit chunks in Rd.

# **Operations:**

```
res[x] = Rd.W[x] +
    (Rs1.W[x].B[3] su* Rs2.W[x].B[3]) + (Rs1.W[x].B[2] su* Rs2.W[x].B[2]) +
    (Rs1.W[x].B[1] su* Rs2.W[x].B[1]) + (Rs1.W[x].B[0] su* Rs2.W[x].B[0]);
Rd.W[x] = res[x];
for RV32: x=0,
for RV64: x=1...0
```

#### **Parameters**

- t [in] long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UMAQA (unsigned long t, unsigned long a, unsig UMAQA (Unsigned Multiply Four Bytes with 32- bit Adds)

```
Type: DSP
Syntax:
```

```
UMAQA Rd, Rs1, Rs2
```

# Purpose:

Do four unsigned 8-bit multiplications from 32-bit chunks of two registers; and then adds the four 16-bit results and the content of corresponding 32-bit chunks of a third register together.

#### **Description**:

This instruction multiplies the four unsigned 8-bit elements of 32-bit chunks of Rs1 with the four unsigned 8-bit elements of 32-bit chunks of Rs2 and then adds the four results together with the unsigned content of the corresponding 32-bit chunks of Rd. The final results are written back to the corresponding 32-bit chunks in Rd.

# **Operations:**

#### **Parameters**

- t [in] unsigned long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

group NMSIS\_Core\_DSP\_Intrinsic\_PART\_SIMD\_DATA\_PROCESS Partial-SIMD Data Processing Instructions.

#### 64-bit Profile Instructions

#### 64-bit Addition & Subtraction Instructions

```
__STATIC_FORCEINLINE unsigned long long __RV_ADD64 (unsigned long long a, unsigned long long)
 _STATIC_FORCEINLINE long long __RV_KADD64 (long long a, long long b)
__STATIC_FORCEINLINE long long __RV_KSUB64 (long long a, long long b)
__STATIC_FORCEINLINE long long __RV_RADD64 (long long a, long long b)
 _STATIC_FORCEINLINE long long __RV_RSUB64 (long long a, long long b)
 _STATIC_FORCEINLINE unsigned long long \_RV\_SUB64 (unsigned long long a, unsigned long long
 _STATIC_FORCEINLINE unsigned long long __RV_UKADD64 (unsigned long long a, unsigned long
__STATIC_FORCEINLINE unsigned long long __RV_UKSUB64 (unsigned long long a, unsigned long
__STATIC_FORCEINLINE unsigned long long __RV_URADD64 (unsigned long long a, unsigned long l
__STATIC_FORCEINLINE unsigned long long __RV_URSUB64 (unsigned long long a, unsigned long :
group NMSIS_Core_DSP_Intrinsic_64B_ADDSUB
    64-bit Addition & Subtraction Instructions
```

there are 10 64-bit Addition & Subtraction Instructions.

#### **Functions**

```
STATIC FORCEINLINE unsigned long long RV ADD64 (unsigned long long a, unsigned lon
 ADD64 (64-bit Addition)
```

Type: 64-bit Profile

# Syntax:

```
ADD64 Rd, Rs1, Rs2
```

#### Purpose:

Add two 64-bit signed or unsigned integers.

# **RV32 Description:**

This instruction adds the 64-bit integer of an even/odd pair of registers specified by Rs1(4,1) with the 64-bit integer of an even/odd pair of registers specified by Rs2(4,1), and then writes the 64-bit result to an even/odd pair of registers specified by Rd(4,1). Rx(4,1), i.e., value d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the

pair contains the high 32-bit of the result and the even 2d register of the pair contains the low 32-bit of the result.

#### **RV64 Description:**

This instruction has the same behavior as the ADD instruction in RV64I.

#### Note:

This instruction can be used for either signed or unsigned addition.

# **Operations:**

```
RV32:
    t_L = CONCAT(Rd(4,1),1'b0);    t_H = CONCAT(Rd(4,1),1'b1);
    a_L = CONCAT(Rs1(4,1),1'b0);    a_H = CONCAT(Rs1(4,1),1'b1);
    b_L = CONCAT(Rs2(4,1),1'b0);    b_H = CONCAT(Rs2(4,1),1'b1);
    R[t_H].R[t_L] = R[a_H].R[a_L] + R[b_H].R[b_L];
RV64:
    Rd = Rs1 + Rs2;
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- **b** [in] unsigned long long type of value stored in b

**Returns** value stored in unsigned long long type

```
__STATIC_FORCEINLINE long long __RV_KADD64 (long long a, long long b) KADD64 (64-bit Signed Saturating Addition)
```

**Type**: DSP (64-bit Profile)

#### Syntax:

```
KADD64 Rd, Rs1, Rs2
```

# Purpose:

Add two 64-bit signed integers. The result is saturated to the Q63 range.

### **RV32 Description:**

This instruction adds the 64-bit signed integer of an even/odd pair of registers specified by Rs1(4,1) with the 64-bit signed integer of an even/odd pair of registers specified by Rs2(4,1). If the 64-bit result is beyond the Q63 number range (- $2^63 \le 2^63-1$ ), it is saturated to the range and the OV bit is set to 1. The saturated result is written to an even/odd pair of registers specified by Rd(4,1). Rx(4,1), i.e., value d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the result and the even 2d register of the pair contains the low 32-bit of the result.

# **RV64 Description:**

This instruction adds the 64-bit signed integer in Rs1 with the 64-bit signed integer in Rs2. If the result is beyond the Q63 number range ( $-2^63 \le Q63 \le 2^63-1$ ), it is saturated to the range and the OV bit is set to 1. The saturated result is written to Rd.

#### **Operations:**

```
RV32:

t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);

a_L = CONCAT(Rs1(4,1),1'b0); a_H = CONCAT(Rs1(4,1),1'b1);
```

(continues on next page)

```
b_L = CONCAT(Rs2(4,1),1'b0); b_H = CONCAT(Rs2(4,1),1'b1);
result = R[a_H].R[a_L] + R[b_H].R[b_L];
if (result > (2^63)-1) {
    result = (2^63)-1; OV = 1;
} else if (result < -2^63) {
    result = -2^63; OV = 1;
}
R[t_H].R[t_L] = result;
RV64:
result = Rs1 + Rs2;
if (result > (2^63)-1) {
    result = (2^63)-1; OV = 1;
} else if (result < -2^63) {
    result = -2^63; OV = 1;
}
Rd = result;</pre>
```

#### **Parameters**

- a [in] long long type of value stored in a
- b [in] long long type of value stored in b

**Returns** value stored in long long type

```
__STATIC_FORCEINLINE long long __RV_KSUB64 (long long a, long long b) KSUB64 (64-bit Signed Saturating Subtraction)
```

Type: DSP (64-bit Profile)

# Syntax:

```
KSUB64 Rd, Rs1, Rs2
```

#### Purpose:

Perform a 64-bit signed integer subtraction. The result is saturated to the Q63 range.

# **RV32 Description:**

This instruction subtracts the 64-bit signed integer of an even/odd pair of registers specified by Rs2(4,1) from the 64-bit signed integer of an even/odd pair of registers specified by Rs1(4,1). If the 64-bit result is beyond the Q63 number range (-2^63 <= Q63 <= 2^63-1), it is saturated to the range and the OV bit is set to 1. The saturated result is then written to an even/odd pair of registers specified by Rd(4,1). Rx(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

### **RV64 Description:**

This instruction subtracts the 64-bit signed integer of Rs2 from the 64-bit signed integer of Rs1. If the 64-bit result is beyond the Q63 number range ( $-2^63 \le 2^63-1$ ), it is saturated to the range and the OV bit is set to 1. The saturated result is then written to Rd.

# **Operations:**

```
RV32:

t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);

a_L = CONCAT(Rs1(4,1),1'b0); a_H = CONCAT(Rs1(4,1),1'b1);
```

(continues on next page)

```
b_L = CONCAT(Rs2(4,1),1'b0); b_H = CONCAT(Rs2(4,1),1'b1);
result = R[a_H].R[a_L] - R[b_H].R[b_L];
if (result > (2^63)-1) {
    result = (2^63)-1; OV = 1;
} else if (result < -2^63) {
    result = -2^63; OV = 1;
}
R[t_H].R[t_L] = result;
RV64:
result = Rs1 - Rs2;
if (result > (2^63)-1) {
    result = (2^63)-1; OV = 1;
} else if (result < -2^63) {
    result = -2^63; OV = 1;
}
Rd = result;</pre>
```

#### **Parameters**

- a [in] long long type of value stored in a
- b [in] long long type of value stored in b

**Returns** value stored in long long type

```
__STATIC_FORCEINLINE long long __RV_RADD64 (long long a, long long b)
RADD64 (64-bit Signed Halving Addition)
```

Type: DSP (64-bit Profile)

# Syntax:

```
RADD64 Rd, Rs1, Rs2
```

#### Purpose:

Add two 64-bit signed integers. The result is halved to avoid overflow or saturation.

# **RV32 Description:**

This instruction adds the 64-bit signed integer of an even/odd pair of registers specified by Rs1(4,1) with the 64-bit signed integer of an even/odd pair of registers specified by Rs2(4,1). The 64-bit addition result is first arithmetically right-shifted by 1 bit and then written to an even/odd pair of registers specified by Rd(4,1). Rx(4,1), i.e., value d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the result and the even 2d register of the pair contains the low 32-bit of the result.

#### **RV64 Description:**

This instruction adds the 64-bit signed integer in Rs1 with the 64-bit signed integer in Rs2. The 64-bit addition result is first arithmetically right-shifted by 1 bit and then written to Rd.

#### **Operations:**

```
RV32:

t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);

a_L = CONCAT(Rs1(4,1),1'b0); a_H = CONCAT(Rs1(4,1),1'b1);

b_L = CONCAT(Rs2(4,1),1'b0); b_H = CONCAT(Rs2(4,1),1'b1);

R[t_H].R[t_L] = (R[a_H].R[a_L] + R[b_H].R[b_L]) s>> 1;
```

(continues on next page)

```
RV64:
Rd = (Rs1 + Rs2) s>> 1;
```

#### **Parameters**

- a [in] long long type of value stored in a
- **b** [in] long long type of value stored in b

**Returns** value stored in long long type

```
__STATIC_FORCEINLINE long long __RV_RSUB64 (long long a, long long b)
RSUB64 (64-bit Signed Halving Subtraction)
```

**Type**: DSP (64-bit Profile)

#### Syntax:

```
RSUB64 Rd, Rs1, Rs2
```

#### Purpose:

Perform a 64-bit signed integer subtraction. The result is halved to avoid overflow or saturation.

#### **RV32 Description**:

This instruction subtracts the 64-bit signed integer of an even/odd pair of registers specified by Rb(4,1) from the 64-bit signed integer of an even/odd pair of registers specified by Ra(4,1). The subtraction result is first arithmetically right-shifted by 1 bit and then written to an even/odd pair of registers specified by Rt(4,1). Rx(4,1), i.e., value d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the result and the even 2d register of the pair contains the low 32-bit of the result.

# **RV64 Description:**

This instruction subtracts the 64-bit signed integer in Rs2 from the 64-bit signed integer in Rs1. The 64-bit subtraction result is first arithmetically right-shifted by 1 bit and then written to Rd.

#### **Operations:**

```
RV32:
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
a_L = CONCAT(Rs1(4,1),1'b0); a_H = CONCAT(Rs1(4,1),1'b1);
b_L = CONCAT(Rs2(4,1),1'b0); b_H = CONCAT(Rs2(4,1),1'b1);
R[t_H].R[t_L] = (R[a_H].R[a_L] - R[b_H].R[b_L]) s>> 1;
RV64:
Rd = (Rs1 - Rs2) s>> 1;
```

# **Parameters**

- a [in] long long type of value stored in a
- **b** [in] long long type of value stored in b

**Returns** value stored in long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_SUB64 (unsigned long long a, unsigned lone SUB64 (64-bit Subtraction)

**Type**: DSP (64-bit Profile)

Syntax:

```
SUB64 Rd, Rs1, Rs2
```

# Purpose:

Perform a 64-bit signed or unsigned integer subtraction.

#### **RV32 Description**:

This instruction subtracts the 64-bit integer of an even/odd pair of registers specified by Rs2(4,1) from the 64-bit integer of an even/odd pair of registers specified by Rs1(4,1), and then writes the 64-bit result to an even/odd pair of registers specified by Rd(4,1). Rx(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

# **RV64 Description**:

This instruction subtracts the 64-bit integer of Rs2 from the 64-bit integer of Rs1, and then writes the 64-bit result to Rd.

#### Note:

This instruction can be used for either signed or unsigned subtraction.

# **Operations:**

```
* RV32:

t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);

a_L = CONCAT(Rs1(4,1),1'b0); a_H = CONCAT(Rs1(4,1),1'b1);

b_L = CONCAT(Rs2(4,1),1'b0); b_H = CONCAT(Rs2(4,1),1'b1);

R[t_H].R[t_L] = R[a_H].R[a_L] - R[b_H].R[b_L];

* RV64:

Rd = Rs1 - Rs2;
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- **b** [in] unsigned long long type of value stored in b

Returns value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_UKADD64 (unsigned long long a, unsigned loud UKADD64 (64-bit Unsigned Saturating Addition)

**Type**: DSP (64-bit Profile)

# Syntax:

```
UKADD64 Rd, Rs1, Rs2
```

#### Purpose:

Add two 64-bit unsigned integers. The result is saturated to the U64 range.

# **RV32 Description:**

This instruction adds the 64-bit unsigned integer of an even/odd pair of registers specified by Rs1(4,1) with the 64-bit unsigned integer of an even/odd pair of registers specified by Rs2(4,1). If the 64-bit result is beyond the U64 number range (0 <= U64 <=  $2^64-1$ ), it is saturated to the range and the OV bit is set to 1. The saturated result is written to an even/odd pair of registers specified by Rd(4,1). Rx(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and

2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the result and the even 2d register of the pair contains the low 32-bit of the result.

#### **RV64 Description:**

This instruction adds the 64-bit unsigned integer in Rs1 with the 64-bit unsigned integer in Rs2. If the 64-bit result is beyond the U64 number range ( $0 \le U64 \le 2^64-1$ ), it is saturated to the range and the OV bit is set to 1. The saturated result is written to Rd.

#### **Operations:**

```
* RV32:
t_L = CONCAT(Rt(4,1),1'b0); t_H = CONCAT(Rt(4,1),1'b1);
a_L = CONCAT(Ra(4,1),1'b0); a_H = CONCAT(Ra(4,1),1'b1);
b_L = CONCAT(Rb(4,1),1'b0); b_H = CONCAT(Rb(4,1),1'b1);
result = R[a_H].R[a_L] + R[b_H].R[b_L];
if (result > (2^64)-1) {
   result = (2^64)-1; OV = 1;
}
R[t_H].R[t_L] = result;
* RV64:
result = Rs1 + Rs2;
if (result > (2^64)-1) {
   result = (2^64)-1; OV = 1;
}
Rd = result;
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- b [in] unsigned long long type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_UKSUB64 (unsigned long long a, unsigned l UKSUB64 (64-bit Unsigned Saturating Subtraction)

**Type**: DSP (64-bit Profile)

#### Syntax:

```
UKSUB64 Rd, Rs1, Rs2
```

# Purpose:

Perform a 64-bit signed integer subtraction. The result is saturated to the U64 range.

#### **RV32 Description**:

This instruction subtracts the 64-bit unsigned integer of an even/odd pair of registers specified by Rs2(4,1) from the 64-bit unsigned integer of an even/odd pair of registers specified by Rs1(4,1). If the 64-bit result is beyond the U64 number range (0 <= U64 <=  $2^64$ -1), it is saturated to the range and the OV bit is set to 1. The saturated result is then written to an even/odd pair of registers specified by Rd(4,1). Rx(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

#### **RV64 Description**:

This instruction subtracts the 64-bit unsigned integer of Rs2 from the 64-bit unsigned integer of an even/odd pair of Rs1. If the 64-bit result is beyond the U64 number range ( $0 \le U64 \le 2^64-1$ ), it is saturated to the range and the OV bit is set to 1. The saturated result is then written to Rd.

# **Operations:**

```
* RV32:
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
a_L = CONCAT(Rs1(4,1),1'b0); a_H = CONCAT(Rs1(4,1),1'b1);
b_L = CONCAT(Rs2(4,1),1'b0); b_H = CONCAT(Rs2(4,1),1'b1);
result = R[a_H].R[a_L] - R[b_H].R[b_L];
if (result < 0) {
    result = 0; OV = 1;
}
R[t_H].R[t_L] = result;
* RV64
result = Rs1 - Rs2;
if (result < 0) {
    result = 0; OV = 1;
}
Rd = result;</pre>
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- **b** [in] unsigned long long type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_URADD64 (unsigned long long a, unsigned l URADD64 (64-bit Unsigned Halving Addition)

**Type**: DSP (64-bit Profile)

# Syntax:

```
URADD64 Rd, Rs1, Rs2
```

### Purpose:

Add two 64-bit unsigned integers. The result is halved to avoid overflow or saturation.

# **RV32 Description**:

This instruction adds the 64-bit unsigned integer of an even/odd pair of registers specified by Rs1(4,1) with the 64-bit unsigned integer of an even/odd pair of registers specified by Rs2(4,1). The 64-bit addition result is first logically right-shifted by 1 bit and then written to an even/odd pair of registers specified by Rd(4,1). Rx(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the result and the even 2d register of the pair contains the low 32-bit of the result.

# **RV64 Description:**

This instruction adds the 64-bit unsigned integer in Rs1 with the 64-bit unsigned integer Rs2. The 64-bit addition result is first logically right-shifted by 1 bit and then written to Rd.

#### **Operations:**

```
* RV32:
t_L = CONCAT(Rt(4,1),1'b0); t_H = CONCAT(Rt(4,1),1'b1);
```

(continues on next page)

```
a_L = CONCAT(Ra(4,1),1'b0); a_H = CONCAT(Ra(4,1),1'b1);
b_L = CONCAT(Rb(4,1),1'b0); b_H = CONCAT(Rb(4,1),1'b1);
R[t_H].R[t_L] = (R[a_H].R[a_L] + R[b_H].R[b_L]) u>> 1;
* RV64:
Rd = (Rs1 + Rs2) u>> 1;
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- **b** [in] unsigned long long type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_URSUB64 (unsigned long long a, unsigned l URSUB64 (64-bit Unsigned Halving Subtraction)

Type: DSP (64-bit Profile)

# Syntax:

```
URSUB64 Rd, Rs1, Rs2
```

# Purpose:

Perform a 64-bit unsigned integer subtraction. The result is halved to avoid overflow or saturation.

#### **RV32 Description**:

This instruction subtracts the 64-bit unsigned integer of an even/odd pair of registers specified by Rs2(4,1) from the 64-bit unsigned integer of an even/odd pair of registers specified by Rs1(4,1). The subtraction result is first logically right-shifted by 1 bit and then written to an even/odd pair of registers specified by Rd(4,1). Rx(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the result and the even 2d register of the pair contains the low 32-bit of the result.

# **RV64 Description:**

This instruction subtracts the 64-bit unsigned integer in Rs2 from the 64-bit unsigned integer in Rs1. The subtraction result is first logically right-shifted by 1 bit and then written to Rd.

# **Operations:**

```
* RV32:

t_L = CONCAT(Rt(4,1),1'b0); t_H = CONCAT(Rt(4,1),1'b1);

a_L = CONCAT(Ra(4,1),1'b0); a_H = CONCAT(Ra(4,1),1'b1);

b_L = CONCAT(Rb(4,1),1'b0); b_H = CONCAT(Rb(4,1),1'b1);

R[t_H].R[t_L] = (R[a_H].R[a_L] - R[b_H].R[b_L]) u>> 1;

* RV64:

Rd = (Rs1 - Rs2) u>> 1;
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- b [in] unsigned long long type of value stored in b

**Returns** value stored in unsigned long long type

#### SIMD 8-bit Addition & Subtraction Instructions

```
__STATIC_FORCEINLINE unsigned long __RV_ADD8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_KADD8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_KSUB8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_RADD8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_RSUB8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_SUB8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_UKADD8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_UKSUB8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_URSUB8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_URSUB8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_URSUB8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_URSUB8 (unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_URSUB8 (unsigned long a, unsigned long b)
```

SIMD 8-bit Addition & Subtraction Instructions.

Based on the types of the four 8-bit arithmetic operations, the SIMD 8-bit add/subtract instructions can be classified into 2 main categories: Addition (four 8-bit addition), and Subtraction (four 8-bit subtraction). Based on the way of how an overflow condition is handled for singed or unsigned operation, the SIMD 8-bit add/subtract instructions can be classified into 5 groups: Wrap-around (dropping overflow), Signed Halving (keeping overflow by dropping 1 LSB bit), Unsigned Halving, Signed Saturation (clipping overflow), and Unsigned Saturation. Together, there are 10 SIMD 8-bit add/subtract instructions.

#### **Functions**

```
__STATIC_FORCEINLINE unsigned long __RV_ADD8 (unsigned long a, unsigned long b)
    ADD8 (SIMD 8-bit Addition)

Type: SIMD

Syntax:

ADD8 Rd, Rs1, Rs2
```

#### Purpose:

Do 8-bit integer element additions simultaneously.

#### **Description:**

This instruction adds the 8-bit integer elements in Rs1 with the 8-bit integer elements in Rs2, and then writes the 8-bit element results to Rd.

#### Note:

This instruction can be used for either signed or unsigned addition.

#### **Operations:**

```
Rd.B[x] = Rs1.B[x] + Rs2.B[x];

for RV32: x=3...0,

for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KADD8 (unsigned long a, unsigned long b) KADD8 (SIMD 8-bit Signed Saturating Addition)

Type: SIMD

# Syntax:

```
KADD8 Rd, Rs1, Rs2
```

### Purpose:

Do 8-bit signed integer element saturating additions simultaneously.

# **Description:**

This instruction adds the 8-bit signed integer elements in Rs1 with the 8-bit signed integer elements in Rs2. If any of the results are beyond the Q7 number range ( $-2^7 \le Q7 \le 2^7-1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

# **Operations:**

```
res[x] = Rs1.B[x] + Rs2.B[x];
if (res[x] > 127) {
    res[x] = 127;
    OV = 1;
} else if (res[x] < -128) {
    res[x] = -128;
    OV = 1;
}
Rd.B[x] = res[x];
for RV32: x=3...0,
for RV64: x=7...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_KSUB8 (unsigned long a, unsigned long b) KSUB8 (SIMD 8-bit Signed Saturating Subtraction)

Type: SIMD

#### Syntax:

```
KSUB8 Rd, Rs1, Rs2
```

# Purpose:

Do 8-bit signed elements saturating subtractions simultaneously.

#### **Description**:

This instruction subtracts the 8-bit signed integer elements in Rs2 from the 8-bit signed integer elements in Rs1. If any of the results are beyond the Q7 number range  $(-2^7 \le Q7 \le 27 - 1)$ , they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

# **Operations:**

```
res[x] = Rs1.B[x] - Rs2.B[x];
if (res[x] > (2^7)-1) {
   res[x] = (2^7)-1;
   OV = 1;
} else if (res[x] < -2^7) {
   res[x] = -2^7;
   OV = 1;
}
Rd.B[x] = res[x];
for RV32: x=3...0,
for RV64: x=7...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_RADD8 (unsigned long a, unsigned long b)
RADD8 (SIMD 8-bit Signed Halving Addition)
```

Type: SIMD

# Syntax:

```
RADD8 Rd, Rs1, Rs2
```

#### Purpose:

Do 8-bit signed integer element additions simultaneously. The element results are halved to avoid overflow or saturation.

#### **Description**:

This instruction adds the 8-bit signed integer elements in Rs1 with the 8-bit signed integer elements in Rs2. The results are first arithmetically right-shifted by 1 bit and then written to Rd.

#### **Examples**:

```
* Rs1 = 0x7F, Rs2 = 0x7F, Rd = 0x7F

* Rs1 = 0x80, Rs2 = 0x80, Rd = 0x80

* Rs1 = 0x40, Rs2 = 0x80, Rd = 0xE0
```

# **Operations**:

```
Rd.B[x] = (Rs1.B[x] + Rs2.B[x]) s>> 1; for RV32: x=3...0, for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

# \_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_RSUB8 (unsigned long a, unsigned long b) RSUB8 (SIMD 8-bit Signed Halving Subtraction)

Type: SIMD

# Syntax:

```
RSUB8 Rd, Rs1, Rs2
```

#### **Purpose:**

Do 8-bit signed integer element subtractions simultaneously. The results are halved to avoid overflow or saturation.

# **Description**:

This instruction subtracts the 8-bit signed integer elements in Rs2 from the 8-bit signed integer elements in Rs1. The results are first arithmetically right-shifted by 1 bit and then written to Rd.

#### **Examples:**

```
* Rs1 = 0x7F, Rs2 = 0x80, Rd = 0x7F

* Rs1 = 0x80, Rs2 = 0x7F, Rd = 0x80

* Rs1= 0x80, Rs2 = 0x40, Rd = 0xA0
```

# **Operations:**

```
Rd.B[x] = (Rs1.B[x] - Rs2.B[x]) s>> 1;

for RV32: x=3...0,

for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

# \_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_SUB8 (unsigned long a, unsigned long b) SUB8 (SIMD 8-bit Subtraction)

Type: SIMD

#### Syntax:

```
SUB8 Rd, Rs1, Rs2
```

#### Purpose:

Do 8-bit integer element subtractions simultaneously.

# **Description:**

This instruction subtracts the 8-bit integer elements in Rs2 from the 8-bit integer elements in Rs1, and then writes the result to Rd.

#### Note:

This instruction can be used for either signed or unsigned subtraction.

#### **Operations:**

```
Rd.B[x] = Rs1.B[x] - Rs2.B[x];

for RV32: x=3...0,

for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

Returns value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UKADD8 (unsigned long a, unsigned long b)
UKADD8 (SIMD 8-bit Unsigned Saturating Addition)

Type: SIMD

#### Syntax:

```
UKADD8 Rd, Rs1, Rs2
```

#### Purpose:

Do 8-bit unsigned integer element saturating additions simultaneously.

#### **Description**:

This instruction adds the 8-bit unsigned integer elements in Rs1 with the 8-bit unsigned integer elements in Rs2. If any of the results are beyond the 8-bit unsigned number range ( $0 \le RES \le 28-1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

# **Operations:**

```
res[x] = Rs1.B[x] + Rs2.B[x];
if (res[x] > (2^8)-1) {
  res[x] = (2^8)-1;
  OV = 1;
}
Rd.B[x] = res[x];
for RV32: x=3...0,
for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_UKSUB8 (unsigned long a, unsigned long b)
UKSUB8 (SIMD 8-bit Unsigned Saturating Subtraction)

Type: SIMD

#### Syntax:

```
UKSUB8 Rd, Rs1, Rs2
```

### Purpose:

Do 8-bit unsigned integer elements saturating subtractions simultaneously.

# **Description**:

This instruction subtracts the 8-bit unsigned integer elements in Rs2 from the 8-bit unsigned integer elements in Rs1. If any of the results are beyond the 8-bit unsigned number range ( $0 \le RES \le 28-1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

#### **Operations:**

```
res[x] = Rs1.B[x] - Rs2.B[x];
if (res[x] < 0) {
  res[x] = 0;
  OV = 1;
}
Rd.B[x] = res[x];
for RV32: x=3...0,
for RV64: x=7...0</pre>
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in unsigned long type

\_\_STATIC\_FORCEINLINE unsigned long \_\_RV\_URADD8 (unsigned long a, unsigned long b)
URADD8 (SIMD 8-bit Unsigned Halving Addition)

Type: SIMD

### Syntax:

```
URADD8 Rd, Rs1, Rs2
```

#### **Purpose:**

Do 8-bit unsigned integer element additions simultaneously. The results are halved to avoid overflow or saturation.

# **Description**:

This instruction adds the 8-bit unsigned integer elements in Rs1 with the 8-bit unsigned integer elements in Rs2. The results are first logically right-shifted by 1 bit and then written to Rd.

#### **Examples:**

```
* Ra = 0x7F, Rb = 0x7F, Rt = 0x7F

* Ra = 0x80, Rb = 0x80, Rt = 0x80

* Ra = 0x40, Rb = 0x80, Rt = 0x60
```

# **Operations:**

```
Rd.B[x] = (Rs1.B[x] + Rs2.B[x]) u>> 1;

for RV32: x=3...0,

for RV64: x=7...0
```

# **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_URSUB8 (unsigned long a, unsigned long b)
URSUB8 (SIMD 8-bit Unsigned Halving Subtraction)
```

Type: SIMD

#### Syntax:

```
URSUB8 Rd, Rs1, Rs2
```

#### Purpose:

Do 8-bit unsigned integer element subtractions simultaneously. The results are halved to avoid overflow or saturation.

### **Description:**

This instruction subtracts the 8-bit unsigned integer elements in Rs2 from the 8-bit unsigned integer elements in Rs1. The results are first logically right-shifted by 1 bit and then written to Rd.

#### **Examples:**

```
* Ra = 0x7F, Rb = 0x80 Rt = 0xFF

* Ra = 0x80, Rb = 0x7F Rt = 0x00

* Ra = 0x80, Rb = 0x40 Rt = 0x20
```

# **Operations:**

```
Rd.B[x] = (Rs1.B[x] - Rs2.B[x]) u>> 1;

for RV32: x=3...0,

for RV64: x=7...0
```

#### **Parameters**

- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

Returns value stored in unsigned long type

# Signed 16-bit Multiply 64-bit Add/Subtract Instructions

```
__STATIC_FORCEINLINE long long __RV_SMALBB (long long a, unsigned long b)
__STATIC_FORCEINLINE long long __RV_SMALBB (long long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long long __RV_SMALBT (long long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long long __RV_SMALTT (long long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long long __RV_SMALDA (long long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long long __RV_SMALXDA (long long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long long __RV_SMALDS (long long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long long __RV_SMALDRS (long long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long long __RV_SMALXDS (long long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long long __RV_SMSLDA (long long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long long __RV_SMSLDA (long long t, unsigned long a, unsigned long b)
__STATIC_FORCEINLINE long long __RV_SMSLXDA (long long t, unsigned long a, unsigned long b)
```

#### group NMSIS Core DSP Intrinsic SIGNED 16B MULT 64B ADDSUB

Signed 16-bit Multiply 64-bit Add/Subtract Instructions.

Signed 16-bit Multiply with 64-bit Add/Subtract Instructions.

there is Signed 16-bit Multiply 64-bit Add/Subtract Instructions

there are 10 Signed 16-bit Multiply with 64-bit Add/Subtract Instructions

### **Functions**

```
__STATIC_FORCEINLINE long long __RV_SMAL (long long a, unsigned long b) SMAL (Signed Multiply Halfs & Add 64-bit)
```

Type: Partial-SIMD

# Syntax:

```
SMAL Rd, Rs1, Rs2
```

# **Purpose**:

Multiply the signed bottom 16-bit content of the 32-bit elements of a register with the top 16-bit content of the same 32-bit elements of the same register, and add the results with a 64-bit value of an even/odd pair of registers (RV32) or a register (RV64). The addition result is written back to another even/odd pair of registers (RV32) or a register (RV64).

#### **RV32 Description:**

This instruction multiplies the bottom 16-bit content of the lower 32-bit of Rs2 with the top 16-bit content of the lower 32-bit of Rs2 and adds the result with the 64-bit value of an even/odd pair of registers specified by Rs1(4,1). The 64-bit addition result is written back to an even/odd pair of registers specified by Rd(4,1). The 16-bit values of Rs2, and the 64-bit value of the Rs1(4,1) register- pair are treated as signed integers. Rx(4,1), i.e., d, determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

# **RV64 Description:**

This instruction multiplies the bottom 16-bit content of the 32-bit elements of Rs2 with the top 16-bit content of the same 32-bit elements of Rs2 and adds the results with the 64-bit value of Rs1. The 64- bit addition result is written back to Rd. The 16-bit values of Rs2, and the 64-bit value of Rs1 are treated as signed integers.

### **Operations:**

```
RV32:
Mres[31:0] = Rs2.H[1] * Rs2.H[0];
Idx0 = CONCAT(Rs1(4,1),1'b0); Idx1 = CONCAT(Rs1(4,1),1'b1); +
Idx2 = CONCAT(Rd(4,1),1'b0); Idx3 = CONCAT(Rd(4,1),1'b1);
R[Idx3].R[Idx2] = R[Idx1].R[Idx0] + SE64(Mres[31:0]);
RV64:
Mres[0][31:0] = Rs2.W[0].H[1] * Rs2.W[0].H[0];
Mres[1][31:0] = Rs2.W[1].H[1] * Rs2.W[1].H[0];
Rd = Rs1 + SE64(Mres[1][31:0]) + SE64(Mres[0][31:0]);
```

# **Parameters**

• a - [in] long long type of value stored in a

• **b** – [in] unsigned long type of value stored in b

**Returns** value stored in long long type

\_\_STATIC\_FORCEINLINE long long \_\_RV\_SMALBB (long long t, unsigned long a, unsigned lon SMALBB (Signed Multiply Bottom Halfs & Add 64-bit)

**Type**: DSP (64-bit Profile)

#### Syntax:

```
SMALBB Rd, Rs1, Rs2
SMALBT Rd, Rs1, Rs2
SMALTT Rd, Rs1, Rs2
```

#### Purpose:

Multiply the signed 16-bit content of the 32-bit elements of a register with the 16-bit content of the corresponding 32-bit elements of another register and add the results with a 64-bit value of an even/odd pair of registers (RV32) or a register (RV64). The addition result is written back to the register-pair (RV32) or the register (RV64).

- SMALBB: rt pair + bottom\*bottom (all 32-bit elements)
- SMALBT rt pair + bottom\*top (all 32-bit elements)
- SMALTT rt pair + top\*top (all 32-bit elements)

#### **RV32 Description:**

For the SMALBB instruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2. For the SMALBT instruction, it multiplies the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2. For the SMALTT instruction, it multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2. The multiplication result is added with the 64-bit value of an even/odd pair of registers specified by Rd(4,1). The 64-bit addition result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 64-bit value of the register-pair are treated as signed integers. Rd(4,1), i.e., d, determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

### **RV64 Description:**

For the SMALBB instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. For the SMALBT instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMALTT instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. The multiplication results are added with the 64-bit value of Rd. The 64-bit addition result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers.

### **Operations:**

```
RV32:
Mres[31:0] = Rs1.H[0] * Rs2.H[0]; // SMALBB
Mres[31:0] = Rs1.H[0] * Rs2.H[1]; // SMALBT
Mres[31:0] = Rs1.H[1] * Rs2.H[1]; // SMALTT
Idx0 = CONCAT(Rd(4,1),1'b0); Idx1 = CONCAT(Rd(4,1),1'b1);
R[Idx1].R[Idx0] = R[Idx1].R[Idx0] + SE64(Mres[31:0]);
RV64:
// SMALBB
Mres[0][31:0] = Rs1.W[0].H[0] * Rs2.W[0].H[0];
```

(continues on next page)

```
Mres[1][31:0] = Rs1.W[1].H[0] * Rs2.W[1].H[0];
// SMALBT
Mres[0][31:0] = Rs1.W[0].H[0] * Rs2.W[0].H[1];
Mres[1][31:0] = Rs1.W[1].H[0] * Rs2.W[1].H[1];
// SMALTT
Mres[0][31:0] = Rs1.W[0].H[1] * Rs2.W[0].H[1];
Mres[1][31:0] = Rs1.W[1].H[1] * Rs2.W[1].H[1];
Rd = Rd + SE64(Mres[0][31:0]) + SE64(Mres[1][31:0]);
```

#### **Parameters**

- t [in] long long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long long type

\_\_STATIC\_FORCEINLINE long long \_\_RV\_SMALBT (long long t, unsigned long a, unsigned lone SMALBT (Signed Multiply Bottom Half & Top Half & Add 64-bit)

**Type**: DSP (64-bit Profile)

#### Syntax:

```
SMALBB Rd, Rs1, Rs2
SMALBT Rd, Rs1, Rs2
SMALTT Rd, Rs1, Rs2
```

# Purpose :

Multiply the signed 16-bit content of the 32-bit elements of a register with the 16-bit content of the corresponding 32-bit elements of another register and add the results with a 64-bit value of an even/odd pair of registers (RV32) or a register (RV64). The addition result is written back to the register-pair (RV32) or the register (RV64).

- SMALBB: rt pair + bottom\*bottom (all 32-bit elements)
- SMALBT rt pair + bottom\*top (all 32-bit elements)
- SMALTT rt pair + top\*top (all 32-bit elements)

# **RV32 Description:**

For the SMALBB instruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2. For the SMALBT instruction, it multiplies the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2. For the SMALTT instruction, it multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2. The multiplication result is added with the 64-bit value of an even/odd pair of registers specified by Rd(4,1). The 64-bit addition result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 64-bit value of the register-pair are treated as signed integers. Rd(4,1), i.e., d, determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

# **RV64 Description:**

For the SMALBB instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. For the SMALBT instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMALTT instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top

16-bit content of the 32-bit elements of Rs2. The multiplication results are added with the 64-bit value of Rd. The 64-bit addition result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers.

# **Operations:**

```
RV32:
Mres[31:0] = Rs1.H[0] * Rs2.H[0]; // SMALBB
Mres[31:0] = Rs1.H[0] * Rs2.H[1]; // SMALBT
Mres[31:0] = Rs1.H[1] * Rs2.H[1]; // SMALTT
Idx0 = CONCAT(Rd(4,1),1'b0); Idx1 = CONCAT(Rd(4,1),1'b1);
R[Idx1].R[Idx0] = R[Idx1].R[Idx0] + SE64(Mres[31:0]);
RV64:
// SMALBB
Mres[0][31:0] = Rs1.W[0].H[0] * Rs2.W[0].H[0];
Mres[1][31:0] = Rs1.W[1].H[0] * Rs2.W[1].H[0];
Mres[0][31:0] = Rs1.W[0].H[0] * Rs2.W[0].H[1];
Mres[1][31:0] = Rs1.W[1].H[0] * Rs2.W[1].H[1];
// SMALTT
Mres[0][31:0] = Rs1.W[0].H[1] * Rs2.W[0].H[1];
Mres[1][31:0] = Rs1.W[1].H[1] * Rs2.W[1].H[1];
Rd = Rd + SE64 (Mres[0][31:0]) + SE64 (Mres[1][31:0]);
```

#### **Parameters**

- t [in] long long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long long type

\_\_STATIC\_FORCEINLINE long long \_\_RV\_SMALTT (long long t, unsigned long a, unsigned lon SMALTT (Signed Multiply Top Halfs & Add 64-bit)

**Type**: DSP (64-bit Profile)

# Syntax:

```
SMALBB Rd, Rs1, Rs2
SMALBT Rd, Rs1, Rs2
SMALTT Rd, Rs1, Rs2
```

#### Purpose:

Multiply the signed 16-bit content of the 32-bit elements of a register with the 16-bit content of the corresponding 32-bit elements of another register and add the results with a 64-bit value of an even/odd pair of registers (RV32) or a register (RV64). The addition result is written back to the register-pair (RV32) or the register (RV64).

- SMALBB: rt pair + bottom\*bottom (all 32-bit elements)
- SMALBT rt pair + bottom\*top (all 32-bit elements)
- SMALTT rt pair + top\*top (all 32-bit elements)

# **RV32 Description:**

For the SMALBB instruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2. For the SMALBT instruction, it multiplies the bottom 16-bit content of Rs1 with the top 16-bit

content of Rs2. For the SMALTT instruction, it multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2. The multiplication result is added with the 64-bit value of an even/odd pair of registers specified by Rd(4,1). The 64-bit addition result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 64-bit value of the register-pair are treated as signed integers. Rd(4,1), i.e., d, determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

#### **RV64 Description**:

For the SMALBB instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. For the SMALBT instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMALTT instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. The multiplication results are added with the 64-bit value of Rd. The 64-bit addition result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers.

#### **Operations:**

```
RV32:
Mres[31:0] = Rs1.H[0] * Rs2.H[0]; // SMALBB
Mres[31:0] = Rs1.H[0] * Rs2.H[1]; // SMALBT
Mres[31:0] = Rs1.H[1] * Rs2.H[1]; // SMALTT
Idx0 = CONCAT(Rd(4,1),1'b0); Idx1 = CONCAT(Rd(4,1),1'b1);
R[Idx1].R[Idx0] = R[Idx1].R[Idx0] + SE64(Mres[31:0]);
RV64:
// SMALBB
Mres[0][31:0] = Rs1.W[0].H[0] * Rs2.W[0].H[0];
Mres[1][31:0] = Rs1.W[1].H[0] * Rs2.W[1].H[0];
// SMALBT
Mres[0][31:0] = Rs1.W[0].H[0] * Rs2.W[0].H[1];
Mres[1][31:0] = Rs1.W[1].H[0] * Rs2.W[1].H[1];
// SMALTT
Mres[0][31:0] = Rs1.W[0].H[1] * Rs2.W[0].H[1];
Mres[1][31:0] = Rs1.W[1].H[1] * Rs2.W[1].H[1];
Rd = Rd + SE64 (Mres[0][31:0]) + SE64 (Mres[1][31:0]);
```

#### **Parameters**

- t [in] long long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b [in]** unsigned long type of value stored in b

**Returns** value stored in long long type

\_\_STATIC\_FORCEINLINE long long \_\_RV\_SMALDA (long long t, unsigned long a, unsigned lon SMALDA (Signed Multiply Two Halfs and Two Adds 64-bit)

Type: DSP (64-bit Profile)

# Syntax:

```
SMALDA Rd, Rs1, Rs2
SMALXDA Rd, Rs1, Rs2
```

#### Purpose:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then adds the two 32-bit results and the 64-bit value of an even/odd pair of registers together.

- SMALDA: rt pair+ top\*top + bottom\*bottom (all 32-bit elements)
- SMALXDA: rt pair+ top\*bottom + bottom\*top (all 32-bit elements)

#### **RV32 Description**:

For the SMALDA instruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then adds the result to the result of multiplying the top 16-bit content of Rs1 with the top 16-bit content of Rs2 with unlimited precision. For the SMALXDA instruction, it multiplies the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then adds the result to the result of multiplying the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2 with unlimited precision. The result is added to the 64-bit value of an even/odd pair of registers specified by Rd(4,1). The 64- bit addition result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 64- bit value of the register-pair are treated as signed integers. Rd(4,1), i.e., d, determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

# **RV64 Description:**

For the SMALDA instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then adds the result to the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 with unlimited precision. For the SMALXDA instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then adds the result to the result of multiplying the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 with unlimited precision. The results are added to the 64-bit value of Rd. The 64-bit addition result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers.

# **Operations:**

```
RV32:
// SMATIDA
Mres0[31:0] = (Rs1.H[0] * Rs2.H[0]);
Mres1[31:0] = (Rs1.H[1] * Rs2.H[1]);
// SMALXDA
Mres0[31:0] = (Rs1.H[0] * Rs2.H[1]);
Mres1[31:0] = (Rs1.H[1] * Rs2.H[0]);
Idx0 = CONCAT(Rd(4,1),1'b0); Idx1 = CONCAT(Rd(4,1),1'b1);
R[Idx1].R[Idx0] = R[Idx1].R[Idx0] + SE64(Mres0[31:0]) + SE64(Mres1[31:0]);
RV64:
// SMALDA
Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[0]);
Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[1]);
Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[0]);
Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[1]);
// SMALXDA
Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[1]);
Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[0]);
Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[1]);
Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[0]);
Rd = Rd + SE64(Mres0[0][31:0]) + SE64(Mres1[0][31:0]) + SE64(Mres0[1][31:0]) +
SE64 (Mres1[1][31:0]);
```

#### **Parameters**

- t [in] long long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long long type

\_\_STATIC\_FORCEINLINE long long \_\_RV\_SMALXDA (long long t, unsigned long a, unsigned lo SMALXDA (Signed Crossed Multiply Two Halfs and Two Adds 64-bit)

Type: DSP (64-bit Profile)

# Syntax:

```
SMALDA Rd, Rs1, Rs2
SMALXDA Rd, Rs1, Rs2
```

#### Purpose:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then adds the two 32-bit results and the 64-bit value of an even/odd pair of registers together.

- SMALDA: rt pair+ top\*top + bottom\*bottom (all 32-bit elements)
- SMALXDA: rt pair+ top\*bottom + bottom\*top (all 32-bit elements)

#### **RV32 Description:**

For the SMALDA instruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then adds the result to the result of multiplying the top 16-bit content of Rs1 with the top 16-bit content of Rs2 with unlimited precision. For the SMALXDA instruction, it multiplies the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then adds the result to the result of multiplying the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2 with unlimited precision. The result is added to the 64-bit value of an even/odd pair of registers specified by Rd(4,1). The 64- bit addition result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 64- bit value of the register-pair are treated as signed integers. Rd(4,1), i.e., d, determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

#### **RV64 Description:**

For the SMALDA instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then adds the result to the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 with unlimited precision. For the SMALXDA instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then adds the result to the result of multiplying the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 with unlimited precision. The results are added to the 64-bit value of Rd. The 64-bit addition result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers.

#### **Operations:**

```
RV32:
// SMALDA
Mres0[31:0] = (Rs1.H[0] * Rs2.H[0]);
Mres1[31:0] = (Rs1.H[1] * Rs2.H[1]);
// SMALXDA
Mres0[31:0] = (Rs1.H[0] * Rs2.H[1]);
```

(continues on next page)

```
Mres1[31:0] = (Rs1.H[1] * Rs2.H[0]);
Idx0 = CONCAT(Rd(4,1),1'b0); Idx1 = CONCAT(Rd(4,1),1'b1);
R[Idx1].R[Idx0] = R[Idx1].R[Idx0] + SE64(Mres0[31:0]) + SE64(Mres1[31:0]);
RV64:
// SMALDA
Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[0]);
Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[1]);
Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[0]);
Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[1]);
// SMALXDA
Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[1]);
Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[0]);
Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[1]);
Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[0]);
Rd = Rd + SE64(Mres0[0][31:0]) + SE64(Mres1[0][31:0]) + SE64(Mres0[1][31:0]) +
SE64 (Mres1[1][31:0]);
```

#### **Parameters**

- t − [in] long long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long long type

\_STATIC\_FORCEINLINE long long \_\_RV\_SMALDS (long long t, unsigned long a, unsigned lon SMALDS (Signed Multiply Two Halfs & Subtract & Add 64-bit)

Type: DSP (64-bit Profile)

#### Syntax:

```
SMALDS Rd, Rs1, Rs2
SMALDRS Rd, Rs1, Rs2
SMALXDS Rd, Rs1, Rs2
```

# Purpose:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then perform a subtraction operation between the two 32-bit results. Then add the subtraction result to the 64-bit value of an even/odd pair of registers (RV32) or a register (RV64). The addition result is written back to the register-pair.

- SMALDS: rt pair + (top\*top bottom\*bottom) (all 32-bit elements)
- SMALDRS: rt pair + (bottom\*bottom top\*top) (all 32-bit elements)
- SMALXDS: rt pair + (top\*bottom bottom\*top) (all 32-bit elements)

# **RV32 Description**:

For the SMALDS instruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of Rs1 with the top 16-bit content of Rs2. For the SMALDRS instruction, it multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2. For the SMALXDS instruction, it multiplies the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2. The subtraction result

is then added to the 64-bit value of an even/odd pair of registers specified by Rd(4,1). The 64-bit addition result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 64-bit value of the register-pair are treated as signed integers. Rd(4,1), i.e., d, determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

# **RV64 Description:**

For the SMALDS instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMALDRS instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. For the SMALXDS instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. The subtraction results are then added to the 64-bit value of Rd. The 64-bit addition result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers.

# **Operations:**

```
* RV32:
Mres[31:0] = (Rs1.H[1] * Rs2.H[1]) - (Rs1.H[0] * Rs2.H[0]); // SMALDS
Mres[31:0] = (Rs1.H[0] * Rs2.H[0]) - (Rs1.H[1] * Rs2.H[1]); // SMALDRS
Mres[31:0] = (Rs1.H[1] * Rs2.H[0]) - (Rs1.H[0] * Rs2.H[1]); // SMALXDS
Idx0 = CONCAT(Rd(4,1),1'b0); Idx1 = CONCAT(Rd(4,1),1'b1);
R[Idx1].R[Idx0] = R[Idx1].R[Idx0] + SE64(Mres[31:0]);
* RV64:
// SMALDS
Mres[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[1]) - (Rs1.W[0].H[0] * Rs2.W[0].
Mres[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[0].H[1]) - (Rs1.W[1].H[0] * Rs2.W[1].
\hookrightarrowH[0]);
// SMALDRS
Mres[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[0]) - (Rs1.W[0].H[1] * Rs2.W[0].
\hookrightarrowH[1]);
Mres[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[0].H[0]) - (Rs1.W[1].H[1] * Rs2.W[1].
\hookrightarrowH[1]);
// SMALXDS
Mres[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[0]) - (Rs1.W[0].H[0] * Rs2.W[0].
\hookrightarrowH[1]);
Mres[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[0].H[0]) - (Rs1.W[1].H[0] * Rs2.W[1].
\hookrightarrowH[1]);
Rd = Rd + SE64 (Mres[0][31:0]) + SE64 (Mres[1][31:0]);
```

# **Parameters**

- t [in] long long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long long type

\_\_STATIC\_FORCEINLINE long long \_\_RV\_SMALDRS (long long t, unsigned long a, unsigned lo SMALDRS (Signed Multiply Two Halfs & Reverse Subtract & Add 64- bit)

**Type**: DSP (64-bit Profile)

#### Syntax:

```
SMALDS Rd, Rs1, Rs2
SMALDRS Rd, Rs1, Rs2
SMALXDS Rd, Rs1, Rs2
```

#### Purpose:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then perform a subtraction operation between the two 32-bit results. Then add the subtraction result to the 64-bit value of an even/odd pair of registers (RV32) or a register (RV64). The addition result is written back to the register-pair.

- SMALDS: rt pair + (top\*top bottom\*bottom) (all 32-bit elements)
- SMALDRS: rt pair + (bottom\*bottom top\*top) (all 32-bit elements)
- SMALXDS: rt pair + (top\*bottom bottom\*top) (all 32-bit elements)

# **RV32 Description:**

For the SMALDS instruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of Rs1 with the top 16-bit content of Rs2. For the SMALDRS instruction, it multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2. For the SMALXDS instruction, it multiplies the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2. The subtraction result is then added to the 64-bit value of an even/odd pair of registers specified by Rd(4,1). The 64-bit addition result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 64-bit value of the register-pair are treated as signed integers. Rd(4,1), i.e., d, determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

### **RV64 Description:**

For the SMALDS instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMALDRS instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. For the SMALXDS instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. The subtraction results are then added to the 64-bit value of Rd. The 64-bit addition result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers.

# **Operations**:

```
* RV32:

Mres[31:0] = (Rs1.H[1] * Rs2.H[1]) - (Rs1.H[0] * Rs2.H[0]); // SMALDS

Mres[31:0] = (Rs1.H[0] * Rs2.H[0]) - (Rs1.H[1] * Rs2.H[1]); // SMALDRS

Mres[31:0] = (Rs1.H[1] * Rs2.H[0]) - (Rs1.H[0] * Rs2.H[1]); // SMALXDS

Idx0 = CONCAT(Rd(4,1),1'b0); Idx1 = CONCAT(Rd(4,1),1'b1);
```

(continues on next page)

```
R[Idx1].R[Idx0] = R[Idx1].R[Idx0] + SE64(Mres[31:0]);
* RV64:
// SMALDS
Mres[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[0].H[1]) - (Rs1.W[1].H[0] * Rs2.W[1].
\hookrightarrowH[0]);
// SMALDRS
Mres[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[0]) - (Rs1.W[0].H[1] * Rs2.W[0].
\hookrightarrowH[1]);
Mres[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[0].H[0]) - (Rs1.W[1].H[1] * Rs2.W[1].
\hookrightarrowH[1]);
// SMALXDS
Mres[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[0]) - (Rs1.W[0].H[0] * Rs2.W[0].
→H[1]):
\hookrightarrowH[1]);
Rd = Rd + SE64 (Mres[0][31:0]) + SE64 (Mres[1][31:0]);
```

#### **Parameters**

- t [in] long long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long long type

\_STATIC\_FORCEINLINE long long \_\_RV\_SMALXDS (long long t, unsigned long a, unsigned lo SMALXDS (Signed Crossed Multiply Two Halfs & Subtract & Add 64-bit)

Type: DSP (64-bit Profile)

#### Syntax:

```
SMALDS Rd, Rs1, Rs2
SMALDRS Rd, Rs1, Rs2
SMALXDS Rd, Rs1, Rs2
```

# Purpose:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then perform a subtraction operation between the two 32-bit results. Then add the subtraction result to the 64-bit value of an even/odd pair of registers (RV32) or a register (RV64). The addition result is written back to the register-pair.

- SMALDS: rt pair + (top\*top bottom\*bottom) (all 32-bit elements)
- SMALDRS: rt pair + (bottom\*bottom top\*top) (all 32-bit elements)
- SMALXDS: rt pair + (top\*bottom bottom\*top) (all 32-bit elements)

# **RV32 Description**:

For the SMALDS instruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of Rs1 with the top 16-bit content of Rs2. For the SMALDRS instruction, it multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2. For the SMALXDS instruction, it multiplies the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2 and then subtracts the result from the result

of multiplying the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2. The subtraction result is then added to the 64-bit value of an even/odd pair of registers specified by Rd(4,1). The 64-bit addition result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 64-bit value of the register-pair are treated as signed integers. Rd(4,1), i.e., d, determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the operand and the even 2d register of the pair contains the low 32-bit of the operand.

### **RV64 Description:**

For the SMALDS instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMALDRS instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2. For the SMALXDS instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of Rs1 with the bott

### **Operations:**

```
* RV32:
Mres[31:0] = (Rs1.H[1] * Rs2.H[1]) - (Rs1.H[0] * Rs2.H[0]); // SMALDS
Mres[31:0] = (Rs1.H[0] * Rs2.H[0]) - (Rs1.H[1] * Rs2.H[1]); // SMALDRS
Mres[31:0] = (Rs1.H[1] * Rs2.H[0]) - (Rs1.H[0] * Rs2.H[1]); // SMALXDS
Idx0 = CONCAT(Rd(4,1),1'b0); Idx1 = CONCAT(Rd(4,1),1'b1);
R[Idx1].R[Idx0] = R[Idx1].R[Idx0] + SE64(Mres[31:0]);
* RV64:
// SMALDS
→H[0]);
Mres[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[0].H[1]) - (Rs1.W[1].H[0] * Rs2.W[1].
\hookrightarrowH[0]);
// SMALDRS
Mres[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[0]) - (Rs1.W[0].H[1] * Rs2.W[0].
Mres[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[0].H[0]) - (Rs1.W[1].H[1] * Rs2.W[1].
\hookrightarrowH[1]);
// SMALXDS
Mres[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[0]) - (Rs1.W[0].H[0] * Rs2.W[0].
\hookrightarrowH[1]);
Mres[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[0].H[0]) - (Rs1.W[1].H[0] * Rs2.W[1].
→H[1]):
Rd = Rd + SE64 (Mres[0][31:0]) + SE64 (Mres[1][31:0]);
```

#### **Parameters**

- t [in] long long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long long type

\_\_STATIC\_FORCEINLINE long long \_\_RV\_SMSLDA (long long t, unsigned long a, unsigned lon SMSLDA (Signed Multiply Two Halfs & Add & Subtract 64-bit)

```
Type: DSP (64-bit Profile)
```

# Syntax:

```
SMSLDA Rd, Rs1, Rs2
SMSLXDA Rd, Rs1, Rs2
```

# Purpose:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then subtracts the two 32-bit results from the 64-bit value of an even/odd pair of registers (RV32) or a register (RV64). The subtraction result is written back to the register-pair.

- SMSLDA: rd pair top\*top bottom\*bottom (all 32-bit elements)
- SMSLXDA: rd pair top\*bottom bottom\*top (all 32-bit elements)

### **RV32 Description:**

For the SMSLDA instruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content Rs2 and multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2. For the SMSLXDA instruction, it multiplies the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and multiplies the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2. The two multiplication results are subtracted from the 64-bit value of an even/odd pair of registers specified by Rd(4,1). The 64-bit subtraction result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 64-bit value of the register-pair are treated as signed integers. Rd(4,1), i.e., d, determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the result and the even 2d register of the pair contains the low 32-bit of the result.

#### **RV64 Description:**

For the SMSLDA instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMSLXDA instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. The four multiplication results are subtracted from the 64-bit value of Rd. The 64-bit subtraction result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers.

#### **Operations:**

```
* RV32:
// SMSLDA
Mres0[31:0] = (Rs1.H[0] * Rs2.H[0]);
Mres1[31:0] = (Rs1.H[1] * Rs2.H[1]);
// SMSLXDA
Mres0[31:0] = (Rs1.H[0] * Rs2.H[0]);
Idx0 = CONCAT(Rd(4,1),1'b0); Idx1 = CONCAT(Rd(4,1),1'b1);
R[Idx1].R[Idx0] = R[Idx1].R[Idx0] - SE64(Mres0[31:0]) - SE64(Mres1[31:0]);
* RV64:
// SMSLDA
Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[0]);
Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[1]);
Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[0]);
```

(continues on next page)

```
Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[1]);
// SMSLXDA
Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[1]);
Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[0]);
Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[1]);
Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[0]);
Rd = Rd - SE64(Mres0[0][31:0]) - SE64(Mres1[0][31:0]) - SE64(Mres0[1][31:0]) -
SE64(Mres1[1][31:0]);
```

#### **Parameters**

- t [in] long long type of value stored in t
- a [in] unsigned long type of value stored in a
- $\mathbf{b} [\mathbf{in}]$  unsigned long type of value stored in  $\mathbf{b}$

**Returns** value stored in long long type

\_\_STATIC\_FORCEINLINE long long \_\_RV\_SMSLXDA (long long t, unsigned long a, unsigned lo SMSLXDA (Signed Crossed Multiply Two Halfs & Add & Subtract 64- bit)

**Type**: DSP (64-bit Profile)

#### Syntax:

```
SMSLDA Rd, Rs1, Rs2
SMSLXDA Rd, Rs1, Rs2
```

#### Purpose:

Do two signed 16-bit multiplications from the 32-bit elements of two registers; and then subtracts the two 32-bit results from the 64-bit value of an even/odd pair of registers (RV32) or a register (RV64). The subtraction result is written back to the register-pair.

- SMSLDA: rd pair top\*top bottom\*bottom (all 32-bit elements)
- SMSLXDA: rd pair top\*bottom bottom\*top (all 32-bit elements)

# **RV32 Description:**

For the SMSLDA instruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content Rs2 and multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2. For the SMSLXDA instruction, it multiplies the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and multiplies the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2. The two multiplication results are subtracted from the 64-bit value of an even/odd pair of registers specified by Rd(4,1). The 64-bit subtraction result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 64-bit value of the register-pair are treated as signed integers. Rd(4,1), i.e., d, determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the high 32-bit of the result and the even 2d register of the pair contains the low 32-bit of the result.

#### **RV64 Description:**

For the SMSLDA instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. For the SMSLXDA instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. The four multiplication results are subtracted from the 64-bit value

of Rd. The 64-bit subtraction result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers.

## **Operations:**

```
* RV32:
// SMSLDA
Mres0[31:0] = (Rs1.H[0] * Rs2.H[0]);
Mres1[31:0] = (Rs1.H[1] * Rs2.H[1]);
// SMSLXDA
Mres0[31:0] = (Rs1.H[0] * Rs2.H[1]);
Mres1[31:0] = (Rs1.H[1] * Rs2.H[0]);
Idx0 = CONCAT(Rd(4,1),1'b0); Idx1 = CONCAT(Rd(4,1),1'b1);
R[Idx1].R[Idx0] = R[Idx1].R[Idx0] - SE64(Mres0[31:0]) - SE64(Mres1[31:0]);
* RV64:
// SMSLDA
Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[0]);
Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[1]);
Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[0]);
Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[1]);
// SMSLXDA
Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[1]);
Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[0]);
Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[1]);
Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[0]);
Rd = Rd - SE64(Mres0[0][31:0]) - SE64(Mres1[0][31:0]) - SE64(Mres0[1][31:0]) -
SE64 (Mres1[1][31:0]);
```

## **Parameters**

- t [in] long long type of value stored in t
- a [in] unsigned long type of value stored in a
- **b** [in] unsigned long type of value stored in b

**Returns** value stored in long long type

```
group NMSIS_Core_DSP_Intrinsic_64B_PROFILE 64-bit Profile Instructions
```

## **Nuclei Customized DSP Instructions**

```
__STATIC_FORCEINLINE unsigned long long __RV_DKHM8 (unsigned long long a, unsigned long long __RV_DKHM16 (unsigned long long a, unsigned long long __RV_DKHM16 (unsigned long long a, unsigned long long __RV_DKABS8 (unsigned long long a)

__STATIC_FORCEINLINE unsigned long long __RV_DKABS16 (unsigned long long a)

__STATIC_FORCEINLINE unsigned long long __RV_DKSLRA8 (unsigned long long a, int b)

__STATIC_FORCEINLINE unsigned long long __RV_DKSLRA16 (unsigned long long a, int b)

__STATIC_FORCEINLINE unsigned long long __RV_DKADD8 (unsigned long long a, unsigned long long __RV_DKADD16 (unsigned long long a, unsigned long long __RV_DKSUB16 (unsigned long long a, unsigned long long __RV_DKSUB16 (unsigned long long a, unsigned long long __RV_DKSUB16 (unsigned long long a, unsigned __RV_DKSUB16 (unsigned l
```

This is Nuclei customized DSP instructions only for RV32

#### **Functions**

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_DKHM8 (unsigned long long a, unsigned lon DKHM8 (64-bit SIMD Signed Saturating Q7 Multiply)

Type: SIMD

## Syntax:

```
DKHM8 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers
```

## Purpose:

Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.

#### **Description**:

For the DKHM8 instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2.

The Q14 results are then right-shifted 7-bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.

## **Operations:**

```
oplt = Rs1.B[x+1]; op2t = Rs2.B[x+1]; // top
op1b = Rs1.B[x]; op2b = Rs2.B[x]; // bottom
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
   if (0x80 != aop | 0x80 != bop) {
     res = (aop s* bop) >> 7;
   } else {
     res= 0x7F;
     OV = 1;
   }
Rd.H[x/2] = concat(rest, resb);
for RV32, x=0,2,4,6
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- **b** [in] unsigned long long type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_DKHM16 (unsigned long long a, unsigned lo DKHM16 (64-bit SIMD Signed Saturating Q15 Multiply)

Type: SIMD

## Syntax:

```
DKHM16 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers
```

## Purpose:

Do Q15xQ15 element multiplications simultaneously. The Q30 results are then reduced to Q15 numbers again.

## **Description:**

For the DKHM16 instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2.

The Q30 results are then right-shifted 15-bits and saturated into Q15 values. The Q15 results are then written into Rd. When both the two Q15 inputs of a multiplication are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.

## **Operations:**

```
oplt = Rs1.H[x+1]; op2t = Rs2.H[x+1]; // top
op1b = Rs1.H[x]; op2b = Rs2.H[x]; // bottom
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
   if (0x8000 != aop | 0x8000 != bop) {
     res = (aop s* bop) >> 15;
   } else {
     res= 0x7FFF;
     OV = 1;
   }
}
Rd.W[x/2] = concat(rest, resb);
for RV32: x=0, 2
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- **b** [in] unsigned long long type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_DKABS8 (unsigned long long a)
DKABS8 (64-bit SIMD 8-bit Saturating Absolute)

Type: SIMD

## Syntax:

```
DKABS8 Rd, Rs1
# Rd, Rs1 are all even/odd pair of registers
```

#### Purpose:

Get the absolute value of 8-bit signed integer elements simultaneously.

## **Description**:

This instruction calculates the absolute value of 8-bit signed integer elements stored in Rs1 and writes the element results to Rd. If the input number is 0x80, this instruction generates 0x7f as the output and sets the OV bit to 1.

## **Operations:**

```
src = Rs1.B[x];
if (src == 0x80) {
    src = 0x7f;
    OV = 1;
} else if (src[7] == 1)
    src = -src;
}
Rd.B[x] = src;
for RV32: x=7...0,
```

Parameters a - [in] unsigned long long type of value stored in a

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_DKABS16 (unsigned long long a)
DKABS16 (64-bit SIMD 16-bit Saturating Absolute)

Type: SIMD

#### Syntax:

```
DKABS16 Rd, Rs1
# Rd, Rs1 are all even/odd pair of registers
```

#### Purpose

Get the absolute value of 16-bit signed integer elements simultaneously.

#### **Description**:

This instruction calculates the absolute value of 16-bit signed integer elements stored in Rs1 and writes the element results to Rd. If the input number is 0x8000, this instruction generates 0x7fff as the output and sets the OV bit to 1.

## **Operations:**

```
src = Rs1.H[x];
if (src == 0x8000) {
    src = 0x7ffff;
    OV = 1;
} else if (src[15] == 1)
    src = -src;
}
Rd.H[x] = src;
for RV32: x=3...0,
```

Parameters a - [in] unsigned long long type of value stored in a

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_DKSLRA8 (unsigned long long a, int b)

DKSLRA8 (64-bit SIMD 8-bit Shift Left Logical with Saturation or Shift Right Arithmetic)

Type: SIMD

Syntax:

```
DKSLRA8 Rd, Rs1, Rs2
# Rd, Rs1 are all even/odd pair of registers
```

#### Purpose:

Do 8-bit elements logical left (positive) or arithmetic right (negative) shift operation with Q7 saturation for the left shift.

## **Description**:

The 8-bit data elements of Rs1 are left-shifted logically or right-shifted arithmetically based on the value of Rs2[3:0]. Rs2[3:0] is in the signed range of [-2^3, 2^3-1]. A positive Rs2[3:0] means logical left shift and a negative Rs2[3:0] means arithmetic right shift. The shift amount is the absolute value of Rs2[3:0]. However, the behavior of Rs2[3:0]==-2^3 (0x8) is defined to be equivalent to the behavior of Rs2[3:0]==-(2^3-1) (0x9). The left-shifted results are saturated to the 8-bit signed integer range of [-2^7, 2^7-1]. If any saturation happens, this instruction sets the OV flag. The value of Rs2[31:4] will not affect this instruction.

## **Operations:**

```
if (Rs2[3:0] < 0) {
    sa = -Rs2[3:0];
    sa = (sa == 8)? 7 : sa;
    Rd.B[x] = SE8(Rs1.B[x][7:sa]);
} else {
    sa = Rs2[2:0];
    res[(7+sa):0] = Rs1.B[x] <<(logic) sa;
    if (res > (2^7)-1) {
        res[7:0] = 0x7f; OV = 1;
    } else if (res < -2^7) {
        res[7:0] = 0x80; OV = 1;
    }
    Rd.B[x] = res[7:0];
}
for RV32: x=7...0,</pre>
```

## **Parameters**

- a [in] unsigned long long type of value stored in a
- **b [in]** int type of value stored in b

**Returns** value stored in unsigned long long type

\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_DKSLRA16 (unsigned long long a, int b)

DKSLRA16 (64-bit SIMD 16-bit Shift Left Logical with Saturation or Shift Right Arithmetic)

**Type**: SIMD

## Syntax:

```
DKSLRA16 Rd, Rs1, Rs2
# Rd, Rs1 are all even/odd pair of registers
```

## Purpose:

Do 16-bit elements logical left (positive) or arithmetic right (negative) shift operation with Q15 saturation for the left shift.

#### **Description**:

The 16-bit data elements of Rs1 are left-shifted logically or right-shifted arithmetically based on the value of Rs2[4:0]. Rs2[4:0] is in the signed range of [-2^4, 2^4-1]. A positive Rs2[4:0] means logical left shift and a negative Rs2[4:0] means arithmetic right shift. The shift amount is the absolute value of Rs2[4:0]. However, the behavior of Rs2[4:0]== $-2^4$  (0x10) is defined to be equivalent to the behavior of Rs2[4:0]== $-(2^4-1)$  (0x11). The left-shifted results are saturated to the 16-bit signed integer range of [-2^15, 2^15-1]. After the shift, saturation, or rounding, the final results are written to Rd. If any saturation happens, this instruction sets the OV flag. The value of Rs2[31:5] will not affect this instruction.

#### **Operations:**

```
if (Rs2[4:0] < 0) {
    sa = -Rs2[4:0];
    sa = (sa == 16)? 15 : sa;
    Rd.H[x] = SE16(Rs1.H[x][15:sa]);
} else {
    sa = Rs2[3:0];
    res[(15+sa):0] = Rs1.H[x] <<(logic) sa;
    if (res > (2^15)-1) {
        res[15:0] = 0x7fff; OV = 1;
    } else if (res < -2^15) {
        res[15:0] = 0x8000; OV = 1;
    }
    d.H[x] = res[15:0];
}
for RV32: x=3...0,</pre>
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- **b** [in] int type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_DKADD8 (unsigned long long a, unsigned lo DKADD8 (64-bit SIMD 8-bit Signed Saturating Addition)

Type: SIMD

#### Syntax:

```
DKADD8 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers
```

#### Purpose:

Do 8-bit signed integer element saturating additions simultaneously.

## **Description**:

This instruction adds the 8-bit signed integer elements in Rs1 with the 8-bit signed integer elements in Rs2. If any of the results are beyond the Q7 number range ( $-2^7 \le Q7 \le 2^7-1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

## **Operations:**

```
res[x] = Rs1.B[x] + Rs2.B[x];

if (res[x] > 127) {

res[x] = 127;

OV = 1;
```

(continues on next page)

(continued from previous page)

```
} else if (res[x] < -128) {
   res[x] = -128;
   OV = 1;
}
Rd.B[x] = res[x];
for RV32: x=7...0,</pre>
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- **b** [in] unsigned long long type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_DKADD16 (unsigned long long a, unsigned l DKADD16 (64-bit SIMD 16-bit Signed Saturating Addition)

Type: SIMD

#### Syntax:

```
DKADD16 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers
```

#### **Purpose**:

Do 16-bit signed integer element saturating additions simultaneously.

#### **Description**:

This instruction adds the 16-bit signed integer elements in Rs1 with the 16-bit signed integer elements in Rs2. If any of the results are beyond the Q15 number range ( $-2^15 \le 2^15 \le 2^15 \le 1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

## **Operations:**

```
res[x] = Rs1.H[x] + Rs2.H[x];
if (res[x] > 32767) {
   res[x] = 32767;
   OV = 1;
} else if (res[x] < -32768) {
   res[x] = -32768;
   OV = 1;
}
Rd.H[x] = res[x];
for RV32: x=3...0,</pre>
```

## **Parameters**

- a [in] unsigned long long type of value stored in a
- **b** [in] unsigned long long type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_DKSUB8 (unsigned long long a, unsigned lo DKSUB8 (64-bit SIMD 8-bit Signed Saturating Subtraction)

Type: SIMD

Syntax:

```
DKSUB8 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers
```

## **Purpose**:

Do 8-bit signed elements saturating subtractions simultaneously.

#### **Description**:

This instruction subtracts the 8-bit signed integer elements in Rs2 from the 8-bit signed integer elements in Rs1. If any of the results are beyond the Q7 number range ( $-2^7 \le Q7 \le 2^7 - 1$ ), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

## **Operations:**

```
res[x] = Rs1.B[x] - Rs2.B[x];
if (res[x] > (2^7)-1) {
   res[x] = (2^7)-1;
   OV = 1;
} else if (res[x] < -2^7) {
   res[x] = -2^7;
   OV = 1;
}
Rd.B[x] = res[x];
for RV32: x=7...0,</pre>
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- **b** [in] unsigned long long type of value stored in b

**Returns** value stored in unsigned long long type

\_\_STATIC\_FORCEINLINE unsigned long long \_\_RV\_DKSUB16 (unsigned long long a, unsigned l DKSUB16 (64-bit SIMD 16-bit Signed Saturating Subtraction)

Type: SIMD

## Syntax:

```
DKSUB16 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers
```

## Purpose:

Do 16-bit signed integer elements saturating subtractions simultaneously.

#### **Description**:

This instruction subtracts the 16-bit signed integer elements in Rs2 from the 16-bit signed integer elements in Rs1. If any of the results are beyond the Q15 number range ( $-2^15 \le 2^15 \le 2^$ 

## **Operations:**

```
res[x] = Rs1.H[x] - Rs2.H[x];
if (res[x] > (2^15)-1) {
  res[x] = (2^15)-1;
  OV = 1;
} else if (res[x] < -2^15) {</pre>
```

(continues on next page)

(continued from previous page)

```
res[x] = -2^15;

OV = 1;

}

Rd.H[x] = res[x];

for RV32: x=3...0,
```

#### **Parameters**

- a [in] unsigned long long type of value stored in a
- **b** [in] unsigned long long type of value stored in b

**Returns** value stored in unsigned long long type

```
__STATIC_FORCEINLINE unsigned long __RV_EXPD80 (unsigned long a)
```

EXPD80 (Expand and Copy Byte 0 to 32bit)

Type: DSP Syntax:

EXPD80 Rd, Rs1

# Purpose :

Copy 8-bit data from 32-bit chunks into 4 bytes in a register.

## **Description**:

Moves Rs1.B[0][7:0] to Rd.[0][7:0], Rd.[1][7:0], Rd.[2][7:0], Rd.[3][7:0]

## **Operations:**

```
Rd.W[x][31:0] = CONCAT(Rs1.B[0][7:0], Rs1.B[0][7:0], Rs1.B[0][7:0], Rs1.

→B[0][7:0]);

for RV32: x=0
```

Parameters a – [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_EXPD81 (unsigned long a)
```

EXPD81 (Expand and Copy Byte 1 to 32bit)

Type: DSP

## Syntax:

```
EXPD81 Rd, Rs1
```

#### Purpose:

Copy 8-bit data from 32-bit chunks into 4 bytes in a register.

## **Description**:

Moves Rs1.B[1][7:0] to Rd.[0][7:0], Rd.[1][7:0], Rd.[2][7:0], Rd.[3][7:0]

## **Operations:**

```
Rd.W[x][31:0] = CONCAT(Rs1.B[1][7:0], Rs1.B[1][7:0], Rs1.B[1][7:0], Rs1.

B[1][7:0]);

for RV32: x=0
```

Parameters a – [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_EXPD82 (unsigned long a)
```

EXPD82 (Expand and Copy Byte 2 to 32bit)

Type: DSP Syntax:

```
EXPD82 Rd, Rs1
```

## Purpose:

Copy 8-bit data from 32-bit chunks into 4 bytes in a register.

#### **Description:**

Moves Rs1.B[2][7:0] to Rd.[0][7:0], Rd.[1][7:0], Rd.[2][7:0], Rd.[3][7:0]

## **Operations:**

```
Rd.W[x][31:0] = CONCAT(Rs1.B[2][7:0], Rs1.B[2][7:0], Rs1.B[2][7:0], Rs1.B[2][7:0]);

obsize for RV32: x=0
```

Parameters a – [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

```
__STATIC_FORCEINLINE unsigned long __RV_EXPD83 (unsigned long a) EXPD83 (Expand and Copy Byte 3 to 32bit)
```

Type: DSP

# Syntax:

```
EXPD83 Rd, Rs1
```

## Purpose :

Copy 8-bit data from 32-bit chunks into 4 bytes in a register.

## **Description**:

Moves Rs1.B[3][7:0] to Rd.[0][7:0], Rd.[1][7:0], Rd.[2][7:0], Rd.[3][7:0]

## **Operations:**

```
Rd.W[x][31:0] = CONCAT(Rs1.B[3][7:0], Rs1.B[3][7:0], Rs1.B[3][7:0], Rs1.

→B[3][7:0]);

for RV32: x=0
```

Parameters a – [in] unsigned long type of value stored in a

**Returns** value stored in unsigned long type

## group NMSIS\_Core\_DSP\_Intrinsic

Functions that generate RISC-V DSP SIMD instructions.

The following functions generate specified RISC-V SIMD instructions that cannot be directly accessed by compiler.

#### DSP ISA Extension Instruction Summary

#### - Shorthand Definitions

- \* r.H == rH1: r[31:16], r.L == r.H0: r[15:0]
- \* r.B3: r[31:24], r.B2: r[23:16], r.B1: r[15:8], r.B0: r[7:0]
- \* r.B[x]: r[(x\*8+7):(x\*8+0)]
- \* r.H[x]: r[(x\*16+7):(x\*16+0)]
- \* r.W[x]: r[(x\*32+31):(x\*32+0)]
- \* r[xU]: the upper 32-bit of a 64-bit number; xU represents the GPR number that contains this upper part 32-bit value.
- \* r[xL]: the lower 32-bit of a 64-bit number; xL represents the GPR number that contains this lower part 32-bit value.
- \* r[xU].r[xL]: a 64-bit number that is formed from a pair of GPRs.
- \* s>>: signed arithmetic right shift:
- \* u>>: unsigned logical right shift
- \* SAT.Qn(): Saturate to the range of [-2^n, 2^n-1], if saturation happens, set PSW.OV.
- \* SAT.Um(): Saturate to the range of [0, 2^m-1], if saturation happens, set PSW.OV.
- \* RUND(): Indicate rounding, i.e., add 1 to the most significant discarded bit for right shift or MSW-type multiplication instructions.
- \* Sign or Zero Extending functions:
  - · SEm(data): Sign-Extend data to m-bit.:
  - · ZEm(data): Zero-Extend data to m-bit.
- \* ABS(x): Calculate the absolute value of x.
- \* CONCAT(x,y): Concatinate x and y to form a value.
- \* u<: Unsinged less than comparison.
- \* u<=: Unsinged less than & equal comparison.
- \* u>: Unsinged greater than comparison.
- \* s\*: Signed multiplication.
- \* u\*: Unsigned multiplication.

## 2.5.8 Peripheral Access

| I volatile const                                                                                                                                               |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| o volatile                                                                                                                                                     |
| IO volatile                                                                                                                                                    |
| <b>IM</b> volatile const                                                                                                                                       |
| OM volatile                                                                                                                                                    |
| <b>IOM</b> volatile                                                                                                                                            |
| $\_{\textbf{VAL2FLD}} \ (\textit{field}, \textit{value}) \ (((\textit{uint32\_t})(\textit{value}) << \textit{field ## \_Pos}) \ \& \ \textit{field ## \_Msk})$ |
|                                                                                                                                                                |
| group NMSIS Core PeriphAccess                                                                                                                                  |

Naming conventions and optional features for accessing peripherals.

The section below describes the naming conventions, requirements, and optional features for accessing device specific peripherals. Most of the rules also apply to the core peripherals.

The **Device Header File <device.h>** contains typically these definition and also includes the core specific header files.

## **Defines**

The macro \_VAL2FLD uses the #define's \_Pos and \_Msk of the related bit field to shift bit-field values for assigning to a register.

## Example:

```
ECLIC->CFG = _VAL2FLD(CLIC_CLICCFG_NLBIT, 3);
```

## **Parameters**

- field [in] Name of the register bit field.
- value [in] Value of the bit field. This parameter is interpreted as an uint32\_t type.

Returns Masked and shifted value.

```
_FLD2VAL (field, value) (((uint32_t)(value) & field ## _Msk) >> field ## _Pos) Mask and shift a register value to extract a bit filed value.
```

The macro \_FLD2VAL uses the #define's \_Pos and \_Msk of the related bit field to extract the value of a bit field from a register.

## **Example:**

```
nlbits = _FLD2VAL(CLIC_CLICCFG_NLBIT, ECLIC->CFG);
```

#### **Parameters**

- field [in] Name of the register bit field.
- value [in] Value of register. This parameter is interpreted as an uint32 t type.

**Returns** Masked and shifted bit field value.

# 2.5.9 Systick Timer(SysTimer)

Click Nuclei Timer Unit<sup>17</sup> to learn about Core Timer Unit in Nuclei ISA Spec.

## **SysTimer API**

```
STATIC FORCEINLINE void SysTimer SetLoadValue (uint64 t value)
__STATIC_FORCEINLINE uint64_t SysTimer_GetLoadValue (void)
__STATIC_FORCEINLINE void SysTimer_SetCompareValue (uint64_t value)
__STATIC_FORCEINLINE uint64_t SysTimer_GetCompareValue (void)
__STATIC_FORCEINLINE void SysTimer_Start (void)
  STATIC_FORCEINLINE void SysTimer_Stop (void)
__STATIC_FORCEINLINE void SysTimer_SetControlValue (uint32_t mctl)
___STATIC_FORCEINLINE uint32_t SysTimer_GetControlValue (void)
__STATIC_FORCEINLINE void SysTimer_SetSWIRQ (void)
__STATIC_FORCEINLINE void SysTimer_ClearSWIRQ (void)
 STATIC FORCEINLINE uint32 t SysTimer GetMsipValue (void)
STATIC FORCEINLINE void SysTimer SetMsipValue (uint32 t msip)
__STATIC_FORCEINLINE void SysTimer_SoftwareReset (void)
__STATIC_INLINE uint32_t SysTick_Config (uint64_t ticks)
 _STATIC_FORCEINLINE uint32_t SysTick_Reload (uint64_t ticks)
group NMSIS_Core_SysTimer
    Functions that configure the Core System Timer.
```

<sup>&</sup>lt;sup>17</sup> https://doc.nucleisys.com/nuclei\_spec/isa/timer.html

#### **Functions**

\_\_STATIC\_FORCEINLINE void SysTimer\_SetLoadValue (uint64\_t value)

Set system timer load value.

This function set the system timer load value in MTIMER register.

#### Remark

- Load value is 64bits wide.
- SysTimer\_GetLoadValue

**Parameters** value – [in] value to set system timer MTIMER register.

\_\_STATIC\_FORCEINLINE uint64\_t SysTimer\_GetLoadValue (void)

Get system timer load value.

This function get the system timer current value in MTIMER register.

#### Remark

- Load value is 64bits wide.
- SysTimer\_SetLoadValue

**Returns** current value(64bit) of system timer MTIMER register.

\_\_STATIC\_FORCEINLINE void SysTimer\_SetCompareValue (uint64\_t value)

Set system timer compare value.

This function set the system Timer compare value in MTIMERCMP register.

## Remark

- Compare value is 64bits wide.
- If compare value is larger than current value timer interrupt generate.
- Modify the load value or compare value less to clear the interrupt.
- SysTimer\_GetCompareValue

Parameters value – [in] compare value to set system timer MTIMERCMP register.

\_\_STATIC\_FORCEINLINE uint64\_t SysTimer\_GetCompareValue (void)

Get system timer compare value.

This function get the system timer compare value in MTIMERCMP register.

## Remark

- Compare value is 64bits wide.
- SysTimer\_SetCompareValue

**Returns** compare value of system timer MTIMERCMP register.

\_\_STATIC\_FORCEINLINE void SysTimer\_Start (void)

Enable system timer counter running.

Enable system timer counter running by clear TIMESTOP bit in MTIMECTL register.

#### \_\_STATIC\_FORCEINLINE void SysTimer\_Stop (void)

Stop system timer counter running.

Stop system timer counter running by set TIMESTOP bit in MTIMECTL register.

## \_\_\_STATIC\_FORCEINLINE void SysTimer\_SetControlValue (uint32\_t mctl)

Set system timer control value.

This function set the system timer MTIMECTL register value.

#### Remark

- Bit TIMESTOP is used to start and stop timer. Clear TIMESTOP bit to 0 to start timer, otherwise to stop timer.
- Bit CMPCLREN is used to enable auto MTIMER clear to zero when MTIMER >= MTIMER-CMP. Clear CMPCLREN bit to 0 to stop auto clear MTIMER feature, otherwise to enable it.
- Bit CLKSRC is used to select timer clock source. Clear CLKSRC bit to 0 to use *mtime\_toggle\_a*, otherwise use *core\_clk\_aon*
- SysTimer\_GetControlValue

Parameters mctl – [in] value to set MTIMECTL register

## \_STATIC\_FORCEINLINE uint32\_t SysTimer\_GetControlValue (void)

Get system timer control value.

This function get the system timer MTIMECTL register value.

#### Remark

• SysTimer\_SetControlValue

**Returns** MTIMECTL register value

## \_\_STATIC\_FORCEINLINE void SysTimer\_SetSWIRQ (void)

Trigger or set software interrupt via system timer.

This function set the system timer MSIP bit in MSIP register.

#### Remark

- Set system timer MSIP bit and generate a SW interrupt.
- SysTimer\_ClearSWIRQ
- SysTimer\_GetMsipValue

#### STATIC FORCEINLINE void SysTimer ClearSWIRQ (void)

Clear system timer software interrupt pending request.

This function clear the system timer MSIP bit in MSIP register.

## Remark

- Clear system timer MSIP bit in MSIP register to clear the software interrupt pending.
- SysTimer\_SetSWIRQ
- SysTimer\_GetMsipValue

## \_\_STATIC\_FORCEINLINE uint32\_t SysTimer\_GetMsipValue (void)

Get system timer MSIP register value.

This function get the system timer MSIP register value.

#### Remark

- Bit0 is SW interrupt flag. Bit0 is 1 then SW interrupt set. Bit0 is 0 then SW interrupt clear.
- SysTimer\_SetSWIRQ
- SysTimer\_ClearSWIRQ

**Returns** Value of Timer MSIP register.

## \_\_STATIC\_FORCEINLINE void SysTimer\_SetMsipValue (uint32\_t msip)

Set system timer MSIP register value.

This function set the system timer MSIP register value.

Parameters msip - [in] value to set MSIP register

## \_STATIC\_FORCEINLINE void SysTimer\_SoftwareReset (void)

Do software reset request.

This function will do software reset request through MTIMER

- Software need to write SysTimer\_MSFRST\_KEY (page 89) to generate software reset request
- The software request flag can be cleared by reset operation to clear

#### Remark

- The software reset is sent to SoC, SoC need to generate reset signal and send back to Core
- This function will not return, it will do while(1) to wait the Core reset happened

# \_\_STATIC\_INLINE uint32\_t SysTick\_Config (uint64\_t ticks)

System Tick Configuration.

Initializes the System Timer and its non-vector interrupt, and starts the System Tick Timer.

In our default implementation, the timer counter will be set to zero, and it will start a timer compare non-vector interrupt when it matchs the ticks user set, during the timer interrupt user should reload the system tick using SysTick\_Reload function or similar function written by user, so it can produce period timer interrupt.

#### Remark

- For \_\_NUCLEI\_N\_REV (page 60) >= 0x0104, the CMPCLREN bit in MTIMECTL is introduced, but we assume that the CMPCLREN bit is set to 0, so MTIMER register will not be auto cleared to 0 when MTIMER >= MTIMERCMP.
- When the variable \_\_Vendor\_SysTickConfig is set to 1, then the function SysTick\_Config is not included.
- In this case, the file **Device>.h** must contain a vendor-specific implementation of this function.
- If user need this function to start a period timer interrupt, then in timer interrupt handler routine code, user should call SysTick\_Reload with ticks to reload the timer.
- This function only available when \_\_SYSTIMER\_PRESENT == 1 and \_\_ECLIC\_PRESENT == 1 and \_\_Vendor\_SysTickConfig == 0

## See

• SysTimer\_SetCompareValue; SysTimer\_SetLoadValue

**Parameters** ticks – [in] Number of ticks between two interrupts.

**Returns** 0 Function succeeded.

**Returns** 1 Function failed.

```
__STATIC_FORCEINLINE uint32_t SysTick_Reload (uint64_t ticks)
System Tick Reload.
```

Reload the System Timer Tick when the MTIMECMP reached TIME value

#### Remark

- For \_\_NUCLEI\_N\_REV (page 60) >= 0x0104, the CMPCLREN bit in MTIMECTL is introduced, but for this SysTick\_Config function, we assume this CMPCLREN bit is set to 0, so in interrupt handler function, user still need to set the MTIMERCMP or MTIMER to reload the system tick, if vendor want to use this timer's auto clear feature, they can define \_\_Vendor\_SysTickConfig to 1, and implement SysTick\_Config and SysTick\_Reload functions.
- When the variable \_\_Vendor\_SysTickConfig is set to 1, then the function SysTick\_Reload is not included.
- In this case, the file **Device>.h** must contain a vendor-specific implementation of this function.
- This function only available when \_\_SYSTIMER\_PRESENT == 1 and \_\_ECLIC\_PRESENT == 1 and \_\_Vendor\_SysTickConfig == 0
- Since the MTIMERCMP value might overflow, if overflowed, MTIMER will be set to 0, and MTIMERCMP set to ticks

#### See

- SysTimer\_SetCompareValue
- SysTimer\_SetLoadValue

Parameters ticks - [in] Number of ticks between two interrupts.

**Returns** 0 Function succeeded.

Returns 1 Function failed.

## **SysTick Code Example**

The code below shows the usage of the function  $SysTick\_Config()$  and  $SysTick\_Reload()$  with an GD32VF103 SoC.

Listing 3: gd32vf103\_systick\_example.c

```
#include "gd32vf103.h"
2
   volatile uint32_t msTicks = 0;
                                                                /* Variable to store .
   →millisecond ticks */
                              (SOC_TIMER_FREQ / 1000)
   #define CONFIG TICKS
   #define SysTick_Handler
                               eclic_mtip_handler
  void SysTick_Handler(void) {
                                                               /* SysTick interrupt
   → Handler. */
   SysTimer_Reload(CONFIG_TICKS);
                                                                /* Call SysTick_Reload to
   →reload timer. */
   msTicks++;
                                                                /* See startup file.
10
   →startup_gd32vf103.S for SysTick vector */
11
```

(continues on next page)

(continued from previous page)

```
12
   int main (void) {
13
     uint32_t returnCode;
14
15
     returnCode = SysTick_Config(CONFIG_TICKS);
                                                                     /* Configure SysTick to_
16
   →generate an interrupt every millisecond */
17
     if (returnCode != 0) {
                                                                     /* Check return code for.
18
   ⇔errors */
       // Error Handling
19
20
21
22
     while (1);
   }
23
```

## **SysTimer Interrupt Code Example**

The code below shows the usage of various NMSIS Timer Interrupt functions with an GD32VF103 device.

Listing 4: gd32vf103\_timer\_example1.c

```
#include "gd32vf103.h"
   void eclic_mtip_handler(void)
       uint64_t now = SysTimer_GetLoadValue();
       SysTimer_SetCompareValue(now + SOC_TIMER_FREQ/100);
6
   static uint32_t int_cnt = 0;
   void eclic_msip_handler(void)
10
11
       SysTimer_ClearSWIRQ();
12
       int_cnt++;
13
14
   void eclic_global_initialize(void)
17
       ECLIC_SetMth(0);
18
       ECLIC_SetCfgNlbits(3);
19
20
21
   int eclic_register_interrupt(IRQn_Type IRQn, uint8_t shv, uint32_t trig_mode, uint32_
    →lvl, uint32_t priority, void * handler)
23
       ECLIC_SetShvIRQ(IRQn, shv);
24
       ECLIC_SetTrigIRQ(IRQn, trig_mode);
25
26
       ECLIC_SetLevelIRQ(IRQn, lvl);
       ECLIC_SetPriorityIRQ(IRQn, priority);
       ECLIC_SetVector(IRQn, (rv_csr_t)(handler));
       ECLIC_EnableIRQ(IRQn);
29
       return 0;
30
31
32
   void setup_timer(void)
```

(continues on next page)

(continued from previous page)

```
34
       SysTimer_SetLoadValue(0);
35
       SysTimer_SetCompareValue(SOC_TIMER_FREQ/100);
36
38
   int main (void)
39
40
       uint32 t returnCode;
41
42
       eclic_global_initialize();
                                                                      /* initialize ECLIC */
43
44
       setup_timer();
                                                                      /* initialize timer */
47
       returnCode = eclic_register_interrupt(SysTimer_IRQn,1,2,8,0,eclic_mtip_handler); __
    →/* register system timer interrupt */
48
       returnCode = eclic_register_interrupt(SysTimerSW_IRQn,1,2,8,0,eclic_msip_handler);
49
       /* register system timer SW interrupt */
50
        __enable_irq();
                                                                      /* enable global
51
   →interrupt */
52
       SysTimer_SetSWIRQ();
                                                                      /* trigger timer SW_
53
   ⇒interrupt */
       if (returnCode != 0) {
                                                                      /* Check return code
   →for errors */
          // Error Handling
56
57
58
59
       while (1);
```

## 2.5.10 Interrupts and Exceptions

## **Description**

This section explains how to use interrupts and exceptions and access functions for the Enhanced Core Local Interrupt Controller(ECLIC)<sup>18</sup>.

Nuclei provides a template file startup\_device for each supported compiler. The file must be adapted by the silicon vendor to include interrupt vectors for all device-specific interrupt handlers. Each interrupt handler is defined as a weak function to an dummy handler. These interrupt handlers can be used directly in application software without being adapted by the programmer.

Click Interrupt<sup>19</sup> to learn more about interrupt handling in Nuclei processor core.

<sup>&</sup>lt;sup>18</sup> https://doc.nucleisys.com/nuclei\_spec/isa/eclic.html

<sup>19</sup> https://doc.nucleisys.com/nuclei\_spec/isa/interrupt.html

## **NMI** Interrupt

NMI interrupt<sup>20</sup> entry is stored by CSR\_MNVEC. If CSR\_MMSIC[9] is 1 then NMI entry is the same as Exception which get from CSR\_MTVEC. If CSR\_MMSIC[9] is 1 NMI entry is reset vector.

## **Exception**

Exception<sup>21</sup> has only 1 entry address which stored by CSR\_MTVEC. All the exceptions will jump to the same entry exc\_entry defined in intexc\_<Device>.S.

The table below lists the core exception code of the Nuclei N/NX processors.

|                    | -     | ÷                                  |
|--------------------|-------|------------------------------------|
| Exception Code     | Value | Description                        |
| InsUnalign_EXCn    | 0     | Instruction address misaligned     |
| InsAccFault_EXCn   | 1     | Instruction access fault           |
| IlleIns_EXCn       | 2     | Illegal instruction                |
| Break_EXCn         | 3     | Beakpoint                          |
| LdAddrUnalign_EXCn | 4     | Load address misaligned            |
| LdFault_EXCn       | 5     | Load access fault                  |
| StAddrUnalign_EXCn | 6     | Store or AMO address misaligned    |
| StAccessFault_EXCn | 7     | Store or AMO access fault          |
| UmodeEcall_EXCn    | 8     | Environment call from User mode    |
| MmodeEcall_EXCn    | 11    | Environment call from Machine mode |
| NMI EXCn           | 0xfff | NMI interrupt                      |

Table 8: Core exception code of the Nuclei N/NX processors

## **Vector Table**

The Vector Table defines the entry addresses of the ECLIC managed interrupts.

It is typically located at the beginning of the program memory, and you can modify CSR MTVT to reallocate the base address of this vector table, but you need to take care of the base address alignment according to the number of interrupts.

| Number of Interrupt | Alignment Requirements of CSR MTVT |
|---------------------|------------------------------------|
| 0 to 16             | 64-byte                            |
| 17 to 32            | 128-byte                           |
| 33 to 64            | 256-byte                           |
| 65 to 128           | 512-byte                           |
| 129 to 256          | 1KB                                |
| 257 to 512          | 2KB                                |
| 513 to 1024         | 4KB                                |

Table 9: base address alignment according to the number of interrupts

Interrupt number 0~18 is reserved by Nuclei Core. 19~1023 could be used by Silicon Vendor Device.

Below is an example interrupt allocated table:

<sup>20</sup> https://doc.nucleisys.com/nuclei\_spec/isa/nmi.html

<sup>&</sup>lt;sup>21</sup> https://doc.nucleisys.com/nuclei\_spec/isa/exception.html

```
typedef enum IRQn {
     /***** Nuclei N/NX Processor Core Internal Interrupt Numbers.
2
   0,
      Reserved0_IRQn
                                          /*!< Internal reserved
       */
                                           /*!< Internal reserved
     Reserved1_IRQn
                                    1,
       */
                                            /*!< Internal reserved
     Reserved2_IRQn
                                     2,
5
       */
      SysTimerSW_IRQn
                                     3,
                                            /*!< System Timer SW interrupt
6
       */
      Reserved3_IRQn
                                     4,
                                            /*!< Internal reserved
      Reserved4_IRQn
                                     5,
                                            /*!< Internal reserved
8
      Reserved5_IRQn
                                     6,
                                            /*!< Internal reserved
                                  =
9
       */
     SysTimer_IRQn
                                     7,
                                            /*!< System Timer Interrupt
10
       */
     Reserved6_IRQn
                                     8,
                                            /*!< Internal reserved
11
     Reserved7_IRQn
                                    9,
                                            /*!< Internal reserved
12
                                                                                   ш.
                                            /*!< Internal reserved
      Reserved8_IRQn
                                 = 10,
13
                                            /*!< Internal reserved
      Reserved9_IRQn
                                 = 11,
14
   \hookrightarrow
                                 = 12,
                                            /*!< Internal reserved
      Reserved10_IRQn
15
        */
   \hookrightarrow
                                 = 13,
                                            /*!< Internal reserved
     Reserved11_IRQn
16
        */
                                 = 14,
                                            /*!< Internal reserved
      Reserved12_IRQn
17
        */
      Reserved13_IRQn
                                 = 15,
                                            /*!< Internal reserved
18
       */
      Reserved14_IRQn
                                 = 16,
                                            /*!< Internal reserved
19
       */
     HardFault_IRQn
                                 = 17,
                                           /*!< Hard Fault, storage access error
20
      */
                                            /*!< Internal reserved
     Reserved15_IRQn
                                 = 18,
21
22
     /***** GD32VF103 Specific External Interrupt Numbers
23
   WWDGT_IRQn
                                 = 19,
                                          /*!< window watchDog timer interrupt
24
      */
     LVD_IRQn
                                 = 20,
                                           /*!< LVD through EXTI line detect
25
   →interrupt
                */
     TAMPER_IRQn
                                 = 21,
                                           /*!< tamper through EXTI line detect
26
      */
27
28
      CAN1_EWMC_IRQn
                                 = 85,
                                            /*!< CAN1 EWMC interrupt
29
        */
      USBFS_IRQn
                                 = 86,
                                           /*!< USBFS global interrupt
30
        */
                                            /*!< Number of total Interrupts</pre>
      SOC_INT_MAX,
31
                                                                       (continues on next page)
```

IRQn\_Type;

#### **ECLIC API Definitions**

When macro NMSIS\_ECLIC\_VIRTUAL is defined, the ECLIC access functions in the table below must be implemented for virtualizing ECLIC access.

These functions should be implemented in a separate source module. The original NMSIS-Core \_\_ECLIC\_xxx functions are always available independent of NMSIS\_ECLIC\_VIRTUAL macro.

**ECLIC ACCESS FUNCTIONS** NMSIS-CORE FUNCTIONS FOR ECLIC ECLIC SetCfqNlbits (page 307) ECLIC SetCfqNlbits() ECLIC\_GetCfgNlbits (page 307) ECLIC\_GetCfgNlbits() ECLIC GetInfoVer (page 307) ECLIC GetInfoVer() \_ECLIC\_GetInfoCtlbits() ECLIC\_GetInfoCtlbits (page 307) ECLIC\_GetInfoNum (page 307) \_ECLIC\_GetInfoNum() ECLIC\_SetMth (page 307) ECLIC\_SetMth() ECLIC\_GetMth (page 307) \_ECLIC\_GetMth() ECLIC\_EnableIRQ (page 307) ECLIC\_EnableIRQ() ECLIC GetEnableIRQ (page 308) ECLIC GetEnableIRQ() ECLIC\_DisableIRQ (page 308) \_ECLIC\_DisableIRQ() ECLIC\_SetPendingIRQ (page 308) ECLIC\_SetPendingIRQ() ECLIC\_GetPendingIRQ (page 308) \_ECLIC\_GetPendingIRQ() ECLIC\_ClearPendingIRQ (page 308) ECLIC ClearPendingIRQ() ECLIC\_SetTrigIRQ (page 308) \_ECLIC\_SetTrigIRQ() ECLIC\_GetTrigIRQ (page 308) \_ECLIC\_GetTrigIRQ() ECLIC\_SetShvIRQ (page 308) \_ECLIC\_SetShvIRQ() ECLIC GetShvIRQ (page 308) ECLIC GetShvIRQ() ECLIC\_SetCtrlIRQ (page 308) ECLIC\_SetCtrlIRQ() ECLIC\_GetCtrlIRQ (page 308) \_ECLIC\_GetCtrlIRQ() ECLIC\_SetLevelIRQ (page 308) \_ECLIC\_SetLevelIRQ() ECLIC\_Get Level IRQ (page 308) \_ECLIC\_GetLevelIRQ() ECLIC SetPriorityIRQ (page 308) ECLIC SetPriorityIRQ() ECLIC\_GetPriorityIRQ (page 308) ECLIC\_GetPriorityIRQ()

Table 10: ECLIC Access Functions

When  $\texttt{NMSIS\_VECTAB\_VIRTUAL}$  macro is defined, the functions in the table below must be replaced to virtualize the API access functions to the interrupt vector table.

The ECLIC vector table API should be implemented in a separate source module.

This allows, for example, alternate implementations to relocate the vector table from flash to RAM on the first vector table update.

The original NMSIS-Core functions are always available, but prefixed with \_\_\_ECLIC.

Table 11: ECLIC Vector Access Functions

| ECLIC Vector Table Access  | NMSIS-CORE FUNCTIONS |
|----------------------------|----------------------|
| ECLIC_SetVector (page 308) | ECLIC_SetVector()    |
| ECLIC GetVector (page 308) | ECLIC GetVector()    |

## **ECLIC Function Usage**

The code below shows the usage of various NMSIS ECLIC flow with an GD32VF103 device.

Listing 5: gd32vf103\_interrupt\_example1.c

```
#include "qd32vf103.h"
2
   // Vector interrupt which could be nested
     INTERRUPT void eclic button1 handler (void)
4
       SAVE_IRQ_CSR_CONTEXT();
                                                                              /* save mepc,
    →mcause, msubm enable interrupts */
       GPIO_REG(GPIO_OUTPUT_VAL) |= (1 << GREEN_LED_GPIO_OFFSET);</pre>
                                                                             /* Green LED On.
8
       GPIO_REG(GPIO_RISE_IP) = (0x1 << BUTTON_1_GPIO_OFFSET);</pre>
                                                                             /* Clear the
    → GPIO Pending interrupt by writing 1. */
       RESTORE_IRQ_CSR_CONTEXT();
                                                                              /* disable
11
    →interrupts, restore mepc, mcause, msubm */
12
13
   // Non-vector interrupt
14
   void eclic_button2_handler(void)
16
       GPIO_REG(GPIO_OUTPUT_VAL) |= (1 << GREEN_LED_GPIO_OFFSET);</pre>
                                                                       /* Green LED On
17
       GPIO_REG(GPIO_RISE_IP) = (0x1 << BUTTON_2_GPIO_OFFSET);</pre>
                                                                            /* Clear the
18
    →GPIO Pending interrupt by writing 1. */
   void eclic_global_initialize(void)
21
22
       ECLIC_SetMth(0);
23
       ECLIC_SetCfgNlbits(3);
24
25
   int eclic_register_interrupt(IRQn_Type IRQn, uint8_t shv, uint32_t trig_mode, uint32_
27
    →lvl, uint32_t priority, void * handler)
28
       ECLIC_SetShvIRQ(IRQn, shv);
29
       ECLIC_SetTrigIRQ(IRQn, trig_mode);
30
       ECLIC_SetLevelIRQ(IRQn, lvl);
31
       ECLIC_SetPriorityIRQ(IRQn, priority);
32
       ECLIC_SetVector(IRQn, (rv_csr_t) (handler));
33
       ECLIC_EnableIRQ(IRQn);
34
       return 0;
35
36
37
   int main (void)
38
39
       uint32_t returnCode;
40
41
       eclic_global_initialize();
                                                                      /* initialize ECLIC */
42.
43
       GPIO_init();
                                                                      /* initialize GPIO */
44
```

(continues on next page)

(continued from previous page)

```
returnCode = eclic_register_interrupt(BTN1_IRQn,1,2,1,0,Button1_IRQHandler); /*_
46
   \rightarrowregister system button1 interrupt */
      returnCode = eclic_register_interrupt(BTN2_IRQn,0,2,2,0,Button2_IRQHandler); /*_
47
   →register system button2 interrupt */
       __enable_irq();
                                                                     /* enable global_
   →interrupt */
50
       if (returnCode != 0) {
                                                                     /* Check return code_
51
   →for errors */
         // Error Handling
52
53
       while (1);
55
```

## **Interrupt and Exception API**

```
enum IRQn_Type
    Values:
    enumerator Reserved0_IRQn
    enumerator Reserved1 IRQn
    enumerator Reserved2_IRQn
    enumerator SysTimerSW_IRQn
    enumerator Reserved3_IRQn
    enumerator Reserved4_IRQn
    enumerator Reserved5_IRQn
    enumerator SysTimer_IRQn
    enumerator Reserved6_IRQn
    enumerator Reserved7_IRQn
    enumerator Reserved8_IRQn
    enumerator Reserved9 IRQn
    enumerator Reserved10_IRQn
    enumerator Reserved11_IRQn
    enumerator Reserved12_IRQn
    enumerator Reserved13 IRQn
    enumerator Reserved14 IRQn
    enumerator Reserved15_IRQn
    enumerator Reserved16_IRQn
    enumerator FirstDeviceSpecificInterrupt_IRQn
    enumerator SOC_INT_MAX
__STATIC_FORCEINLINE void __ECLIC_SetCfgNlbits (uint32_t nlbits)
```

```
__STATIC_FORCEINLINE uint32_t __ECLIC_GetCfgNlbits (void)
STATIC FORCEINLINE uint32 t ECLIC GetInfoVer (void)
__STATIC_FORCEINLINE uint32_t __ECLIC_GetInfoCtlbits (void)
__STATIC_FORCEINLINE uint32_t __ECLIC_GetInfoNum (void)
STATIC FORCEINLINE void ECLIC SetMth (uint8 t mth)
STATIC FORCEINLINE uint8 t ECLIC GetMth (void)
__STATIC_FORCEINLINE void __ECLIC_EnableIRQ (IRQn_Type IRQn)
__STATIC_FORCEINLINE uint32_t __ECLIC_GetEnableIRQ (IRQn_Type IRQn)
__STATIC_FORCEINLINE void __ECLIC_DisableIRQ (IRQn_Type IRQn)
__STATIC_FORCEINLINE int32_t __ECLIC_GetPendingIRQ (IRQn_Type IRQn)
__STATIC_FORCEINLINE void __ECLIC_SetPendingIRQ (IRQn_Type IRQn)
__STATIC_FORCEINLINE void __ECLIC_ClearPendingIRQ (IRQn_Type IRQn)
__STATIC_FORCEINLINE void __ECLIC_SetTrigIRQ (IRQn_Type IRQn, uint32_t trig)
__STATIC_FORCEINLINE uint32_t __ECLIC_GetTrigIRQ (IRQn_Type IRQn)
__STATIC_FORCEINLINE void __ECLIC_SetShvIRQ (IRQn_Type IRQn, uint32_t shv)
STATIC FORCEINLINE uint32 t ECLIC GetShvIRQ (IRQn Type IRQn)
__STATIC_FORCEINLINE void __ECLIC_SetCtrlIRQ (IRQn_Type IRQn, uint8_t intctrl)
__STATIC_FORCEINLINE uint8_t __ECLIC_GetCtrlIRQ (IRQn_Type IRQn)
__STATIC_FORCEINLINE void __ECLIC_SetLevelIRQ (IRQn_Type IRQn, uint8_t lvl_abs)
STATIC FORCEINLINE uint8 t ECLIC GetLevelIRO (IROn Type IROn)
__STATIC_FORCEINLINE void __ECLIC_SetPriorityIRQ (IRQn_Type IRQn, uint8_t pri)
__STATIC_FORCEINLINE uint8_t __ECLIC_GetPriorityIRQ (IRQn_Type IRQn)
__STATIC_FORCEINLINE void __ECLIC_SetVector (IRQn_Type IRQn, rv_csr_t vector)
__STATIC_FORCEINLINE rv_csr_t __ECLIC_GetVector (IRQn_Type IRQn)
__STATIC_FORCEINLINE void __set_exc_entry (rv_csr_t addr)
__STATIC_FORCEINLINE rv_csr_t __get_exc_entry (void)
__STATIC_FORCEINLINE void __set_nonvec_entry (rv_csr_t addr)
__STATIC_FORCEINLINE rv_csr_t __get_nonvec_entry (void)
__STATIC_FORCEINLINE rv_csr_t __get_nmi_entry (void)
ECLIC_SetCfgNlbits __ECLIC_SetCfgNlbits
ECLIC_GetCfgNlbits __ECLIC_GetCfgNlbits
ECLIC_GetInfoVer __ECLIC_GetInfoVer
ECLIC_GetInfoCtlbits __ECLIC_GetInfoCtlbits
ECLIC GetInfoNum ECLIC GetInfoNum
ECLIC SetMth ECLIC SetMth
ECLIC GetMth ECLIC GetMth
```

```
ECLIC_EnableIRQ __ECLIC_EnableIRQ
ECLIC_GetEnableIRQ __ECLIC_GetEnableIRQ
ECLIC_DisableIRQ __ECLIC_DisableIRQ
ECLIC_SetPendingIRQ __ECLIC_SetPendingIRQ
ECLIC_GetPendingIRQ __ECLIC_GetPendingIRQ
ECLIC_ClearPendingIRQ __ECLIC_ClearPendingIRQ
ECLIC_SetTrigIRQ __ECLIC_SetTrigIRQ
ECLIC_GetTrigIRQ __ECLIC_GetTrigIRQ
ECLIC_SetShvIRQ __ECLIC_SetShvIRQ
ECLIC_GetShvIRQ __ECLIC_GetShvIRQ
ECLIC_SetCtrlIRQ __ECLIC_SetCtrlIRQ
ECLIC_GetCtrlIRQ __ECLIC_GetCtrlIRQ
ECLIC_SetLevelIRQ __ECLIC_SetLevelIRQ
ECLIC_GetLevelIRQ __ECLIC_GetLevelIRQ
ECLIC_SetPriorityIRQ __ECLIC_SetPriorityIRQ
ECLIC GetPriorityIRQ ECLIC GetPriorityIRQ
ECLIC_SetVector __ECLIC_SetVector
ECLIC_GetVector __ECLIC_GetVector
SAVE_IRQ_CSR_CONTEXT()
RESTORE_IRQ_CSR_CONTEXT()
group NMSIS_Core_IntExc
    Functions that manage interrupts and exceptions via the ECLIC.
```

#### **Defines**

```
ECLIC_SetCfgNlbits __ECLIC_SetCfgNlbits

ECLIC_GetCfgNlbits __ECLIC_GetCfgNlbits

ECLIC_GetInfoVer __ECLIC_GetInfoVer

ECLIC_GetInfoCtlbits __ECLIC_GetInfoCtlbits

ECLIC_GetInfoNum __ECLIC_GetInfoNum

ECLIC_SetMth __ECLIC_SetMth

ECLIC_GetMth __ECLIC_GetMth

ECLIC_EnableIRQ __ECLIC_EnableIRQ

ECLIC_GetEnableIRQ __ECLIC_GetEnableIRQ

ECLIC_DisableIRQ __ECLIC_DisableIRQ

ECLIC_SetPendingIRQ __ECLIC_SetPendingIRQ

ECLIC_GetPendingIRQ __ECLIC_GetPendingIRQ
```

```
ECLIC_ClearPendingIRQ __ECLIC_ClearPendingIRQ

ECLIC_SetTrigIRQ __ECLIC_SetTrigIRQ

ECLIC_GetTrigIRQ __ECLIC_GetTrigIRQ

ECLIC_SetShvIRQ __ECLIC_SetShvIRQ

ECLIC_GetShvIRQ __ECLIC_GetShvIRQ

ECLIC_GetShvIRQ __ECLIC_SetCtrlIRQ

ECLIC_SetCtrlIRQ __ECLIC_SetCtrlIRQ

ECLIC_GetCtrlIRQ __ECLIC_GetCtrlIRQ

ECLIC_GetLevelIRQ __ECLIC_SetLevelIRQ

ECLIC_GetLevelIRQ __ECLIC_GetLevelIRQ

ECLIC_GetPriorityIRQ __ECLIC_SetPriorityIRQ

ECLIC_GetPriorityIRQ __ECLIC_GetPriorityIRQ

ECLIC_GetVector __ECLIC_SetVector

ECLIC_GetVector __ECLIC_GetVector
```

Save necessary CSRs into variables for vector interrupt nesting.

This macro is used to declare variables which are used for saving CSRs(MCAUSE, MEPC, MSUB), and it will read these CSR content into these variables, it need to be used in a vector-interrupt if nesting is required.

#### Remark

- · Interrupt will be enabled after this macro is called
- It need to be used together with RESTORE\_IRQ\_CSR\_CONTEXT
- Don't use variable names \_\_mcause, \_\_mpec, \_\_msubm in your ISR code
- If you want to enable interrupt nesting feature for vector interrupt, you can do it like this:

## RESTORE\_IRQ\_CSR\_CONTEXT()

Restore necessary CSRs from variables for vector interrupt nesting.

This macro is used restore CSRs(MCAUSE, MEPC, MSUB) from pre-defined variables in SAVE\_IRQ\_CSR\_CONTEXT macro.

#### Remark

- Interrupt will be disabled after this macro is called
- It need to be used together with SAVE\_IRQ\_CSR\_CONTEXT

#### **Enums**

## enum IRQn\_Type

Definition of IRQn numbers.

The core interrupt enumeration names for IRQn values are defined in the file **<Device>.h**.

- Interrupt ID(IRQn) from 0 to 18 are reserved for core internal interrupts.
- Interrupt ID(IRQn) start from 19 represent device-specific external interrupts.
- The first device-specific interrupt has the IRQn value 19.

The table below describes the core interrupt names and their availability in various Nuclei Cores.

Values:

## enumerator Reserved0\_IRQn

Internal reserved.

#### enumerator Reserved1\_IRQn

Internal reserved.

## enumerator Reserved2\_IRQn

Internal reserved.

## enumerator SysTimerSW\_IRQn

System Timer SW interrupt.

## enumerator Reserved3\_IRQn

Internal reserved.

#### enumerator Reserved4\_IRQn

Internal reserved.

## enumerator Reserved5\_IRQn

Internal reserved.

## enumerator SysTimer\_IRQn

System Timer Interrupt.

## enumerator Reserved6\_IRQn

Internal reserved.

## enumerator Reserved7 IRQn

Internal reserved.

## enumerator Reserved8\_IRQn

Internal reserved.

## enumerator Reserved9\_IRQn

Internal reserved.

## enumerator Reserved10\_IRQn

Internal reserved.

## enumerator Reserved11\_IRQn

Internal reserved.

```
enumerator Reserved12 IROn
        Internal reserved.
    enumerator Reserved13_IRQn
        Internal reserved.
    enumerator Reserved14 IRQn
        Internal reserved.
    enumerator Reserved15 IRQn
        Internal reserved.
    enumerator Reserved16_IRQn
        Internal reserved.
    enumerator FirstDeviceSpecificInterrupt_IRQn
        First Device Specific Interrupt.
    enumerator SOC_INT_MAX
        Number of total interrupts.
Functions
 _STATIC_FORCEINLINE void __ECLIC_SetCfgNlbits (uint32_t nlbits)
    Set nlbits value.
    This function set the nlbits value of CLICCFG register.
    Remark
          • nlbits is used to set the width of level in the CLICINTCTL[i].
    See
          • ECLIC_GetCfgNlbits
        Parameters nlbits - [in] nlbits value
  _STATIC_FORCEINLINE uint32_t __ECLIC_GetCfgNlbits (void)
    Get nlbits value.
    This function get the nlbits value of CLICCFG register.
    Remark
          • nlbits is used to set the width of level in the CLICINTCTL[i].
    See
          • ECLIC_SetCfgNlbits
        Returns nlbits value of CLICCFG register
 _STATIC_FORCEINLINE uint32_t __ECLIC_GetInfoVer (void)
    Get the ECLIC version number.
```

See

This function gets harware version information from CLICINFO register.
Bit 20:17 for architecture version, bit 16:13 for implementation version.

This function gets the hardware version information from CLICINFO register.

Remark

• ECLIC GetInfoNum

**Returns** hardware version number in CLICINFO register.

# \_\_STATIC\_FORCEINLINE uint32\_t \_\_ECLIC\_GetInfoCtlbits (void) Get CLICINTCTLBITS.

This function gets CLICINTCTLBITS from CLICINFO register.

#### Remark

- In the CLICINTCTL[i] registers, with 2 <= CLICINTCTLBITS <= 8.
- The implemented bits are kept left-justified in the most-significant bits of each 8-bit CLICINTCTL[I] register, with the lower unimplemented bits treated as hardwired to 1.

#### See

• ECLIC\_GetInfoNum

Returns CLICINTCTLBITS from CLICINFO register.

```
__STATIC_FORCEINLINE uint32_t __ECLIC_GetInfoNum (void)
```

Get number of maximum interrupt inputs supported.

This function gets number of maximum interrupt inputs supported from CLICINFO register.

#### Remark

- This function gets number of maximum interrupt inputs supported from CLICINFO register.
- The num\_interrupt field specifies the actual number of maximum interrupt inputs supported in this implementation.

## See

• ECLIC\_GetInfoCtlbits

Returns number of maximum interrupt inputs supported from CLICINFO register.

```
__STATIC_FORCEINLINE void __ECLIC_SetMth (uint8_t mth)
```

Set Machine Mode Interrupt Level Threshold.

This function sets machine mode interrupt level threshold.

#### See

• ECLIC GetMth

**Parameters** mth – [in] Interrupt Level Threshold.

## \_\_STATIC\_FORCEINLINE uint8\_t \_\_ECLIC\_GetMth (void)

Get Machine Mode Interrupt Level Threshold.

This function gets machine mode interrupt level threshold.

#### See

ECLIC\_SetMth

Returns Interrupt Level Threshold.

```
__STATIC_FORCEINLINE void __ECLIC_EnableIRQ (IRQn_Type IRQn)
```

Enable a specific interrupt.

This function enables the specific interrupt IRQn.

#### Remark

• IRQn must not be negative.

See

• ECLIC\_DisableIRQ

Parameters IRQn - [in] Interrupt number

```
Get a specific interrupt enable status.
```

This function returns the interrupt enable status for the specific interrupt IRQn.

#### Remark

• IRQn must not be negative.

See

- ECLIC\_EnableIRQ
- ECLIC\_DisableIRQ

Parameters IRQn - [in] Interrupt number

#### **Returns**

- 0 Interrupt is not enabled
- 1 Interrupt is pending

```
_STATIC_FORCEINLINE void ___ECLIC_DisableIRQ (IRQn_Type IRQn)
```

Disable a specific interrupt.

This function disables the specific interrupt *IRQn*.

#### Remark

• IRQn must not be negative.

See

ECLIC EnableIRQ

Parameters IRQn – [in] Number of the external interrupt to disable

```
STATIC_FORCEINLINE int32_t __ECLIC_GetPendingIRQ (IRQn_Type IRQn)
```

Get the pending specific interrupt.

This function returns the pending status of the specific interrupt *IRQn*.

## Remark

• IRQn must not be negative.

See

- ECLIC\_SetPendingIRQ
- ECLIC\_ClearPendingIRQ

Parameters IRQn - [in] Interrupt number

## **Returns**

• 0 Interrupt is not pending

• 1 Interrupt is pending

## \_\_\_STATIC\_FORCEINLINE void \_\_\_ECLIC\_SetPendingIRQ (IRQn\_Type IRQn)

Set a specific interrupt to pending.

This function sets the pending bit for the specific interrupt *IRQn*.

#### Remark

• IRQn must not be negative.

#### See

- ECLIC\_GetPendingIRQ
- ECLIC\_ClearPendingIRQ

Parameters IRQn - [in] Interrupt number

## \_\_STATIC\_FORCEINLINE void \_\_ECLIC\_ClearPendingIRQ (IRQn\_Type IRQn)

Clear a specific interrupt from pending.

This function removes the pending state of the specific interrupt IRQn. IRQn cannot be a negative number.

## Remark

• IRQn must not be negative.

#### See

- ECLIC\_SetPendingIRQ
- ECLIC GetPendingIRQ

Parameters IRQn - [in] Interrupt number

# \_\_STATIC\_FORCEINLINE void \_\_ECLIC\_SetTrigIRQ (IRQn\_Type IRQn, uint32\_t trig) Set trigger mode and polarity for a specific interrupt.

This function set trigger mode and polarity of the specific interrupt IRQn.

## Remark

• IRQn must not be negative.

## See

• ECLIC\_GetTrigIRQ

## **Parameters**

- IRQn [in] Interrupt number
- trig [in]
  - 00 level trigger, *ECLIC\_LEVEL\_TRIGGER* (page 87)
  - 01 positive edge trigger, *ECLIC\_POSTIVE\_EDGE\_TRIGGER* (page 87)
  - 02 level trigger, *ECLIC\_LEVEL\_TRIGGER* (page 87)
  - 03 negative edge trigger, *ECLIC\_NEGTIVE\_EDGE\_TRIGGER* (page 87)

# \_\_STATIC\_FORCEINLINE uint32\_t \_\_ECLIC\_GetTrigIRQ (IRQn\_Type IRQn)

Get trigger mode and polarity for a specific interrupt.

This function get trigger mode and polarity of the specific interrupt IRQn.

#### Remark

• IRQn must not be negative.

See

• ECLIC\_SetTrigIRQ

Parameters IRQn – [in] Interrupt number

#### Returns

- 00 level trigger, ECLIC\_LEVEL\_TRIGGER (page 87)
- 01 positive edge trigger, ECLIC\_POSTIVE\_EDGE\_TRIGGER (page 87)
- 02 level trigger, ECLIC\_LEVEL\_TRIGGER (page 87)
- 03 negative edge trigger, ECLIC\_NEGTIVE\_EDGE\_TRIGGER (page 87)

\_\_STATIC\_FORCEINLINE void \_\_ECLIC\_SetShvIRQ (IRQn\_Type IRQn, uint32\_t shv)
Set interrupt working mode for a specific interrupt.

This function set selective hardware vector or non-vector working mode of the specific interrupt IRQn.

#### Remark

• IRQn must not be negative.

See

ECLIC\_GetShvIRQ

#### **Parameters**

- IRQn [in] Interrupt number
- shv [in]
  - 0 non-vector mode, ECLIC\_NON\_VECTOR\_INTERRUPT (page 87)
  - 1 vector mode, ECLIC\_VECTOR\_INTERRUPT (page 87)

```
__STATIC_FORCEINLINE uint32_t __ECLIC_GetShvIRQ (IRQn_Type IRQn)
```

Get interrupt working mode for a specific interrupt.

This function get selective hardware vector or non-vector working mode of the specific interrupt *IRQn*.

#### Remark

• IRQn must not be negative.

See

ECLIC\_SetShvIRQ

Parameters IRQn - [in] Interrupt number

## Returns shv

- 0 non-vector mode, ECLIC\_NON\_VECTOR\_INTERRUPT (page 87)
- 1 vector mode, *ECLIC\_VECTOR\_INTERRUPT* (page 87)

\_\_STATIC\_FORCEINLINE void \_\_ECLIC\_SetCtrlIRQ (IRQn\_Type IRQn, uint8\_t intctrl) Modify ECLIC Interrupt Input Control Register for a specific interrupt.

This function modify ECLIC Interrupt Input Control(CLICINTCTL[i]) register of the specific interrupt *IROn*.

#### Remark

• IRQn must not be negative.

See

• ECLIC GetCtrlIRQ

#### **Parameters**

- IRQn [in] Interrupt number
- intctrl [in] Set value for CLICINTCTL[i] register

## \_\_STATIC\_FORCEINLINE uint8\_t \_\_ECLIC\_GetCtrlIRQ (IRQn\_Type IRQn)

Get ECLIC Interrupt Input Control Register value for a specific interrupt.

This function modify ECLIC Interrupt Input Control register of the specific interrupt IRQn.

#### Remark

IRQn must not be negative.

See

• ECLIC\_SetCtrlIRQ

Parameters IRQn - [in] Interrupt number

Returns value of ECLIC Interrupt Input Control register

\_\_STATIC\_FORCEINLINE void \_\_ECLIC\_SetLevelIRQ (IRQn\_Type IRQn, uint8\_t lvl\_abs)

Set ECLIC Interrupt level of a specific interrupt.

This function set interrupt level of the specific interrupt IRQn.

#### Remark

- IRQn must not be negative.
- If lvl\_abs to be set is larger than the max level allowed, it will be force to be max level.
- When you set level value you need use clciinfo.nlbits to get the width of level. Then we could know the maximum of level. CLICINTCTLBITS is how many total bits are present in the CLICINTCTL register.

See

• ECLIC\_GetLevelIRQ

## **Parameters**

- IRQn [in] Interrupt number
- lvl\_abs [in] Interrupt level

## \_\_STATIC\_FORCEINLINE uint8\_t \_\_ECLIC\_GetLevelIRQ (IRQn\_Type IRQn)

Get ECLIC Interrupt level of a specific interrupt.

This function get interrupt level of the specific interrupt *IRQn*.

## Remark

• IRQn must not be negative.

#### See

ECLIC\_SetLevelIRQ

Parameters IRQn - [in] Interrupt number

**Returns** Interrupt level

\_\_STATIC\_FORCEINLINE void \_\_ECLIC\_SetPriorityIRQ (IRQn\_Type IRQn, uint8\_t pri)

Get ECLIC Interrupt priority of a specific interrupt.

This function get interrupt priority of the specific interrupt IRQn.

#### Remark

- IRQn must not be negative.
- If pri to be set is larger than the max priority allowed, it will be force to be max priority.
- Priority width is CLICINTCTLBITS minus clciinfo.nlbits if clciinfo.nlbits is less than CLICINTCTLBITS. Otherwise priority width is 0.

#### See

ECLIC\_GetPriorityIRQ

#### **Parameters**

- IRQn [in] Interrupt number
- pri [in] Interrupt priority

# \_\_STATIC\_FORCEINLINE uint8\_t \_\_ECLIC\_GetPriorityIRQ (IRQn\_Type IRQn)

Get ECLIC Interrupt priority of a specific interrupt.

This function get interrupt priority of the specific interrupt IRQn.

#### Remark

• IRQn must not be negative.

## See

• ECLIC\_SetPriorityIRQ

Parameters IRQn - [in] Interrupt number

**Returns** Interrupt priority

\_\_STATIC\_FORCEINLINE void \_\_ECLIC\_SetVector (IRQn\_Type IRQn, rv\_csr\_t vector)

Set Interrupt Vector of a specific interrupt.

This function set interrupt handler address of the specific interrupt IRQn.

## Remark

- IRQn must not be negative.
- You can set the CSR\_CSR\_MTVT to set interrupt vector table entry address.
- If your vector table is placed in readonly section, the vector for IRQn will not be modified. For this case, you need to use the correct irq handler name defined in your vector table as your irq handler function name.
- This function will only work correctly when the vector table is placed in an read-write enabled section.

See

• ECLIC\_GetVector

#### **Parameters**

- IRQn [in] Interrupt number
- vector [in] Interrupt handler address

```
__STATIC_FORCEINLINE rv_csr_t __ECLIC_GetVector (IRQn_Type IRQn)
```

Get Interrupt Vector of a specific interrupt.

This function get interrupt handler address of the specific interrupt IRQn.

#### Remark

- IRQn must not be negative.
- You can read CSR\_CSR\_MTVT to get interrupt vector table entry address.

See

ECLIC\_SetVector

Parameters IRQn - [in] Interrupt number

**Returns** Interrupt handler address

```
__STATIC_FORCEINLINE void __set_exc_entry (rv_csr_t addr)
```

Set Exception entry address.

This function set exception handler address to 'CSR\_MTVEC'.

## Remark

This function use to set exception handler address to 'CSR\_MTVEC'. Address is 4 bytes align.

See

• \_\_get\_exc\_entry

**Parameters** addr – [in] Exception handler address

```
__STATIC_FORCEINLINE rv_csr_t __get_exc_entry (void)
```

Get Exception entry address.

This function get exception handler address from 'CSR\_MTVEC'.

#### Remark

• This function use to get exception handler address from 'CSR\_MTVEC'. Address is 4 bytes align

See

• \_\_set\_exc\_entry

Returns Exception handler address

```
__STATIC_FORCEINLINE void __set_nonvec_entry (rv_csr_t addr)
```

Set Non-vector interrupt entry address.

This function set Non-vector interrupt address.

## Remark

• This function use to set non-vector interrupt entry address to 'CSR\_MTVT2' if

CSR\_MTVT2 bit0 is 1. If 'CSR\_MTVT2' bit0 is 0 then set address to 'CSR\_MTVEC'

### See

\_\_get\_nonvec\_entry

**Parameters** addr – [in] Non-vector interrupt entry address

```
__STATIC_FORCEINLINE rv_csr_t __get_nonvec_entry (void)
```

Get Non-vector interrupt entry address.

This function get Non-vector interrupt address.

### Remark

- This function use to get non-vector interrupt entry address from 'CSR\_MTVT2' if
- CSR\_MTVT2 bit0 is 1. If 'CSR\_MTVT2' bit0 is 0 then get address from 'CSR\_MTVEC'.

#### See

\_\_set\_nonvec\_entry

**Returns** Non-vector interrupt handler address

```
_STATIC_FORCEINLINE rv_csr_t __get_nmi_entry (void)
```

Get NMI interrupt entry from 'CSR\_MNVEC'.

This function get NMI interrupt address from 'CSR\_MNVEC'.

#### Remark

- This function use to get NMI interrupt handler address from 'CSR\_MNVEC'. If CSR\_MMISC\_CTL[9] = 1 'CSR\_MNVEC'
- will be equal as mtvec. If CSR\_MMISC\_CTL[9] = 0 'CSR\_MNVEC' will be equal as reset vector.
- NMI entry is defined via *CSR\_MMISC\_CTL* (page 72), writing to *CSR\_MNVEC* (page 72) will be ignored.

**Returns** NMI interrupt handler address

## 2.5.11 FPU Functions

## group NMSIS\_Core\_FPU\_Functions

Functions that related to the RISC-V FPU (F and D extension).

Nuclei provided floating point unit by RISC-V F and D extension.

- F extension adds single-precision floating-point computational instructions compliant with the IEEE 754-2008 arithmetic standard, \_\_RISCV\_FLEN = 32. The F extension adds 32 floating-point registers, f0-f31, each 32 bits wide, and a floating-point control and status register fcsr, which contains the operating mode and exception status of the floating-point unit.
- D extension adds double-precision floating-point computational instructions compliant with the IEEE 754-2008 arithmetic standard. The D extension widens the 32 floating-point registers, f0-f31, to 64 bits, \_\_RISCV\_FLEN = 64

### **Defines**

```
__RISCV_FLEN 64
\_get\_FCSR() \_RV\_CSR\_READ (page 63)(CSR\_FCSR (page 66))
    Get FCSR CSR Register.
__set_FCSR (val) __RV_CSR_WRITE (page 63)(CSR_FCSR (page 66), (val))
    Set FCSR CSR Register with val.
__get_FRM() _RV_CSR_READ (page 63)(CSR_FRM (page 66))
    Get FRM CSR Register.
set FRM (val) RV CSR WRITE (page 63)(CSR FRM (page 66), (val))
    Set FRM CSR Register with val.
__get_FFLAGS() __RV_CSR_READ (page 63)(CSR_FFLAGS (page 66))
    Get FFLAGS CSR Register.
set FFLAGS (val) RV CSR WRITE (page 63)(CSR FFLAGS (page 66), (val))
    Set FFLAGS CSR Register with val.
enable FPU() RV CSR SET (page 64)(CSR MSTATUS (page 67), MSTATUS FS (page 73))
    Enable FPU Unit.
 _disable_FPU() __RV_CSR_CLEAR (page 64)(CSR_MSTATUS (page 67), MSTATUS_FS
                 (page 73))
    Disable FPU Unit.
```

- We can save power by disable FPU Unit.
- When FPU Unit is disabled, any access to FPU related CSR registers and FPU instructions will cause illegal Instruction Exception.

```
\_RV_FLW (freg, addr, ofs)
```

Load a single-precision value from memory into float point register freg using flw instruction.

The FLW instruction loads a single-precision floating point value from memory address (addr + ofs) into floating point register freg(f0-f31)

### Remark

- FLW and FSW operations need to make sure the address is 4 bytes aligned, otherwise it will cause exception code 4(Load address misaligned) or 6 (Store/AMO address misaligned)
- FLW and FSW do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved

### **Parameters**

- **freg** [in] The floating point register, eg. *FREG*(0) (page 79), f0
- addr [in] The memory base address, 4 byte aligned required
- ofs [in] a 12-bit immediate signed byte offset value, should be an const value

```
\_RV_FSW (freg, addr, ofs)
```

Store a single-precision value from float point freg into memory using fsw instruction.

The FSW instruction stores a single-precision value from floating point register to memory

## Remark

- FLW and FSW operations need to make sure the address is 4 bytes aligned, otherwise it will cause exception code 4(Load address misaligned) or 6 (Store/AMO address misaligned)
- FLW and FSW do not modify the bits being transferred; in particular, the payloads of noncanonical NaNs are preserved

### **Parameters**

- **freg [in]** The floating point register(f0-f31), eg. *FREG*(0) (page 79), f0
- addr [in] The memory base address, 4 byte aligned required
- ofs [in] a 12-bit immediate signed byte offset value, should be an const value

### **\_\_\_RV\_FLD** (freg, addr, ofs)

Load a double-precision value from memory into float point register freg using fld instruction.

The FLD instruction loads a double-precision floating point value from memory address (addr + ofs) into floating point register freg(f0-f31)

#### Attention

• Function only available for double precision floating point unit, FLEN = 64

#### Remark

- FLD and FSD operations need to make sure the address is 8 bytes aligned, otherwise it will cause exception code 4(Load address misaligned) or 6 (Store/AMO address misaligned)
- FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

## **Parameters**

- **freg [in]** The floating point register, eg. *FREG*(0) (page 79), f0
- addr [in] The memory base address, 8 byte aligned required
- ofs [in] a 12-bit immediate signed byte offset value, should be an const value

## $\_$ RV $_$ FSD (freg, addr, ofs)

Store a double-precision value from float point freg into memory using fsd instruction.

The FSD instruction stores double-precision value from floating point register to memory

#### Attention

• Function only available for double precision floating point unit, FLEN = 64

### Remark

- FLD and FSD operations need to make sure the address is 8 bytes aligned, otherwise it will cause exception code 4(Load address misaligned) or 6 (Store/AMO address misaligned)
- FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

#### **Parameters**

- **freg [in]** The floating point register(f0-f31), eg. *FREG*(0) (page 79), f0
- addr [in] The memory base address, 8 byte aligned required
- ofs [in] a 12-bit immediate signed byte offset value, should be an const value

```
RV FLOAD RV FLD (page 321)
```

Load a float point value from memory into float point register freg using flw/fld instruction.

- For Single-Precision Floating-Point Mode(\_\_FPU\_PRESENT == 1, \_\_RISCV\_FLEN == 32): It will call \_\_RV\_FLW (page 320) to load a single-precision floating point value from memory to floating point register
- For Double-Precision Floating-Point Mode(\_\_FPU\_PRESENT == 2, \_\_RISCV\_FLEN == 64): It will call \_\_RV\_FLD (page 321) to load a double-precision floating point value from memory to floating point register

**Attention** Function behaviour is different for \_\_FPU\_PRESENT = 1 or 2, please see the real function this macro represent

```
RV FSTORE RV FSD (page 321)
```

Store a float value from float point freg into memory using fsw/fsd instruction.

- For Single-Precison Floating-Point Mode(\_\_FPU\_PRESENT == 1, \_\_RISCV\_FLEN == 32): It will call \_\_RV\_FSW (page 320) to store floating point register into memory
- For Double-Precision Floating-Point Mode(\_\_FPU\_PRESENT == 2, \_\_RISCV\_FLEN == 64): It will call \_\_RV\_FSD (page 321) to store floating point register into memory

**Attention** Function behaviour is different for \_\_FPU\_PRESENT = 1 or 2, please see the real function this macro represent

#### SAVE FPU CONTEXT()

Save FPU context into variables for interrupt nesting.

This macro is used to declare variables which are used for saving FPU context, and it will store the nessary fpu registers into these variables, it need to be used in a interrupt when in this interrupt fpu registers are used.

#### Remark

- It need to be used together with RESTORE\_FPU\_CONTEXT (page 322)
- Don't use variable names \_\_fpu\_context in your ISR code
- If you is code will use fpu registers, and this interrupt is nested. Then you can do it like this:

```
void eclic_mtip_handler(void)
{
    // !!!Interrupt is enabled here!!!
    // !!!Higher priority interrupt could nest it!!!

    // Necessary only when you need to use fpu registers
    // in this isr handler functions
    SAVE_FPU_CONTEXT();

    // put you own interrupt handling code here

    // pair of SAVE_FPU_CONTEXT()
    RESTORE_FPU_CONTEXT();
}
```

### RESTORE\_FPU\_CONTEXT()

Restore necessary fpu registers from variables for interrupt nesting.

This macro is used restore necessary fpu registers from pre-defined variables in SAVE\_FPU\_CONTEXT (page 322) macro.

#### Remark

• It need to be used together with SAVE\_FPU\_CONTEXT (page 322)

## **Typedefs**

```
typedef uint64_t rv_fpu_t
```

Type of FPU register, depends on the FLEN defined in RISC-V.

## 2.5.12 PMP Functions

Click Nuclei PMP Unit<sup>22</sup> to learn about Core PMP Unit in Nuclei ISA Spec.

```
__STATIC_INLINE uint8_t __get_PMPxCFG (uint32_t idx)

__STATIC_INLINE void __set_PMPxCFG (uint32_t idx, uint8_t pmpxcfg)

__STATIC_INLINE rv_csr_t __get_PMPCFGx (uint32_t idx)

__STATIC_INLINE void __set_PMPCFGx (uint32_t idx, rv_csr_t pmpcfg)

__STATIC_INLINE rv_csr_t __get_PMPADDRx (uint32_t idx)

__STATIC_INLINE void __set_PMPADDRx (uint32_t idx, rv_csr_t pmpaddr)

group NMSIS_Core_PMP_Functions
```

Functions that related to the RISCV Phyiscal Memory Protection.

Optional physical memory protection (PMP) unit provides per-hart machine-mode control registers to allow physical memory access privileges (read, write, execute) to be specified for each physical memory region.

The PMP can supports region access control settings as small as four bytes.

## **Functions**

```
__STATIC_INLINE uint8_t __get_PMPxCFG (uint32_t idx)
Get 8bit PMPxCFG Register by PMP entry index.
```

det boit I wil hel d'Registel by I wil entry mae

Return the content of the PMPxCFG Register.

**Parameters** idx – [in] PMP region index(0-15)

**Returns** PMPxCFG Register value

```
__STATIC_INLINE void __set_PMPxCFG (uint32_t idx, uint8_t pmpxcfg)

Set 8bit PMPxCFG by pmp entry index.
```

Set the given pmpxcfg value to the PMPxCFG Register.

### **Parameters**

- idx [in] PMPx region index(0-15)
- pmpxcfg [in] PMPxCFG register value to set

<sup>22</sup> https://doc.nucleisys.com/nuclei\_spec/isa/pmp.html

## \_\_STATIC\_INLINE rv\_csr\_t \_\_get\_PMPCFGx (uint32\_t idx)

Get PMPCFGx Register by index.

Return the content of the PMPCFGx Register.

#### Remark

- For RV64, only idx = 0 and idx = 2 is allowed. pmpcfg0 and pmpcfg2 hold the configurations for the 16 PMP entries, pmpcfg1 and pmpcfg3 are illegal
- For RV32, pmpcfg0-pmpcfg3, hold the configurations pmp0cfg-pmp15cfg for the 16 PMP entries

**Parameters** idx – [in] PMPCFG CSR index(0-3)

**Returns** PMPCFGx Register value

```
__STATIC_INLINE void __set_PMPCFGx (uint32_t idx, rv_csr_t pmpcfg)
Set PMPCFGx by index.
```

Write the given value to the PMPCFGx Register.

#### Remark

- For RV64, only idx = 0 and idx = 2 is allowed. pmpcfg0 and pmpcfg2 hold the configurations for the 16 PMP entries, pmpcfg1 and pmpcfg3 are illegal
- For RV32, pmpcfg0-pmpcfg3, hold the configurations pmp0cfg-pmp15cfg for the 16 PMP entries

#### **Parameters**

- idx [in] PMPCFG CSR index(0-3)
- pmpcfg [in] PMPCFGx Register value to set

```
__STATIC_INLINE rv_csr_t __get_PMPADDRx (uint32_t idx)
```

Get PMPADDRx Register by index.

Return the content of the PMPADDRx Register.

**Parameters** idx – [in] PMP region index(0-15)

**Returns** PMPADDRx Register value

```
__STATIC_INLINE void __set_PMPADDRx (uint32_t idx, rv_csr_t pmpaddr)

Set PMPADDRx by index.
```

Write the given value to the PMPADDRx Register.

### **Parameters**

- **idx [in]** PMP region index(0-15)
- pmpaddr [in] PMPADDRx Register value to set

## 2.5.13 Cache Functions

## General

```
enum CCM_OP_FINFO_Type
    Values:
    enumerator CCM_OP_SUCCESS
    enumerator CCM_OP_EXCEED_ERR
    enumerator CCM OP PERM CHECK ERR
    enumerator CCM_OP_REFILL_BUS_ERR
    enumerator CCM_OP_ECC_ERR
enum CCM_CMD_Type
    Values:
    enumerator CCM_DC_INVAL
    enumerator CCM_DC_WB
    enumerator CCM_DC_WBINVAL
    enumerator CCM_DC_LOCK
    enumerator CCM_DC_UNLOCK
    enumerator CCM_DC_WBINVAL_ALL
    enumerator CCM_DC_WB_ALL
    enumerator CCM_DC_INVAL_ALL
    enumerator CCM IC INVAL
    enumerator CCM_IC_LOCK
    enumerator CCM_IC_UNLOCK
    enumerator CCM_IC_INVAL_ALL
  STATIC_FORCEINLINE void EnableSUCCM (void)
__STATIC_FORCEINLINE void DisableSUCCM (void)
 _STATIC_FORCEINLINE void FlushPipeCCM (void)
CCM_SUEN_SUEN_Pos 0U
CCM_SUEN_SUEN_Msk (1UL << CCM_SUEN_SUEN_Pos)
group NMSIS_Core_Cache
```

Functions that configure Instruction and Data Cache.

Nuclei provide Cache Control and Maintainence(CCM) for software to control and maintain the internal L1 I/D Cache of the RISC-V Core, software can manage the cache flexibly to meet the actual application scenarios.

The CCM operations have 3 types: by single address, by all and flush pipeline. The CCM operations are done via CSR registers, M/S/U mode has its own CSR registers to do CCM operations. By default, CCM operations are not allowed in S/U mode, you can execute EnableSUCCM in M-Mode to enable it.

- API names started with M<operations>, such as MInvallCacheLine must be called in M-Mode only.
- API names started with S<operations>, such as SInvalICacheLine should be called in S-Mode.

• API names started with U<operations>, such as UInvalICacheLine should be called in U-Mode.

## **Defines**

### CCM\_SUEN\_SUEN\_Pos 0U

CSR CCM\_SUEN: SUEN bit Position.

## CCM\_SUEN\_SUEN\_Msk (1UL << CCM\_SUEN\_SUEN\_Pos)

CSR CCM SUEN: SUEN Mask.

### **Enums**

## enum CCM\_OP\_FINFO\_Type

Cache CCM Operation Fail Info.

Values:

## enumerator CCM\_OP\_SUCCESS

Lock Succeed.

### enumerator CCM\_OP\_EXCEED\_ERR

Exceed the the number of lockable ways(N-Way I/D-Cache, lockable is N-1)

## enumerator CCM\_OP\_PERM\_CHECK\_ERR

PMP/sPMP/Page-Table X(I-Cache)/R(D-Cache) permission check failed, or belong to Device/Non-Cacheable address range.

#### enumerator CCM OP REFILL BUS ERR

Refill has Bus Error.

## enumerator CCM\_OP\_ECC\_ERR

ECC Error.

### enum CCM\_CMD\_Type

Cache CCM Command Types.

Values:

## enumerator CCM\_DC\_INVAL

Unlock and invalidate D-Cache line specified by CSR CCM\_XBEGINADDR.

#### enumerator CCM DC WB

Flush the specific D-Cache line specified by CSR CCM\_XBEGINADDR.

## enumerator CCM\_DC\_WBINVAL

Unlock, flush and invalidate the specific D-Cache line specified by CSR CCM\_XBEGINADDR.

## enumerator CCM\_DC\_LOCK

Lock the specific D-Cache line specified by CSR CCM\_XBEGINADDR.

### enumerator CCM DC UNLOCK

Unlock the specific D-Cache line specified by CSR CCM\_XBEGINADDR.

## enumerator CCM\_DC\_WBINVAL\_ALL

Unlock and flush and invalidate all the valid and dirty D-Cache lines.

## enumerator CCM\_DC\_WB\_ALL

Flush all the valid and dirty D-Cache lines.

### enumerator CCM\_DC\_INVAL\_ALL

Unlock and invalidate all the D-Cache lines.

#### enumerator CCM IC INVAL

Unlock and invalidate I-Cache line specified by CSR CCM\_XBEGINADDR.

### enumerator CCM IC LOCK

Lock the specific I-Cache line specified by CSR CCM\_XBEGINADDR.

## enumerator CCM\_IC\_UNLOCK

Unlock the specific I-Cache line specified by CSR CCM XBEGINADDR.

### enumerator CCM IC INVAL ALL

Unlock and invalidate all the I-Cache lines.

### **Functions**

## \_\_STATIC\_FORCEINLINE void EnableSUCCM (void)

Enable CCM operation in Supervisor/User Mode.

This function enable CCM operation in Supervisor/User Mode. If enabled, CCM operations in supervisor/user mode will be allowed.

### Remark

• This function can be called in M-Mode only.

#### See

DisableSUCCM

## \_\_STATIC\_FORCEINLINE void DisableSUCCM (void)

Disable CCM operation in Supervisor/User Mode.

This function disable CCM operation in Supervisor/User Mode. If not enabled, CCM operations in supervisor/user mode will trigger a *illegal intruction* exception.

### Remark

• This function can be called in M-Mode only.

## See

• EnableSUCCM

## \_\_STATIC\_FORCEINLINE void FlushPipeCCM (void)

Flush pipeline after CCM operation.

This function is used to flush pipeline after CCM operations on Cache, it will ensure latest instructions or data can be seen by pipeline.

## Remark

• This function can be called in M/S/U-Mode only.

### **I-Cache Functions**

```
__STATIC_FORCEINLINE void EnableICache (void)
__STATIC_FORCEINLINE void DisableICache (void)
 _STATIC_FORCEINLINE void MInvallCacheLine (unsigned long addr)
 _STATIC_FORCEINLINE void MInvallCacheLines (unsigned long addr, unsigned long cnt)
 _STATIC_FORCEINLINE void SInvalICacheLine (unsigned long addr)
 _STATIC_FORCEINLINE void SInvalICacheLines (unsigned long addr, unsigned long cnt)
 STATIC FORCEINLINE void UInvallCacheLine (unsigned long addr)
__STATIC_FORCEINLINE void UInvalICacheLines (unsigned long addr, unsigned long cnt)
__STATIC_FORCEINLINE unsigned long MLockICacheLine (unsigned long addr)
__STATIC_FORCEINLINE unsigned long MLockICacheLines (unsigned long addr, unsigned long cnt
 STATIC FORCEINLINE unsigned long SLockICacheLine (unsigned long addr)
STATIC FORCEINLINE unsigned long SLockICacheLines (unsigned long addr, unsigned long cnt
__STATIC_FORCEINLINE unsigned long ULockICacheLine (unsigned long addr)
__STATIC_FORCEINLINE unsigned long ULockICacheLines (unsigned long addr, unsigned long cnt
__STATIC_FORCEINLINE void MUnlockICacheLine (unsigned long addr)
 _STATIC_FORCEINLINE void MUnlockICacheLines (unsigned long addr, unsigned long cnt)
__STATIC_FORCEINLINE void SUnlockICacheLine (unsigned long addr)
__STATIC_FORCEINLINE void SUnlockICacheLines (unsigned long addr, unsigned long cnt)
__STATIC_FORCEINLINE void UUnlockICacheLine (unsigned long addr)
__STATIC_FORCEINLINE void UUnlockICacheLines (unsigned long addr, unsigned long cnt)
__STATIC_FORCEINLINE void MInvallCache (void)
STATIC FORCEINLINE void SInvalICache (void)
 _STATIC_FORCEINLINE void UInvalICache (void)
group NMSIS_Core_ICache
    Functions that configure Instruction Cache.
```

## **Functions**

\_\_STATIC\_FORCEINLINE void EnableICache (void)

Enable ICache.

This function enable I-Cache

### Remark

- This function can be called in M-Mode only.
- This CSR\_MCACHE\_CTL (page 72) register control I Cache enable.

See

DisableICache

## \_\_STATIC\_FORCEINLINE void DisableICache (void)

Disable ICache.

This function Disable I-Cache

#### Remark

- This function can be called in M-Mode only.
- This CSR\_MCACHE\_CTL (page 72) register control I Cache enable.

#### See

• EnableICache

## \_\_STATIC\_FORCEINLINE void MInvallCacheLine (unsigned long addr)

Invalidate one I-Cache line specified by address in M-Mode.

This function unlock and invalidate one I-Cache line specified by the address. Command CCM\_IC\_INVAL is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

**Parameters** addr – [in] start address to be invalidated

\_\_STATIC\_FORCEINLINE void MInvallCacheLines (unsigned long addr, unsigned long cnt)
Invalidate several I-Cache lines specified by address in M-Mode.

This function unlock and invalidate several I-Cache lines specified by the address and line count. Command CCM\_IC\_INVAL is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

## **Parameters**

- addr [in] start address to be invalidated
- cnt [in] count of cache lines to be invalidated

## \_\_STATIC\_FORCEINLINE void SInvallCacheLine (unsigned long addr)

Invalidate one I-Cache line specified by address in S-Mode.

This function unlock and invalidate one I-Cache line specified by the address. Command CCM\_IC\_INVAL is written to CSR *CSR\_CCM\_SCOMMAND* (page 73).

**Remark** This function must be executed in M/S-Mode only.

**Parameters** addr – [in] start address to be invalidated

\_\_STATIC\_FORCEINLINE void SInvalICacheLines (unsigned long addr, unsigned long cnt)
Invalidate several I-Cache lines specified by address in S-Mode.

This function unlock and invalidate several I-Cache lines specified by the address and line count. Command CCM\_IC\_INVAL is written to CSR CSR\_CCM\_SCOMMAND (page 73).

**Remark** This function must be executed in M/S-Mode only.

#### **Parameters**

- addr [in] start address to be invalidated
- cnt [in] count of cache lines to be invalidated

## \_\_STATIC\_FORCEINLINE void UInvallCacheLine (unsigned long addr)

Invalidate one I-Cache line specified by address in U-Mode.

This function unlock and invalidate one I-Cache line specified by the address. Command CCM\_IC\_INVAL is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

Parameters addr - [in] start address to be invalidated

\_\_STATIC\_FORCEINLINE void UInvalICacheLines (unsigned long addr, unsigned long cnt)
Invalidate several I-Cache lines specified by address in U-Mode.

This function unlock and invalidate several I-Cache lines specified by the address and line count. Command CCM\_IC\_INVAL is written to CSR CSR\_CCM\_UCOMMAND (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

### **Parameters**

- addr [in] start address to be invalidated
- cnt [in] count of cache lines to be invalidated

## \_\_STATIC\_FORCEINLINE unsigned long MLockICacheLine (unsigned long addr)

Lock one I-Cache line specified by address in M-Mode.

This function lock one I-Cache line specified by the address. Command CCM\_IC\_LOCK is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

Parameters addr - [in] start address to be locked

**Returns** result of CCM lock operation, see enum CCM\_OP\_FINFO

\_\_STATIC\_FORCEINLINE unsigned long MLockICacheLines (unsigned long addr, unsigned long Lock several I-Cache lines specified by address in M-Mode.

This function lock several I-Cache lines specified by the address and line count. Command CCM\_IC\_LOCK is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

#### **Parameters**

- addr [in] start address to be locked
- cnt [in] count of cache lines to be locked

**Returns** result of CCM lock operation, see enum CCM\_OP\_FINFO

# \_\_STATIC\_FORCEINLINE unsigned long SLockICacheLine (unsigned long addr)

Lock one I-Cache line specified by address in S-Mode.

This function lock one I-Cache line specified by the address. Command CCM\_IC\_LOCK is written to CSR *CSR\_CCM\_SCOMMAND* (page 73).

**Remark** This function must be executed in M/S-Mode only.

Parameters addr - [in] start address to be locked

Returns result of CCM lock operation, see enum CCM OP FINFO

\_\_STATIC\_FORCEINLINE unsigned long SLockICacheLines (unsigned long addr, unsigned long Lock several I-Cache lines specified by address in S-Mode.

This function lock several I-Cache lines specified by the address and line count. Command CCM\_IC\_LOCK is written to CSR CSR\_CCM\_SCOMMAND (page 73).

**Remark** This function must be executed in M/S-Mode only.

#### **Parameters**

- addr [in] start address to be locked
- cnt [in] count of cache lines to be locked

**Returns** result of CCM lock operation, see enum CCM\_OP\_FINFO

\_\_STATIC\_FORCEINLINE unsigned long ULockICacheLine (unsigned long addr)

Lock one I-Cache line specified by address in U-Mode.

This function lock one I-Cache line specified by the address. Command CCM\_IC\_LOCK is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

Parameters addr - [in] start address to be locked

Returns result of CCM lock operation, see enum CCM\_OP\_FINFO

\_\_STATIC\_FORCEINLINE unsigned long ULockICacheLines (unsigned long addr, unsigned long Lock several I-Cache lines specified by address in U-Mode.

This function lock several I-Cache lines specified by the address and line count. Command CCM\_IC\_LOCK is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

## **Parameters**

- addr [in] start address to be locked
- cnt [in] count of cache lines to be locked

Returns result of CCM lock operation, see enum CCM\_OP\_FINFO

## \_\_STATIC\_FORCEINLINE void MUnlockICacheLine (unsigned long addr)

Unlock one I-Cache line specified by address in M-Mode.

This function unlock one I-Cache line specified by the address. Command CCM\_IC\_UNLOCK is written to CSR CSR CCM MCOMMAND (page 73).

**Remark** This function must be executed in M-Mode only.

Parameters addr – [in] start address to be unlocked

\_\_STATIC\_FORCEINLINE void MUnlockICacheLines (unsigned long addr, unsigned long cnt)
Unlock several I-Cache lines specified by address in M-Mode.

This function unlock several I-Cache lines specified by the address and line count. Command CCM\_IC\_UNLOCK is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

#### **Parameters**

• addr - [in] start address to be unlocked

• cnt – [in] count of cache lines to be unlocked

### STATIC FORCEINLINE void SUnlockICacheLine (unsigned long addr)

Unlock one I-Cache line specified by address in S-Mode.

This function unlock one I-Cache line specified by the address. Command CCM\_IC\_UNLOCK is written to CSR CSR CCM SCOMMAND (page 73).

**Remark** This function must be executed in M/S-Mode only.

Parameters addr - [in] start address to be unlocked

\_\_STATIC\_FORCEINLINE void SUnlockICacheLines (unsigned long addr, unsigned long cnt)
Unlock several I-Cache lines specified by address in S-Mode.

This function unlock several I-Cache lines specified by the address and line count. Command CCM\_IC\_UNLOCK is written to CSR *CSR\_CCM\_SCOMMAND* (page 73).

**Remark** This function must be executed in M/S-Mode only.

#### **Parameters**

- addr [in] start address to be unlocked
- cnt [in] count of cache lines to be unlocked

## \_\_STATIC\_FORCEINLINE void UUnlockICacheLine (unsigned long addr)

Unlock one I-Cache line specified by address in U-Mode.

This function unlock one I-Cache line specified by the address. Command CCM\_IC\_UNLOCK is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

**Parameters** addr – [in] start address to be unlocked

\_\_STATIC\_FORCEINLINE void UUnlockICacheLines (unsigned long addr, unsigned long cnt)
Unlock several I-Cache lines specified by address in U-Mode.

This function unlock several I-Cache lines specified by the address and line count. Command CCM\_IC\_UNLOCK is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

#### **Parameters**

- addr [in] start address to be unlocked
- cnt [in] count of cache lines to be unlocked

### \_\_STATIC\_FORCEINLINE void MInvalICache (void)

Invalidate all I-Cache lines in M-Mode.

This function invalidate all I-Cache lines. Command CCM\_IC\_INVAL\_ALL is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

**Parameters** addr – [in] start address to be invalidated

## \_\_STATIC\_FORCEINLINE void SInvalICache (void)

Invalidate all I-Cache lines in S-Mode.

This function invalidate all I-Cache lines. Command CCM\_IC\_INVAL\_ALL is written to CSR *CSR\_CCM\_SCOMMAND* (page 73).

**Remark** This function must be executed in M/S-Mode only.

Parameters addr - [in] start address to be invalidated

STATIC FORCEINLINE void UInvalICache (void)

Invalidate all I-Cache lines in U-Mode.

This function invalidate all I-Cache lines. Command CCM\_IC\_INVAL\_ALL is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

Parameters addr - [in] start address to be invalidated

## **D-Cache Functions**

```
_STATIC_FORCEINLINE void EnableDCache (void)
 _STATIC_FORCEINLINE void DisableDCache (void)
 STATIC_FORCEINLINE void MInvalDCacheLine (unsigned long addr)
_STATIC_FORCEINLINE void MInvalDCacheLines (unsigned long addr, unsigned long cnt)
__STATIC_FORCEINLINE void SInvalDCacheLine (unsigned long addr)
_STATIC_FORCEINLINE void SInvalDCacheLines (unsigned long addr, unsigned long cnt)
STATIC FORCEINLINE void UInvalDCacheLine (unsigned long addr)
__STATIC_FORCEINLINE void UInvalDCacheLines (unsigned long addr, unsigned long cnt)
__STATIC_FORCEINLINE void MFlushDCacheLine (unsigned long addr)
 STATIC_FORCEINLINE void MFlushDCacheLines (unsigned long addr, unsigned long cnt)
_STATIC_FORCEINLINE void SFlushDCacheLine (unsigned long addr)
__STATIC_FORCEINLINE void SFlushDCacheLines (unsigned long addr, unsigned long cnt)
 _STATIC_FORCEINLINE void UFlushDCacheLine (unsigned long addr)
__STATIC_FORCEINLINE void UFlushDCacheLines (unsigned long addr, unsigned long cnt)
 _STATIC_FORCEINLINE void MFlushInvalDCacheLine (unsigned long addr)
__STATIC_FORCEINLINE void MFlushInvalDCacheLines (unsigned long addr, unsigned long cnt)
__STATIC_FORCEINLINE void SFlushInvalDCacheLine (unsigned long addr)
STATIC FORCEINLINE void SFlushInvalDCacheLines (unsigned long addr, unsigned long cnt)
__STATIC_FORCEINLINE void UFlushInvalDCacheLine (unsigned long addr)
__STATIC_FORCEINLINE void UFlushInvalDCacheLines (unsigned long addr, unsigned long cnt)
__STATIC_FORCEINLINE unsigned long MLockDCacheLine (unsigned long addr)
__STATIC_FORCEINLINE unsigned long MLockDCacheLines (unsigned long addr, unsigned long cnt
 _STATIC_FORCEINLINE unsigned long SLockDCacheLine (unsigned long addr)
```

2.5. NMSIS Core API 333

\_STATIC\_FORCEINLINE unsigned long SLockDCacheLines (unsigned long addr, unsigned long cnt

```
_STATIC_FORCEINLINE unsigned long ULockDCacheLine (unsigned long addr)
 _STATIC_FORCEINLINE unsigned long ULockDCacheLines (unsigned long addr, unsigned long cnt
__STATIC_FORCEINLINE void MUnlockDCacheLine (unsigned long addr)
__STATIC_FORCEINLINE void MUnlockDCacheLines (unsigned long addr, unsigned long cnt)
 STATIC FORCEINLINE void SUnlockDCacheLine (unsigned long addr)
STATIC FORCEINLINE void SUnlockDCacheLines (unsigned long addr, unsigned long cnt)
__STATIC_FORCEINLINE void UUnlockDCacheLine (unsigned long addr)
__STATIC_FORCEINLINE void UUnlockDCacheLines (unsigned long addr, unsigned long cnt)
 _STATIC_FORCEINLINE void MInvalDCache (void)
 _STATIC_FORCEINLINE void SInvalDCache (void)
__STATIC_FORCEINLINE void UInvalDCache (void)
__STATIC_FORCEINLINE void MFlushDCache (void)
 _STATIC_FORCEINLINE void SFlushDCache (void)
__STATIC_FORCEINLINE void UFlushDCache (void)
 _STATIC_FORCEINLINE void MFlushInvalDCache (void)
 STATIC FORCEINLINE void SFlushInvalDCache (void)
__STATIC_FORCEINLINE void UFlushInvalDCache (void)
group NMSIS_Core_DCache
    Functions that configure Data Cache.
    Functions
      STATIC_FORCEINLINE void EnableDCache (void)
        Enable DCache.
        This function enable D-Cache
        Remark
             • This function can be called in M-Mode only.
             • This CSR_MCACHE_CTL (page 72) register control D Cache enable.
        See

    DisableDCache

      _STATIC_FORCEINLINE void DisableDCache (void)
        Disable DCache.
        This function Disable D-Cache
        Remark
             • This function can be called in M-Mode only.
             • This CSR_MCACHE_CTL (page 72) register control D Cache enable.
        See
             • EnableDCache
```

## \_\_STATIC\_FORCEINLINE void MInvalDCacheLine (unsigned long addr)

Invalidate one D-Cache line specified by address in M-Mode.

This function unlock and invalidate one D-Cache line specified by the address. Command CCM\_DC\_INVAL is written to CSR CSR\_CCM\_MCOMMAND (page 73).

**Remark** This function must be executed in M-Mode only.

Parameters addr - [in] start address to be invalidated

\_\_STATIC\_FORCEINLINE void MInvalDCacheLines (unsigned long addr, unsigned long cnt)
Invalidate several D-Cache lines specified by address in M-Mode.

This function unlock and invalidate several D-Cache lines specified by the address and line count. Command CCM\_DC\_INVAL is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

### **Parameters**

- addr [in] start address to be invalidated
- cnt [in] count of cache lines to be invalidated

## \_\_STATIC\_FORCEINLINE void SInvalDCacheLine (unsigned long addr)

Invalidate one D-Cache line specified by address in S-Mode.

This function unlock and invalidate one D-Cache line specified by the address. Command CCM DC INVAL is written to CSR CSR CCM MCOMMAND (page 73).

**Remark** This function must be executed in M/S-Mode only.

Parameters addr - [in] start address to be invalidated

\_\_STATIC\_FORCEINLINE void SInvalDCacheLines (unsigned long addr, unsigned long cnt)
Invalidate several D-Cache lines specified by address in S-Mode.

This function unlock and invalidate several D-Cache lines specified by the address and line count. Command CCM\_DC\_INVAL is written to CSR *CSR\_CCM\_SCOMMAND* (page 73).

**Remark** This function must be executed in M/S-Mode only.

### **Parameters**

- addr [in] start address to be invalidated
- cnt [in] count of cache lines to be invalidated

### STATIC FORCEINLINE void UInvalDCacheLine (unsigned long addr)

Invalidate one D-Cache line specified by address in U-Mode.

This function unlock and invalidate one D-Cache line specified by the address. Command CCM\_DC\_INVAL is written to CSR CSR\_CCM\_UCOMMAND (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

**Parameters** addr – [in] start address to be invalidated

\_\_STATIC\_FORCEINLINE void UInvalDCacheLines (unsigned long addr, unsigned long cnt)
Invalidate several D-Cache lines specified by address in U-Mode.

This function unlock and invalidate several D-Cache lines specified by the address and line count. Command CCM DC INVAL is written to CSR CSR CCM UCOMMAND (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

#### **Parameters**

- addr [in] start address to be invalidated
- cnt [in] count of cache lines to be invalidated

## \_STATIC\_FORCEINLINE void MFlushDCacheLine (unsigned long addr)

Flush one D-Cache line specified by address in M-Mode.

This function flush one D-Cache line specified by the address. Command CCM\_DC\_WB is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

Parameters addr - [in] start address to be flushed

\_\_STATIC\_FORCEINLINE void MFlushDCacheLines (unsigned long addr, unsigned long cnt) Flush several D-Cache lines specified by address in M-Mode.

This function flush several D-Cache lines specified by the address and line count. Command CCM\_DC\_WB is written to CSR CSR\_CCM\_MCOMMAND (page 73).

**Remark** This function must be executed in M-Mode only.

#### **Parameters**

- addr [in] start address to be flushed
- cnt [in] count of cache lines to be flushed

### \_\_STATIC\_FORCEINLINE void SFlushDCacheLine (unsigned long addr)

Flush one D-Cache line specified by address in S-Mode.

This function flush one D-Cache line specified by the address. Command CCM\_DC\_WB is written to CSR *CSR\_CCM\_SCOMMAND* (page 73).

**Remark** This function must be executed in M/S-Mode only.

**Parameters** addr – [in] start address to be flushed

\_\_STATIC\_FORCEINLINE void SFlushDCacheLines (unsigned long addr, unsigned long cnt) Flush several D-Cache lines specified by address in S-Mode.

This function flush several D-Cache lines specified by the address and line count. Command CCM\_DC\_WB is written to CSR *CSR\_CCM\_SCOMMAND* (page 73).

**Remark** This function must be executed in M/S-Mode only.

## **Parameters**

- addr [in] start address to be flushed
- cnt [in] count of cache lines to be flushed

## \_STATIC\_FORCEINLINE void UFlushDCacheLine (unsigned long addr)

Flush one D-Cache line specified by address in U-Mode.

This function flush one D-Cache line specified by the address. Command CCM\_DC\_WB is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

### **Parameters** addr – [in] start address to be flushed

\_\_STATIC\_FORCEINLINE void UFlushDCacheLines (unsigned long addr, unsigned long cnt)
Flush several D-Cache lines specified by address in U-Mode.

This function flush several D-Cache lines specified by the address and line count. Command CCM\_DC\_WB is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

## **Parameters**

- addr [in] start address to be flushed
- cnt [in] count of cache lines to be flushed

### \_\_STATIC\_FORCEINLINE void MFlushInvalDCacheLine (unsigned long addr)

Flush and invalidate one D-Cache line specified by address in M-Mode.

This function flush and invalidate one D-Cache line specified by the address. Command CCM\_DC\_WBINVAL is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

Parameters addr - [in] start address to be flushed and invalidated

\_\_STATIC\_FORCEINLINE void MFlushInvalDCacheLines (unsigned long addr, unsigned long cn Flush and invalidate several D-Cache lines specified by address in M-Mode.

This function flush and invalidate several D-Cache lines specified by the address and line count. Command CCM\_DC\_WBINVAL is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

### **Parameters**

- addr [in] start address to be flushed and invalidated
- cnt [in] count of cache lines to be flushed and invalidated

## \_\_\_STATIC\_FORCEINLINE void SFlushInvalDCacheLine (unsigned long addr)

Flush and invalidate one D-Cache line specified by address in S-Mode.

This function flush and invalidate one D-Cache line specified by the address. Command CCM\_DC\_WBINVAL is written to CSR CSR\_CCM\_SCOMMAND (page 73).

**Remark** This function must be executed in M/S-Mode only.

Parameters addr - [in] start address to be flushed and invalidated

\_\_STATIC\_FORCEINLINE void SFlushInvalDCacheLines (unsigned long addr, unsigned long cn Flush and invalidate several D-Cache lines specified by address in S-Mode.

This function flush and invalidate several D-Cache lines specified by the address and line count. Command CCM\_DC\_WBINVAL is written to CSR *CSR\_CCM\_SCOMMAND* (page 73).

**Remark** This function must be executed in M/S-Mode only.

## **Parameters**

- addr [in] start address to be flushed and invalidated
- cnt [in] count of cache lines to be flushed and invalidated

## \_\_\_STATIC\_FORCEINLINE void UFlushInvalDCacheLine (unsigned long addr)

Flush and invalidate one D-Cache line specified by address in U-Mode.

This function flush and invalidate one D-Cache line specified by the address. Command CCM\_DC\_WBINVAL is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

Parameters addr - [in] start address to be flushed and invalidated

\_\_STATIC\_FORCEINLINE void UFlushInvalDCacheLines (unsigned long addr, unsigned long cn Flush and invalidate several D-Cache lines specified by address in U-Mode.

This function flush and invalidate several D-Cache lines specified by the address and line count. Command CCM\_DC\_WBINVAL is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

### **Parameters**

- addr [in] start address to be flushed and invalidated
- cnt [in] count of cache lines to be flushed and invalidated

## \_\_STATIC\_FORCEINLINE unsigned long MLockDCacheLine (unsigned long addr)

Lock one D-Cache line specified by address in M-Mode.

This function lock one D-Cache line specified by the address. Command CCM\_DC\_LOCK is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

Parameters addr - [in] start address to be locked

**Returns** result of CCM lock operation, see enum CCM\_OP\_FINFO

\_\_STATIC\_FORCEINLINE unsigned long MLockDCacheLines (unsigned long addr, unsigned long Lock several D-Cache lines specified by address in M-Mode.

This function lock several D-Cache lines specified by the address and line count. Command CCM\_DC\_LOCK is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

#### **Parameters**

- addr [in] start address to be locked
- cnt [in] count of cache lines to be locked

**Returns** result of CCM lock operation, see enum CCM\_OP\_FINFO

# \_\_STATIC\_FORCEINLINE unsigned long SLockDCacheLine (unsigned long addr)

Lock one D-Cache line specified by address in S-Mode.

This function lock one D-Cache line specified by the address. Command CCM\_DC\_LOCK is written to CSR *CSR\_CCM\_SCOMMAND* (page 73).

**Remark** This function must be executed in M/S-Mode only.

Parameters addr - [in] start address to be locked

Returns result of CCM lock operation, see enum CCM OP FINFO

\_\_STATIC\_FORCEINLINE unsigned long SLockDCacheLines (unsigned long addr, unsigned long Lock several D-Cache lines specified by address in S-Mode.

This function lock several D-Cache lines specified by the address and line count. Command CCM\_DC\_LOCK is written to CSR CSR\_CCM\_SCOMMAND (page 73).

**Remark** This function must be executed in M/S-Mode only.

#### **Parameters**

- addr [in] start address to be locked
- cnt [in] count of cache lines to be locked

**Returns** result of CCM lock operation, see enum CCM\_OP\_FINFO

\_\_STATIC\_FORCEINLINE unsigned long ULockDCacheLine (unsigned long addr)

Lock one D-Cache line specified by address in U-Mode.

This function lock one D-Cache line specified by the address. Command CCM\_DC\_LOCK is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

Remark This function must be executed in M/S/U-Mode only.

Parameters addr - [in] start address to be locked

Returns result of CCM lock operation, see enum CCM\_OP\_FINFO

\_\_STATIC\_FORCEINLINE unsigned long ULockDCacheLines (unsigned long addr, unsigned long Lock several D-Cache lines specified by address in U-Mode.

This function lock several D-Cache lines specified by the address and line count. Command CCM\_DC\_LOCK is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

## **Parameters**

- addr [in] start address to be locked
- cnt [in] count of cache lines to be locked

Returns result of CCM lock operation, see enum CCM\_OP\_FINFO

## \_\_STATIC\_FORCEINLINE void MUnlockDCacheLine (unsigned long addr)

Unlock one D-Cache line specified by address in M-Mode.

This function unlock one D-Cache line specified by the address. Command CCM\_DC\_UNLOCK is written to CSR CSR CCM MCOMMAND (page 73).

**Remark** This function must be executed in M-Mode only.

Parameters addr – [in] start address to be unlocked

\_\_STATIC\_FORCEINLINE void MUnlockDCacheLines (unsigned long addr, unsigned long cnt)
Unlock several D-Cache lines specified by address in M-Mode.

This function unlock several D-Cache lines specified by the address and line count. Command CCM\_DC\_UNLOCK is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

#### **Parameters**

• addr - [in] start address to be unlocked

• cnt – [in] count of cache lines to be unlocked

### STATIC FORCEINLINE void SUnlockDCacheLine (unsigned long addr)

Unlock one D-Cache line specified by address in S-Mode.

This function unlock one D-Cache line specified by the address. Command CCM\_DC\_UNLOCK is written to CSR *CSR\_CCM\_SCOMMAND* (page 73).

**Remark** This function must be executed in M/S-Mode only.

Parameters addr - [in] start address to be unlocked

\_\_STATIC\_FORCEINLINE void SUnlockDCacheLines (unsigned long addr, unsigned long cnt)
Unlock several D-Cache lines specified by address in S-Mode.

This function unlock several D-Cache lines specified by the address and line count. Command CCM\_DC\_UNLOCK is written to CSR CSR\_CCM\_SCOMMAND (page 73).

**Remark** This function must be executed in M/S-Mode only.

### **Parameters**

- addr [in] start address to be unlocked
- cnt [in] count of cache lines to be unlocked

## \_\_STATIC\_FORCEINLINE void UUnlockDCacheLine (unsigned long addr)

Unlock one D-Cache line specified by address in U-Mode.

This function unlock one D-Cache line specified by the address. Command CCM\_DC\_UNLOCK is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

Parameters addr – [in] start address to be unlocked

\_\_STATIC\_FORCEINLINE void UUnlockDCacheLines (unsigned long addr, unsigned long cnt)
Unlock several D-Cache lines specified by address in U-Mode.

This function unlock several D-Cache lines specified by the address and line count. Command CCM\_DC\_UNLOCK is written to CSR CSR\_CCM\_UCOMMAND (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

## **Parameters**

- addr [in] start address to be unlocked
- cnt [in] count of cache lines to be unlocked

### \_\_STATIC\_FORCEINLINE void MInvalDCache (void)

Invalidate all D-Cache lines in M-Mode.

This function invalidate all D-Cache lines. Command CCM\_DC\_INVAL\_ALL is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

**Parameters** addr – [in] start address to be invalidated

## \_\_STATIC\_FORCEINLINE void SInvalDCache (void)

Invalidate all D-Cache lines in S-Mode.

This function invalidate all D-Cache lines. Command CCM\_DC\_INVAL\_ALL is written to CSR *CSR\_CCM\_SCOMMAND* (page 73).

**Remark** This function must be executed in M/S-Mode only.

Parameters addr - [in] start address to be invalidated

### STATIC FORCEINLINE void UInvalDCache (void)

Invalidate all D-Cache lines in U-Mode.

This function invalidate all D-Cache lines. In U-Mode, this operation will be automatically translated to flush and invalidate operations by hardware. Command CCM\_DC\_INVAL\_ALL is written to CSR CSR\_CCM\_UCOMMAND (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

Parameters addr - [in] start address to be invalidated

## \_STATIC\_FORCEINLINE void MFlushDCache (void)

Flush all D-Cache lines in M-Mode.

This function flush all D-Cache lines. Command CCM\_DC\_WB\_ALL is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

Parameters addr - [in] start address to be flushed

### STATIC FORCEINLINE void SFlushDCache (void)

Flush all D-Cache lines in S-Mode.

This function flush all D-Cache lines. Command CCM\_DC\_WB\_ALL is written to CSR CSR\_CCM\_SCOMMAND (page 73).

**Remark** This function must be executed in M/S-Mode only.

Parameters addr - [in] start address to be flushed

## \_STATIC\_FORCEINLINE void UFlushDCache (void)

Flush all D-Cache lines in U-Mode.

This function flush all D-Cache lines. Command CCM\_DC\_WB\_ALL is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

**Parameters** addr – [in] start address to be flushed

### \_\_STATIC\_FORCEINLINE void MFlushInvalDCache (void)

Flush and invalidate all D-Cache lines in M-Mode.

This function flush and invalidate all D-Cache lines. Command CCM\_DC\_WBINVAL\_ALL is written to CSR *CSR\_CCM\_MCOMMAND* (page 73).

**Remark** This function must be executed in M-Mode only.

**Parameters** addr – [in] start address to be flushed and locked

## \_\_STATIC\_FORCEINLINE void SFlushInvalDCache (void)

Flush and invalidate all D-Cache lines in S-Mode.

This function flush and invalidate all D-Cache lines. Command CCM\_DC\_WBINVAL\_ALL is written to CSR CSR CCM SCOMMAND (page 73).

**Remark** This function must be executed in M/S-Mode only.

**Parameters addr** – [in] start address to be flushed and locked

## \_\_STATIC\_FORCEINLINE void UFlushInvalDCache (void)

Flush and invalidate all D-Cache lines in U-Mode.

This function flush and invalidate all D-Cache lines. Command CCM\_DC\_WBINVAL\_ALL is written to CSR *CSR\_CCM\_UCOMMAND* (page 73).

**Remark** This function must be executed in M/S/U-Mode only.

Parameters addr - [in] start address to be flushed and locked

## 2.5.14 System Device Configuration

## group NMSIS\_Core\_SystemConfig

Functions for system init, clock setup and interrupt/exception/nmi functions available in system\_<device>.c.

Nuclei provides a template file **system\_Device.c** that must be adapted by the silicon vendor to match their actual device. As a **minimum requirement**, this file must provide:

- A device-specific system configuration function, *SystemInit* (page 343).
- A global variable that contains the system frequency, SystemCoreClock (page 343).
- A global eclic configuration initialization, *ECLIC\_Init* (page 345).
- Global c library \_init (page 343) and \_fini (page 343) functions called right before calling main function.
- Vendor customized interrupt, exception and nmi handling code, see *Interrupt and Exception and NMI Handling* (page 344)

The file configures the device and, typically, initializes the oscillator (PLL) that is part of the microcontroller device. This file might export other functions or variables that provide a more flexible configuration of the microcontroller system.

And this file also provided common interrupt, exception and NMI exception handling framework template, Silicon vendor can customize these template code as they want.

**Note:** Please pay special attention to the static variable SystemCoreClock. This variable might be used throughout the whole system initialization and runtime to calculate frequency/time related values. Thus one must assure that the variable always reflects the actual system clock speed.

Attention Be aware that a value stored to SystemCoreClock during low level initialization (i.e. SystemInit() (page 343)) might get overwritten by C library startup code and/or .bss section initialization. Thus its highly recommended to call SystemCoreClockUpdate (page 343) at the beginning of the user main() routine.

### **Functions**

### void SystemCoreClockUpdate (void)

Function to update the variable SystemCoreClock (page 343).

Updates the variable *SystemCoreClock* (page 343) and must be called whenever the core clock is changed during program execution. The function evaluates the clock register settings and calculates the current core clock.

## void SystemInit (void)

Function to Initialize the system.

Initializes the microcontroller system. Typically, this function configures the oscillator (PLL) that is part of the microcontroller device. For systems with a variable clock speed, it updates the variable *SystemCore-Clock* (page 343). SystemInit is called from the file **startup**.

## void \_premain\_init (void)

early init function before main

This function is executed right before main function. For RISC-V gnu toolchain, \_init function might not be called by \_\_libc\_init\_array function, so we defined a new function to do initialization

## void \_postmain\_fini (int status)

finish function after main

This function is executed right after main function. For RISC-V gnu toolchain, \_fini function might not be called by \_ libc fini array function, so we defined a new function to do initialization

Parameters status - [in] status code return from main

```
void _init (void)
```

\_init function called in \_\_libc\_init\_array()

This \_\_libc\_init\_array() function is called during startup code, user need to implement this function, otherwise when link it will error init.c:(.text.\_\_libc\_init\_array+0x26): undefined reference to `\_init'

**Note:** Please use \_*premain\_init* (page 343) function now

#### void fini(void)

\_fini function called in \_\_libc\_fini\_array()

This \_\_libc\_fini\_array() function is called when exit main. user need to implement this function, otherwise when link it will error fini.c:(.text.\_\_libc\_fini\_array+0x28): undefined reference to `\_fini'

**Note:** Please use \_postmain\_fini (page 343) function now

## **Variables**

## uint32\_t SystemCoreClock = SYSTEM\_CLOCK

Variable to hold the system core clock value.

Holds the system core clock, which is the system clock frequency supplied to the SysTick timer and the processor core clock. This variable can be used by debuggers to query the frequency of the debug timer or to configure the trace clock speed.

**Attention** Compilers must be configured to avoid removing this variable in case the application program is not using it. Debugging systems require the variable to be physically present in memory so that it can be examined to configure the debugger.

## Interrupt Exception NMI Handling

## group NMSIS\_Core\_IntExcNMI\_Handling

Functions for interrupt, exception and nmi handle available in system\_<device>.c.

Nuclei provide a template for interrupt, exception and NMI handling. Silicon Vendor could adapat according to their requirement. Silicon vendor could implement interface for different exception code and replace current implementation.

### **Defines**

### MAX\_SYSTEM\_EXCEPTION\_NUM 12

Max exception handler number, don't include the NMI(0xFFF) one.

## **Typedefs**

typedef void (\*EXC\_HANDLER) (unsigned long meause, unsigned long sp)

Exception Handler Function Typedef.

**Note:** This typedef is only used internal in this system\_<Device>.c file. It is used to do type conversion for registered exception handler before calling it.

## **Functions**

**static** void **system\_default\_exception\_handler** (unsigned long *mcause*, unsigned long *sp*)

System Default Exception Handler.

This function provided a default exception and NMI handling code for all exception ids. By default, It will just print some information for debug, Vendor can customize it according to its requirements.

### static void Exception\_Init (void)

Initialize all the default core exception handlers.

The core exception handler for each exception id will be initialized to *system\_default\_exception\_handler* (page 344).

**Note:** Called in \_*init* (page 343) function, used to initialize default exception handlers for all exception IDs

## void Exception\_DumpFrame (unsigned long sp)

Dump Exception Frame.

This function provided feature to dump exception frame stored in stack.

void **Exception\_Register\_EXC** (uint32\_t EXCn, unsigned long exc\_handler)

Register an exception handler for exception code EXCn.

- For EXCn < MAX\_SYSTEM\_EXCEPTION\_NUM (page 344), it will be registered into SystemExceptionHandlers[EXCn-1].
- For EXCn == NMI\_EXCn, it will be registered into SystemExceptionHandlers[MAX\_SYSTEM\_EXCEPTION\_NUM].

### **Parameters**

- EXCn See EXCn Type
- exc\_handler The exception handler for this exception code EXCn

## unsigned long Exception\_Get\_EXC (uint32\_t EXCn)

Get current exception handler for exception code EXCn.

- For EXCn < MAX\_SYSTEM\_EXCEPTION\_NUM (page 344), it will return SystemExceptionHandlers[EXCn-1].
- For EXCn == NMI\_EXCn, it will return SystemExceptionHandlers[MAX\_SYSTEM\_EXCEPTION\_NUM].

```
Parameters EXCn - See EXCn_Type
```

Returns Current exception handler for exception code EXCn, if not found, return 0.

## uint32\_t core\_exception\_handler (unsigned long mcause, unsigned long sp)

Common NMI and Exception handler entry.

This function provided a command entry for NMI and exception. Silicon Vendor could modify this template implementation according to requirement.

## Remark

- RISCV provided common entry for all types of exception. This is proposed code template for exception entry function, Silicon Vendor could modify the implementation.
- For the core\_exception\_handler template, we provided exception register function Exception\_Register\_EXCn which can help developer to register your exception handler for specific exception number.

## void ECLIC\_Init (void)

Initialize Global ECLIC Config.

ECLIC needs be initialized after boot up, Vendor could also change the initialization configuration.

```
int32_t ECLIC_Register_IRQ (IRQn_Type (page 306) IRQn, uint8_t shv, ECLIC_TRIGGER_Type (page 87) trig_mode, uint8_t lvl, uint8_t priority, void *handler)
Initialize a specific IRQ and register the handler.
```

This function set vector mode, trigger mode and polarity, interrupt level and priority, assign handler for specific IRQn.

## Remark

- This function use to configure specific eclic interrupt and register its interrupt handler and enable its interrupt.
- If the vector table is placed in read-only section(FLASHXIP mode), handler could not be installed

## **Parameters**

• IRQn – [in] NMI interrupt handler address

- **shv** [in] *ECLIC\_NON\_VECTOR\_INTERRUPT* (page 87) means non-vector mode, and *ECLIC\_VECTOR\_INTERRUPT* (page 87) is vector mode
- trig\_mode [in] see ECLIC\_TRIGGER\_Type (page 87)
- lvl [in] interupt level
- priority [in] interrupt priority
- handler [in] interrupt handler, if NULL, handler will not be installed

Returns -1 means invalid input parameter. 0 means successful.

### **Variables**

static unsigned long SystemExceptionHandlers[MAX\_SYSTEM\_EXCEPTION\_NUM + 1] Store the exception handlers for each exception ID.

#### Note:

- This SystemExceptionHandlers are used to store all the handlers for all the exception codes Nuclei N/NX core provided.
- Exception code 0 11, totally 12 exceptions are mapped to SystemExceptionHandlers[0:11]
- Exception for NMI is also re-routed to exception handling(exception code 0xFFF) in startup code configuration, the handler itself is mapped to SystemExceptionHandlers[MAX\_SYSTEM\_EXCEPTION\_NUM]

## 2.5.15 ARM Compatiable Functions

## group NMSIS\_Core\_ARMCompatiable\_Functions

A few functions that compatiable with ARM CMSIS-Core.

Here we provided a few functions that compatiable with ARM CMSIS-Core, mostly used in the DSP and NN library.

### **Defines**

```
\_STRBT (val, ptr) \_\_SB((ptr), (val))
     STRT Unprivileged (8 bit), ARM Compatiable.
___STRHT (val, ptr) ___SH((ptr), (val))
     STRT Unprivileged (16 bit), ARM Compatiable.
 __STRT (val, ptr) __SW((ptr), (val))
     STRT Unprivileged (32 bit), ARM Compatiable.
___SSAT (val, sat) ___RV_SCLIP32((val), (sat-1))
     Signed Saturate.
     Saturates a signed value.
         Parameters
              • value – [in] Value to be saturated
              • sat – [in] Bit position to saturate to (1..32)
         Returns Saturated value
  USAT (val, sat) RV UCLIP32((val), (sat))
     Unsigned Saturate.
     Saturates an unsigned value.
         Parameters
              • value – [in] Value to be saturated
              • sat – [in] Bit position to saturate to (0..31)
         Returns Saturated value
___RBIT (value) __RV_BITREVI((value), 31)
     Reverse bit order of value.
     Reverses the bit order of the given value.
         Parameters
              • value - [in] Value to reverse
         Returns Reversed value
 CLZ (data) RV CLZ32(data)
     Count leading zeros.
     Counts the number of leading zeros of a data value.
```

C

### **Parameters**

• data – [in] Value to count the leading zeros

Returns number of leading zeros in value

### **Functions**

\_\_STATIC\_FORCEINLINE uint32\_t \_\_REV (uint32\_t value)

Reverse byte order (32 bit)

Reverses the byte order in unsigned integer value. For example, 0x12345678 becomes 0x78563412.

Parameters value – [in] Value to reverse

**Returns** Reversed value

\_\_STATIC\_FORCEINLINE uint32\_t \_\_REV16 (uint32\_t value)

Reverse byte order (16 bit)

Reverses the byte order within each halfword of a word. For example, 0x12345678 becomes 0x34127856.

Parameters value – [in] Value to reverse

Returns Reversed value

\_\_STATIC\_FORCEINLINE int16\_t \_\_REVSH (int16\_t value)

Reverse byte order (16 bit)

Reverses the byte order in a 16-bit value and returns the signed 16-bit result. For example, 0x0080 becomes 0x8000.

Parameters value – [in] Value to reverse

Returns Reversed value

\_\_STATIC\_FORCEINLINE uint32\_t \_\_ROR (uint32\_t op1, uint32\_t op2)

Rotate Right in unsigned value (32 bit)

Rotate Right (immediate) provides the value of the contents of a register rotated by a variable number of bits.

### **Parameters**

- op1 [in] Value to rotate
- op2 [in] Number of Bits to rotate(0-31)

Returns Rotated value

**CHAPTER** 

**THREE** 

## **NMSIS DSP**

## 3.1 Overview

## 3.1.1 Introduction

This user manual describes the NMSIS DSP software library, a suite of common signal processing functions for use on Nuclei N/NX Class Processors based devices.

The library is divided into a number of functions each covering a specific category:

- · Basic math functions
- · Fast math functions
- · Complex math functions
- Filters
- · Matrix functions
- · Transform functions
- Motor control functions
- · Statistical functions
- · Support functions
- · Interpolation functions

The library has separate functions for operating on 8-bit integers, 16-bit integers, 32-bit integer and 32-bit floating-point values.

# 3.1.2 Using the Library

The library functions are declared in the public file riscv\_math.h which is placed in the NMSIS/DSP/Include folder.

Simply include this file and link the appropriate library in the application and begin calling the library functions.

The Library supports single public header file riscv\_math.h for Nuclei N/NX Class Processors cores with little endian. Same header file will be used for floating point unit(FPU) variants.

## 3.1.3 Examples

The library ships with a number of examples (page 526) which demonstrate how to use the library functions.

## 3.1.4 Toolchain Support

The library has been developed and tested with RISCV GCC Toolchain.

The library is being tested in GCC toolchain and updates on this activity will be made available shortly.

## 3.1.5 Building the Library

The library installer contains a Makefile to rebuild libraries on Nuclei RISCV GCC toolchain in the NMSIS/ folder.

The libraries can be built by run make <code>gen\_dsp\_lib</code>, it will build and install DSP library into Library/DSP/GCC folder.

## 3.1.6 Preprocessor Macros

Each library project have different preprocessor macros.

**RISCV\_MATH\_MATRIX\_CHECK:** Define macro RISCV\_MATH\_MATRIX\_CHECK for checking on the input and output sizes of matrices

RISCV\_MATH\_ROUNDING: Define macro RISCV\_MATH\_ROUNDING for rounding on support functions

RISCV\_MATH\_LOOPUNROLL: Define macro RISCV\_MATH\_LOOPUNROLL to enable manual loop unrolling in DSP functions

# 3.2 Using NMSIS-DSP

Here we will describe how to run the nmsis dsp examples in Nuclei Spike.

## 3.2.1 Preparation

- Nuclei Modified Spike xl\_spike
- Nuclei SDK modified for xl\_spike branch dev\_xlspike
- Nuclei RISCV GNU Toolchain
- CMake >= 3.5

## 3.2.2 Tool Setup

1. Export **PATH** correctly for xl\_spike and riscv-nuclei-elf-gcc

```
export PATH=/path/to/xl_spike/bin:/path/to/riscv-nuclei-elf-gcc/bin/:$PATH
```

## 3.2.3 Build NMSIS DSP Library

- 1. Download or clone NMSIS source code into NMSIS directory.
- 2. cd to NMSIS/NMSIS/ directory
- 3. Build NMSIS DSP library using make gen\_dsp\_lib
- 4. Strip debug informations using make strip\_dsp\_lib to make the generated library smaller
- 5. The dsp library will be generated into ./Library/DSP/GCC folder
- 6. The dsp libraries will be look like this:

```
$ 11 Library/DSP/GCC/
total 28604
-rw-r--r-- 1 hqfang nucleisys 1847080 Jul 14 14:51 libnmsis_dsp_rv32imac.a
-rw-r--r-- 1 hqfang nucleisys 2515912 Jul 14 14:51 libnmsis_dsp_rv32imacp.a
-rw-r--r-- 1 hqfang nucleisys 1786008 Jul 14 14:51 libnmsis_dsp_rv32imafc.a
-rw-r--r-- 1 hqfang nucleisys 2377420 Jul 14 14:51 libnmsis_dsp_rv32imafcp.a
-rw-r--r-- 1 hqfang nucleisys 1785500 Jul 14 14:51 libnmsis_dsp_rv32imafdc.a
-rw-r--r-- 1 hqfang nucleisys 2367840 Jul 14 14:51 libnmsis_dsp_rv32imafdcp.a
-rw-r--r-- 1 hqfang nucleisys 2374468 Jul 14 14:51 libnmsis_dsp_rv64imac.a
-rw-r--r-- 1 hqfang nucleisys 3369340 Jul 14 14:51 libnmsis_dsp_rv64imacp.a
-rw-r--r-- 1 hqfang nucleisys 2276836 Jul 14 14:51 libnmsis_dsp_rv64imafc.a
-rw-r--r-- 1 hqfang nucleisys 3151172 Jul 14 14:51 libnmsis_dsp_rv64imafcp.a
-rw-r--r-- 1 hqfang nucleisys 2275828 Jul 14 14:51 libnmsis_dsp_rv64imafdc.a
-rw-r--r-- 1 hqfang nucleisys 3140188 Jul 14 14:51 libnmsis_dsp_rv64imafdc.a
```

- 7. library name with extra p is build with RISCV DSP enabled.
  - libnmsis\_dsp\_rv32imac.a: Build for RISCV\_ARCH=rv32imac without DSP enabled.
  - libnmsis\_dsp\_rv32imacp.a: Build for RISCV\_ARCH=rv32imac with DSP enabled.

#### Note:

- You can also directly build both DSP and NN library using make gen
- You can strip the generated DSP and NN library using make strip

## 3.2.4 How to run

1. Set environment variables NUCLEI\_SDK\_ROOT and NUCLEI\_SDK\_NMSIS, and set Nuclei SDK SoC to xl-spike

```
export NUCLEI_SDK_ROOT=/path/to/nuclei_sdk
export NUCLEI_SDK_NMSIS=/path/to/NMSIS/NMSIS
export SOC=xlspike
```

- 2. Let us take ./riscv\_class\_marks\_example/ for example
- 3. cd ./riscv\_class\_marks\_example/
- 4. Run with RISCV DSP enabled NMSIS-DSP library for CORE n307

```
# Clean project
make DSP_ENABLE=ON CORE=n307 clean
# Build project
make DSP_ENABLE=ON CORE=n307 all
# Run application using xl_spike
make DSP_ENABLE=ON CORE=n307 run
```

5. Run with RISCV DSP disabled NMSIS-DSP library for CORE n307

```
make DSP_ENABLE=OFF CORE=n307 clean
make DSP_ENABLE=OFF CORE=n307 all
make DSP_ENABLE=OFF CORE=n307 run
```

### Note:

• You can easily run this example in your hardware, if you have enough memory to run it, just modify the SOC to the one your are using in step 1.

## 3.3 NMSIS DSP API

If you want to access doxygen generated NMSIS DSP API, please click NMSIS DSP API Doxygen Documentation.

## 3.3.1 Basic Math Functions

## **Vector Absolute Value**

```
void riscv_abs_f32 (const float32_t *pSrc, float32_t *pDst, uint32_t blockSize) void riscv_abs_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t blockSize) void riscv_abs_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t blockSize) void riscv_abs_q7 (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize) void riscv_abs_q7 (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize) group BasicAbs
```

Computes the absolute value of a vector on an element-by-element basis.

The functions support in-place computation allowing the source and destination pointers to reference the same memory buffer. There are separate functions for floating-point, Q7, Q15, and Q31 data types.

## **Functions**

```
void riscv_abs_f32 (const float32_t *pSrc, float32_t *pDst, uint32_t blockSize) Floating-point vector absolute value.
```

#### **Parameters**

- pSrc [in] points to the input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

## Returns none

```
void riscv_abs_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t blockSize) Q15 vector absolute value.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. The Q15 value -1 (0x8000) will be saturated to the maximum allowable positive value 0x7FFF.

#### **Parameters**

- pSrc [in] points to the input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

#### Returns none

```
void riscv_abs_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t blockSize) Q31 vector absolute value.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. The Q31 value -1 (0x80000000) will be saturated to the maximum allowable positive value 0x7FFFFFF.

#### **Parameters**

- pSrc [in] points to the input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

### Returns none

```
void riscv_abs_q7 (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize) Q7 vector absolute value.
```

Conditions for optimum performance Input and output buffers should be aligned by 32-bit

**Scaling and Overflow Behavior** The function uses saturating arithmetic. The Q7 value -1 (0x80) will be saturated to the maximum allowable positive value 0x7F.

#### **Parameters**

- pSrc [in] points to the input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

3.3. NMSIS DSP API 353

#### Returns none

## **Vector Addition**

```
void riscv_add_f32 (const float32_t *pSrcA, const float32_t *pSrcB, float32_t *pDst, uint32_t block-
Size)

void riscv_add_q15 (const q15_t *pSrcA, const q15_t *pSrcB, q15_t *pDst, uint32_t blockSize)

void riscv_add_q31 (const q31_t *pSrcA, const q31_t *pSrcB, q31_t *pDst, uint32_t blockSize)

void riscv_add_q7 (const q7_t *pSrcA, const q7_t *pSrcB, q7_t *pDst, uint32_t blockSize)

void riscv_add_q7 (const q7_t *pSrcA, const q7_t *pSrcB, q7_t *pDst, uint32_t blockSize)

group BasicAdd
```

Element-by-element addition of two vectors.

There are separate functions for floating-point, Q7, Q15, and Q31 data types.

### **Functions**

```
void riscv_add_f32 (const float32_t *pSrcA, const float32_t *pSrcB, float32_t *pDst, uint32_t blockSize)
Floating-point vector addition.
```

### **Parameters**

- pSrcA [in] points to first input vector
- pSrcB [in] points to second input vector
- pDst [out] points to output vector
- blockSize [in] number of samples in each vector

### Returns none

```
void riscv_add_q15 (const q15_t *pSrcA, const q15_t *pSrcB, q15_t *pDst, uint32_t blockSize) Q15 vector addition.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated.

#### **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

### Returns none

```
void riscv_add_q31 (const q31_t *pSrcA, const q31_t *pSrcB, q31_t *pDst, uint32_t blockSize) Q31 vector addition.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q31 range [0x8000000 0x7FFFFFF] are saturated.

## **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

```
void riscv_add_q7 (const q7_t *pSrcA, const q7_t *pSrcB, q7_t *pDst, uint32_t blockSize) Q7 vector addition.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q7 range [0x80 0x7F] are saturated.

## **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

#### Returns none

#### **Vector Dot Product**

Computes the dot product of two vectors. The vectors are multiplied element-by-element and then summed.

There are separate functions for floating-point, Q7, Q15, and Q31 data types.

### **Functions**

```
void riscv_dot_prod_f32 (const float32_t *pSrcA, const float32_t *pSrcB, uint32_t blockSize, float32_t *result)

Dot product of floating-point vectors.
```

# **Parameters**

- pSrcA [in] points to the first input vector.
- pSrcB [in] points to the second input vector.
- blockSize [in] number of samples in each vector.
- result [out] output result returned here.

Returns none

```
void riscv_dot_prod_q15 (const q15_t *pSrcA, const q15_t *pSrcB, uint32_t blockSize, q63_t *result)

Dot product of Q15 vectors.
```

Scaling and Overflow Behavior The intermediate multiplications are in  $1.15 \times 1.15 = 2.30$  format and these results are added to a 64-bit accumulator in 34.30 format. Nonsaturating additions are used and given that there are 33 guard bits in the accumulator there is no risk of overflow. The return result is in 34.30 format.

#### **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- blockSize [in] number of samples in each vector
- result [out] output result returned here

#### Returns none

```
void riscv_dot_prod_q31 (const q31_t *pSrcA, const q31_t *pSrcB, uint32_t blockSize, q63_t *result)

Dot product of Q31 vectors.
```

**Scaling and Overflow Behavior** The intermediate multiplications are in 1.31 x 1.31 = 2.62 format and these are truncated to 2.48 format by discarding the lower 14 bits. The 2.48 result is then added without saturation to a 64-bit accumulator in 16.48 format. There are 15 guard bits in the accumulator and there is no risk of overflow as long as the length of the vectors is less than 2^16 elements. The return result is in 16.48 format.

# **Parameters**

- pSrcA [in] points to the first input vector.
- pSrcB [in] points to the second input vector.
- blockSize [in] number of samples in each vector.
- result [out] output result returned here.

## Returns none

```
void riscv_dot_prod_q7 (const q7_t *pSrcA, const q7_t *pSrcB, uint32_t blockSize, q31_t *re-sult)

Dot product of Q7 vectors.
```

Scaling and Overflow Behavior The intermediate multiplications are in  $1.7 \times 1.7 = 2.14$  format and these results are added to an accumulator in 18.14 format. Nonsaturating additions are used and there is no danger of wrap around as long as the vectors are less than  $2^18$  elements long. The return result is in 18.14 format.

## **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- blockSize [in] number of samples in each vector
- result [out] output result returned here

# **Vector Multiplication**

```
void riscv_mult_f32 (const float32_t *pSrcA, const float32_t *pSrcB, float32_t *pDst, uint32_t block-
Size)

void riscv_mult_q15 (const q15_t *pSrcA, const q15_t *pSrcB, q15_t *pDst, uint32_t blockSize)

void riscv_mult_q31 (const q31_t *pSrcA, const q31_t *pSrcB, q31_t *pDst, uint32_t blockSize)

void riscv_mult_q7 (const q7_t *pSrcA, const q7_t *pSrcB, q7_t *pDst, uint32_t blockSize)

group BasicMult
```

Element-by-element multiplication of two vectors.

There are separate functions for floating-point, Q7, Q15, and Q31 data types.

### **Functions**

```
void riscv_mult_f32 (const float32_t *pSrcA, const float32_t *pSrcB, float32_t *pDst, uint32_t blockSize)
Floating-point vector multiplication.
```

### **Parameters**

- pSrcA [in] points to the first input vector.
- pSrcB [in] points to the second input vector.
- pDst [out] points to the output vector.
- blockSize [in] number of samples in each vector.

### Returns none

```
void riscv_mult_q15 (const q15_t *pSrcA, const q15_t *pSrcB, q15_t *pDst, uint32_t blockSize) Q15 vector multiplication.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated.

#### **Parameters**

- pSrcA [in] points to first input vector
- pSrcB [in] points to second input vector
- pDst [out] points to output vector
- blockSize [in] number of samples in each vector

## Returns none

```
void riscv_mult_q31 (const q31_t *pSrcA, const q31_t *pSrcB, q31_t *pDst, uint32_t blockSize) Q31 vector multiplication.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q31 range[0x80000000 0x7FFFFFFF] are saturated.

# **Parameters**

- pSrcA [in] points to the first input vector.
- pSrcB [in] points to the second input vector.
- pDst [out] points to the output vector.
- blockSize [in] number of samples in each vector.

```
void riscv_mult_q7 (const q7_t *pSrcA, const q7_t *pSrcB, q7_t *pDst, uint32_t blockSize) Q7 vector multiplication.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q7 range [0x80 0x7F] are saturated.

## **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

#### Returns none

# **Vector Negate**

```
void riscv_negate_f32 (const float32_t *pSrc, float32_t *pDst, uint32_t blockSize) void riscv_negate_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t blockSize) void riscv_negate_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t blockSize) void riscv_negate_q7 (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize) void riscv_negate_q7 (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize) group BasicNegate
```

Negates the elements of a vector.

The functions support in-place computation allowing the source and destination pointers to reference the same memory buffer. There are separate functions for floating-point, Q7, Q15, and Q31 data types.

### **Functions**

```
void riscv_negate_f32 (const float32_t *pSrc, float32_t *pDst, uint32_t blockSize) Negates the elements of a floating-point vector.
```

#### **Parameters**

- pSrc [in] points to input vector.
- pDst [out] points to output vector.
- blockSize [in] number of samples in each vector.

#### Returns none

```
void riscv_negate_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t blockSize) Negates the elements of a Q15 vector.
```

Conditions for optimum performance Input and output buffers should be aligned by 32-bit

**Scaling and Overflow Behavior** The function uses saturating arithmetic. The Q15 value -1 (0x8000) is saturated to the maximum allowable positive value 0x7FFF.

### **Parameters**

- pSrc [in] points to the input vector.
- pDst [out] points to the output vector.
- blockSize [in] number of samples in each vector.

Returns none

```
void riscv_negate_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t blockSize) Negates the elements of a Q31 vector.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. The Q31 value -1 (0x80000000) is saturated to the maximum allowable positive value 0x7FFFFFF.

#### **Parameters**

- pSrc [in] points to the input vector.
- pDst [out] points to the output vector.
- blockSize [in] number of samples in each vector.

Returns none

```
void riscv_negate_q7 (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize) Negates the elements of a Q7 vector.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. The Q7 value -1 (0x80) is saturated to the maximum allowable positive value 0x7F.

#### **Parameters**

- pSrc [in] points to the input vector.
- pDst [out] points to the output vector.
- blockSize [in] number of samples in each vector.

Returns none

## **Vector Offset**

```
void riscv_offset_f32 (const float32_t *pSrc, float32_t offset, float32_t *pDst, uint32_t blockSize) void riscv_offset_q15 (const q15_t *pSrc, q15_t offset, q15_t *pDst, uint32_t blockSize) void riscv_offset_q31 (const q31_t *pSrc, q31_t offset, q31_t *pDst, uint32_t blockSize) void riscv_offset_q7 (const q7_t *pSrc, q7_t offset, q7_t *pDst, uint32_t blockSize) void riscv_offset_q7 (const q7_t *pSrc, q7_t offset, q7_t *pDst, uint32_t blockSize) group BasicOffset
```

Adds a constant offset to each element of a vector.

The functions support in-place computation allowing the source and destination pointers to reference the same memory buffer. There are separate functions for floating-point, Q7, Q15, and Q31 data types.

### **Functions**

void riscv\_offset\_f32 (const float32\_t \*pSrc, float32\_t offset, float32\_t \*pDst, uint32\_t block-Size)

Adds a constant offset to a floating-point vector.

#### **Parameters**

- pSrc [in] points to the input vector
- offset [in] is the offset to be added
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

#### Returns none

```
void riscv_offset_q15 (const q15_t *pSrc, q15_t offset, q15_t *pDst, uint32_t blockSize) Adds a constant offset to a Q15 vector.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated.

## **Parameters**

- pSrc [in] points to the input vector
- offset [in] is the offset to be added
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

# Returns none

```
void riscv_offset_q31 (const q31_t *pSrc, q31_t offset, q31_t *pDst, uint32_t blockSize) Adds a constant offset to a Q31 vector.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q31 range [0x8000000 0x7FFFFFFF] are saturated.

# **Parameters**

- pSrc [in] points to the input vector
- **offset [in]** is the offset to be added
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

# Returns none

```
void riscv_offset_q7 (const q7_t *pSrc, q7_t offset, q7_t *pDst, uint32_t blockSize) Adds a constant offset to a Q7 vector.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q7 range [0x80 0x7F] are saturated.

# **Parameters**

• pSrc – [in] points to the input vector

- offset [in] is the offset to be added
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

### **Vector Scale**

```
void riscv_scale_f32 (const float32_t *pSrc, float32_t scale, float32_t *pDst, uint32_t blockSize)

void riscv_scale_q15 (const q15_t *pSrc, q15_t scaleFract, int8_t shift, q15_t *pDst, uint32_t block-
Size)

void riscv_scale_q31 (const q31_t *pSrc, q31_t scaleFract, int8_t shift, q31_t *pDst, uint32_t block-
Size)

void riscv_scale_q7 (const q7_t *pSrc, q7_t scaleFract, int8_t shift, q7_t *pDst, uint32_t blockSize)

group BasicScale
```

Multiply a vector by a scalar value. For floating-point data, the algorithm used is:

In the fixed-point Q7, Q15, and Q31 functions, scale is represented by a fractional multiplication scaleFract and an arithmetic shift shift. The shift allows the gain of the scaling operation to exceed 1.0. The algorithm used with fixed-point data is:

The overall scale factor applied to the fixed-point data is

The functions support in-place computation allowing the source and destination pointers to reference the same memory buffer.

# **Functions**

void **riscv\_scale\_f32** (**const** float32\_t \*pSrc, float32\_t scale, float32\_t \*pDst, uint32\_t blockSize) Multiplies a floating-point vector by a scalar.

#### **Parameters**

- pSrc [in] points to the input vector
- scale [in] scale factor to be applied
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

## Returns none

```
void riscv_scale_q15 (const q15_t *pSrc, q15_t scaleFract, int8_t shift, q15_t *pDst, uint32_t blockSize)

Multiplies a Q15 vector by a scalar.
```

**Scaling and Overflow Behavior** The input data \*pSrc and scaleFract are in 1.15 format. These are multiplied to yield a 2.30 intermediate result and this is shifted with saturation to 1.15 format.

## **Parameters**

- pSrc [in] points to the input vector
- scaleFract [in] fractional portion of the scale value
- **shift** [in] number of bits to shift the result by

- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

```
void riscv_scale_q31 (const q31_t *pSrc, q31_t scaleFract, int8_t shift, q31_t *pDst, uint32_t blockSize)

Multiplies a Q31 vector by a scalar.
```

**Scaling and Overflow Behavior** The input data \*pSrc and scaleFract are in 1.31 format. These are multiplied to yield a 2.62 intermediate result and this is shifted with saturation to 1.31 format.

### **Parameters**

- pSrc [in] points to the input vector
- scaleFract [in] fractional portion of the scale value
- **shift** [in] number of bits to shift the result by
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

### Returns none

```
void riscv_scale_q7 (const q7_t *pSrc, q7_t scaleFract, int8_t shift, q7_t *pDst, uint32_t block-
Size)
Multiplies a Q7 vector by a scalar.
```

**Scaling and Overflow Behavior** The input data \*pSrc and scaleFract are in 1.7 format. These are multiplied to yield a 2.14 intermediate result and this is shifted with saturation to 1.7 format.

#### **Parameters**

- pSrc [in] points to the input vector
- scaleFract [in] fractional portion of the scale value
- **shift** [in] number of bits to shift the result by
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

## Returns none

#### **Vector Shift**

```
void riscv_shift_q15 (const q15_t *pSrc, int8_t shiftBits, q15_t *pDst, uint32_t blockSize) void riscv_shift_q31 (const q31_t *pSrc, int8_t shiftBits, q31_t *pDst, uint32_t blockSize) void riscv_shift_q7 (const q7_t *pSrc, int8_t shiftBits, q7_t *pDst, uint32_t blockSize) group BasicShift
```

Shifts the elements of a fixed-point vector by a specified number of bits. There are separate functions for Q7, Q15, and Q31 data types. The underlying algorithm used is:

If shift is positive then the elements of the vector are shifted to the left. If shift is negative then the elements of the vector are shifted to the right.

The functions support in-place computation allowing the source and destination pointers to reference the same memory buffer.

## **Functions**

void **riscv\_shift\_q15** (**const** q15\_t \*pSrc, int8\_t shiftBits, q15\_t \*pDst, uint32\_t blockSize) Shifts the elements of a Q15 vector a specified number of bits.

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated.

#### **Parameters**

- pSrc [in] points to the input vector
- **shiftBits [in]** number of bits to shift. A positive value shifts left; a negative value shifts right.
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

## Returns none

void **riscv\_shift\_q31** (**const** q31\_t \*pSrc, int8\_t shiftBits, q31\_t \*pDst, uint32\_t blockSize) Shifts the elements of a Q31 vector a specified number of bits.

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q31 range [0x8000000 0x7FFFFFFF] are saturated.

## **Parameters**

- pSrc [in] points to the input vector
- **shiftBits [in]** number of bits to shift. A positive value shifts left; a negative value shifts right.
- pDst [out] points to the output vector
- blockSize [in] number of samples in the vector

### Returns none

void **riscv\_shift\_q7** (**const** q7\_t \*pSrc, int8\_t shiftBits, q7\_t \*pDst, uint32\_t blockSize) Shifts the elements of a Q7 vector a specified number of bits.

onditions for optimum performance Input and output buffers should be aligned by 32-bit

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q7 range [0x80 0x7F] are saturated.

## **Parameters**

- pSrc [in] points to the input vector
- **shiftBits [in]** number of bits to shift. A positive value shifts left; a negative value shifts right.
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

## **Vector Subtraction**

```
void riscv_sub_f32 (const float32_t *pSrcA, const float32_t *pSrcB, float32_t *pDst, uint32_t block-
Size)

void riscv_sub_q15 (const q15_t *pSrcA, const q15_t *pSrcB, q15_t *pDst, uint32_t blockSize)

void riscv_sub_q31 (const q31_t *pSrcA, const q31_t *pSrcB, q31_t *pDst, uint32_t blockSize)

void riscv_sub_q7 (const q7_t *pSrcA, const q7_t *pSrcB, q7_t *pDst, uint32_t blockSize)

void riscv_sub_q7 (const q7_t *pSrcA, const q7_t *pSrcB, q7_t *pDst, uint32_t blockSize)

group BasicSub
```

Element-by-element subtraction of two vectors.

There are separate functions for floating-point, Q7, Q15, and Q31 data types.

### **Functions**

```
void riscv_sub_f32 (const float32_t *pSrcA, const float32_t *pSrcB, float32_t *pDst, uint32_t blockSize)
Floating-point vector subtraction.
```

### **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

### Returns none

```
void riscv_sub_q15 (const q15_t *pSrcA, const q15_t *pSrcB, q15_t *pDst, uint32_t blockSize) Q15 vector subtraction.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated.

#### **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

## Returns none

```
void riscv_sub_q31 (const q31_t *pSrcA, const q31_t *pSrcB, q31_t *pDst, uint32_t blockSize) Q31 vector subtraction.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q31 range [0x8000000 0x7FFFFFF] are saturated.

# **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

```
void riscv_sub_q7 (const q7_t *pSrcA, const q7_t *pSrcB, q7_t *pDst, uint32_t blockSize) Q7 vector subtraction.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q7 range [0x80 0x7F] will be saturated.

## **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- pDst [out] points to the output vector
- blockSize [in] number of samples in each vector

Returns none

group groupMath

## 3.3.2 Fast Math Functions

# **Square Root**

```
__STATIC_FORCEINLINE riscv_status riscv_sqrt_f32 (float32_t in, float32_t *pOut)
riscv_status riscv_sqrt_q31 (q31_t in, q31_t *pOut)
riscv_status riscv_sqrt_q15 (q15_t in, q15_t *pOut)
void riscv_vsqrt_f32 (float32_t *pIn, float32_t *pOut, uint16_t len)
void riscv_vsqrt_q31 (q31_t *pIn, q31_t *pOut, uint16_t len)
void riscv_vsqrt_q15 (q15_t *pIn, q15_t *pOut, uint16_t len)
group SQRT
```

Computes the square root of a number. There are separate functions for Q15, Q31, and floating-point data types. The square root function is computed using the Newton-Raphson algorithm. This is an iterative algorithm of the form: where  $\times 1$  is the current estimate,  $\times 0$  is the previous estimate, and  $f'(\times 0)$  is the derivative of f() evaluated at  $\times 0$ . For the square root function, the algorithm reduces to:

### **Functions**

\_\_STATIC\_FORCEINLINE riscv\_status riscv\_sqrt\_f32 (float32\_t in, float32\_t \*pOut) Floating-point square root function.

### **Parameters**

- in [in] input value
- pOut [out] square root of input value

## Returns execution status

- RISCV\_MATH\_SUCCESS: input value is positive
- RISCV\_MATH\_ARGUMENT\_ERROR : input value is negative; \*pOut is set to 0

```
riscv_status riscv_sqrt_q31 (q31_t in, q31_t *pOut)
```

Q31 square root function.

#### **Parameters**

- in [in] input value. The range of the input value is [0 +1) or 0x000000000 to 0x7FFFFFFF
- pout [out] points to square root of input value

### **Returns** execution status

- RISCV\_MATH\_SUCCESS : input value is positive
- RISCV\_MATH\_ARGUMENT\_ERROR: input value is negative; \*pOut is set to 0

```
riscv_status riscv_sqrt_q15 (q15_t in, q15_t *pOut)
```

Q15 square root function.

## **Parameters**

- in [in] input value. The range of the input value is [0 + 1) or 0x0000 to 0x7FFF
- pOut [out] points to square root of input value

## **Returns** execution status

- RISCV\_MATH\_SUCCESS: input value is positive
- RISCV\_MATH\_ARGUMENT\_ERROR : input value is negative; \*pOut is set to 0

void riscv\_vsqrt\_f32 (float32\_t \*pIn, float32\_t \*pOut, uint16\_t len)

Vector Floating-point square root function.

## **Parameters**

- pIn [in] input vector.
- pOut [out] vector of square roots of input elements.
- len [in] length of input vector.

**Returns** The function returns RISCV\_MATH\_SUCCESS if input value is positive value or RISCV\_MATH\_ARGUMENT\_ERROR if in is negative value and returns zero output for negative values.

```
void riscv_vsqrt_q31 (q31_t *pIn, q31_t *pOut, uint16_t len) void riscv_vsqrt_q15 (q15_t *pIn, q15_t *pOut, uint16_t len)
```

## Cosine

```
float32_t riscv_cos_f32 (float32_t x)
q31_t riscv_cos_q31 (q31_t x)
q15_t riscv_cos_q15 (q15_t x)
group cos
```

Computes the trigonometric cosine function using a combination of table lookup and linear interpolation. There are separate functions for Q15, Q31, and floating-point data types. The input to the floating-point version is in radians while the fixed-point Q15 and Q31 have a scaled input with the range [0 +0.9999] mapping to [0 2\*pi). The fixed-point range is chosen so that a value of 2\*pi wraps around to 0.

The implementation is based on table lookup using 512 values together with linear interpolation. The steps used are:

- 1. Calculation of the nearest integer table index
- 2. Compute the fractional portion (fract) of the table index.
- 3. The final result equals (1.0f-fract) \*a + fract\*b;

where

end of sin group

### **Functions**

```
float32_t riscv_cos_f32 (float32_t x)
```

Fast approximation to the trigonometric cosine function for floating-point data.

# **Parameters**

- x [in] input value in radians.
- $\mathbf{x} [\mathbf{in}]$  input value in radians

**Returns** cos(x).

**Returns** cos(x)

```
q31_t riscv_cos_q31 (q31_t x)
```

Fast approximation to the trigonometric cosine function for Q31 data.

The Q31 input value is in the range [0 + 0.9999] and is mapped to a radian value in the range [0 2\*PI).

## **Parameters**

- $\mathbf{x}$  [in] Scaled input value in radians.
- x [in] Scaled input value in radians

Returns cos(x).

Returns cos(x)

```
q15_t riscv_cos_q15(q15_t x)
```

Fast approximation to the trigonometric cosine function for Q15 data.

The Q15 input value is in the range [0 +0.9999] and is mapped to a radian value in the range [0 2\*PI).

# **Parameters**

•  $\mathbf{x}$  – [in] Scaled input value in radians.

• **x** – [in] Scaled input value in radians

Returns cos(x).

Returns cos(x)

# Sine

```
float32_t riscv_sin_f32 (float32_t x)
q31_t riscv_sin_q31 (q31_t x)
q15_t riscv_sin_q15 (q15_t x)
group sin
```

Computes the trigonometric sine function using a combination of table lookup and linear interpolation. There are separate functions for Q15, Q31, and floating-point data types. The input to the floating-point version is in radians while the fixed-point Q15 and Q31 have a scaled input with the range [0 +0.9999] mapping to [0 2\*pi). The fixed-point range is chosen so that a value of 2\*pi wraps around to 0.

The implementation is based on table lookup using 512 values together with linear interpolation. The steps used are:

- 1. Calculation of the nearest integer table index
- 2. Compute the fractional portion (fract) of the table index.
- 3. The final result equals (1.0f-fract) \*a + fract\*b;

where

# **Functions**

```
float32_t riscv_sin_f32 (float32_t x)
```

Fast approximation to the trigonometric sine function for floating-point data.

## **Parameters**

- $\mathbf{x} [\mathbf{in}]$  input value in radians.
- $\mathbf{x} [\mathbf{in}]$  input value in radians.

**Returns** sin(x).

**Returns** sin(x)

# $q31_t riscv_sin_q31 (q31_t x)$

Fast approximation to the trigonometric sine function for Q31 data.

The Q31 input value is in the range [0 + 0.9999] and is mapped to a radian value in the range [0 2\*PI).

## **Parameters**

- **x** [in] Scaled input value in radians.
- **x** [in] Scaled input value in radians

**Returns** sin(x).

**Returns** sin(x)

## $q15_t riscv_sin_q15(q15_t x)$

Fast approximation to the trigonometric sine function for Q15 data.

The Q15 input value is in the range [0 +0.9999] and is mapped to a radian value in the range [0 2\*PI).

### **Parameters**

- **x** [in] Scaled input value in radians.
- x [in] Scaled input value in radians

**Returns** sin(x).

**Returns** sin(x)

#### group groupFastMath

This set of functions provides a fast approximation to sine, cosine, and square root. As compared to most of the other functions in the NMSIS math library, the fast math functions operate on individual values and not arrays. There are separate functions for Q15, Q31, and floating-point data.

# 3.3.3 Complex Math Functions

# **Complex Conjugate**

```
void riscv_cmplx_conj_f32 (const float32_t *pSrc, float32_t *pDst, uint32_t numSamples) void riscv_cmplx_conj_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t numSamples) void riscv_cmplx_conj_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t numSamples) group cmplx_conj
```

Conjugates the elements of a complex data vector.

The pSrc points to the source data and pDst points to the destination data where the result should be written. numSamples specifies the number of complex samples and the data in each array is stored in an interleaved fashion (real, imag, real, imag, ...). Each array has a total of 2\*numSamples values.

The underlying algorithm is used:

There are separate functions for floating-point, Q15, and Q31 data types.

# **Functions**

```
void riscv_cmplx_conj_f32 (const float32_t *pSrc, float32_t *pDst, uint32_t numSamples) Floating-point complex conjugate.
```

#### **Parameters**

- pSrc [in] points to the input vector
- pDst [out] points to the output vector
- numSamples [in] number of samples in each vector

Returns none

```
void riscv_cmplx_conj_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t numSamples) Q15 complex conjugate.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. The Q15 value -1 (0x8000) is saturated to the maximum allowable positive value 0x7FFF.

#### **Parameters**

- pSrc [in] points to the input vector
- pDst [out] points to the output vector

• numSamples - [in] number of samples in each vector

Returns none

```
void riscv_cmplx_conj_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t numSamples) Q31 complex conjugate.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. The Q31 value -1 (0x80000000) is saturated to the maximum allowable positive value 0x7FFFFFF.

## **Parameters**

- pSrc [in] points to the input vector
- pDst [out] points to the output vector
- numSamples [in] number of samples in each vector

Returns none

# **Complex Dot Product**

## group cmplx dot prod

Computes the dot product of two complex vectors. The vectors are multiplied element-by-element and then summed.

The pSrcA points to the first complex input vector and pSrcB points to the second complex input vector. numSamples specifies the number of complex samples and the data in each array is stored in an interleaved fashion (real, imag, real, imag, ...). Each array has a total of 2\*numSamples values.

The underlying algorithm is used:

There are separate functions for floating-point, Q15, and Q31 data types.

#### **Functions**

```
void riscv_cmplx_dot_prod_f32 (const float32_t *pSrcA, const float32_t *pSrcB, uint32_t numSamples, float32_t *realResult, float32_t *imagResult) Floating-point complex dot product.
```

### **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- numSamples [in] number of samples in each vector
- realResult [out] real part of the result returned here
- imagResult [out] imaginary part of the result returned here

Returns none

```
void riscv_cmplx_dot_prod_q15 (const q15_t *pSrcA, const q15_t *pSrcB, uint32_t num-
Samples, q31_t *realResult, q31_t *imagResult)

O15 complex dot product.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The intermediate 1.15 by 1.15 multiplications are performed with full precision and yield a 2.30 result. These are accumulated in a 64-bit accumulator with 34.30 precision. As a final step, the accumulators are converted to 8.24 format. The return results realResult and imagResult are in 8.24 format.

#### **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- numSamples [in] number of samples in each vector
- realResult [out] real part of the result returned here
- imagResult [out] imaginary part of the result returned her

#### Returns none

```
void riscv_cmplx_dot_prod_q31 (const q31_t *pSrcA, const q31_t *pSrcB, uint32_t num-
Samples, q63_t *realResult, q63_t *imagResult)
Q31 complex dot product.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The intermediate 1.31 by 1.31 multiplications are performed with 64-bit precision and then shifted to 16.48 format. The internal real and imaginary accumulators are in 16.48 format and provide 15 guard bits. Additions are nonsaturating and no overflow will occur as long as numSamples is less than 32768. The return results realResult and imagResult are in 16.48 format. Input down scaling is not required.

# **Parameters**

- pSrcA [in] points to the first input vector
- pSrcB [in] points to the second input vector
- numSamples [in] number of samples in each vector
- realResult [out] real part of the result returned here
- imagResult [out] imaginary part of the result returned here

Returns none

# **Complex Magnitude**

```
void riscv_cmplx_mag_f32 (const float32_t *pSrc, float32_t *pDst, uint32_t numSamples) void riscv_cmplx_mag_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t numSamples) void riscv_cmplx_mag_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t numSamples) group cmplx_mag
```

Computes the magnitude of the elements of a complex data vector.

The pSrc points to the source data and pDst points to the where the result should be written. numSamples specifies the number of complex samples in the input array and the data is stored in an interleaved fashion (real,

imag, real, imag, ...). The input array has a total of 2\*numSamples values; the output array has a total of numSamples values.

The underlying algorithm is used:

There are separate functions for floating-point, Q15, and Q31 data types.

### **Functions**

void **riscv\_cmplx\_mag\_f32** (**const** float32\_t \*pSrc, float32\_t \*pDst, uint32\_t numSamples) Floating-point complex magnitude.

#### **Parameters**

- pSrc [in] points to input vector
- pDst [out] points to output vector
- numSamples [in] number of samples in each vector

## Returns none

```
void riscv_cmplx_mag_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t numSamples) Q15 complex magnitude.
```

**Scaling and Overflow Behavior** The function implements 1.15 by 1.15 multiplications and finally output is converted into 2.14 format.

## **Parameters**

- pSrc [in] points to input vector
- pDst [out] points to output vector
- numSamples [in] number of samples in each vector

#### Returns none

```
void riscv_cmplx_mag_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t numSamples) Q31 complex magnitude.
```

**Scaling and Overflow Behavior** The function implements 1.31 by 1.31 multiplications and finally output is converted into 2.30 format. Input down scaling is not required.

## **Parameters**

- pSrc [in] points to input vector
- pDst [out] points to output vector
- numSamples [in] number of samples in each vector

#### Returns none

# **Complex Magnitude Squared**

```
void riscv_cmplx_mag_squared_f32 (const float32_t *pSrc, float32_t *pDst, uint32_t numSamples) void riscv_cmplx_mag_squared_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t numSamples) void riscv_cmplx_mag_squared_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t numSamples) group cmplx_mag_squared
```

Computes the magnitude squared of the elements of a complex data vector.

The pSrc points to the source data and pDst points to the where the result should be written. numSamples specifies the number of complex samples in the input array and the data is stored in an interleaved fashion (real, imag, real, imag, ...). The input array has a total of 2\*numSamples values; the output array has a total of numSamples values.

The underlying algorithm is used:

There are separate functions for floating-point, Q15, and Q31 data types.

### **Functions**

```
void riscv_cmplx_mag_squared_f32 (const float32_t *pSrc, float32_t *pDst, uint32_t num-
Samples)
```

Floating-point complex magnitude squared.

#### **Parameters**

- pSrc [in] points to input vector
- pDst [out] points to output vector
- numSamples [in] number of samples in each vector

Returns none

```
void riscv_cmplx_mag_squared_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t numSamples) Q15 complex magnitude squared.
```

**Scaling and Overflow Behavior** The function implements 1.15 by 1.15 multiplications and finally output is converted into 3.13 format.

# **Parameters**

- pSrc [in] points to input vector
- pDst [out] points to output vector
- numSamples [in] number of samples in each vector

Returns none

```
void riscv_cmplx_mag_squared_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t numSamples) Q31 complex magnitude squared.
```

**Scaling and Overflow Behavior** The function implements 1.31 by 1.31 multiplications and finally output is converted into 3.29 format. Input down scaling is not required.

# **Parameters**

• pSrc - [in] points to input vector

- pDst [out] points to output vector
- numSamples [in] number of samples in each vector

# **Complex-by-Complex Multiplication**

## group CmplxByCmplxMult

Multiplies a complex vector by another complex vector and generates a complex result. The data in the complex arrays is stored in an interleaved fashion (real, imag, real, imag, ...). The parameter numSamples represents the number of complex samples processed. The complex arrays have a total of 2\*numSamples real values.

The underlying algorithm is used:

There are separate functions for floating-point, Q15, and Q31 data types.

### **Functions**

```
void riscv_cmplx_mult_cmplx_f32 (const float32_t *pSrcA, const float32_t *pSrcB, float32_t *pDst, uint32_t numSamples)

Floating-point complex-by-complex multiplication.
```

#### **Parameters**

- pSrcA [in] points to first input vector
- pSrcB [in] points to second input vector
- pDst [out] points to output vector
- numSamples [in] number of samples in each vector

## Returns none

```
void riscv_cmplx_mult_cmplx_q15 (const q15_t *pSrcA, const q15_t *pSrcB, q15_t *pDst, uint32_t numSamples)
Q15 complex-by-complex multiplication.
```

**Scaling and Overflow Behavior** The function implements 1.15 by 1.15 multiplications and finally output is converted into 3.13 format.

## **Parameters**

- pSrcA [in] points to first input vector
- pSrcB [in] points to second input vector
- pDst [out] points to output vector
- numSamples [in] number of samples in each vector

### Returns none

```
void riscv_cmplx_mult_cmplx_q31 (const q31_t *pSrcA, const q31_t *pSrcB, q31_t *pDst, uint32_t numSamples)

O31 complex-by-complex multiplication.
```

**Scaling and Overflow Behavior** The function implements 1.31 by 1.31 multiplications and finally output is converted into 3.29 format. Input down scaling is not required.

#### **Parameters**

- pSrcA [in] points to first input vector
- pSrcB [in] points to second input vector
- pDst [out] points to output vector
- numSamples [in] number of samples in each vector

#### Returns none

# **Complex-by-Real Multiplication**

# group CmplxByRealMult

Multiplies a complex vector by a real vector and generates a complex result. The data in the complex arrays is stored in an interleaved fashion (real, imag, real, imag, ...). The parameter numSamples represents the number of complex samples processed. The complex arrays have a total of 2\*numSamples real values while the real array has a total of numSamples real values.

The underlying algorithm is used:

There are separate functions for floating-point, Q15, and Q31 data types.

### **Functions**

```
void riscv_cmplx_mult_real_f32 (const float32_t *pSrcCmplx, const float32_t *pSrcReal, float32_t *pCmplxDst, uint32_t numSamples)

Floating-point complex-by-real multiplication.
```

## **Parameters**

- pSrcCmplx [in] points to complex input vector
- pSrcReal [in] points to real input vector
- pCmplxDst [out] points to complex output vector
- numSamples [in] number of samples in each vector

# Returns none

```
void riscv_cmplx_mult_real_q15 (const q15_t *pSrcCmplx, const q15_t *pSrcReal, q15_t *pCmplxDst, uint32_t numSamples)

Q15 complex-by-real multiplication.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated.

### **Parameters**

- pSrcCmplx [in] points to complex input vector
- pSrcReal [in] points to real input vector
- pCmplxDst [out] points to complex output vector
- numSamples [in] number of samples in each vector

## Returns none

```
void riscv_cmplx_mult_real_q31 (const q31_t *pSrcCmplx, const q31_t *pSrcReal, q31_t *pCmplxDst, uint32_t numSamples)

Q31 complex-by-real multiplication.
```

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q31 range[0x80000000 0x7FFFFFFF] are saturated.

## **Parameters**

- pSrcCmplx [in] points to complex input vector
- pSrcReal [in] points to real input vector
- pCmplxDst [out] points to complex output vector
- numSamples [in] number of samples in each vector

#### Returns none

# group groupCmplxMath

This set of functions operates on complex data vectors. The data in the complex arrays is stored in an interleaved fashion (real, imag, real, imag, ...). In the API functions, the number of samples in a complex array refers to the number of complex values; the array contains twice this number of real values.

# 3.3.4 Filtering Functions

# **High Precision Q31 Biquad Cascade Filter**

```
void riscv_biquad_cas_df1_32x64_init_q31 (riscv_biquad_cas_df1_32x64_ins_q31 *S, uint8_t numStages, const q31_t *pCoeffs, q63_t *pState, uint8_t postShift)

void riscv_biquad_cas_df1_32x64_q31 (const riscv_biquad_cas_df1_32x64_ins_q31 *S, const
```

# group BiquadCascadeDF1\_32x64

This function implements a high precision Biquad cascade filter which operates on Q31 data values. The filter coefficients are in 1.31 format and the state variables are in 1.63 format. The double precision state variables reduce quantization noise in the filter and provide a cleaner output. These filters are particularly useful when implementing filters in which the singularities are close to the unit circle. This is common for low pass or high pass filters with very low cutoff frequencies.

q31\_t \*pSrc, q31\_t \*pDst, uint32\_t blockSize)

The function operates on blocks of input and output data and each call to the function processes blockSize samples through the filter. pSrc and pDst points to input and output arrays containing blockSize Q31 values.

# Algorithm

Each Biquad stage implements a second order filter using the difference equation: A Direct Form I algorithm is used with 5 coefficients and 4 state variables per stage.

Coefficients b0, b1 and b2 multiply the input signal x[n] and are referred to as the feedforward coefficients. Coefficients a1 and a2 multiply the output signal y[n] and are referred to as the feedback coefficients. Pay careful attention to the sign of the feedback coefficients. Some design tools use the difference equation In this case the feedback coefficients a1 and a2 must be negated when used with the NMSIS DSP Library.



Higher order filters are realized as a cascade of second order sections. numStages refers to the number of second order stages used. For example, an 8th order filter would be realized with numStages=4 second order stages.

A 9th order filter would be realized with numStages=5 second order stages with the coefficients for one of the stages configured as a first order filter (b2=0 and a2=0).



The pState points to state variables array. Each Biquad stage has 4 state variables x[n-1], x[n-2], y[n-1], and y[n-2] and each state variable in 1.63 format to improve precision. The state variables are arranged in the array as:

The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on. The state array has a total length of 4\*numStages values of data in 1.63 format. The state variables are updated after each block of data is processed, the coefficients are untouched.

**Instance Structure** The coefficients and state variables for a filter are stored together in an instance data structure. A separate instance structure must be defined for each filter. Coefficient arrays may be shared among

several instances while state variable arrays cannot be shared.

**Init Function** There is also an associated initialization function which performs the following operations:

- Sets the values of the internal structure fields.
- Zeros out the values in the state buffer. To do this manually without calling the init function, assign the follow subfields of the instance structure: numStages, pCoeffs, postShift, pState. Also set all of the values in pState to zero.

Use of the initialization function is optional. However, if the initialization function is used, then the instance structure cannot be placed into a const data section. To place an instance structure into a const data section, the instance structure must be manually initialized. Set the values in the state buffer to zeros before static initialization. For example, to statically initialize the filter instance structure use where numStages is the number of Biquad stages in the filter; pState is the address of the state buffer; pCoeffs is the address of the coefficient buffer; postShift shift to be applied which is described in detail below.

**Fixed-Point Behavior** Care must be taken while using Biquad Cascade 32x64 filter function. Following issues must be considered:

- · Scaling of coefficients
- Filter gain
- · Overflow and saturation

Filter coefficients are represented as fractional values and restricted to lie in the range [-1 +1). The processing function has an additional scaling parameter postShift which allows the filter coefficients to exceed the range [+1 -1). At the output of the filter's accumulator is a shift register which shifts the result by postShift bits.

This essentially scales the filter coefficients 2^postShift. example, by For realize Coefficient coefficients set the array to: and set postShift=1



The second thing to keep in mind is the gain through the filter. The frequency response of a Biquad filter is a function of its coefficients. It is possible for the gain through the filter to exceed 1.0 meaning that the filter increases the amplitude of certain frequencies. This means that an input signal with amplitude < 1.0 may result in an output > 1.0 and these are saturated or overflowed based on the implementation of the filter. To avoid this behavior the filter needs to be scaled down such that its peak gain < 1.0 or the input signal must be scaled down so that the combination of input and filter are never overflowed.

The third item to consider is the overflow and saturation behavior of the fixed-point Q31 version. This is described in the function specific documentation below.

#### **Functions**

```
void riscv_biquad_cas_df1_32x64_init_q31 (riscv_biquad_cas_df1_32x64_ins_q31 *S, uint8_t numStages, const q31_t *pCoeffs, q63_t *pState, uint8_t postShift)

Initialization function for the Q31 Biquad cascade 32x64 filter.
```

Coefficient and State Ordering The coefficients are stored in the array pCoeffs in the following order: where b1x and a1x are the coefficients for the first stage, b2x and a2x are the coefficients for the second stage, and so on. The pCoeffs array contains a total of 5\*numStages values.

The pState points to state variables array and size of each state variable is 1.63 format. Each Biquad stage has 4 state variables x[n-1], x[n-2], y[n-1], and y[n-2]. The state variables are arranged in the state array as: The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on. The state array has a total length of 4\*numStages values. The state variables are updated after each block of data is processed; the coefficients are untouched.

#### **Parameters**

- S [inout] points to an instance of the high precision Q31 Biquad cascade filter structure
- numStages [in] number of 2nd order stages in the filter
- pCoeffs [in] points to the filter coefficients
- pState [in] points to the state buffer
- postShift [in] Shift to be applied after the accumulator. Varies according to the coefficients format

### Returns none

```
void riscv_biquad_cas_df1_32x64_q31 (const riscv_biquad_cas_df1_32x64_ins_q31 *S, const q31_t *pSrc, q31_t *pDst, uint32_t blockSize)

Processing function for the Q31 Biquad cascade 32x64 filter.
```

**Details** The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around rather than clip. In order to avoid overflows completely the input signal must be scaled down by 2 bits and lie in the range [-0.25 +0.25). After all 5 multiply-accumulates are performed, the 2.62 accumulator is shifted by postShift bits and the result truncated to 1.31 format by discarding the low 32 bits.

Two related functions are provided in the NMSIS DSP library.

- riscv\_biquad\_cascade\_df1\_q31() implements a Biquad cascade with 32-bit coefficients and state variables with a Q63 accumulator.
- riscv\_biquad\_cascade\_df1\_fast\_q31() implements a Biquad cascade with 32-bit coefficients and state variables with a Q31 accumulator.

#### **Parameters**

- **S** [in] points to an instance of the high precision Q31 Biquad cascade filter
- pSrc [in] points to the block of input data

- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

# **Biquad Cascade IIR Filters Using Direct Form I Structure**

q15\_t \*pSrc, q15\_t \*pDst, uint32\_t blockSize)
void riscv\_biquad\_cascade\_df1\_fast\_q31 (const\_riscv\_biquad\_casd\_df1\_inst\_q31 \*S, const\_

 $\label{eq:q31_t*pSrc} q31_t *pSrc, q31_t *pDst, uint32_t blockSize) \\ \mbox{void } \mbox{\bf riscv\_biquad\_cascade\_df1\_init\_f32} \ (\mbox{riscv\_biquad\_casd\_df1\_inst\_f32} \ *S, \ \mbox{uint8\_t} \ \ num-time \ \mbox{\it num-time} \ \mbox{\it riscv\_biquad\_cascade\_df1\_init\_f32} \ \ \mbox{\it void } \mbox{\it riscv\_biquad\_cascade\_df1\_init\_f32} \ \ \mbox{\it void } \mbox{\it riscv\_biquad\_cascade\_df1\_init\_f32} \ \ \mbox{\it void } \mbox{\it riscv\_biquad\_cascade\_df1\_init\_f32} \ \mbox{\it void } \mbox{\it riscv\_biquad\_cascade\_df1\_init\_f32} \ \mbox{\it void } \mbox{\it riscv\_biquad\_cascade\_df1\_init\_f32} \ \mbox{\it void } \mbox{\it void } \mbox{\it riscv\_biquad\_cascade\_df1\_init\_f32} \ \mbox{\it void } \mbox{\it void } \mbox{\it void } \mbox{\it riscv\_biquad\_cascade\_df1\_init\_f32} \ \mbox{\it void } \mbox{\it void$ 

Stages, const float32 t \*pCoeffs, float32 t \*pState)

void riscv\_biquad\_cascade\_df1\_init\_q15 (riscv\_biquad\_casd\_df1\_inst\_q15 \*S, uint8\_t num-Stages, const q15\_t \*pCoeffs, q15\_t \*pState, int8\_t postShift)

void riscv\_biquad\_cascade\_df1\_init\_q31 (riscv\_biquad\_casd\_df1\_inst\_q31 \*S, uint8\_t num-Stages, const q31\_t \*pCoeffs, q31\_t \*pState, int8\_t postShift)

void riscv\_biquad\_cascade\_df1\_q15 (const riscv\_biquad\_casd\_df1\_inst\_q15 \*S, const q15\_t \*pSrc, q15\_t \*pDst, uint32\_t blockSize)

void riscv\_biquad\_cascade\_df1\_q31 (const riscv\_biquad\_casd\_df1\_inst\_q31 \*S, const q31\_t \*pSrc, q31\_t \*pDst, uint32\_t blockSize)

# group BiquadCascadeDF1

This set of functions implements arbitrary order recursive (IIR) filters. The filters are implemented as a cascade of second order Biquad sections. The functions support Q15, Q31 and floating-point data types. Fast version of Q15 and Q31 also available.

The functions operate on blocks of input and output data and each call to the function processes blockSize samples through the filter. pSrc points to the array of input data and pDst points to the array of output data. Both arrays contain blockSize values.

## Algorithm

Each Biquad stage implements a second order filter using the difference equation: A Direct Form I algorithm is used with 5 coefficients and 4 state variables per stage.

Coefficients b0, b1 and b2 multiply the input signal x[n] and are referred to as the feedforward coefficients. Coefficients a1 and a2 multiply the output signal y[n] and are referred to as the feedback coefficients. Pay careful attention to the sign of the feedback coefficients. Some design tools use the difference equation In this case the feedback coefficients a1 and a2 must be negated when used with the NMSIS DSP



Library.

Higher order filters are realized as a cascade of second order sections. numStages refers to the number of second order stages used. For example, an 8th order filter would be realized with numStages=4 second order stages.

9th order filter with numStages=5 would be realized second order the coefficients for one of the stages configured as a first order filter (b2=0)and



The pState points to state variables array. Each Biquad stage has 4 state variables x[n-1], x[n-2], y[n-1], and y[n-2]. The state variables are arranged in the pState array as:

The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on. The state array has a total length of 4\*numStages values. The state variables are updated after each block of data is processed, the coefficients are untouched.

**Instance Structure** The coefficients and state variables for a filter are stored together in an instance data structure. A separate instance structure must be defined for each filter. Coefficient arrays may be shared among several instances while state variable arrays cannot be shared. There are separate instance structure declarations for each of the 3 supported data types.

**Init Function** There is also an associated initialization function for each data type. The initialization function performs following operations:

- Sets the values of the internal structure fields.
- Zeros out the values in the state buffer. To do this manually without calling the init function, assign the follow subfields of the instance structure: numStages, pCoeffs, pState. Also set all of the values in pState to zero.

Use of the initialization function is optional. However, if the initialization function is used, then the instance structure cannot be placed into a const data section. To place an instance structure into a const data section, the instance structure must be manually initialized. Set the values in the state buffer to zeros before static initialization. The code below statically initializes each of the 3 different data type filter instance structures where numStages is the number of Biquad stages in the filter; pState is the address of the state buffer; pCoeffs is the address of the coefficient buffer; postShift shift to be applied.

**Fixed-Point Behavior** Care must be taken when using the Q15 and Q31 versions of the Biquad Cascade filter functions. Following issues must be considered:

- Scaling of coefficients
- Filter gain
- · Overflow and saturation

Scaling of coefficients Filter coefficients are represented as fractional values and coefficients are restricted to lie in the range [-1 +1). The fixed-point functions have an additional scaling parameter postShift which allow the filter coefficients to exceed the range [+1 -1). At the output of the filter's accumulator is a shift register which shifts the result by postShift bits.

This filter by essentially scales the coefficients 2^postShift. For exand ample, to realize the coefficients set the pCoeffs array set



postShift=1

**Filter gain** The frequency response of a Biquad filter is a function of its coefficients. It is possible for the gain through the filter to exceed 1.0 meaning that the filter increases the amplitude of certain frequencies. This means that an input signal with amplitude < 1.0 may result in an output > 1.0 and these are saturated or overflowed based on the implementation of the filter. To avoid this behavior the filter needs to be scaled down such that its peak gain < 1.0 or the input signal must be scaled down so that the combination of input and filter are never overflowed.

**Overflow and saturation** For Q15 and Q31 versions, it is described separately as part of the function specific documentation below.

### **Functions**

```
void riscv_biquad_cascade_df1_f32 (const riscv_biquad_casd_df1_inst_f32 *S, const float32_t *pSrc, float32_t *pDst, uint32_t blockSize)

Processing function for the floating-point Biquad cascade filter.
```

### **Parameters**

- S [in] points to an instance of the floating-point Biquad cascade structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

## Returns none

```
void riscv_biquad_cascade_df1_fast_q15 (const riscv_biquad_casd_df1_inst_q15 *S, const q15_t *pSrc, q15_t *pDst, uint32_t blockSize)
```

Processing function for the Q15 Biquad cascade filter (fast variant).

Fast but less precise processing function for the Q15 Biquad cascade filter for RISC-V Core with DSP enabled.

Scaling and Overflow Behavior This fast version uses a 32-bit accumulator with 2.30 format. The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around and distorts the result. In order to avoid overflows completely the input signal must be scaled down by two bits and lie in the range [-0.25 +0.25). The 2.30 accumulator is then shifted by postShift bits and the result truncated to 1.15 format by discarding the low 16 bits.

**Remark** Refer to riscv\_biquad\_cascade\_df1\_q15() for a slower implementation of this function which uses 64-bit accumulation to avoid wrap around distortion. Both the slow and the fast versions use the same instance structure. Use the function riscv\_biquad\_cascade\_df1\_init\_q15() to initialize the filter structure.

## **Parameters**

- S [in] points to an instance of the Q15 Biquad cascade structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process per call

#### Returns none

```
void riscv_biquad_cascade_df1_fast_q31 (const riscv_biquad_casd_df1_inst_q31 *S, const q31_t *pSrc, q31_t *pDst, uint32_t blockSize)
```

Processing function for the Q31 Biquad cascade filter (fast variant).

Fast but less precise processing function for the Q31 Biquad cascade filter for RISC-V Core with DSP enabled.

**Scaling and Overflow Behavior** This function is optimized for speed at the expense of fixed-point precision and overflow protection. The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format. These intermediate results are added to a 2.30 accumulator. Finally, the accumulator is saturated and converted to a 1.31 result. The fast version has the same overflow behavior as the standard version

and provides less precision since it discards the low 32 bits of each multiplication result. In order to avoid overflows completely the input signal must be scaled down by two bits and lie in the range [-0.25 +0.25). Use the intialization function riscv\_biquad\_cascade\_df1\_init\_q31() to initialize filter structure.

**Remark** Refer to riscv\_biquad\_cascade\_df1\_q31() for a slower implementation of this function which uses 64-bit accumulation to provide higher precision. Both the slow and the fast versions use the same instance structure. Use the function riscv\_biquad\_cascade\_df1\_init\_q31() to initialize the filter structure.

#### **Parameters**

- **S** [in] points to an instance of the Q31 Biquad cascade structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process per call

#### Returns none

```
void riscv_biquad_cascade_df1_init_f32 (riscv_biquad_casd_df1_inst_f32 *S, uint8_t num-
Stages, const float32_t *pCoeffs, float32_t
*nState)
```

Initialization function for the floating-point Biquad cascade filter.

The initialization function which must be used is riscv\_biquad\_cascade\_df1\_mve\_init\_f32.

Coefficient and State Ordering The coefficients are stored in the array pCoeffs in the following order:

where b1x and a1x are the coefficients for the first stage, b2x and a2x are the coefficients for the second stage, and so on. The pCoeffs array contains a total of 5\*numStages values.

The pState is a pointer to state array. Each Biquad stage has 4 state variables x[n-1], x[n-2], y[n-1], and y[n-2]. The state variables are arranged in the pState array as: The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on. The state array has a total length of 4\*numStages values. The state variables are updated after each block of data is processed; the coefficients are untouched.

**For MVE code, an additional buffer of modified coefficients is required.** Its size is numStages and each element of this buffer has type riscv\_biquad\_mod\_coef\_f32. So, its total size is 32\*numStages float32\_t elements.

#### **Parameters**

- **S [inout]** points to an instance of the floating-point Biquad cascade structure.
- numStages [in] number of 2nd order stages in the filter.
- pCoeffs [in] points to the filter coefficients.
- pState [in] points to the state buffer.

#### Returns none

```
void riscv_biquad_cascade_df1_init_q15 (riscv_biquad_casd_df1_inst_q15 *S, uint8_t numStages, const q15_t *pCoeffs, q15_t *pState, int8_t postShift)

Initialization function for the Q15 Biquad cascade filter.
```

Coefficient and State Ordering The coefficients are stored in the array pCoeffs in the following order:

where b1x and a1x are the coefficients for the first stage, b2x and a2x are the coefficients for the second stage, and so on. The pCoeffs array contains a total of 6\*numStages values. The zero coefficient between b1 and b2 facilities use of 16-bit SIMD instructions on the RISC-V Core with DSP.

The state variables are stored in the array pState. Each Biquad stage has 4 state variables x[n-1], x[n-2], y[n-1], and y[n-2]. The state variables are arranged in the pState array as: The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on. The state array has a total length of 4\*numStages values. The state variables are updated after each block of data is processed; the coefficients are untouched.

### **Parameters**

- **S [inout]** points to an instance of the Q15 Biquad cascade structure.
- numStages [in] number of 2nd order stages in the filter.
- pCoeffs [in] points to the filter coefficients.
- pState [in] points to the state buffer.
- postShift [in] Shift to be applied to the accumulator result. Varies according to the coefficients format

#### Returns none

Initialization function for the Q31 Biquad cascade filter.

Coefficient and State Ordering The coefficients are stored in the array pCoeffs in the following order:

where b1x and a1x are the coefficients for the first stage, b2x and a2x are the coefficients for the second stage, and so on. The pCoeffs array contains a total of 5\*numStages values.

The pState points to state variables array. Each Biquad stage has 4 state variables x[n-1], x[n-2], y[n-1], and y[n-2]. The state variables are arranged in the pState array as: The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on. The state array has a total length of 4\*numStages values. The state variables are updated after each block of data is processed; the coefficients are untouched.

## **Parameters**

- **S [inout]** points to an instance of the Q31 Biquad cascade structure.
- numStages [in] number of 2nd order stages in the filter.
- pCoeffs [in] points to the filter coefficients.
- pState [in] points to the state buffer.
- postShift [in] Shift to be applied after the accumulator. Varies according to the coefficients format

### Returns none

```
void riscv_biquad_cascade_df1_q15 (const riscv_biquad_casd_df1_inst_q15 *S, const q15_t *pSrc, q15_t *pDst, uint32_t blockSize)

Processing function for the Q15 Biquad cascade filter.
```

Scaling and Overflow Behavior The function is implemented using a 64-bit internal accumulator. Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved. The accumulator is then shifted by postShift bits to truncate the result to 1.15 format by discarding the low 16 bits. Finally, the result is saturated to 1.15 format.

**Remark** Refer to riscv\_biquad\_cascade\_df1\_fast\_q15() for a faster but less precise implementation of this filter.

## **Parameters**

- **S** [in] points to an instance of the Q15 Biquad cascade structure
- pSrc [in] points to the block of input data
- pDst [out] points to the location where the output result is written
- blockSize [in] number of samples to process

#### Returns none

```
void riscv_biquad_cascade_df1_q31 (const riscv_biquad_casd_df1_inst_q31 *S, const q31_t *pSrc, q31_t *pDst, uint32_t blockSize)

Processing function for the Q31 Biquad cascade filter.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around rather than clip. In order to avoid overflows completely the input signal must be scaled down by 2 bits and lie in the range [-0.25 +0.25). After all 5 multiply-accumulates are performed, the 2.62 accumulator is shifted by postShift bits and the result truncated to 1.31 format by discarding the low 32 bits.

**Remark** Refer to riscv\_biquad\_cascade\_df1\_fast\_q31() for a faster but less precise implementation of this filter.

#### **Parameters**

- S [in] points to an instance of the Q31 Biquad cascade structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

Returns none

# Biquad Cascade IIR Filters Using a Direct Form II Transposed Structure

```
LOW_OPTIMIZATION_ENTER void riscv_biquad_cascade_df2T_f64 (const riscv_biquad_cascade_df2T_tound_cascade_df2T_f64 (const riscv_biquad_cascade_df2T_void riscv_biquad_cascade_df2T_instance_f32 *S, uint8_t numStages, const float32_t *pCoeffs, float32_t *pState)

void riscv_biquad_cascade_df2T_init_f64 (riscv_biquad_cascade_df2T_instance_f64 *S, uint8_t numStages, const float64_t *pCoeffs, float64_t *pCoeffs, float64_t *pState)
```

LOW\_OPTIMIZATION\_ENTER void riscv\_biquad\_cascade\_stereo\_df2T\_f32 (const riscv\_biquad\_cascade\_stereo\_df3T\_f32 (const riscv\_biquad\_cascade\_stereo\_f32 (const riscv\_

void riscv\_biquad\_cascade\_stereo\_df2T\_init\_f32 (riscv\_biquad\_cascade\_stereo\_df2T\_instance\_f32 \*S, uint8\_t numStages, const float32\_t \*pCoeffs, float32\_t \*pState)

## group BiquadCascadeDF2T

This set of functions implements arbitrary order recursive (IIR) filters using a transposed direct form II structure. The filters are implemented as a cascade of second order Biquad sections. These functions provide a slight memory savings as compared to the direct form I Biquad filter functions. Only floating-point data is supported.

This function operate on blocks of input and output data and each call to the function processes blockSize samples through the filter. pSrc points to the array of input data and pDst points to the array of output data. Both arrays contain blockSize values.

**Algorithm** Each Biquad stage implements a second order filter using the difference equation: where d1 and d2 represent the two state values.

A Biquad filter using a transposed Direct Form II structure is shown below.

Coefficients b0, b1, and b2 multiply the input signal x[n] and are referred to as the feed-forward coefficients. Coefficients a1 and a2 multiply the output signal y[n] and are referred to as the feedback coefficients. Pay careful attention to the sign of the feedback coefficients. Some design tools flip the sign of the feedback coefficients: In this case the feedback coefficients a1 and a2 must be negated when used with the NMSIS DSP Library.



Higher order filters are realized as a cascade of second order sections. numStages refers to the number of second order stages used. For example, an 8th order filter would be realized with numStages=4 second order stages. A 9th order filter would be realized with numStages=5 second order stages with the coefficients for one of the stages configured as a first order filter (b2=0 and a2=0).

pState points to the state variable array. Each Biquad stage has 2 state variables d1 and d2. The state variables are arranged in the pState array as: where d1x refers to the state variables for the first Biquad and d2x refers to the state variables for the second Biquad. The state array has a total length of 2\*numStages values. The state variables are updated after each block of data is processed; the coefficients are untouched.

The NMSIS library contains Biquad filters in both Direct Form I and transposed Direct Form II. The advantage of the Direct Form I structure is that it is numerically more robust for fixed-point data types. That is why the Direct Form I structure supports Q15 and Q31 data types. The transposed Direct Form II structure, on the other hand, requires a wide dynamic range for the state variables d1 and d2. Because of this, the NMSIS library only has a floating-point version of the Direct Form II Biquad. The advantage of the Direct Form II Biquad is that it requires half the number of state variables, 2 rather than 4, per Biquad stage.

**Instance Structure** The coefficients and state variables for a filter are stored together in an instance data structure. A separate instance structure must be defined for each filter. Coefficient arrays may be shared among several instances while state variable arrays cannot be shared.

**Init Functions** There is also an associated initialization function. The initialization function performs following operations:

- Sets the values of the internal structure fields.
- Zeros out the values in the state buffer. To do this manually without calling the init function, assign the follow subfields of the instance structure: numStages, pCoeffs, pState. Also set all of the values in pState to zero.

Use of the initialization function is optional. However, if the initialization function is used, then the instance structure cannot be placed into a const data section. To place an instance structure into a const data section, the instance structure must be manually initialized. Set the values in the state buffer to zeros before static initialization. For example, to statically initialize the instance structure use where numStages is the number of Biquad stages in the filter; pState is the address of the state buffer. pCoeffs is the address of the coefficient buffer;

### **Functions**

LOW\_OPTIMIZATION\_ENTER void riscv\_biquad\_cascade\_df2T\_f32 (const riscv\_biquad\_cascade\_ Processing function for the floating-point transposed direct form II Biquad cascade filter.

#### **Parameters**

- **S** [in] points to an instance of the filter data structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

Returns none

LOW\_OPTIMIZATION\_ENTER void riscv\_biquad\_cascade\_df2T\_f64 (const riscv\_biquad\_cascade\_ Processing function for the floating-point transposed direct form II Biquad cascade filter.

#### **Parameters**

- **S** [in] points to an instance of the filter data structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

### Returns none

void riscv\_biquad\_cascade\_df2T\_init\_f32 (riscv\_biquad\_cascade\_df2T\_instance\_f32 \*S, uint8\_t numStages, const float32\_t \*pCoeffs, float32\_t \*pState)

Initialization function for the floating-point transposed direct form II Biquad cascade filter.

For Neon version, this array is bigger. If numstages = 4x + y, then the array has size: 32\*x + 5\*y and it must be initialized using the function riscv\_biquad\_cascade\_df2T\_compute\_coefs\_f32 which is taking the standard array coefficient as parameters.

**Coefficient and State Ordering** The coefficients are stored in the array pCoeffs in the following order in the not Neon version.

where b1x and a1x are the coefficients for the first stage, b2x and a2x are the coefficients for the second stage, and so on. The pCoeffs array contains a total of 5\*numStages values.

But, an array of 8\*numstages is a good approximation.

Then, the initialization can be done with:

## In this example, neonCoefs is a bigger array of size 8 \* numStages. coefs is the standard array:

The pState is a pointer to state array. Each Biquad stage has 2 state variables d1, and d2. The 2 state variables for stage 1 are first, then the 2 state variables for stage 2, and so on. The state array has a total length of 2\*numStages values. The state variables are updated after each block of data is processed; the coefficients are untouched.

#### **Parameters**

- **S [inout]** points to an instance of the filter data structure.
- numStages [in] number of 2nd order stages in the filter.
- pCoeffs [in] points to the filter coefficients.
- pState [in] points to the state buffer.

### Returns none

```
void riscv_biquad_cascade_df2T_init_f64 (riscv_biquad_cascade_df2T_instance_f64 *S, uint8_t numStages, const float64_t *pCoeffs, float64_t *pState)
```

Initialization function for the floating-point transposed direct form II Biquad cascade filter.

Coefficient and State Ordering The coefficients are stored in the array pCoeffs in the following order:

where b1x and a1x are the coefficients for the first stage, b2x and a2x are the coefficients for the second stage, and so on. The pCoeffs array contains a total of 5\*numStages values.

The pState is a pointer to state array. Each Biquad stage has 2 state variables d1, and d2. The 2 state variables for stage 1 are first, then the 2 state variables for stage 2, and so on. The state array has a total length of 2\*numStages values. The state variables are updated after each block of data is processed; the coefficients are untouched.

## **Parameters**

- S [inout] points to an instance of the filter data structure
- numStages [in] number of 2nd order stages in the filter
- pCoeffs [in] points to the filter coefficients
- pState [in] points to the state buffer

## Returns none

# LOW\_OPTIMIZATION\_ENTER void riscv\_biquad\_cascade\_stereo\_df2T\_f32 (const riscv\_biquad\_c Processing function for the floating-point transposed direct form II Biquad cascade filter.

Processing function for the floating-point transposed direct form II Biquad cascade filter. 2 channels.

## **Parameters**

- **S** [in] points to an instance of the filter data structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

# Returns none

```
void riscv_biquad_cascade_stereo_df2T_init_f32 (riscv_biquad_cascade_stereo_df2T_instance_f32

*S, uint8_t numStages, const
float32_t *pCoeffs, float32_t
*pState)
```

Initialization function for the floating-point transposed direct form II Biquad cascade filter.

Coefficient and State Ordering The coefficients are stored in the array pCoeffs in the following order:

where b1x and a1x are the coefficients for the first stage, b2x and a2x are the coefficients for the second stage, and so on. The pCoeffs array contains a total of 5\*numStages values.

The pState is a pointer to state array. Each Biquad stage has 2 state variables d1, and d2 for each channel. The 2 state variables for stage 1 are first, then the 2 state variables for stage 2, and so on. The state array has a total length of 2\*numStages values. The state variables are updated after each block of data is processed; the coefficients are untouched.

#### **Parameters**

- **S** [inout] points to an instance of the filter data structure.
- numStages [in] number of 2nd order stages in the filter.
- pCoeffs [in] points to the filter coefficients.
- pState [in] points to the state buffer.

### Returns none

## Convolution

```
void riscv_conv_f32 (const float32_t *pSrcA, uint32_t srcALen, const float32_t *pSrcB, uint32_t srcBLen, float32_t *pDst)
```

```
void riscv_conv_fast_opt_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst, q15_t *pScratch1, q15_t *pScratch2)
```

void riscv\_conv\_fast\_q15 (const q15\_t \*pSrcA, uint32\_t srcALen, const q15\_t \*pSrcB, uint32\_t srcBLen, q15\_t \*pDst)

void riscv\_conv\_fast\_q31 (const q31\_t \*pSrcA, uint32\_t srcALen, const q31\_t \*pSrcB, uint32\_t sr-cBLen, q31\_t \*pDst)

void **riscv\_conv\_opt\_q15** (**const** q15\_t \**pSrcA*, uint32\_t *srcALen*, **const** q15\_t \**pSrcB*, uint32\_t *srcBLen*, q15\_t \**pDst*, q15\_t \**pScratch1*, q15\_t \**pScratch2*)

void **riscv\_conv\_opt\_q7** (**const** q7\_t \*pSrcA, uint32\_t srcALen, **const** q7\_t \*pSrcB, uint32\_t srcBLen, q7\_t \*pDst, q15\_t \*pScratch1, q15\_t \*pScratch2)

void riscv\_conv\_q15 (const q15\_t \*pSrcA, uint32\_t srcALen, const q15\_t \*pSrcB, uint32\_t srcBLen, q15\_t \*pDst)

void riscv\_conv\_q31 (const q31\_t \*pSrcA, uint32\_t srcALen, const q31\_t \*pSrcB, uint32\_t srcBLen, q31\_t \*pDst)

void **riscv\_conv\_q7** (**const** q7\_t \*pSrcA, uint32\_t srcALen, **const** q7\_t \*pSrcB, uint32\_t srcBLen, q7\_t \*pDst)

# group Conv

Convolution is a mathematical operation that operates on two finite length vectors to generate a finite length output vector. Convolution is similar to correlation and is frequently used in filtering and data analysis. The NMSIS DSP library contains functions for convolving Q7, Q15, Q31, and floating-point data types. The library also provides fast versions of the Q15 and Q31 functions.

Algorithm Let a[n] and b[n] be sequences of length srcALen and srcBLen samples respectively. Then the convolution

$$c[n] = \sum_{k=0}^{\text{src ALe n}} a[k]b[n-k]$$

is defined as

Note that c[n] is of length srcALen + srcBLen - 1 and is defined over the interval n=0, 1, 2, ..., srcALen + srcBLen - 2. pSrcA points to the first input vector of length srcALen and pSrcB points to the second input vector of length srcBLen. The output result is written to pDst and the calling function must allocate srcALen+srcBLen-1 words for the result.

Conceptually, when two signals a[n] and b[n] are convolved, the signal b[n] slides over a[n]. For each offset n, the overlapping portions of a[n] and b[n] are multiplied and summed together.

Note that convolution is a commutative operation:

This means that switching the A and B arguments to the convolution functions has no effect.

- **Fixed-Point Behavior** Convolution requires summing up a large number of intermediate products. As such, the Q7, Q15, and Q31 functions run a risk of overflow and saturation. Refer to the function specific documentation below for further details of the particular algorithm used.
- **Fast Versions** Fast versions are supported for Q31 and Q15. Cycles for Fast versions are less compared to Q31 and Q15 of conv and the design requires the input signals should be scaled down to avoid intermediate overflows.
- **Opt Versions** Opt versions are supported for Q15 and Q7. Design uses internal scratch buffer for getting good optimisation. These versions are optimised in cycles and consumes more memory (Scratch memory) compared to Q15 and Q7 versions

# **Functions**

void  $riscv\_conv\_f32$  (const float32\_t \*pSrcA, uint32\_t srcALen, const float32\_t \*pSrcB, uint32\_t srcBLen, float32\_t \*pDst)

Convolution of floating-point sequences.

# **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length srcALen+srcBLen-1.

# Returns none

void riscv\_conv\_fast\_opt\_q15 (const q15\_t \*pSrcA, uint32\_t srcALen, const q15\_t \*pSrcB, uint32\_t srcBLen, q15\_t \*pDst, q15\_t \*pScratch1, q15\_t \*pScratch2)

Convolution of Q15 sequences (fast version).

Convolution of Q15 sequences (fast version) for RISC-V Core with DSP enabled.

Scaling and Overflow Behavior This fast version uses a 32-bit accumulator with 2.30 format. The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit. There is no saturation on intermediate additions. Thus, if the accumulator overflows it wraps around and distorts the result. The input signals should be scaled down to avoid intermediate overflows. Scale down the inputs by log2(min(srcALen, srcBLen)) (log2 is read as log to the base 2) times to avoid overflows, as maximum of min(srcALen, srcBLen) number of additions are carried internally. The 2.30 accumulator is right shifted by 15 bits and then saturated to 1.15 format to yield the final result.

**Remark** Refer to riscv\_conv\_q15() for a slower implementation of this function which uses 64-bit accumulation to avoid wrap around distortion.

### **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- srcBLen [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length srcALen+srcBLen-1
- pScratch1 [in] points to scratch buffer of size max(srcALen, srcBLen) + 2\*min(srcALen, srcBLen) 2
- pScratch2 [in] points to scratch buffer of size min(srcALen, srcBLen

### Returns none

```
void riscv_conv_fast_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst)
```

Convolution of Q15 sequences (fast version).

Convolution of Q15 sequences (fast version) for RISC-V Core with DSP enabled.

Scaling and Overflow Behavior This fast version uses a 32-bit accumulator with 2.30 format. The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit. There is no saturation on intermediate additions. Thus, if the accumulator overflows it wraps around and distorts the result. The input signals should be scaled down to avoid intermediate overflows. Scale down the inputs by log2(min(srcALen, srcBLen)) (log2 is read as log to the base 2) times to avoid overflows, as maximum of min(srcALen, srcBLen) number of additions are carried internally. The 2.30 accumulator is right shifted by 15 bits and then saturated to 1.15 format to yield the final result.

**Remark** Refer to riscv\_conv\_q15() for a slower implementation of this function which uses 64-bit accumulation to avoid wrap around distortion.

# **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length srcALen+srcBLen-1

```
void riscv_conv_fast_q31 (const q31_t *pSrcA, uint32_t srcALen, const q31_t *pSrcB, uint32_t srcBLen, q31_t *pDst)
```

Convolution of Q31 sequences (fast version).

Convolution of Q31 sequences (fast version) for RISC-V Core with DSP enabled.

**Scaling and Overflow Behavior** This function is optimized for speed at the expense of fixed-point precision and overflow protection. The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format. These intermediate results are accumulated in a 32-bit register in 2.30 format. Finally, the accumulator is saturated and converted to a 1.31 result.

The fast version has the same overflow behavior as the standard version but provides less precision since it discards the low 32 bits of each multiplication result. In order to avoid overflows completely the input signals must be scaled down. Scale down the inputs by log2(min(srcALen, srcBLen)) (log2 is read as log to the base 2) times to avoid overflows, as maximum of min(srcALen, srcBLen) number of additions are carried internally.

**Remark** Refer to riscv\_conv\_q31() for a slower implementation of this function which uses 64-bit accumulation to provide higher precision.

#### **Parameters**

- pSrcA [in] points to the first input sequence.
- **srcALen** [in] length of the first input sequence.
- pSrcB [in] points to the second input sequence.
- **srcBLen** [in] length of the second input sequence.
- pDst [out] points to the location where the output result is written. Length srcALen+srcBLen-1.

# Returns none

```
void riscv_conv_opt_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst, q15_t *pScratch1, q15_t *pScratch2)

Convolution of Q15 sequences.
```

**Scaling and Overflow Behavior** The function is implemented using a 64-bit internal accumulator. Both inputs are in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. This approach provides 33 guard bits and there is no risk of overflow. The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.

**Remark** Refer to riscv\_conv\_fast\_q15() for a faster but less precise version of this function.

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length srcALen+srcBLen-1.

- pScratch1 [in] points to scratch buffer of size max(srcALen, srcBLen) + 2\*min(srcALen, srcBLen) 2.
- pScratch2 [in] points to scratch buffer of size min(srcALen, srcBLen).

```
void riscv_conv_opt_q7 (const q7_t *pSrcA, uint32_t srcALen, const q7_t *pSrcB, uint32_t srcBLen, q7_t *pDst, q15_t *pScratch1, q15_t *pScratch2)

Convolution of Q7 sequences.
```

Scaling and Overflow Behavior The function is implemented using a 32-bit internal accumulator. Both the inputs are represented in 1.7 format and multiplications yield a 2.14 result. The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format. This approach provides 17 guard bits and there is no risk of overflow as long as max(srcAlen, srcBlen) <131072. The 18.14 result is then truncated to 18.7 format by discarding the low 7 bits and then saturated to 1.7 format.

# **Parameters**

- pSrcA [in] points to the first input sequence
- **srcALen** [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- srcBLen [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length srcALen+srcBLen-1.
- pScratch1 [in] points to scratch buffer(of type q15\_t) of size max(srcALen, srcBLen) + 2\*min(srcALen, srcBLen) 2.
- pScratch2 [in] points to scratch buffer (of type q15\_t) of size min(srcALen, srcBLen).

# Returns none

```
void riscv_conv_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst)

Convolution of Q15 sequences.
```

**Scaling and Overflow Behavior** The function is implemented using a 64-bit internal accumulator. Both inputs are in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. This approach provides 33 guard bits and there is no risk of overflow. The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.

Remark Refer to riscv\_conv\_fast\_q15() for a faster but less precise version of this function.

**Remark** Refer to riscv\_conv\_opt\_q15() for a faster implementation of this function using scratch buffers.

# **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- srcBLen [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length srcALen+srcBLen-1.

```
void riscv_conv_q31 (const q31_t *pSrcA, uint32_t srcALen, const q31_t *pSrcB, uint32_t srcBLen, q31_t *pDst)

Convolution of Q31 sequences.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. There is no saturation on intermediate additions. Thus, if the accumulator overflows it wraps around and distorts the result. The input signals should be scaled down to avoid intermediate overflows. Scale down the inputs by log2(min(srcALen, srcBLen)) (log2 is read as log to the base 2) times to avoid overflows, as maximum of min(srcALen, srcBLen) number of additions are carried internally. The 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.

Remark Refer to riscv\_conv\_fast\_q31() for a faster but less precise implementation of this function.

#### **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- srcBLen [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length srcALen+srcBLen-1.

### Returns none

```
void riscv_conv_q7 (const q7_t *pSrcA, uint32_t srcALen, const q7_t *pSrcB, uint32_t srcBLen, q7_t *pDst)

Convolution of Q7 sequences.
```

Scaling and Overflow Behavior The function is implemented using a 32-bit internal accumulator. Both the inputs are represented in 1.7 format and multiplications yield a 2.14 result. The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format. This approach provides 17 guard bits and there is no risk of overflow as long as max(srcAlen, srcBlen) <131072. The 18.14 result is then truncated to 18.7 format by discarding the low 7 bits and then saturated to 1.7 format.

**Remark** Refer to riscv\_conv\_opt\_q7() for a faster implementation of this function.

# **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen [in]** length of the second input sequence
- pDst [out] points to the location where the output result is written. Length srcALen+srcBLen-1.

### Returns none

## **Partial Convolution**

- riscv\_status riscv\_conv\_partial\_f32 (const float32\_t \*pSrcA, uint32\_t \*srcALen, const float32\_t \*pSrcB, uint32\_t \*srcBLen, float32\_t \*pDst, uint32\_t firstIndex, uint32\_t numPoints)
- riscv\_status riscv\_conv\_partial\_fast\_opt\_q15 (const q15\_t \*pSrcA, uint32\_t srcALen, const q15\_t \*pSrcB, uint32\_t srcBLen, q15\_t \*pDst, uint32\_t firstIndex, uint32\_t numPoints, q15\_t \*pScratch1, q15\_t \*pScratch2)
- riscv\_status riscv\_conv\_partial\_fast\_q15 (const q15\_t \*pSrcA, uint32\_t srcALen, const q15\_t \*pSrcB, uint32\_t srcBLen, q15\_t \*pDst, uint32\_t firstIn-dex, uint32\_t numPoints)
- riscv\_status riscv\_conv\_partial\_fast\_q31 (const q31\_t \*pSrcA, uint32\_t srcALen, const q31\_t \*pSrcB, uint32\_t srcBLen, q31\_t \*pDst, uint32\_t firstIn-dex, uint32\_t numPoints)
- riscv\_status riscv\_conv\_partial\_opt\_q15 (const q15\_t \*pSrcA, uint32\_t srcALen, const q15\_t \*pSrcB, uint32\_t srcBLen, q15\_t \*pDst, uint32\_t firstIn-dex, uint32\_t numPoints, q15\_t \*pScratch1, q15\_t \*pScratch2)
- riscv\_status riscv\_conv\_partial\_opt\_q7 (const q7\_t \*pSrcA, uint32\_t srcALen, const q7\_t \*pSrcB, uint32\_t srcBLen, q7\_t \*pDst, uint32\_t firstIndex, uint32\_t numPoints, q15\_t \*pScratch1, q15\_t \*pScratch2)
- riscv\_status riscv\_conv\_partial\_q15 (const q15\_t \*pSrcA, uint32\_t srcALen, const q15\_t \*pSrcB, uint32\_t srcBLen, q15\_t \*pDst, uint32\_t firstIndex, uint32\_t numPoints)
- riscv\_status riscv\_conv\_partial\_q31 (const q31\_t \*pSrcA, uint32\_t srcALen, const q31\_t \*pSrcB, uint32\_t srcBLen, q31\_t \*pDst, uint32\_t firstIndex, uint32\_t numPoints)
- riscv\_status riscv\_conv\_partial\_q7 (const q7\_t \*pSrcA, uint32\_t srcALen, const q7\_t \*pSrcB, uint32\_t srcBLen, q7\_t \*pDst, uint32\_t firstIndex, uint32\_t num-Points)

### group PartialConv

Partial Convolution is equivalent to Convolution except that a subset of the output samples is generated. Each function has two additional arguments. firstIndex specifies the starting index of the subset of output samples. numPoints is the number of output samples to compute. The function computes the output in the range [firstIndex, ..., firstIndex+numPoints-1]. The output array pDst contains numPoints values.

The allowable range of output indices is [0 srcALen+srcBLen-2]. If the requested subset does not fall in this range then the functions return RISCV\_MATH\_ARGUMENT\_ERROR. Otherwise the functions return RISCV\_MATH\_SUCCESS.

**Note:** Refer to riscv\_conv\_f32() for details on fixed point behavior.

- **Fast Versions** Fast versions are supported for Q31 and Q15 of partial convolution. Cycles for Fast versions are less compared to Q31 and Q15 of partial conv and the design requires the input signals should be scaled down to avoid intermediate overflows.
- **Opt Versions** Opt versions are supported for Q15 and Q7. Design uses internal scratch buffer for getting good optimisation. These versions are optimised in cycles and consumes more memory (Scratch memory) compared to Q15 and Q7 versions of partial convolution

## **Functions**

```
riscv_status riscv_conv_partial_f32 (const float32_t *pSrcA, uint32_t srcALen, const float32_t *pSrcB, uint32_t srcBLen, float32_t *pDst, uint32_t firstIndex, uint32_t numPoints)
```

Partial convolution of floating-point sequences.

## **Parameters**

- pSrcA [in] points to the first input sequence
- **srcALen** [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence
- pDst [out] points to the location where the output result is written
- firstIndex [in] is the first output sample to start with
- numPoints [in] is the number of output points to be computed

# Returns execution status

- RISCV MATH SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: requested subset is not in the range [0 srcALen+srcBLen-2]

```
riscv_status riscv_conv_partial_fast_opt_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst, uint32_t firstIndex, uint32_t numPoints, q15_t *pScratch1, q15_t *pScratch2)
```

Partial convolution of Q15 sequences (fast version).

Partial convolution of Q15 sequences (fast version) for RISC-V Core with DSP enabled.

**Remark** Refer to riscv\_conv\_partial\_q15() for a slower implementation of this function which uses a 64-bit accumulator to avoid wrap around distortion.

## **Parameters**

- pSrcA [in] points to the first input sequence
- **srcALen** [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- srcBLen [in] length of the second input sequence
- pDst [out] points to the location where the output result is written
- firstIndex [in] is the first output sample to start with
- numPoints [in] is the number of output points to be computed
- pScratch1 [in] points to scratch buffer of size max(srcALen, srcBLen) + 2\*min(srcALen, srcBLen) 2
- pScratch2 [in] points to scratch buffer of size min(srcALen, srcBLen)

### Returns execution status

• RISCV\_MATH\_SUCCESS: Operation successful

RISCV\_MATH\_ARGUMENT\_ERROR: requested subset is not in the range [0 srcALen+srcBLen-2]

```
riscv_status riscv_conv_partial_fast_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst, uint32_t firstIndex, uint32_t numPoints)
```

Partial convolution of Q15 sequences (fast version).

Partial convolution of Q15 sequences (fast version) for RISC-V Core with DSP enabled.

**Remark** Refer to riscv\_conv\_partial\_q15() for a slower implementation of this function which uses a 64-bit accumulator to avoid wrap around distortion.

# **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- srcBLen [in] length of the second input sequence
- pDst [out] points to the location where the output result is written
- firstIndex [in] is the first output sample to start with
- numPoints [in] is the number of output points to be computed

# Returns execution status

- RISCV MATH SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: requested subset is not in the range [0 srcALen+srcBLen-2]

```
riscv_status riscv_conv_partial_fast_q31 (const q31_t *pSrcA, uint32_t srcALen, const q31_t *pSrcB, uint32_t srcBLen, q31_t *pDst, uint32_t firstIndex, uint32_t numPoints)
```

Partial convolution of Q31 sequences (fast version).

Partial convolution of Q31 sequences (fast version) for RISC-V Core with DSP enabled.

**Remark** Refer to riscv\_conv\_partial\_q31() for a slower implementation of this function which uses a 64-bit accumulator to provide higher precision.

### **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen [in]** length of the second input sequence
- pDst [out] points to the location where the output result is written
- firstIndex [in] is the first output sample to start with
- numPoints [in] is the number of output points to be computed

### **Returns** execution status

RISCV\_MATH\_SUCCESS: Operation successful

RISCV\_MATH\_ARGUMENT\_ERROR: requested subset is not in the range [0 srcALen+srcBLen-2]

```
riscv_status riscv_conv_partial_opt_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst, uint32_t firstIndex, uint32_t numPoints, q15_t *pScratch1, q15_t *pScratch2)
```

Partial convolution of Q15 sequences.

**Remark** Refer to riscv\_conv\_partial\_fast\_q15() for a faster but less precise version of this function.

#### **Parameters**

- pSrcA [in] points to the first input sequence
- **srcALen** [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- srcBLen [in] length of the second input sequence
- pDst [out] points to the location where the output result is written
- firstIndex [in] is the first output sample to start with
- numPoints [in] is the number of output points to be computed
- pScratch1 [in] points to scratch buffer of size max(srcALen, srcBLen) + 2\*min(srcALen, srcBLen) 2.
- pScratch2 [in] points to scratch buffer of size min(srcALen, srcBLen).

### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: requested subset is not in the range [0 srcALen+srcBLen-2]

```
riscv_status riscv_conv_partial_opt_q7 (const q7_t *pSrcA, uint32_t srcALen, const q7_t *pSrcB, uint32_t srcBLen, q7_t *pDst, uint32_t firstIn-dex, uint32_t numPoints, q15_t *pScratch1, q15_t *pScratch2)
```

Partial convolution of Q7 sequences.

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence
- pDst [out] points to the location where the output result is written
- firstIndex [in] is the first output sample to start with
- numPoints [in] is the number of output points to be computed
- pScratch1 [in] points to scratch buffer(of type q15\_t) of size max(srcALen, srcBLen) + 2\*min(srcALen, srcBLen) 2.
- pScratch2 [in] points to scratch buffer (of type q15\_t) of size min(srcALen, srcBLen).

### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: requested subset is not in the range [0 srcALen+srcBLen-2]

```
riscv_status riscv_conv_partial_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst, uint32_t firstIndex, uint32_t numPoints)
```

Partial convolution of Q15 sequences.

**Remark** Refer to riscv\_conv\_partial\_fast\_q15() for a faster but less precise version of this function.

**Remark** Refer to riscv\_conv\_partial\_opt\_q15() for a faster implementation of this function using scratch buffers.

# **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence
- pDst [out] points to the location where the output result is written
- firstIndex [in] is the first output sample to start with
- numPoints [in] is the number of output points to be computed

# **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: requested subset is not in the range [0 srcALen+srcBLen-2]

```
riscv_status riscv_conv_partial_q31 (const q31_t *pSrcA, uint32_t srcALen, const q31_t *pSrcB, uint32_t srcBLen, q31_t *pDst, uint32_t firstIndex, uint32_t numPoints)
```

Partial convolution of Q31 sequences.

**Remark** Refer to riscv\_conv\_partial\_fast\_q31() for a faster but less precise implementation of this function.

# **Parameters**

- pSrcA [in] points to the first input sequence
- **srcALen** [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence
- pDst [out] points to the location where the output result is written
- firstIndex [in] is the first output sample to start with
- numPoints [in] is the number of output points to be computed

**Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: requested subset is not in the range [0 srcALen+srcBLen-2]

```
riscv_status riscv_conv_partial_q7 (const q7_t *pSrcA, uint32_t srcALen, const q7_t *pSrcB, uint32_t srcBLen, q7_t *pDst, uint32_t firstIndex, uint32_t numPoints)
```

Partial convolution of Q7 sequences.

**Remark** Refer to riscv\_conv\_partial\_opt\_q7() for a faster implementation of this function.

# **Parameters**

- pSrcA [in] points to the first input sequence
- **srcALen** [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- srcBLen [in] length of the second input sequence
- pDst [out] points to the location where the output result is written
- firstIndex [in] is the first output sample to start with
- numPoints [in] is the number of output points to be computed

## **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: requested subset is not in the range [0 srcALen+srcBLen-2]

# Correlation

- void riscv\_correlate\_f32 (const float32\_t \*pSrcA, uint32\_t srcALen, const float32\_t \*pSrcB, uint32\_t srcBLen, float32\_t \*pDst)
- void riscv\_correlate\_fast\_opt\_q15 (const q15\_t \*pSrcA, uint32\_t srcALen, const q15\_t \*pSrcB, uint32\_t srcBLen, q15\_t \*pDst, q15\_t \*pScratch)
- void riscv\_correlate\_fast\_q15 (const q15\_t \*pSrcA, uint32\_t srcALen, const q15\_t \*pSrcB, uint32\_t srcBLen, q15\_t \*pDst)
- void riscv\_correlate\_fast\_q31 (const q31\_t \*pSrcA, uint32\_t srcALen, const q31\_t \*pSrcB, uint32\_t srcBLen, q31\_t \*pDst)
- void riscv\_correlate\_opt\_q15 (const q15\_t \*pSrcA, uint32\_t srcALen, const q15\_t \*pSrcB, uint32\_t srcBLen, q15\_t \*pDst, q15\_t \*pScratch)
- void riscv\_correlate\_opt\_q7 (const q7\_t \*pSrcA, uint32\_t srcALen, const q7\_t \*pSrcB, uint32\_t srcBLen, q7\_t \*pDst, q15\_t \*pScratch1, q15\_t \*pScratch2)
- void riscv\_correlate\_q15 (const q15\_t \*pSrcA, uint32\_t srcALen, const q15\_t \*pSrcB, uint32\_t srcBLen, q15\_t \*pDst)
- void riscv\_correlate\_q31 (const q31\_t \*pSrcA, uint32\_t srcALen, const q31\_t \*pSrcB, uint32\_t srcBLen, q31\_t \*pDst)
- void riscv\_correlate\_q7 (const q7\_t \*pSrcA, uint32\_t srcALen, const q7\_t \*pSrcB, uint32\_t srcBLen, q7\_t \*pDst)

### group Corr

Correlation is a mathematical operation that is similar to convolution. As with convolution, correlation uses two signals to produce a third signal. The underlying algorithms in correlation and convolution are identical except that one of the inputs is flipped in convolution. Correlation is commonly used to measure the similarity between two signals. It has applications in pattern recognition, cryptanalysis, and searching. The NMSIS library provides correlation functions for Q7, Q15, Q31 and floating-point data types. Fast versions of the Q15 and Q31 functions are also provided.

**Note:** The pDst should be initialized to all zeros before being used.

**Algorithm** Let a [n] and b [n] be sequences of length srcALen and srcBLen samples respectively. The convolution of the two signals is denoted by In correlation, one of the signals is flipped in time

$$c[n] = \sum_{k=0}^{srcALen} a[k] b[k-n]$$

and this is mathematically defined as

The pSrcA points to the first input vector of length srcALen and pSrcB points to the second input vector of length srcBLen. The result c[n] is of length 2 \* max(srcALen, srcBLen) - 1 and is defined over the interval n=0, 1, 2, ..., (2 \* max(srcALen, srcBLen) - 2). The output result is written to pDst and the calling function must allocate 2 \* max(srcALen, srcBLen) - 1 words for the result.

**Fixed-Point Behavior** Correlation requires summing up a large number of intermediate products. As such, the Q7, Q15, and Q31 functions run a risk of overflow and saturation. Refer to the function specific documentation below for further details of the particular algorithm used.

**Fast Versions** Fast versions are supported for Q31 and Q15. Cycles for Fast versions are less compared to Q31 and Q15 of correlate and the design requires the input signals should be scaled down to avoid intermediate overflows.

**Opt Versions** Opt versions are supported for Q15 and Q7. Design uses internal scratch buffer for getting good optimisation. These versions are optimised in cycles and consumes more memory (Scratch memory) compared to Q15 and Q7 versions of correlate

# **Functions**

void riscv\_correlate\_f32 (const float32\_t \*pSrcA, uint32\_t srcALen, const float32\_t \*pSrcB, uint32\_t srcBLen, float32\_t \*pDst)

Correlation of floating-point sequences.

# **Parameters**

- pSrcA [in] points to the first input sequence
- **srcALen** [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- srcBLen [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length 2 \* max(srcALen, srcBLen) 1.

```
void riscv_correlate_fast_opt_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst, q15_t *pScratch)
```

Correlation of Q15 sequences (fast version).

Scaling and Overflow Behavior This fast version uses a 32-bit accumulator with 2.30 format. The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit. There is no saturation on intermediate additions. Thus, if the accumulator overflows it wraps around and distorts the result. The input signals should be scaled down to avoid intermediate overflows. Scale down one of the inputs by 1/min(srcALen, srcBLen) to avoid overflow since a maximum of min(srcALen, srcBLen) number of additions is carried internally. The 2.30 accumulator is right shifted by 15 bits and then saturated to 1.15 format to yield the final result.

**Remark** Refer to riscv\_correlate\_q15() for a slower implementation of this function which uses a 64-bit accumulator to avoid wrap around distortion.

## **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence.
- pDst [out] points to the location where the output result is written. Length 2 \* max(srcALen, srcBLen) 1.
- pScratch [in] points to scratch buffer of size max(srcALen, srcBLen) + 2\*min(srcALen, srcBLen) 2.

# Returns none

```
void riscv_correlate_fast_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst)

Correlation of Q15 sequences (fast version).
```

Scaling and Overflow Behavior This fast version uses a 32-bit accumulator with 2.30 format. The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit. There is no saturation on intermediate additions. Thus, if the accumulator overflows it wraps around and distorts the result. The input signals should be scaled down to avoid intermediate overflows. Scale down one of the inputs by 1/min(srcALen, srcBLen) to avoid overflow since a maximum of min(srcALen, srcBLen) number of additions is carried internally. The 2.30 accumulator is right shifted by 15 bits and then saturated to 1.15 format to yield the final result.

**Remark** Refer to riscv\_correlate\_q15() for a slower implementation of this function which uses a 64-bit accumulator to avoid wrap around distortion.

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence

• pDst - [out] points to the location where the output result is written. Length 2 \* max(srcALen, srcBLen) - 1.

#### Returns none

```
void riscv_correlate_fast_q31 (const q31_t *pSrcA, uint32_t srcALen, const q31_t *pSrcB, uint32_t srcBLen, q31_t *pDst)
```

Correlation of Q31 sequences (fast version).

**Scaling and Overflow Behavior** This function is optimized for speed at the expense of fixed-point precision and overflow protection. The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format. These intermediate results are accumulated in a 32-bit register in 2.30 format. Finally, the accumulator is saturated and converted to a 1.31 result.

The fast version has the same overflow behavior as the standard version but provides less precision since it discards the low 32 bits of each multiplication result. In order to avoid overflows completely the input signals must be scaled down. The input signals should be scaled down to avoid intermediate overflows. Scale down one of the inputs by 1/min(srcALen, srcBLen)to avoid overflows since a maximum of min(srcALen, srcBLen) number of additions is carried internally.

**Remark** Refer to riscv\_correlate\_q31() for a slower implementation of this function which uses 64-bit accumulation to provide higher precision.

### **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- srcBLen [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length 2 \* max(srcALen, srcBLen) 1.

### Returns none

```
void riscv_correlate_opt_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst, q15_t *pScratch)

Correlation of Q15 sequences.
```

**Scaling and Overflow Behavior** The function is implemented using a 64-bit internal accumulator. Both inputs are in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. This approach provides 33 guard bits and there is no risk of overflow. The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.

**Remark** Refer to riscv\_correlate\_fast\_q15() for a faster but less precise version of this function.

# **Parameters**

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length 2 \* max(srcALen, srcBLen) 1.

• pScratch - [in] points to scratch buffer of size max(srcALen, srcBLen) + 2\*min(srcALen, srcBLen) - 2.

#### Returns none

Correlation of Q7 sequences.

Scaling and Overflow Behavior The function is implemented using a 32-bit internal accumulator. Both the inputs are represented in 1.7 format and multiplications yield a 2.14 result. The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format. This approach provides 17 guard bits and there is no risk of overflow as long as max(srcAlen, srcBlen) <131072. The 18.14 result is then truncated to 18.7 format by discarding the low 7 bits and then saturated to 1.7 format.

### **Parameters**

- pSrcA [in] points to the first input sequence
- **srcALen** [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen [in]** length of the second input sequence
- pDst [out] points to the location where the output result is written. Length 2 \* max(srcALen, srcBLen) 1.
- pScratch1 [in] points to scratch buffer(of type q15\_t) of size max(srcALen, srcBLen) + 2\*min(srcALen, srcBLen) 2.
- pScratch2 [in] points to scratch buffer (of type q15\_t) of size min(srcALen, srcBLen).

# Returns none

```
void riscv_correlate_q15 (const q15_t *pSrcA, uint32_t srcALen, const q15_t *pSrcB, uint32_t srcBLen, q15_t *pDst)

Correlation of Q15 sequences.
```

**Scaling and Overflow Behavior** The function is implemented using a 64-bit internal accumulator. Both inputs are in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. This approach provides 33 guard bits and there is no risk of overflow. The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.

**Remark** Refer to riscv\_correlate\_fast\_q15() for a faster but less precise version of this function.

**Remark** Refer to riscv\_correlate\_opt\_q15() for a faster implementation of this function using scratch buffers.

- pSrcA [in] points to the first input sequence
- srcALen [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence

• pDst – [out] points to the location where the output result is written. Length 2 \* max(srcALen, srcBLen) - 1.

#### Returns none

```
void riscv_correlate_q31 (const q31_t *pSrcA, uint32_t srcALen, const q31_t *pSrcB, uint32_t srcBLen, q31_t *pDst)

Correlation of Q31 sequences.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. There is no saturation on intermediate additions. Thus, if the accumulator overflows it wraps around and distorts the result. The input signals should be scaled down to avoid intermediate overflows. Scale down one of the inputs by 1/min(srcALen, srcBLen)to avoid overflows since a maximum of min(srcALen, srcBLen) number of additions is carried internally. The 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.

**Remark** Refer to riscv\_correlate\_fast\_q31() for a faster but less precise implementation of this function.

## **Parameters**

- pSrcA [in] points to the first input sequence
- **srcALen** [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- **srcBLen** [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length 2 \* max(srcALen, srcBLen) 1.

# Returns none

```
void riscv_correlate_q7 (const q7_t *pSrcA, uint32_t srcALen, const q7_t *pSrcB, uint32_t srcBLen, q7_t *pDst)

Correlation of Q7 sequences.
```

Scaling and Overflow Behavior The function is implemented using a 32-bit internal accumulator. Both the inputs are represented in 1.7 format and multiplications yield a 2.14 result. The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format. This approach provides 17 guard bits and there is no risk of overflow as long as max(srcAlen, srcBlen) <131072. The 18.14 result is then truncated to 18.7 format by discarding the low 7 bits and saturated to 1.7 format.

**Remark** Refer to riscv\_correlate\_opt\_q7() for a faster implementation of this function.

### **Parameters**

- pSrcA [in] points to the first input sequence
- **srcALen** [in] length of the first input sequence
- pSrcB [in] points to the second input sequence
- srcBLen [in] length of the second input sequence
- pDst [out] points to the location where the output result is written. Length 2 \* max(srcALen, srcBLen) 1.

Returns none

# Finite Impulse Response (FIR) Decimator

void **riscv\_fir\_decimate\_f32** (**const** riscv\_fir\_decimate\_instance\_f32 \*S, **const** float32\_t \*pSrc, float32\_t \*pDst, uint32\_t blockSize)

void riscv\_fir\_decimate\_fast\_q15 (const riscv\_fir\_decimate\_instance\_q15 \*S, const q15\_t \*pSrc, q15\_t \*pDst, uint32\_t blockSize)

void riscv\_fir\_decimate\_fast\_q31 (const riscv\_fir\_decimate\_instance\_q31 \*S, const q31\_t \*pSrc, q31\_t \*pDst, uint32\_t blockSize)

riscv\_status riscv\_fir\_decimate\_init\_f32 (riscv\_fir\_decimate\_instance\_f32 \*S, uint16\_t numTaps, uint8\_t M, const float32\_t \*pCoeffs, float32\_t \*pState, uint32 t blockSize)

riscv\_status riscv\_fir\_decimate\_init\_q15 (riscv\_fir\_decimate\_instance\_q15 \*S, uint16\_t numTaps, uint8\_t M, const q15\_t \*pCoeffs, q15\_t \*pState, uint32\_t blockSize)

riscv\_status riscv\_fir\_decimate\_init\_q31 (riscv\_fir\_decimate\_instance\_q31 \*S, uint16\_t numTaps, uint8\_t M, const q31\_t \*pCoeffs, q31\_t \*pState, uint32 t blockSize)

void riscv\_fir\_decimate\_q15 (const riscv\_fir\_decimate\_instance\_q15 \*S, const q15\_t \*pSrc, q15\_t \*pDst, uint32\_t blockSize)

void riscv\_fir\_decimate\_q31 (const riscv\_fir\_decimate\_instance\_q31 \*S, const q31\_t \*pSrc, q31\_t \*pDst, uint32\_t blockSize)

# group FIR\_decimate

These functions combine an FIR filter together with a decimator. They are used in multirate systems for reducing the sample rate of a signal without introducing aliasing distortion. Conceptually, the functions are equivalent to the block diagram below:

When decimating by a factor of M, the signal should be prefiltered by a lowpass filter with a normalized cutoff frequency of 1/M in order to prevent aliasing distortion. The user of the function is responsible for providing



the filter coefficients.

The FIR decimator functions provided in the NMSIS DSP Library combine the FIR filter and the decimator in an efficient manner. Instead of calculating all of the FIR filter outputs and discarding M-1 out of every M, only the samples output by the decimator are computed. The functions operate on blocks of input and output data. pSrc points to an array of blockSize input values and pDst points to an array of blockSize/M output values. In order to have an integer number of output samples blockSize must always be a multiple of the decimation factor M.

The library provides separate functions for Q15, Q31 and floating-point data types.

**Algorithm:** The FIR portion of the algorithm uses the standard form filter: where, b[n] are the filter coefficients.

The pCoeffs points to a coefficient array of size numTaps. Coefficients are stored in time reversed order.

pState points to a state array of size numTaps + blockSize - 1. Samples in the state buffer are stored in the order:

The state variables are updated after each block of data is processed, the coefficients are untouched.

**Instance Structure** The coefficients and state variables for a filter are stored together in an instance data structure. A separate instance structure must be defined for each filter. Coefficient arrays may be shared among several instances while state variable array should be allocated separately. There are separate instance structure declarations for each of the 3 supported data types.

**Initialization Functions** There is also an associated initialization function for each data type. The initialization function performs the following operations:

- Sets the values of the internal structure fields.
- Zeros out the values in the state buffer.
- Checks to make sure that the size of the input is a multiple of the decimation factor. To do this manually without calling the init function, assign the follow subfields of the instance structure: numTaps, pCoeffs, M (decimation factor), pState. Also set all of the values in pState to zero.

Use of the initialization function is optional. However, if the initialization function is used, then the instance structure cannot be placed into a const data section. To place an instance structure into a const data section, the instance structure must be manually initialized. The code below statically initializes each of the 3 different data type filter instance structures where M is the decimation factor; numTaps is the number of filter coefficients in the filter; pCoeffs is the address of the coefficient buffer; pState is the address of the state buffer. Be sure to set the values in the state buffer to zeros when doing static initialization.

**Fixed-Point Behavior** Care must be taken when using the fixed-point versions of the FIR decimate filter functions. In particular, the overflow and saturation behavior of the accumulator used in each function must be considered. Refer to the function specific documentation below for usage guidelines.

# **Functions**

```
void riscv_fir_decimate_f32 (const riscv_fir_decimate_instance_f32 *S, const float32_t *pSrc, float32_t *pDst, uint32_t blockSize)

Processing function for floating-point FIR decimator.
```

# **Parameters**

- **S** [in] points to an instance of the floating-point FIR decimator structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

## **Returns** none

```
void riscv_fir_decimate_fast_q15 (const riscv_fir_decimate_instance_q15 *S, const q15_t *pSrc, q15_t *pDst, uint32_t blockSize)

Processing function for the Q15 FIR decimator (fast variant).
```

Processing function for the Q15 FIR decimator (fast variant) for RISC-V Core with DSP enabled.

Scaling and Overflow Behavior This fast version uses a 32-bit accumulator with 2.30 format. The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around and distorts the result. In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits (log2 is read as log to the base 2). The 2.30 accumulator is then truncated to 2.15 format and saturated to yield the 1.15 result.

**Remark** Refer to riscv\_fir\_decimate\_q15() for a slower implementation of this function which uses 64-bit accumulation to avoid wrap around distortion. Both the slow and the fast versions use the same instance structure. Use function riscv\_fir\_decimate\_init\_q15() to initialize the filter structure.

#### **Parameters**

- **S** [in] points to an instance of the Q15 FIR decimator structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of input samples to process per call

### Returns none

```
void riscv_fir_decimate_fast_q31 (const riscv_fir_decimate_instance_q31 *S, const q31_t *pSrc, q31_t *pDst, uint32_t blockSize)

Processing function for the Q31 FIR decimator (fast variant).
```

Processing function for the Q31 FIR decimator (fast variant) for RISC-V Core with DSP enabled.

Scaling and Overflow Behavior This function is optimized for speed at the expense of fixed-point precision and overflow protection. The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format. These intermediate results are added to a 2.30 accumulator. Finally, the accumulator is saturated and converted to a 1.31 result. The fast version has the same overflow behavior as the standard version and provides less precision since it discards the low 32 bits of each multiplication result. In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits (where log2 is read as log to the base 2).

**Remark** Refer to riscv\_fir\_decimate\_q31() for a slower implementation of this function which uses a 64-bit accumulator to provide higher precision. Both the slow and the fast versions use the same instance structure. Use function riscv\_fir\_decimate\_init\_q31() to initialize the filter structure.

# **Parameters**

- S [in] points to an instance of the Q31 FIR decimator structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

# Returns none

```
riscv_status riscv_fir_decimate_init_f32 (riscv_fir_decimate_instance_f32 *S, uint16_t num-
Taps, uint8_t M, const float32_t *pCoeffs,
float32_t *pState, uint32_t blockSize)

Initialization function for the floating-point FIR decimator.
```

**Details** pCoeffs points to the array of filter coefficients stored in time reversed order:

pState points to the array of state variables. pState is of length numTaps+blockSize-1 words where blockSize is the number of input samples passed to riscv\_fir\_decimate\_f32(). M is the decimation factor.

- S [inout] points to an instance of the floating-point FIR decimator structure
- numTaps [in] number of coefficients in the filter

- **M** [in] decimation factor
- pCoeffs [in] points to the filter coefficients
- pState [in] points to the state buffer
- blockSize [in] number of input samples to process per call

#### **Returns** execution status

- RISCV MATH SUCCESS: Operation successful
- RISCV\_MATH\_LENGTH\_ERROR: blockSize is not a multiple of M

```
riscv_status riscv_fir_decimate_init_q15 (riscv_fir_decimate_instance_q15 *S, uint16_t num-
Taps, uint8_t M, const q15_t *pCoeffs, q15_t *pState, uint32_t blockSize)
```

Initialization function for the Q15 FIR decimator.

**Details** pCoeffs points to the array of filter coefficients stored in time reversed order:

pState points to the array of state variables. pState is of length numTaps+blockSize-1 words where blockSize is the number of input samples to the call riscv\_fir\_decimate\_q15(). M is the decimation factor.

#### **Parameters**

- **S** [inout] points to an instance of the Q15 FIR decimator structure
- numTaps [in] number of coefficients in the filter
- **M [in]** decimation factor
- pCoeffs [in] points to the filter coefficients
- pState [in] points to the state buffer
- blockSize [in] number of input samples to process

# **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_LENGTH\_ERROR: blockSize is not a multiple of M

```
riscv_status riscv_fir_decimate_init_q31 (riscv_fir_decimate_instance_q31 *S, uint16_t num-
Taps, uint8_t M, const q31_t *pCoeffs, q31_t *pState, uint32_t blockSize)
```

Initialization function for the Q31 FIR decimator.

Details pCoeffs points to the array of filter coefficients stored in time reversed order:

pState points to the array of state variables. pState is of length numTaps+blockSize-1 words where blockSize is the number of input samples passed to riscv\_fir\_decimate\_q31(). M is the decimation factor.

### **Parameters**

- S [inout] points to an instance of the Q31 FIR decimator structure
- numTaps [in] number of coefficients in the filter
- M [in] decimation factor
- pCoeffs [in] points to the filter coefficients

- pState [in] points to the state buffer
- blockSize [in] number of input samples to process

Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV MATH LENGTH ERROR: blockSize is not a multiple of M

```
void riscv_fir_decimate_q15 (const riscv_fir_decimate_instance_q15 *S, const q15_t *pSrc, q15_t *pDst, uint32_t blockSize)
```

Processing function for the Q15 FIR decimator.

Scaling and Overflow Behavior The function is implemented using a 64-bit internal accumulator. Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved. After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits. Lastly, the accumulator is saturated to yield a result in 1.15 format.

**Remark** Refer to riscv\_fir\_decimate\_fast\_q15() for a faster but less precise implementation of this function.

#### **Parameters**

- S [in] points to an instance of the Q15 FIR decimator structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of input samples to process per call

Returns none

```
void riscv_fir_decimate_q31 (const riscv_fir_decimate_instance_q31 *S, const q31_t *pSrc, q31_t *pDst, uint32_t blockSize)

Processing function for the Q31 FIR decimator.
```

**Scaling and Overflow Behavior** The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around rather than clip. In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits (where log2 is read as log to the base 2). After all multiply-accumulates are performed, the 2.62 accumulator is truncated to 1.32 format and then saturated to 1.31 format.

**Remark** Refer to riscv\_fir\_decimate\_fast\_q31() for a faster but less precise implementation of this function.

# **Parameters**

- **S** [in] points to an instance of the Q31 FIR decimator structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

Returns none

# Finite Impulse Response (FIR) Filters

```
void riscv_fir_f32 (const riscv_fir_instance_f32 *S, const float32_t *pSrc, float32_t *pDst, uint32_t blockSize)
```

void riscv\_fir\_fast\_q15 (const riscv\_fir\_instance\_q15 \*S, const q15\_t \*pSrc, q15\_t \*pDst, uint32\_t blockSize)

IAR\_ONLY\_LOW\_OPTIMIZATION\_ENTER void riscv\_fir\_fast\_q31 (const riscv\_fir\_instance\_q31 \*S,

void **riscv\_fir\_init\_f32** (riscv\_fir\_instance\_f32 \*S, uint16\_t numTaps, **const** float32\_t \*pCoeffs, float32\_t \*pState, uint32\_t blockSize)

riscv\_status riscv\_fir\_init\_q15 (riscv\_fir\_instance\_q15 \*S, uint16\_t numTaps, const q15\_t \*pCoeffs, q15\_t \*pState, uint32\_t blockSize)

void **riscv\_fir\_init\_q31** (riscv\_fir\_instance\_q31 \*S, uint16\_t numTaps, **const** q31\_t \*pCoeffs, q31\_t \*pState, uint32\_t blockSize)

void riscv\_fir\_init\_q7 (riscv\_fir\_instance\_q7 \*S, uint16\_t numTaps, const q7\_t \*pCoeffs, q7\_t \*pState, uint32 t blockSize)

void **riscv\_fir\_q15** (**const** riscv\_fir\_instance\_q15 \*S, **const** q15\_t \*pSrc, q15\_t \*pDst, uint32\_t block-Size)

void **riscv\_fir\_q31** (**const** riscv\_fir\_instance\_q31 \*S, **const** q31\_t \*pSrc, q31\_t \*pDst, uint32\_t block-Size)

 $void\ \mathbf{riscv\_fir\_q7}\ (\mathbf{const}\ riscv\_fir\_instance\_q7\ *S,\ \mathbf{const}\ q7\_t\ *pSrc,\ q7\_t\ *pDst,\ uint32\_t\ blockSize)$ 

# group FIR

This set of functions implements Finite Impulse Response (FIR) filters for Q7, Q15, Q31, and floating-point data types. Fast versions of Q15 and Q31 are also provided. The functions operate on blocks of input and output data and each call to the function processes blockSize samples through the filter. pSrc and pDst points to input and output arrays containing blockSize values.

The array length L must be a multiple of x. L = x \* a:

- x is 4 for f32
- x is 4 for q31
- x is 4 for f16 (so managed like the f32 version and not like the q15 one)
- x is 8 for q15
- x is 16 for q7

**Algorithm** The FIR filter algorithm is based upon a sequence of multiply-accumulate (MAC) operations. Each filter coefficient b[n] is multiplied by a state variable which equals a previous input sample x[n].



pCoeffs points to a coefficient array of size numTaps. Coefficients are stored in time reversed order.

pState points to a state array of size numTaps + blockSize - 1. Samples in the state buffer are stored in the following order.

Note that the length of the state buffer exceeds the length of the coefficient array by blockSize-1. The increased state buffer length allows circular addressing, which is traditionally used in the FIR filters, to be avoided and yields a significant speed improvement. The state variables are updated after each block of data is processed; the coefficients are untouched.

**Instance Structure** The coefficients and state variables for a filter are stored together in an instance data structure. A separate instance structure must be defined for each filter. Coefficient arrays may be shared among several instances while state variable arrays cannot be shared. There are separate instance structure declarations for each of the 4 supported data types.

**Initialization Functions** There is also an associated initialization function for each data type. The initialization function performs the following operations:

- Sets the values of the internal structure fields.
- Zeros out the values in the state buffer. To do this manually without calling the init function, assign the follow subfields of the instance structure: numTaps, pCoeffs, pState. Also set all of the values in pState to zero.

Use of the initialization function is optional. However, if the initialization function is used, then the instance structure cannot be placed into a const data section. To place an instance structure into a const data section, the instance structure must be manually initialized. Set the values in the state buffer to zeros before static initialization. The code below statically initializes each of the 4 different data type filter instance structures where numTaps is the number of filter coefficients in the filter; pState is the address of the state buffer; pCoeffs is the address of the coefficient buffer.

**Initialization of Helium version** For Helium version the array of coefficients must be padded with zero to contain a full number of lanes.

The additional coefficients (x \* a - numTaps) must be set to 0. numTaps is still set to its right value in the init function. It means that the implementation may require to read more coefficients due to the vectorization and to avoid having to manage too many different cases in the code.

**Helium state buffer** The state buffer must contain some additional temporary data used during the computation but which is not the state of the FIR. The first A samples are temporary data. The remaining samples are the state of the FIR filter.

So the state buffer has size numTaps + A + blockSize - 1:

- A is blockSize for f32
- A is 8\*ceil(blockSize/8) for f16
- A is 8\*ceil(blockSize/4) for q31
- A is 0 for other datatypes (q15 and q7)

**Fixed-Point Behavior** Care must be taken when using the fixed-point versions of the FIR filter functions. In particular, the overflow and saturation behavior of the accumulator used in each function must be considered. Refer to the function specific documentation below for usage guidelines.

## **Functions**

```
void riscv_fir_f32 (const riscv_fir_instance_f32 *S, const float32_t *pSrc, float32_t *pDst, uint32_t blockSize)

Processing function for floating-point FIR filter.
```

Processing function for the floating-point FIR filter.

# **Parameters**

- S [in] points to an instance of the floating-point FIR filter structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

Returns none

```
void riscv_fir_fast_q15 (const riscv_fir_instance_q15 *S, const q15_t *pSrc, q15_t *pDst, uint32_t blockSize)

Processing function for the Q15 FIR filter (fast version).
```

Processing function for the fast Q15 FIR filter (fast version).

**Scaling and Overflow Behavior** This fast version uses a 32-bit accumulator with 2.30 format. The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around and distorts the result. In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits. The 2.30 accumulator is then truncated to 2.15 format and saturated to yield the 1.15 result.

**Remark** Refer to riscv\_fir\_q15() for a slower implementation of this function which uses 64-bit accumulation to avoid wrap around distortion. Both the slow and the fast versions use the same instance structure. Use function riscv\_fir\_init\_q15() to initialize the filter structure.

# **Parameters**

- **S** [in] points to an instance of the Q15 FIR filter structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

Processing function for the fast Q31 FIR filter (fast version).

Scaling and Overflow Behavior This function is optimized for speed at the expense of fixed-point precision and overflow protection. The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format. These intermediate results are added to a 2.30 accumulator. Finally, the accumulator is saturated and converted to a 1.31 result. The fast version has the same overflow behavior as the standard version and provides less precision since it discards the low 32 bits of each multiplication result. In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits.

**Remark** Refer to riscv\_fir\_q31() for a slower implementation of this function which uses a 64-bit accumulator to provide higher precision. Both the slow and the fast versions use the same instance structure. Use function riscv\_fir\_init\_q31() to initialize the filter structure.

## **Parameters**

- **S** [in] points to an instance of the Q31 structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

# Returns none

void **riscv\_fir\_init\_f32** (riscv\_fir\_instance\_f32 \*S, uint16\_t numTaps, **const** float32\_t \*pCoeffs, float32\_t \*pState, uint32\_t blockSize)

Initialization function for the floating-point FIR filter.

**Details** pCoeffs points to the array of filter coefficients stored in time reversed order:

pState points to the array of state variables and some working memory for the Helium version. pState is of length numTaps+blockSize-1 samples (except for Helium - see below), where blockSize is the number of input samples processed by each call to riscv\_fir\_f32().

**Initialization of Helium version** For Helium version the array of coefficients must be a multiple of 4 (4a) even if less then 4a coefficients are defined in the FIR. The additional coefficients (4a - numTaps) must be set to 0. numTaps is still set to its right value in the init function. It means that the implementation may require to read more coefficients due to the vectorization and to avoid having to manage too many different cases in the code.

Helium state buffer The state buffer must contain some additional temporary data used during the computation but which is not the state of the FIR. The first blockSize samples are temporary data. The remaining samples are the state of the FIR filter. So the state buffer has size numTaps + 2 \* blockSize - 1

- S [inout] points to an instance of the floating-point FIR filter structure
- numTaps [in] number of filter coefficients in the filter
- pCoeffs [in] points to the filter coefficients buffer
- pState [in] points to the state buffer
- blockSize [in] number of samples processed per call

```
riscv_status riscv_fir_init_q15 (riscv_fir_instance_q15 *S, uint16_t numTaps, const q15_t *pCoeffs, q15_t *pState, uint32_t blockSize)

Initialization function for the Q15 FIR filter.
```

Details pCoeffs points to the array of filter coefficients stored in time reversed order: Note that numTaps must be even and greater than or equal to 4. To implement an odd length filter simply increase numTaps by 1 and set the last coefficient to zero. For example, to implement a filter with numTaps=3 and coefficients set numTaps=4 and use the coefficients: Similarly, to implement a two point filter set numTaps=4 and use the coefficients: pState points to the array of state variables. pState is of length numTaps+blockSize, when running on RISC-V Core with DSP enabled and is of length numTaps+blockSize-1, when running on RISC-V Core without DSP where blockSize is the number of input samples processed by each call to riscv fir q15().

**Initialization of Helium version** For Helium version the array of coefficients must be a multiple of 8 (8a) even if less then 8a coefficients are defined in the FIR. The additional coefficients (8a - numTaps) must be set to 0. numTaps is still set to its right value in the init function. It means that the implementation may require to read more coefficients due to the vectorization and to avoid having to manage too many different cases in the code.

#### **Parameters**

- **S** [inout] points to an instance of the Q15 FIR filter structure.
- numTaps [in] number of filter coefficients in the filter. Must be even and greater than or equal to 4.
- pCoeffs [in] points to the filter coefficients buffer.
- pState [in] points to the state buffer.
- blockSize [in] number of samples processed per call.

# **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: numTaps is not greater than or equal to 4 and even

```
void riscv_fir_init_q31 (riscv_fir_instance_q31 *S, uint16_t numTaps, const q31_t *pCoeffs, q31_t *pState, uint32_t blockSize)

Initialization function for the Q31 FIR filter.
```

**Details** pCoeffs points to the array of filter coefficients stored in time reversed order: pState points to the array of state variables. pState is of length numTaps+blockSize-1 samples (except for Helium - see below), where blockSize is the number of input samples processed by each call to riscv\_fir\_q31().

**Initialization of Helium version** For Helium version the array of coefficients must be a multiple of 4 (4a) even if less then 4a coefficients are defined in the FIR. The additional coefficients (4a - numTaps) must be set to 0. numTaps is still set to its right value in the init function. It means that the implementation may require to read more coefficients due to the vectorization and to avoid having to manage too many different cases in the code.

Helium state buffer The state buffer must contain some additional temporary data used during the computation but which is not the state of the FIR. The first 2\*4\*ceil(blockSize/4) samples are temporary data. The remaining samples are the state of the FIR filter. So the state buffer has size numTaps + 8\*ceil(blockSize/4) + blockSize - 1

## **Parameters**

- S [inout] points to an instance of the Q31 FIR filter structure
- numTaps [in] number of filter coefficients in the filter
- pCoeffs [in] points to the filter coefficients buffer
- pState [in] points to the state buffer
- blockSize [in] number of samples processed

## Returns none

```
void riscv_fir_init_q7 (riscv_fir_instance_q7 *S, uint16_t numTaps, const q7_t *pCoeffs, q7_t *pState, uint32_t blockSize)

Initialization function for the Q7 FIR filter.
```

**Details** pCoeffs points to the array of filter coefficients stored in time reversed order:

pState points to the array of state variables. pState is of length numTaps+blockSize-1 samples, where blockSize is the number of input samples processed by each call to riscv\_fir\_q7().

**Initialization of Helium version** For Helium version the array of coefficients must be a multiple of 16 (16a) even if less then 16a coefficients are defined in the FIR. The additional coefficients (16a - numTaps) must be set to 0. numTaps is still set to its right value in the init function. It means that the implementation may require to read more coefficients due to the vectorization and to avoid having to manage too many different cases in the code.

#### **Parameters**

- S [inout] points to an instance of the Q7 FIR filter structure
- numTaps [in] number of filter coefficients in the filter
- pCoeffs [in] points to the filter coefficients buffer
- pState [in] points to the state buffer
- blockSize [in] number of samples processed

# Returns none

```
void riscv_fir_q15 (const riscv_fir_instance_q15 *S, const q15_t *pSrc, q15_t *pDst, uint32_t blockSize)

Processing function for the Q15 FIR filter.
```

Scaling and Overflow Behavior The function is implemented using a 64-bit internal accumulator. Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved. After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits. Lastly, the accumulator is saturated to yield a result in 1.15 format.

**Remark** Refer to riscv\_fir\_fast\_q15() for a faster but less precise implementation of this function.

- S [in] points to an instance of the Q15 FIR filter structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data

• blockSize - [in] number of samples to process

Returns none

```
void riscv_fir_q31 (const riscv_fir_instance_q31 *S, const q31_t *pSrc, q31_t *pDst, uint32_t blockSize)
```

Processing function for Q31 FIR filter.

Processing function for the Q31 FIR filter.

**Scaling and Overflow Behavior** The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around rather than clip. In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits. After all multiply-accumulates are performed, the 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.

**Remark** Refer to riscy fir fast q31() for a faster but less precise implementation of this filter.

### **Parameters**

- **S** [in] points to an instance of the Q31 FIR filter structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

Returns none

```
void riscv_fir_q7 (const riscv_fir_instance_q7 *S, const q7_t *pSrc, q7_t *pDst, uint32_t block-
```

Processing function for Q7 FIR filter.

Processing function for the Q7 FIR filter.

**Scaling and Overflow Behavior** The function is implemented using a 32-bit internal accumulator. Both coefficients and state variables are represented in 1.7 format and multiplications yield a 2.14 result. The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format. There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved. The accumulator is converted to 18.7 format by discarding the low 7 bits. Finally, the result is truncated to 1.7 format.

# **Parameters**

- S [in] points to an instance of the Q7 FIR filter structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

Returns none

# Finite Impulse Response (FIR) Lattice Filters

void riscv\_fir\_lattice\_f32 (const riscv\_fir\_lattice\_instance\_f32 \*S, const float32\_t \*pSrc, float32\_t \*pDst, uint32\_t blockSize)

void **riscv\_fir\_lattice\_init\_f32** (riscv\_fir\_lattice\_instance\_f32 \*S, uint16\_t numStages, **const** float32\_t \*pCoeffs, float32\_t \*pState)

void **riscv\_fir\_lattice\_init\_q15** (riscv\_fir\_lattice\_instance\_q15 \*S, uint16\_t numStages, **const** q15\_t \*pCoeffs, q15\_t \*pState)

void **riscv\_fir\_lattice\_init\_q31** (riscv\_fir\_lattice\_instance\_q31 \*S, uint16\_t numStages, **const** q31\_t \*pCoeffs, q31\_t \*pState)

void riscv\_fir\_lattice\_q15 (const riscv\_fir\_lattice\_instance\_q15 \*S, const q15\_t \*pSrc, q15\_t \*pDst, uint32\_t blockSize)

void riscv\_fir\_lattice\_q31 (const riscv\_fir\_lattice\_instance\_q31 \*S, const q31\_t \*pSrc, q31\_t \*pDst, uint32 t blockSize)

## group FIR\_Lattice

This set of functions implements Finite Impulse Response (FIR) lattice filters for Q15, Q31 and floating-point data types. Lattice filters are used in a variety of adaptive filter applications. The filter structure is feedforward and the net impulse response is finite length. The functions operate on blocks of input and output data and each call to the function processes blockSize samples through the filter. pSrc and pDst point to input and output arrays containing blockSize values.

# Algorithm



The following difference equation is implemented:

pCoeffs points to the array of reflection coefficients of size numStages. Reflection Coefficients are stored in the following order.

where M is number of stages

pState points to a state array of size numStages. The state variables (g values) hold previous inputs and are stored in the following order. The state variables are updated after each block of data is processed; the coefficients are untouched.

**Instance Structure** The coefficients and state variables for a filter are stored together in an instance data structure. A separate instance structure must be defined for each filter. Coefficient arrays may be shared among several instances while state variable arrays cannot be shared. There are separate instance structure declarations for each of the 3 supported data types.

**Initialization Functions** There is also an associated initialization function for each data type. The initialization function performs the following operations:

- Sets the values of the internal structure fields.
- Zeros out the values in the state buffer. To do this manually without calling the init function, assign the follow subfields of the instance structure: numStages, pCoeffs, pState. Also set all of the values in pState to zero.

Use of the initialization function is optional. However, if the initialization function is used, then the instance structure cannot be placed into a const data section. To place an instance structure into a const data section, the instance structure must be manually initialized. Set the values in the state buffer to zeros and then manually initialize the instance structure as follows:

where numStages is the number of stages in the filter; pState is the address of the state buffer; pCoeffs is the address of the coefficient buffer.

**Fixed-Point Behavior** Care must be taken when using the fixed-point versions of the FIR Lattice filter functions. In particular, the overflow and saturation behavior of the accumulator used in each function must be considered. Refer to the function specific documentation below for usage guidelines.

#### **Functions**

```
void riscv_fir_lattice_f32 (const riscv_fir_lattice_instance_f32 *S, const float32_t *pSrc, float32_t *pDst, uint32_t blockSize)

Processing function for the floating-point FIR lattice filter.
```

# **Parameters**

- S [in] points to an instance of the floating-point FIR lattice structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

#### Returns none

```
void riscv_fir_lattice_init_f32 (riscv_fir_lattice_instance_f32 *S, uint16_t numStages, const float32_t *pCoeffs, float32_t *pState)
Initialization function for the floating-point FIR lattice filter.
```

# **Parameters**

- **S** [in] points to an instance of the floating-point FIR lattice structure
- numStages [in] number of filter stages
- pCoeffs [in] points to the coefficient buffer. The array is of length numStages
- pState [in] points to the state buffer. The array is of length numStages

# Returns none

```
void riscv_fir_lattice_init_q15 (riscv_fir_lattice_instance_q15 *S, uint16_t numStages, const q15_t *pCoeffs, q15_t *pState)
Initialization function for the Q15 FIR lattice filter.
```

# **Parameters**

- $\mathbf{S} [\mathbf{in}]$  points to an instance of the Q15 FIR lattice structure
- numStages [in] number of filter stages
- pCoeffs [in] points to the coefficient buffer. The array is of length numStages
- pState [in] points to the state buffer. The array is of length numStages

# Returns none

```
void {\tt riscv\_fir\_lattice\_init\_q31} (riscv_fir_lattice_instance_q31 *S, uint16_t numStages, const q31_t *pCoeffs, q31_t *pState) Initialization function for the Q31 FIR lattice filter.
```

## **Parameters**

- S [in] points to an instance of the Q31 FIR lattice structure
- numStages [in] number of filter stages
- pCoeffs [in] points to the coefficient buffer. The array is of length numStages
- pState [in] points to the state buffer. The array is of length numStages

### Returns none

```
void riscv_fir_lattice_q15 (const riscv_fir_lattice_instance_q15 *S, const q15_t *pSrc, q15_t *pDst, uint32_t blockSize)
```

Processing function for Q15 FIR lattice filter.

Processing function for the Q15 FIR lattice filter.

#### **Parameters**

- **S** [in] points to an instance of the Q15 FIR lattice structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

### Returns none

```
void riscv_fir_lattice_q31 (const riscv_fir_lattice_instance_q31 *S, const q31_t *pSrc, q31_t *pDst, uint32_t blockSize)
```

Processing function for the Q31 FIR lattice filter.

**Scaling and Overflow Behavior** In order to avoid overflows the input signal must be scaled down by 2\*log2(numStages) bits.

# **Parameters**

- S [in] points to an instance of the Q31 FIR lattice structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

# Returns none

# Finite Impulse Response (FIR) Sparse Filters

```
void riscv_fir_sparse_f32 (riscv_fir_sparse_instance_f32 *S, const float32_t *pSrc, float32_t *pDst, float32_t *pScratchIn, uint32_t blockSize)

void riscv_fir_sparse_init_f32 (riscv_fir_sparse_instance_f32 *S, uint16_t numTaps, const float32_t *pCoeffs, float32_t *pState, int32_t *pTapDelay, uint16_t maxDelay, uint32_t blockSize)

void riscv_fir_sparse_init_q15 (riscv_fir_sparse_instance_q15 *S, uint16_t numTaps, const q15_t *pCoeffs, q15_t *pState, int32_t *pTapDelay, uint16_t maxDelay, uint32_t blockSize)

void riscv_fir_sparse_init_q31 (riscv_fir_sparse_instance_q31 *S, uint16_t numTaps, const q31_t *pCoeffs, q31_t *pState, int32_t *pTapDelay, uint16_t maxDelay, uin
```

uint32\_t blockSize)

void riscv\_fir\_sparse\_init\_q7 (riscv\_fir\_sparse\_instance\_q7 \*S, uint16\_t numTaps, const q7\_t \*pCoeffs, q7\_t \*pState, int32\_t \*pTapDelay, uint16\_t maxDelay, uint32\_t blockSize)

void **riscv\_fir\_sparse\_q15** (riscv\_fir\_sparse\_instance\_q15 \*S, **const** q15\_t \*pSrc, q15\_t \*pDst, q15\_t \*pScratchIn, q31\_t \*pScratchOut, uint32\_t blockSize)

void riscv\_fir\_sparse\_q31 (riscv\_fir\_sparse\_instance\_q31 \*S, const q31\_t \*pSrc, q31\_t \*pDst, q31\_t \*pScratchIn, uint32\_t blockSize)

void **riscv\_fir\_sparse\_q7** (riscv\_fir\_sparse\_instance\_q7 \*S, **const** q7\_t \*pSrc, q7\_t \*pDst, q7\_t \*pScratchIn, q31\_t \*pScratchOut, uint32\_t blockSize)

### group FIR Sparse

This group of functions implements sparse FIR filters. Sparse FIR filters are equivalent to standard FIR filters except that most of the coefficients are equal to zero. Sparse filters are used for simulating reflections in communications and audio applications.

There are separate functions for Q7, Q15, Q31, and floating-point data types. The functions operate on blocks of input and output data and each call to the function processes blockSize samples through the filter. pSrc and pDst points to input and output arrays respectively containing blockSize values.

Algorithm The sparse filter instant structure contains an array of tap indices pTapDelay which specifies the locations of the non-zero coefficients. This is in addition to the coefficient array b. The implementation essentially skips the multiplications by zero and leads to an efficient realization.



pCoeffs points to a coefficient array of size numTaps; pTapDelay points to an array of nonzero indices and is also of size numTaps; pState points to a state array of size maxDelay + blockSize, where maxDelay is the largest offset value that is ever used in the pTapDelay array. Some of the processing functions also require temporary working buffers.

**Instance Structure** The coefficients and state variables for a filter are stored together in an instance data structure. A separate instance structure must be defined for each filter. Coefficient and offset arrays may be shared among several instances while state variable arrays cannot be shared. There are separate instance structure declarations for each of the 4 supported data types.

**Initialization Functions** There is also an associated initialization function for each data type. The initialization function performs the following operations:

• Sets the values of the internal structure fields.

• Zeros out the values in the state buffer. To do this manually without calling the init function, assign the follow subfields of the instance structure: numTaps, pCoeffs, pTapDelay, maxDelay, stateIndex, pState. Also set all of the values in pState to zero.

Use of the initialization function is optional. However, if the initialization function is used, then the instance structure cannot be placed into a const data section. To place an instance structure into a const data section, the instance structure must be manually initialized. Set the values in the state buffer to zeros before static initialization. The code below statically initializes each of the 4 different data type filter instance structures

**Fixed-Point Behavior** Care must be taken when using the fixed-point versions of the sparse FIR filter functions. In particular, the overflow and saturation behavior of the accumulator used in each function must be considered. Refer to the function specific documentation below for usage guidelines.

## **Functions**

```
void riscv_fir_sparse_f32 (riscv_fir_sparse_instance_f32 *S, const float32_t *pSrc, float32_t *pDst, float32_t *pScratchIn, uint32_t blockSize)

Processing function for the floating-point sparse FIR filter.
```

### **Parameters**

- S [in] points to an instance of the floating-point sparse FIR structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- pScratchIn [in] points to a temporary buffer of size blockSize
- blockSize [in] number of input samples to process

# Returns none

```
void riscv_fir_sparse_init_f32 (riscv_fir_sparse_instance_f32 *S, uint16_t numTaps, const float32_t *pCoeffs, float32_t *pState, int32_t *pTapDelay, uint16_t maxDelay, uint32_t blockSize)

Initialization function for the floating-point sparse FIR filter.
```

Details pCoeffs holds the filter coefficients and has length numTaps. pState holds the filter's state variables and must be of length maxDelay + blockSize, where maxDelay is the maximum number of delay line values. blockSize is the number of samples processed by the riscv\_fir\_sparse\_f32() function.

## **Parameters**

- S [inout] points to an instance of the floating-point sparse FIR structure
- numTaps [in] number of nonzero coefficients in the filter
- pCoeffs [in] points to the array of filter coefficients
- pState [in] points to the state buffer
- pTapDelay [in] points to the array of offset times
- maxDelay [in] maximum offset time supported
- blockSize [in] number of samples that will be processed per block

# Returns none

```
void riscv_fir_sparse_init_q15 (riscv_fir_sparse_instance_q15 *S, uint16_t numTaps, const q15_t *pCoeffs, q15_t *pState, int32_t *pTapDelay, uint16_t maxDelay, uint32_t blockSize)
```

Initialization function for the Q15 sparse FIR filter.

**Details** pCoeffs holds the filter coefficients and has length numTaps. pState holds the filter's state variables and must be of length maxDelay + blockSize, where maxDelay is the maximum number of delay line values. blockSize is the number of words processed by riscy fir sparse q15() function.

#### **Parameters**

- S [inout] points to an instance of the Q15 sparse FIR structure
- numTaps [in] number of nonzero coefficients in the filter
- pCoeffs [in] points to the array of filter coefficients
- pState [in] points to the state buffer
- pTapDelay [in] points to the array of offset times
- maxDelay [in] maximum offset time supported
- blockSize [in] number of samples that will be processed per block

#### Returns none

```
void riscv_fir_sparse_init_q31 (riscv_fir_sparse_instance_q31 *S, uint16_t numTaps, const q31_t *pCoeffs, q31_t *pState, int32_t *pTapDelay, uint16_t maxDelay, uint32_t blockSize)
```

Initialization function for the Q31 sparse FIR filter.

Details pCoeffs holds the filter coefficients and has length numTaps. pState holds the filter's state variables and must be of length maxDelay + blockSize, where maxDelay is the maximum number of delay line values. blockSize is the number of words processed by riscv\_fir\_sparse\_q31() function.

### **Parameters**

- S [inout] points to an instance of the Q31 sparse FIR structure
- numTaps [in] number of nonzero coefficients in the filter
- pCoeffs [in] points to the array of filter coefficients
- pState [in] points to the state buffer
- pTapDelay [in] points to the array of offset times
- maxDelay [in] maximum offset time supported
- blockSize [in] number of samples that will be processed per block

## **Returns** none

```
void riscv_fir_sparse_init_q7 (riscv_fir_sparse_instance_q7 **S, uint16_t numTaps, const q7_t *pCoeffs, q7_t *pState, int32_t *pTapDelay, uint16_t maxDelay, uint32_t blockSize)

Initialization function for the Q7 sparse FIR filter.
```

Details pCoeffs holds the filter coefficients and has length numTaps. pState holds the filter's state variables and must be of length maxDelay + blockSize, where maxDelay is the maximum number of delay line values. blockSize is the number of samples processed by the riscv\_fir\_sparse\_q7() function.

## **Parameters**

- **S** [inout] points to an instance of the Q7 sparse FIR structure
- numTaps [in] number of nonzero coefficients in the filter
- pCoeffs [in] points to the array of filter coefficients
- pState [in] points to the state buffer
- pTapDelay [in] points to the array of offset times
- maxDelay [in] maximum offset time supported
- blockSize [in] number of samples that will be processed per block

#### Returns none

```
void riscv_fir_sparse_q15 (riscv_fir_sparse_instance_q15 *S, const q15_t *pSrc, q15_t *pDst, q15_t *pScratchIn, q31_t *pScratchOut, uint32_t blockSize)

Processing function for the Q15 sparse FIR filter.
```

Scaling and Overflow Behavior The function is implemented using an internal 32-bit accumulator. The 1.15 x 1.15 multiplications yield a 2.30 result and these are added to a 2.30 accumulator. Thus the full precision of the multiplications is maintained but there is only a single guard bit in the accumulator. If the accumulator result overflows it will wrap around rather than saturate. After all multiply-accumulates are performed, the 2.30 accumulator is truncated to 2.15 format and then saturated to 1.15 format. In order to avoid overflows the input signal or coefficients must be scaled down by log2(numTaps) bits.

### **Parameters**

- **S** [in] points to an instance of the Q15 sparse FIR structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- pScratchIn [in] points to a temporary buffer of size blockSize
- pScratchOut [in] points to a temporary buffer of size blockSize
- blockSize [in] number of input samples to process per call

### Returns none

```
void riscv_fir_sparse_q31 (riscv_fir_sparse_instance_q31 *S, const q31_t *pSrc, q31_t *pDst, q31_t *pScratchIn, uint32_t blockSize)

Processing function for the Q31 sparse FIR filter.
```

**Scaling and Overflow Behavior** The function is implemented using an internal 32-bit accumulator. The 1.31 x 1.31 multiplications are truncated to 2.30 format. This leads to loss of precision on the intermediate multiplications and provides only a single guard bit. If the accumulator result overflows, it wraps around rather than saturate. In order to avoid overflows the input signal or coefficients must be scaled down by log2(numTaps) bits.

- S [in] points to an instance of the Q31 sparse FIR structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- pScratchIn [in] points to a temporary buffer of size blockSize
- blockSize [in] number of input samples to process

#### Returns none

```
void riscv_fir_sparse_q7 (riscv_fir_sparse_instance_q7 *S, const q7_t *pSrc, q7_t *pDst, q7_t *pScratchIn, q31_t *pScratchOut, uint32_t blockSize)

Processing function for the Q7 sparse FIR filter.
```

**Scaling and Overflow Behavior** The function is implemented using a 32-bit internal accumulator. Both coefficients and state variables are represented in 1.7 format and multiplications yield a 2.14 result. The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format. There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved. The accumulator is then converted to 18.7 format by discarding the low 7 bits. Finally, the result is truncated to 1.7 format.

#### **Parameters**

- **S** [in] points to an instance of the Q7 sparse FIR structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- pScratchIn [in] points to a temporary buffer of size blockSize
- pScratchOut [in] points to a temporary buffer of size blockSize
- blockSize [in] number of input samples to process

## Returns none

# Infinite Impulse Response (IIR) Lattice Filters

### group IIR Lattice

This set of functions implements lattice filters for Q15, Q31 and floating-point data types. Lattice filters are used in a variety of adaptive filter applications. The filter structure has feedforward and feedback components and the net impulse response is infinite length. The functions operate on blocks of input and output data and

each call to the function processes blockSize samples through the filter. pSrc and pDst point to input and output arrays containing blockSize values.



# Algorithm

pkCoeffs points to array of reflection coefficients of size numStages. Reflection Coefficients are stored in time-reversed order.

pvCoeffs points to the array of ladder coefficients of size (numStages+1). Ladder coefficients are stored in time-reversed order.

pState points to a state array of size numStages + blockSize. The state variables shown in the figure above (the g values) are stored in the pState array. The state variables are updated after each block of data is processed; the coefficients are untouched.

**Instance Structure** The coefficients and state variables for a filter are stored together in an instance data structure. A separate instance structure must be defined for each filter. Coefficient arrays may be shared among several instances while state variable arrays cannot be shared. There are separate instance structure declarations for each of the 3 supported data types.

**Initialization Functions** There is also an associated initialization function for each data type. The initialization function performs the following operations:

- Sets the values of the internal structure fields.
- Zeros out the values in the state buffer. To do this manually without calling the init function, assign the follow subfields of the instance structure: numStages, pkCoeffs, pvCoeffs, pState. Also set all of the values in pState to zero.

Use of the initialization function is optional. However, if the initialization function is used, then the instance structure cannot be placed into a const data section. To place an instance structure into a const data section, the instance structure must be manually initialized. Set the values in the state buffer to zeros and then manually initialize the instance structure as follows:

where numStages is the number of stages in the filter; pState points to the state buffer array; pkCoeffs points to array of the reflection coefficients; pvCoeffs points to the array of ladder coefficients.

**Fixed-Point Behavior** Care must be taken when using the fixed-point versions of the IIR lattice filter functions. In particular, the overflow and saturation behavior of the accumulator used in each function must be considered. Refer to the function specific documentation below for usage guidelines.

## **Functions**

```
void riscv_iir_lattice_f32 (const riscv_iir_lattice_instance_f32 *S, const float32_t *pSrc, float32_t *pDst, uint32_t blockSize)
```

Processing function for the floating-point IIR lattice filter.

#### **Parameters**

- **S** [in] points to an instance of the floating-point IIR lattice structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

## Returns none

```
void riscv_iir_lattice_init_f32 (riscv_iir_lattice_instance_f32 *S, uint16_t numStages, float32_t *pkCoeffs, float32_t *pvCoeffs, float32_t *pvCoeffs, float32_t *pState, uint32_t blockSize)
```

Initialization function for the floating-point IIR lattice filter.

#### **Parameters**

- S [in] points to an instance of the floating-point IIR lattice structure
- numStages [in] number of stages in the filter
- pkCoeffs [in] points to reflection coefficient buffer. The array is of length numStages
- pvCoeffs [in] points to ladder coefficient buffer. The array is of length numStages+1
- pState [in] points to state buffer. The array is of length numStages+blockSize
- blockSize [in] number of samples to process

# Returns none

```
void riscv_iir_lattice_init_q15 (riscv_iir_lattice_instance_q15 *S, uint16_t numStages, q15_t *pkCoeffs, q15_t *pvCoeffs, q15_t *pState, uint32_t blockSize)
```

Initialization function for the Q15 IIR lattice filter.

## **Parameters**

- S [in] points to an instance of the Q15 IIR lattice structure
- numStages [in] number of stages in the filter
- pkCoeffs [in] points to reflection coefficient buffer. The array is of length numStages
- pvCoeffs [in] points to ladder coefficient buffer. The array is of length numStages+1
- pState [in] points to state buffer. The array is of length numStages+blockSize
- blockSize [in] number of samples to process

# Returns none

```
void riscv_iir_lattice_init_q31 (riscv_iir_lattice_instance_q31 *S, uint16_t numStages, q31_t *pkCoeffs, q31_t *pvCoeffs, q31_t *pVCoeffs, q31_t *pState, uint32_t blockSize)
```

Initialization function for the Q31 IIR lattice filter.

## **Parameters**

• **S** – **[in]** points to an instance of the Q31 IIR lattice structure

- numStages [in] number of stages in the filter
- pkCoeffs [in] points to reflection coefficient buffer. The array is of length numStages
- pvCoeffs [in] points to ladder coefficient buffer. The array is of length numStages+1
- pState [in] points to state buffer. The array is of length numStages+blockSize
- blockSize [in] number of samples to process

#### Returns none

```
void riscv_iir_lattice_q15 (const riscv_iir_lattice_instance_q15 *S, const q15_t *pSrc, q15_t *pDst, uint32_t blockSize)

Processing function for the Q15 IIR lattice filter.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved. After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits. Lastly, the accumulator is saturated to yield a result in 1.15 format.

## **Parameters**

- **S** [in] points to an instance of the Q15 IIR lattice structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

#### Returns none

```
void riscv_iir_lattice_q31 (const_riscv_iir_lattice_instance_q31_*S, const_q31_t *pSrc, q31_t *pDst, uint32_t blockSize)

Processing function for the Q31 IIR lattice filter.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around rather than clip. In order to avoid overflows completely the input signal must be scaled down by 2\*log2(numStages) bits. After all multiply-accumulates are performed, the 2.62 accumulator is saturated to 1.32 format and then truncated to 1.31 format.

### **Parameters**

- **S [in]** points to an instance of the Q31 IIR lattice structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

#### Returns none

# Least Mean Square (LMS) Filters

void riscv\_lms\_f32 (const riscv\_lms\_instance\_f32 \*S, const float32\_t \*pSrc, float32\_t \*pRef, float32\_t \*pOut, float32\_t \*pErr, uint32\_t blockSize)

void **riscv\_lms\_init\_f32** (riscv\_lms\_instance\_f32 \*S, uint16\_t numTaps, float32\_t \*pCoeffs, float32\_t \*pState, float32\_t mu, uint32\_t blockSize)

void **riscv\_lms\_init\_q15** (riscv\_lms\_instance\_q15 \*S, uint16\_t numTaps, q15\_t \*pCoeffs, q15\_t \*pState, q15\_t mu, uint32\_t blockSize, uint32\_t postShift)

void **riscv\_lms\_init\_q31** (riscv\_lms\_instance\_q31 \*S, uint16\_t numTaps, q31\_t \*pCoeffs, q31\_t \*pState, q31\_t mu, uint32\_t blockSize, uint32\_t postShift)

void riscv\_lms\_q15 (const riscv\_lms\_instance\_q15 \*S, const q15\_t \*pSrc, q15\_t \*pRef, q15\_t \*pOut, q15\_t \*pErr, uint32\_t blockSize)

void riscv\_lms\_q31 (const riscv\_lms\_instance\_q31 \*S, const q31\_t \*pSrc, q31\_t \*pRef, q31\_t \*pOut, q31\_t \*pErr, uint32\_t blockSize)

#### group LMS

LMS filters are a class of adaptive filters that are able to "learn" an unknown transfer functions. LMS filters use a gradient descent method in which the filter coefficients are updated based on the instantaneous error signal. Adaptive filters are often used in communication systems, equalizers, and noise removal. The NMSIS DSP Library contains LMS filter functions that operate on Q15, Q31, and floating-point data types. The library also contains normalized LMS filters in which the filter coefficient adaptation is indepedent of the level of the input signal.

An LMS filter consists of two components as shown below. The first component is a standard transversal or FIR filter. The second component is a coefficient update mechanism. The LMS filter has two input signals. The "input" feeds the FIR filter while the "reference input" corresponds to the desired output of the FIR filter. That is, the FIR filter coefficients are updated so that the output of the FIR filter matches the reference input. The filter coefficient update mechanism is based on the difference between the FIR filter output and the reference input. This "error signal" tends towards zero as the filter adapts. The LMS processing functions accept the input and reference input signals and generate the filter output and error signal.

The functions operate on blocks of data and each call to the function processes blockSize samples through the filter. pSrc points to input signal, pRef points to reference signal, pOut points to output signal and pErr points to error signal. All arrays contain blockSize



The functions operate on a block-by-block basis. Internally, the filter coefficients b[n] are updated on a sample-by-sample basis. The convergence of the LMS filter is slower compared to the normalized LMS algorithm.

**Algorithm** The output signal y [n] is computed by a standard FIR filter:

The error signal equals the difference between the reference signal d[n] and the filter output:

After each sample of the error signal is computed, the filter coefficients b[k] are updated on a sample-by-sample basis: where mu is the step size and controls the rate of coefficient convergence.

In the APIs, pCoeffs points to a coefficient array of size numTaps. Coefficients are stored in time reversed order.

pState points to a state array of size numTaps + blockSize - 1. Samples in the state buffer are stored in the order:

Note that the length of the state buffer exceeds the length of the coefficient array by blockSize-1 samples. The increased state buffer length allows circular addressing, which is traditionally used in FIR filters, to be avoided and yields a significant speed improvement. The state variables are updated after each block of data is processed.

**Instance Structure** The coefficients and state variables for a filter are stored together in an instance data structure. A separate instance structure must be defined for each filter and coefficient and state arrays cannot be shared among instances. There are separate instance structure declarations for each of the 3 supported data types.

**Initialization Functions** There is also an associated initialization function for each data type. The initialization function performs the following operations:

- Sets the values of the internal structure fields.
- Zeros out the values in the state buffer. To do this manually without calling the init function, assign the follow subfields of the instance structure: numTaps, pCoeffs, mu, postShift (not for f32), pState. Also set all of the values in pState to zero.

Use of the initialization function is optional. However, if the initialization function is used, then the instance structure cannot be placed into a const data section. To place an instance structure into a const data section, the instance structure must be manually initialized. Set the values in the state buffer to zeros before static initialization. The code below statically initializes each of the 3 different data type filter instance structures where numTaps is the number of filter coefficients in the filter; pState is the address of the state buffer; pCoeffs is the address of the coefficient buffer; mu is the step size parameter; and postShift is the shift applied to coefficients.

**Fixed-Point Behavior** Care must be taken when using the Q15 and Q31 versions of the LMS filter. The following issues must be considered:

- · Scaling of coefficients
- · Overflow and saturation

Scaling of Coefficients Filter coefficients are represented as fractional values and coefficients are restricted to lie in the range [-1 +1). The fixed-point functions have an additional scaling parameter postShift. At the output of the filter's accumulator is a shift register which shifts the result by postShift bits. This essentially scales the filter coefficients by 2^postShift and allows the filter coefficients to exceed the range [+1 -1). The value of postShift is set by the user based on the expected gain through the system being modeled.

**Overflow and Saturation** Overflow and saturation behavior of the fixed-point Q15 and Q31 versions are described separately as part of the function specific documentation below.

#### **Functions**

```
void riscv_lms_f32 (const riscv_lms_instance_f32 *S, const float32_t *pSrc, float32_t *pRef, float32_t *pOut, float32_t *pErr, uint32_t blockSize)

Processing function for floating-point LMS filter.
```

#### **Parameters**

- S [in] points to an instance of the floating-point LMS filter structure
- pSrc [in] points to the block of input data
- pRef [in] points to the block of reference data
- pout [out] points to the block of output data
- pErr [out] points to the block of error data
- blockSize [in] number of samples to process

#### Returns none

```
void riscv_lms_init_f32 (riscv_lms_instance_f32 *S, uint16_t numTaps, float32_t *pCoeffs, float32_t *pState, float32_t mu, uint32_t blockSize)

Initialization function for floating-point LMS filter.
```

**Details** pCoeffs points to the array of filter coefficients stored in time reversed order: The initial filter coefficients serve as a starting point for the adaptive filter. pState points to an array of length numTaps+blockSize-1 samples, where blockSize is the number of input samples processed by each call to riscv lms f32().

# **Parameters**

- **S [in]** points to an instance of the floating-point LMS filter structure
- numTaps [in] number of filter coefficients
- pCoeffs [in] points to coefficient buffer
- pState [in] points to state buffer
- mu [in] step size that controls filter coefficient updates
- blockSize [in] number of samples to process

## Returns none

```
void riscv_lms_init_q15 (riscv_lms_instance_q15 *S, uint16_t numTaps, q15_t *pCoeffs, q15_t *pState, q15_t mu, uint32_t blockSize, uint32_t postShift)

Initialization function for the Q15 LMS filter.
```

Details pCoeffs points to the array of filter coefficients stored in time reversed order: The initial filter coefficients serve as a starting point for the adaptive filter. pState points to the array of state variables and size of array is numTaps+blockSize-1 samples, where blockSize is the number of input samples processed by each call to riscv\_lms\_q15().

#### **Parameters**

• **S** – **[in]** points to an instance of the Q15 LMS filter structure.

- numTaps [in] number of filter coefficients.
- pCoeffs [in] points to coefficient buffer.
- pState [in] points to state buffer.
- mu [in] step size that controls filter coefficient updates.
- blockSize [in] number of samples to process.
- postShift [in] bit shift applied to coefficients.

#### Returns none

```
void riscv_lms_init_q31 (riscv_lms_instance_q31 *S, uint16_t numTaps, q31_t *pCoeffs, q31_t *pState, q31_t mu, uint32_t blockSize, uint32_t postShift)

Initialization function for Q31 LMS filter.
```

Details pCoeffs points to the array of filter coefficients stored in time reversed order: The initial filter coefficients serve as a starting point for the adaptive filter. pState points to an array of length numTaps+blockSize-1 samples, where blockSize is the number of input samples processed by each call to riscv\_lms\_q31().

#### **Parameters**

- **S** [in] points to an instance of the Q31 LMS filter structure
- numTaps [in] number of filter coefficients
- pCoeffs [in] points to coefficient buffer
- pState [in] points to state buffer
- mu [in] step size that controls filter coefficient updates
- blockSize [in] number of samples to process
- postShift [in] bit shift applied to coefficients

### Returns none

```
void riscv_lms_q15 (const riscv_lms_instance_q15 *S, const q15_t *pSrc, q15_t *pRef, q15_t *pOut, q15_t *pErr, uint32_t blockSize)

Processing function for Q15 LMS filter.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved. After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits. Lastly, the accumulator is saturated to yield a result in 1.15 format.

In this filter, filter coefficients are updated for each sample and the updation of filter cofficients are saturted.

## **Parameters**

- **S** [in] points to an instance of the Q15 LMS filter structure
- pSrc [in] points to the block of input data
- pRef [in] points to the block of reference data
- pOut [out] points to the block of output data

- pErr [out] points to the block of error data
- blockSize [in] number of samples to process

#### Returns none

```
void riscv_lms_q31 (const riscv_lms_instance_q31 *S, const q31_t *pSrc, q31_t *pRef, q31_t *pOut, q31_t *pErr, uint32_t blockSize)

Processing function for Q31 LMS filter.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around rather than clips. In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits. The reference signal should not be scaled down. After all multiply-accumulates are performed, the 2.62 accumulator is shifted and saturated to 1.31 format to yield the final result. The output signal and error signal are in 1.31 format.

In this filter, filter coefficients are updated for each sample and the updation of filter cofficients are saturted.

#### **Parameters**

- **S** [in] points to an instance of the Q31 LMS filter structure.
- pSrc [in] points to the block of input data.
- pRef [in] points to the block of reference data.
- pout [out] points to the block of output data.
- pErr [out] points to the block of error data.
- blockSize [in] number of samples to process.

### Returns none

## **Normalized LMS Filters**

```
void riscv_lms_norm_f32 (riscv_lms_norm_instance_f32 *S, const float32_t *pSrc, float32_t *pRef, float32_t *pOut, float32_t *pErr, uint32_t blockSize)

void riscv_lms_norm_init_f32 (riscv_lms_norm_instance_f32 *S, uint16_t numTaps, float32_t *pCoeffs, float32_t *pState, float32_t mu, uint32_t blockSize)

void riscv_lms_norm_init_q15 (riscv_lms_norm_instance_q15 *S, uint16_t numTaps, q15_t *pCoeffs, q15_t *pState, q15_t mu, uint32_t blockSize, uint8_t postShift)

void riscv_lms_norm_init_q31 (riscv_lms_norm_instance_q31 *S, uint16_t numTaps, q31_t *pCoeffs, q31_t *pState, q31_t mu, uint32_t blockSize, uint8_t postShift)

void riscv_lms_norm_q15 (riscv_lms_norm_instance_q15 *S, const q15_t *pSrc, q15_t *pRef, q15_t *pOut, q15_t *pErr, uint32_t blockSize)

void riscv_lms_norm_q31 (riscv_lms_norm_instance_q31 *S, const q31_t *pSrc, q31_t *pRef, q31_t *pOut, q31_t *pErr, uint32_t blockSize)
```

# group LMS\_NORM

This set of functions implements a commonly used adaptive filter. It is related to the Least Mean Square (LMS) adaptive filter and includes an additional normalization factor which increases the adaptation rate of the filter. The NMSIS DSP Library contains normalized LMS filter functions that operate on Q15, Q31, and floating-point data types.

A normalized least mean square (NLMS) filter consists of two components as shown below. The first component is a standard transversal or FIR filter. The second component is a coefficient update mechanism. The NLMS filter has two input signals. The "input" feeds the FIR filter while the "reference input" corresponds to the desired output of the FIR filter. That is, the FIR filter coefficients are updated so that the output of the FIR filter matches the reference input. The filter coefficient update mechanism is based on the difference between the FIR filter output and the reference input. This "error signal" tends towards zero as the filter adapts. The NLMS processing functions accept the input and reference input signals and generate the filter output and error signal.

The functions operate on blocks of data and each call to the function processes blockSize samples through the filter. pSrc points to input signal, pRef points to reference signal, pOut points to output signal and pErr points to error signal. All arrays contain blockSize



The functions operate on a block-by-block basis. Internally, the filter coefficients b[n] are updated on a sample-by-sample basis. The convergence of the LMS filter is slower compared to the normalized LMS algorithm.

**Algorithm** The output signal y [n] is computed by a standard FIR filter:

The error signal equals the difference between the reference signal d[n] and the filter output:

After each sample of the error signal is computed the instanteous energy of the filter state variables is calculated: The filter coefficients b[k] are then updated on a sample-by-sample basis: where mu is the step size and controls the rate of coefficient convergence.

In the APIs, pCoeffs points to a coefficient array of size numTaps. Coefficients are stored in time reversed order.

pState points to a state array of size numTaps + blockSize - 1. Samples in the state buffer are stored in the order:

Note that the length of the state buffer exceeds the length of the coefficient array by blockSize-1 samples. The increased state buffer length allows circular addressing, which is traditionally used in FIR filters, to be avoided and yields a significant speed improvement. The state variables are updated after each block of data is processed.

**Instance Structure** The coefficients and state variables for a filter are stored together in an instance data structure. A separate instance structure must be defined for each filter and coefficient and state arrays cannot

be shared among instances. There are separate instance structure declarations for each of the 3 supported data types.

**Initialization Functions** There is also an associated initialization function for each data type. The initialization function performs the following operations:

- Sets the values of the internal structure fields.
- Zeros out the values in the state buffer. To do this manually without calling the init function, assign the follow subfields of the instance structure: numTaps, pCoeffs, mu, energy, x0, pState. Also set all of the values in pState to zero. For Q7, Q15, and Q31 the following fields must also be initialized; recipTable, postShift

Instance structure cannot be placed into a const data section and it is recommended to use the initialization function.

**Fixed-Point Behavior** Care must be taken when using the Q15 and Q31 versions of the normalised LMS filter. The following issues must be considered:

- · Scaling of coefficients
- · Overflow and saturation

Scaling of Coefficients (fixed point versions) Filter coefficients are represented as fractional values and coefficients are restricted to lie in the range [-1 +1). The fixed-point functions have an additional scaling parameter postShift. At the output of the filter's accumulator is a shift register which shifts the result by postShift bits. This essentially scales the filter coefficients by 2^postShift and allows the filter coefficients to exceed the range [+1 -1). The value of postShift is set by the user based on the expected gain through the system being modeled.

**Overflow and Saturation (fixed point versions)** Overflow and saturation behavior of the fixed-point Q15 and Q31 versions are described separately as part of the function specific documentation below.

## **Functions**

```
void riscv_lms_norm_f32 (riscv_lms_norm_instance_f32 *S, const float32_t *pSrc, float32_t *pRef, float32_t *pOut, float32_t *pErr, uint32_t blockSize) Processing function for floating-point normalized LMS filter.
```

# **Parameters**

- **S** [in] points to an instance of the floating-point normalized LMS filter structure
- pSrc [in] points to the block of input data
- pRef [in] points to the block of reference data
- pout [out] points to the block of output data
- pErr [out] points to the block of error data
- blockSize [in] number of samples to process

## Returns none

```
void riscv_lms_norm_init_f32 (riscv_lms_norm_instance_f32 *S, uint16_t numTaps, float32_t *pCoeffs, float32_t *pState, float32_t mu, uint32_t blockSize) Initialization function for floating-point normalized LMS filter.
```

Details pCoeffs points to the array of filter coefficients stored in time reversed order: The initial filter coefficients serve as a starting point for the adaptive filter. pState points to an array of length numTaps+blockSize-1 samples, where blockSize is the number of input samples processed by each call to riscv\_lms\_norm\_f32().

#### **Parameters**

- S [in] points to an instance of the floating-point LMS filter structure
- numTaps [in] number of filter coefficients
- pCoeffs [in] points to coefficient buffer
- pState [in] points to state buffer
- mu [in] step size that controls filter coefficient updates
- blockSize [in] number of samples to process

#### Returns none

```
void riscv_lms_norm_init_q15 (riscv_lms_norm_instance_q15 *S, uint16_t numTaps, q15_t *pCoeffs, q15_t *pState, q15_t mu, uint32_t blockSize, uint8_t postShift)
```

Initialization function for Q15 normalized LMS filter.

**Details** pCoeffs points to the array of filter coefficients stored in time reversed order: The initial filter coefficients serve as a starting point for the adaptive filter. pState points to the array of state variables and size of array is numTaps+blockSize-1 samples, where blockSize is the number of input samples processed by each call to riscv lms norm g15().

#### **Parameters**

- **S** [in] points to an instance of the Q15 normalized LMS filter structure.
- numTaps [in] number of filter coefficients.
- pCoeffs [in] points to coefficient buffer.
- pState [in] points to state buffer.
- mu [in] step size that controls filter coefficient updates.
- blockSize [in] number of samples to process.
- postShift [in] bit shift applied to coefficients.

## Returns none

```
void riscv_lms_norm_init_q31 (riscv_lms_norm_instance_q31 *S, uint16_t numTaps, q31_t *pCoeffs, q31_t *pState, q31_t mu, uint32_t blockSize, uint8_t postShift)
```

Initialization function for Q31 normalized LMS filter.

**Details** pCoeffs points to the array of filter coefficients stored in time reversed order: The initial filter coefficients serve as a starting point for the adaptive filter. pState points to an array of length numTaps+blockSize-1 samples, where blockSize is the number of input samples processed by each call to riscy lms norm q31().

## **Parameters**

- **S** [in] points to an instance of the Q31 normalized LMS filter structure.
- numTaps [in] number of filter coefficients.
- pCoeffs [in] points to coefficient buffer.
- pState [in] points to state buffer.

- mu [in] step size that controls filter coefficient updates.
- blockSize [in] number of samples to process.
- postShift [in] bit shift applied to coefficients.

#### Returns none

```
void riscv_lms_norm_q15 (riscv_lms_norm_instance_q15 *S, const q15_t *pSrc, q15_t *pRef, q15_t *pOut, q15_t *pErr, uint32_t blockSize)

Processing function for Q15 normalized LMS filter.
```

Scaling and Overflow Behavior The function is implemented using a 64-bit internal accumulator. Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved. After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits. Lastly, the accumulator is saturated to yield a result in 1.15 format.

In this filter, filter coefficients are updated for each sample and the updation of filter cofficients are saturted.

## **Parameters**

- **S** [in] points to an instance of the Q15 normalized LMS filter structure
- pSrc [in] points to the block of input data
- pRef [in] points to the block of reference data
- pout [out] points to the block of output data
- pErr [out] points to the block of error data
- blockSize [in] number of samples to process

## Returns none

```
void riscv_lms_norm_q31 (riscv_lms_norm_instance_q31 *S, const q31_t *pSrc, q31_t *pRef, q31_t *pOut, q31_t *pErr, uint32_t blockSize)

Processing function for Q31 normalized LMS filter.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around rather than clip. In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits. The reference signal should not be scaled down. After all multiply-accumulates are performed, the 2.62 accumulator is shifted and saturated to 1.31 format to yield the final result. The output signal and error signal are in 1.31 format.

In this filter, filter coefficients are updated for each sample and the updation of filter cofficients are saturted.

# **Parameters**

- S [in] points to an instance of the Q31 normalized LMS filter structure
- pSrc [in] points to the block of input data
- pRef [in] points to the block of reference data
- pout [out] points to the block of output data

- pErr [out] points to the block of error data
- blockSize [in] number of samples to process

Returns none

# Finite Impulse Response (FIR) Interpolator

void **riscv\_fir\_interpolate\_f32** (**const** riscv\_fir\_interpolate\_instance\_f32 \*S, **const** float32\_t \*pSrc, float32\_t \*pDst, uint32\_t blockSize)

riscv\_status riscv\_fir\_interpolate\_init\_f32 (riscv\_fir\_interpolate\_instance\_f32 \*S, uint8\_t L, uint16\_t numTaps, const float32\_t \*pCoeffs, float32\_t \*pState, uint32\_t blockSize)

riscv\_status riscv\_fir\_interpolate\_init\_q15 (riscv\_fir\_interpolate\_instance\_q15 \*S, uint8\_t L, uint16\_t numTaps, const q15\_t \*pCoeffs, q15\_t \*pState, uint32\_t blockSize)

riscv\_status riscv\_fir\_interpolate\_init\_q31 (riscv\_fir\_interpolate\_instance\_q31 \*S, uint8\_t L, uint16\_t numTaps, const q31\_t \*pCoeffs, q31\_t \*pState, uint32\_t blockSize)

void **riscv\_fir\_interpolate\_q15** (**const** riscv\_fir\_interpolate\_instance\_q15 \*S, **const** q15\_t \*pSrc, q15\_t \*pDst, uint32\_t blockSize)

void riscv\_fir\_interpolate\_q31 (const riscv\_fir\_interpolate\_instance\_q31 \*S, const q31\_t \*pSrc, q31\_t \*pDst, uint32\_t blockSize)

#### group FIR\_Interpolate

These functions combine an upsampler (zero stuffer) and an FIR filter. They are used in multirate systems for increasing the sample rate of a signal without introducing high frequency images. Conceptually, the functions are equivalent to the block diagram below:

After upsampling by a factor of L, the signal should be filtered by a lowpass filter with a normalized cutoff frequency of 1/L in order to eliminate high frequency copies of the spectrum. The user of the function is responsible for providing the filter



coefficients.

The FIR interpolator functions provided in the NMSIS DSP Library combine the upsampler and FIR filter in an efficient manner. The upsampler inserts L-1 zeros between each sample. Instead of multiplying by these zero values, the FIR filter is designed to skip them. This leads to an efficient implementation without any wasted effort. The functions operate on blocks of input and output data. pSrc points to an array of blockSize input values and pDst points to an array of blockSize\*L output values.

The library provides separate functions for Q15, Q31, and floating-point data types.

**Algorithm** The functions use a polyphase filter structure: This approach is more efficient than straightforward upsample-then-filter algorithms. With this method the computation is reduced by a factor of 1/L when compared to using a standard FIR filter.

pCoeffs points to a coefficient array of size numTaps. numTaps must be a multiple of the interpolation factor L and this is checked by the initialization functions. Internally, the function divides the FIR filter's

impulse response into shorter filters of length phaseLength=numTaps/L. Coefficients are stored in time reversed order.

pState points to a state array of size blockSize + phaseLength - 1. Samples in the state buffer are stored in the order:

The state variables are updated after each block of data is processed, the coefficients are untouched.

**Instance Structure** The coefficients and state variables for a filter are stored together in an instance data structure. A separate instance structure must be defined for each filter. Coefficient arrays may be shared among several instances while state variable array should be allocated separately. There are separate instance structure declarations for each of the 3 supported data types.

**Initialization Functions** There is also an associated initialization function for each data type. The initialization function performs the following operations:

- Sets the values of the internal structure fields.
- Zeros out the values in the state buffer.
- Checks to make sure that the length of the filter is a multiple of the interpolation factor. To do this manually without calling the init function, assign the follow subfields of the instance structure: L (interpolation factor), pCoeffs, phaseLength (numTaps / L), pState. Also set all of the values in pState to zero.

Use of the initialization function is optional. However, if the initialization function is used, then the instance structure cannot be placed into a const data section. To place an instance structure into a const data section, the instance structure must be manually initialized. The code below statically initializes each of the 3 different data type filter instance structures

where L is the interpolation factor; phaseLength=numTaps/L is the length of each of the shorter FIR filters used internally, pCoeffs is the address of the coefficient buffer; pState is the address of the state buffer. Be sure to set the values in the state buffer to zeros when doing static initialization.

**Fixed-Point Behavior** Care must be taken when using the fixed-point versions of the FIR interpolate filter functions. In particular, the overflow and saturation behavior of the accumulator used in each function must be considered. Refer to the function specific documentation below for usage guidelines.

#### **Functions**

```
void riscv_fir_interpolate_f32 (const riscv_fir_interpolate_instance_f32 *S, const float32_t *pSrc, float32_t *pDst, uint32_t blockSize)

Processing function for floating-point FIR interpolator.
```

Processing function for the floating-point FIR interpolator.

## **Parameters**

- **S** [in] points to an instance of the floating-point FIR interpolator structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

#### Returns none

```
riscv_status riscv_fir_interpolate_init_f32 (riscv_fir_interpolate_instance_f32 *S, uint8_t

L, uint16_t numTaps, const float32_t *pCo-
effs, float32_t *pState, uint32_t blockSize)

Initialization function for the floating-point FIR interpolator.
```

**Details** pCoeffs points to the array of filter coefficients stored in time reversed order:

The length of the filter numTaps must be a multiple of the interpolation factor L.

pState points to the array of state variables. pState is of length (numTaps/L)+blockSize-1 words where blockSize is the number of input samples processed by each call to riscv\_fir\_interpolate\_f32().

#### **Parameters**

- **S** [inout] points to an instance of the floating-point FIR interpolator structure
- L [in] upsample factor
- numTaps [in] number of filter coefficients in the filter
- pCoeffs [in] points to the filter coefficient buffer
- pState [in] points to the state buffer
- blockSize [in] number of input samples to process per call

#### **Returns** execution status

- RISCV\_MATH\_SUCCESS : Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: filter length numTaps is not a multiple of the interpolation factor L

```
riscv_status riscv_fir_interpolate_init_q15 (riscv_fir_interpolate_instance_q15 *S, uint8_t
L, uint16_t numTaps, const q15_t *pCoeffs,
q15_t *pState, uint32_t blockSize)
```

Initialization function for the Q15 FIR interpolator.

**Details** pCoeffs points to the array of filter coefficients stored in time reversed order: The length of the filter numTaps must be a multiple of the interpolation factor L.

pState points to the array of state variables. pState is of length (numTaps/L)+blockSize-1 words where blockSize is the number of input samples processed by each call to riscv\_fir\_interpolate\_q15().

# **Parameters**

- S [inout] points to an instance of the Q15 FIR interpolator structure
- L [in] upsample factor
- numTaps [in] number of filter coefficients in the filter
- pCoeffs [in] points to the filter coefficient buffer
- pState [in] points to the state buffer
- blockSize [in] number of input samples to process per call

## **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR : filter length numTaps is not a multiple of the interpolation factor  ${\tt L}$

```
riscv_status riscv_fir_interpolate_init_q31 (riscv_fir_interpolate_instance_q31 *S, uint8_t
L, uint16_t numTaps, const q31_t *pCoeffs,
q31_t *pState, uint32_t blockSize)
```

Initialization function for the Q31 FIR interpolator.

**Details** pCoeffs points to the array of filter coefficients stored in time reversed order: The length of the filter numTaps must be a multiple of the interpolation factor L.

pState points to the array of state variables. pState is of length (numTaps/L)+blockSize-1 words where blockSize is the number of input samples processed by each call to riscv\_fir\_interpolate\_q31().

#### **Parameters**

- S [inout] points to an instance of the Q31 FIR interpolator structure
- L [in] upsample factor
- numTaps [in] number of filter coefficients in the filter
- pCoeffs [in] points to the filter coefficient buffer
- pState [in] points to the state buffer
- blockSize [in] number of input samples to process per call

#### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR : filter length numTaps is not a multiple of the interpolation factor  ${\tt L}$

```
void riscv_fir_interpolate_q15 (const riscv_fir_interpolate_instance_q15 *S, const q15_t *pSrc, q15_t *pDst, uint32_t blockSize)

Processing function for the Q15 FIR interpolator.
```

Scaling and Overflow Behavior The function is implemented using a 64-bit internal accumulator. Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved. After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits. Lastly, the accumulator is saturated to yield a result in 1.15 format.

# **Parameters**

- S [in] points to an instance of the Q15 FIR interpolator structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

### **Returns** none

```
void riscv_fir_interpolate_q31 (const riscv_fir_interpolate_instance_q31 *S, const q31_t *pSrc, q31_t *pDst, uint32_t blockSize)

Processing function for the Q31 FIR interpolator.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around rather than clip. In order to avoid overflows completely the input signal must be scaled down by 1/(numTaps/L). since numTaps/L additions occur per output sample. After all multiply-accumulates are performed, the 2.62 accumulator is truncated to 1.32 format and then saturated to 1.31 format.

#### **Parameters**

- S [in] points to an instance of the Q31 FIR interpolator structure
- pSrc [in] points to the block of input data
- pDst [out] points to the block of output data
- blockSize [in] number of samples to process

Returns none

group groupFilters

## 3.3.5 Matrix Functions

## **Matrix Addition**

```
riscv_status riscv_mat_add_f32 (const riscv_matrix_instance_f32 *pSrcA, const riscv_matrix_instance_f32 *pSrcB, riscv_matrix_instance_f32 *pDst)

riscv_status riscv_mat_add_q15 (const riscv_matrix_instance_q15 *pSrcA, const riscv_matrix_instance_q15 *pSrcB, riscv_matrix_instance_q15 *pDst)

riscv_status riscv_mat_add_q31 (const riscv_matrix_instance_q31 *pSrcA, const riscv_matrix_instance_q31 *pSrcB, riscv_matrix_instance_q31 *pDst)
```

# group MatrixAdd

Adds two matrices.

The functions check to make sure that pSrcA, pSrcB, and pDst have the same number of rows and columns.



## **Functions**

```
riscv_status riscv_mat_add_f32 (const riscv_matrix_instance_f32 *pSrcA, const riscv_matrix_instance_f32 *pSrcB, riscv_matrix_instance_f32 *pDst)
```

Floating-point matrix addition.

#### **Parameters**

- pSrcA [in] points to first input matrix structure
- pSrcB [in] points to second input matrix structure
- pDst [out] points to output matrix structure

## Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

```
riscv_status riscv_mat_add_q15 (const riscv_matrix_instance_q15 *pSrcA, const riscv_matrix_instance_q15 *pSrcB, riscv_matrix_instance_q15 *pDst)
```

Q15 matrix addition.

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated.

## **Parameters**

- pSrcA [in] points to first input matrix structure
- pSrcB [in] points to second input matrix structure
- pDst [out] points to output matrix structure

#### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

```
riscv_status riscv_mat_add_q31 (const riscv_matrix_instance_q31 *pSrcA, const riscv_matrix_instance_q31 *pSrcB, riscv_matrix_instance_q31 *pDst)
```

Q31 matrix addition.

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q31 range [0x8000000 0x7FFFFFFF] are saturated.

## **Parameters**

- pSrcA [in] points to first input matrix structure
- pSrcB [in] points to second input matrix structure
- pDst [out] points to output matrix structure

# **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

## **Complex Matrix Multiplication**

```
riscv_status riscv_mat_cmplx_mult_f32 (const
                                                           riscv matrix instance f32
                                                                                           *pSrcA,
                                            const
                                                           riscv_matrix_instance_f32
                                                                                            *pSrcB,
                                            riscv_matrix_instance_f32 *pDst)
                                                                                            *pSrcA.
riscv_status riscv_mat_cmplx_mult_q15 (const
                                                           riscv_matrix_instance_q15
                                                           riscv_matrix_instance_q15
                                                                                            *pSrcB,
                                            const
                                            riscv_matrix_instance_q15 *pDst, q15_t *pScratch)
                                                           riscv_matrix_instance_q31
                                                                                            *pSrcA,
riscv_status riscv_mat_cmplx_mult_q31 (const
                                            const
                                                           riscv_matrix_instance_q31
                                                                                            *pSrcB,
                                            riscv_matrix_instance_q31 *pDst)
```

## group CmplxMatrixMult

Complex Matrix multiplication is only defined if the number of columns of the first matrix equals the number of rows of the second matrix. Multiplying an  $M \times N$  matrix with an  $N \times P$  matrix results in an  $M \times P$  matrix.

When matrix size checking is enabled, the functions check:

- that the inner dimensions of pSrcA and pSrcB are equal;
- that the size of the output matrix equals the outer dimensions of pSrcA and pSrcB.

## **Functions**

```
riscv_status riscv_mat_cmplx_mult_f32 (const riscv_matrix_instance_f32 *pSrcA, const riscv_matrix_instance_f32 *pSrcB, riscv_matrix_instance_f32 *pDst) *pSrcB,
```

Floating-point Complex matrix multiplication.

Floating-point, complex, matrix multiplication.

# **Parameters**

- pSrcA [in] points to first input complex matrix structure
- pSrcB [in] points to second input complex matrix structure
- pDst [out] points to output complex matrix structure

## **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

```
riscv_status riscv_mat_cmplx_mult_q15 (const riscv_matrix_instance_q15 *pSrcA, const riscv_matrix_instance_q15 *pSrcB, riscv_matrix_instance_q15 *pDst, q15_t *pScratch)
```

Q15 Complex matrix multiplication.

Q15, complex, matrix multiplication.

Conditions for optimum performance Input, output and state buffers should be aligned by 32-bit

**Scaling and Overflow Behavior** The function is implemented using an internal 64-bit accumulator. The inputs to the multiplications are in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. This approach provides 33 guard bits and there is no risk of overflow. The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.

## **Parameters**

- pSrcA [in] points to first input complex matrix structure
- pSrcB [in] points to second input complex matrix structure
- pDst [out] points to output complex matrix structure
- pScratch [in] points to an array for storing intermediate results

#### **Returns** execution status

- RISCV MATH SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

```
riscv_status riscv_mat_cmplx_mult_q31 (const riscv_matrix_instance_q31 *pSrcA, const riscv_matrix_instance_q31 *pSrcB, riscv_matrix_instance_q31 *pDst)
```

- Q31 Complex matrix multiplication.
- Q31, complex, matrix multiplication.

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. There is no saturation on intermediate additions. Thus, if the accumulator overflows it wraps around and distorts the result. The input signals should be scaled down to avoid intermediate overflows. The input is thus scaled down by log2(numColsA) bits to avoid overflows, as a total of numColsA additions are performed internally. The 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.

#### **Parameters**

- pSrcA [in] points to first input complex matrix structure
- pSrcB [in] points to second input complex matrix structure
- pDst [out] points to output complex matrix structure

## **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

#### **Matrix Initialization**

```
void riscv_mat_init_f32 (riscv_matrix_instance_f32 *S, uint16_t nRows, uint16_t nColumns, float32_t *pData)
```

```
void riscv_mat_init_f64 (riscv_matrix_instance_f64 *S, uint16_t nRows, uint16_t nColumns, float32_t *pData)
```

void riscv\_mat\_init\_q15 (riscv\_matrix\_instance\_q15 \*S, uint16\_t nRows, uint16\_t nColumns, q15\_t \*pData)

void riscv\_mat\_init\_q31 (riscv\_matrix\_instance\_q31 \*S, uint16\_t nRows, uint16\_t nColumns, q31\_t \*pData)

#### group MatrixInit

Initializes the underlying matrix data structure. The functions set the numRows, numCols, and pData fields of the matrix data structure.

#### **Functions**

void **riscv\_mat\_init\_f32** (riscv\_matrix\_instance\_f32 \*S, uint16\_t nRows, uint16\_t nColumns, float32\_t \*pData)

Floating-point matrix initialization.

#### **Parameters**

- **S** [inout] points to an instance of the floating-point matrix structure
- nRows [in] number of rows in the matrix
- nColumns [in] number of columns in the matrix
- pData [in] points to the matrix data array

#### Returns none

void **riscv\_mat\_init\_f64** (riscv\_matrix\_instance\_f64 \*S, uint16\_t nRows, uint16\_t nColumns, float32\_t \*pData)

Floating-point matrix initialization.

#### **Parameters**

- **S** [inout] points to an instance of the floating-point matrix structure
- nRows [in] number of rows in the matrix
- nColumns [in] number of columns in the matrix
- pData [in] points to the matrix data array

#### Returns none

void **riscv\_mat\_init\_q15** (riscv\_matrix\_instance\_q15 \*S, uint16\_t nRows, uint16\_t nColumns, q15\_t \*pData)

O15 matrix initialization.

#### **Parameters**

- **S** [inout] points to an instance of the floating-point matrix structure
- nRows [in] number of rows in the matrix
- nColumns [in] number of columns in the matrix
- pData [in] points to the matrix data array

## Returns none

void **riscv\_mat\_init\_q31** (riscv\_matrix\_instance\_q31 \*S, uint16\_t nRows, uint16\_t nColumns, q31\_t \*pData)

Q31 matrix initialization.

### **Parameters**

- **S** [inout] points to an instance of the Q31 matrix structure
- nRows [in] number of rows in the matrix
- nColumns [in] number of columns in the matrix
- pData [in] points to the matrix data array

## Returns none

#### **Matrix Inverse**

```
riscv_status riscv_mat_inverse_f32 (const
                                                        riscv matrix instance f32
                                                                                           *pSrc,
                                       riscv_matrix_instance_f32 *pDst)
riscv_status riscv_mat_inverse_f64 (const
                                                         riscv_matrix_instance_f64
                                                                                           *pSrc,
                                       riscv_matrix_instance_f64 *pDst)
                                                                                              *lt.
riscv_status riscv_mat_solve_lower_triangular_f32 (const
                                                                   riscv_matrix_instance_f32
                                                                    riscv_matrix_instance_f32
                                                                                              *a,
                                                           const
                                                           riscv_matrix_instance_f32 *dst)
                                                                                              *lt,
riscv_status riscv_mat_solve_lower_triangular_f64 (const
                                                                    riscv_matrix_instance_f64
                                                                                              *a,
                                                                    riscv_matrix_instance_f64
                                                           const
                                                           riscv matrix instance f64 *dst)
riscv_status riscv_mat_solve_upper_triangular_f32 (const
                                                                   riscv_matrix_instance_f32
                                                                                             *ut,
                                                                    riscv_matrix_instance_f32
                                                           const
                                                                                              *a,
                                                           riscv matrix instance f32 *dst)
riscv_status riscv_mat_solve_upper_triangular_f64 (const riscv_matrix_instance_f64
                                                                                             *ut.
                                                                   riscv matrix instance f64
                                                                                              *a,
                                                           riscv matrix instance f64 *dst)
```

## group MatrixInv

Computes the inverse of a matrix.

The inverse is defined only if the input matrix is square and non-singular (the determinant is non-zero). The function checks that the input and output matrices are square and of the same size.

Matrix inversion is numerically sensitive and the NMSIS DSP library only supports matrix inversion of floating-point matrices.

**Algorithm** The Gauss-Jordan method is used to find the inverse. The algorithm performs a sequence of elementary row-operations until it reduces the input matrix to an identity matrix. Applying the same sequence of elementary row-operations to an identity matrix yields the inverse matrix. If the input matrix is singular, then the algorithm terminates and returns error status RISCV\_MATH\_SINGULAR.



A is a 3 x 3 matrix and its inverse is X

#### **Functions**

riscv\_status riscv\_mat\_inverse\_f32 (const riscv\_matrix\_instance\_f32 \*pSrc, riscv\_matrix\_instance\_f32 \*pDst) \*pSrc,

Floating-point matrix inverse.

## **Parameters**

- pSrc [in] points to input matrix structure. The source matrix is modified by the function.
- pDst [out] points to output matrix structure

# Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed
- RISCV\_MATH\_SINGULAR: Input matrix is found to be singular (non-invertible)

```
riscv_status riscv_mat_inverse_f64 (const riscv_matrix_instance_f64 *pSrc, riscv_matrix_instance_f64 *pDst)
```

Floating-point (64 bit) matrix inverse.

Floating-point matrix inverse.

# **Parameters**

- pSrc [in] points to input matrix structure. The source matrix is modified by the function.
- pDst [out] points to output matrix structure

#### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed
- RISCV\_MATH\_SINGULAR : Input matrix is found to be singular (non-invertible)

```
riscv_status riscv_mat_solve_lower_triangular_f32 (const riscv_matrix_instance_f32 *lt, const riscv_matrix_instance_f32 *a, riscv_matrix_instance_f32 *dst)
```

Solve LT . X = A where LT is a lower triangular matrix.

#### **Parameters**

- 1t [in] The lower triangular matrix
- a [in] The matrix a
- dst [out] The solution X of LT . X = A

**Returns** The function returns RISCV\_MATH\_SINGULAR, if the system can't be solved.

```
riscv_status riscv_mat_solve_lower_triangular_f64 (const riscv_matrix_instance_f64 *lt, const riscv_matrix_instance_f64 *a, riscv_matrix_instance_f64 *dst)
```

Solve LT  $\cdot$  X = A where LT is a lower triangular matrix.

#### **Parameters**

- 1t [in] The lower triangular matrix
- a [in] The matrix a

• dst - [out] The solution X of LT . X = A

**Returns** The function returns RISCV\_MATH\_SINGULAR, if the system can't be solved.

riscv\_status riscv\_mat\_solve\_upper\_triangular\_f32 (const riscv\_matrix\_instance\_f32 \*ut, const riscv\_matrix\_instance\_f32 \*a, riscv\_matrix\_instance\_f32 \*dst)

Solve UT . X = A where UT is an upper triangular matrix.

## **Parameters**

- ut [in] The upper triangular matrix
- a [in] The matrix a
- dst [out] The solution X of UT . X = A

**Returns** The function returns RISCV\_MATH\_SINGULAR, if the system can't be solved.

riscv\_status riscv\_mat\_solve\_upper\_triangular\_f64 (const riscv\_matrix\_instance\_f64 \*ut, const riscv\_matrix\_instance\_f64 \*a, riscv\_matrix\_instance\_f64 \*dst)

Solve UT . X = A where UT is an upper triangular matrix.

#### **Parameters**

- ut [in] The upper triangular matrix
- **a** [in] The matrix a
- dst [out] The solution X of UT . X = A

**Returns** The function returns RISCV\_MATH\_SINGULAR, if the system can't be solved.

# **Matrix Multiplication**

```
riscv_status riscv_mat_mult_f32 (const
                                                 riscv_matrix_instance_f32
                                                                               *pSrcA,
                                                                                            const
                                    riscv_matrix_instance f32
                                                                           riscv matrix instance f32
                                                                *pSrcB,
                                    *pDst)
                                                 riscv matrix instance f64
                                                                               *pSrcA,
                                                                                            const
riscv status riscv mat mult f64 (const
                                    riscv matrix instance f64
                                                                *pSrcB,
                                                                           riscv matrix instance f64
                                    *pDst)
                                                          riscv_matrix_instance_q15
                                                                                            *pSrcA,
riscv_status riscv_mat_mult_fast_q15 (const
                                           const
                                                          riscv_matrix_instance_q15
                                                                                            *pSrcB,
                                           riscv_matrix_instance_q15 *pDst, q15_t *pState)
riscv_status riscv_mat_mult_fast_q31 (const
                                                          riscv_matrix_instance_q31
                                                                                            *pSrcA,
                                                          riscv matrix instance q31
                                                                                            *pSrcB,
                                           riscv_matrix_instance_q31 *pDst)
riscv_status riscv_mat_mult_q15 (const
                                                 riscv_matrix_instance_q15
                                                                               *pSrcA,
                                                                                            const
                                    riscv matrix instance q15
                                                                *pSrcB,
                                                                          riscv matrix instance q15
                                    *pDst, q15_t *pState)
riscv_status riscv_mat_mult_q31 (const
                                                 riscv matrix instance q31
                                                                               *pSrcA,
                                                                                            const
                                    riscv matrix instance q31
                                                                *pSrcB,
                                                                          riscv_matrix_instance_q31
                                    *pDst)
riscv_status riscv_mat_mult_q7 (const
                                                riscv matrix instance q7
                                                                              *pSrcA,
                                                                                            const
                                   riscv_matrix_instance_q7 *pSrcB, riscv_matrix_instance_q7 *pDst,
                                   q7_t *pState)
```

#### group MatrixMult

Multiplies two matrices.

Matrix multiplication is only defined if the number of columns of the first matrix equals the number of rows of the second matrix. Multiplying an  $M \times N$  matrix with an  $N \times P$  matrix results in an  $M \times P$  matrix. When matrix size checking is enabled, the functions check: (1) that the inner dimensions of pSrcA and pSrcB are equal; and (2) that the size of the output matrix equals the outer dimensions of pSrcA and pSrcB.

```
        a11
        a12
        a13
        b11
        b12
        b13
        a11xb11+a12xb21+a13xb31
        a11xb12+a12xb22+a13xb32
        a11xb13+a12xb23+a13xb33

        a21
        a22
        a23
        x
        b21
        b22
        b23
        =
        a21xb11+a22xb21+a23xb31
        a21xb12+a22xb22+a23xb32
        a21xb13+a22xb23+a23xb33

        a31
        a32
        a33
        b31
        b32
        b33
        b31xb11+a32xb21+a33xb31
        a31xb12+a32xb22+a33xb32
        a31xb13+a32xb23+a33xb33
```

## **Functions**

```
riscv_status riscv_mat_mult_f32 (const riscv_matrix_instance_f32 *pSrcA, const riscv_matrix_instance_f32 *pSrcB, riscv_matrix_instance_f32 *pDst)
```

Floating-point matrix multiplication.

#### **Parameters**

- \*pSrcA [in] points to the first input matrix structure
- \*pSrcB [in] points to the second input matrix structure
- \*pDst [out] points to output matrix structure

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH or RISCV\_MATH\_SUCCESS based on the outcome of size checking.

```
riscv_status riscv_mat_mult_f64 (const riscv_matrix_instance_f64 *pSrcA, const riscv_matrix_instance_f64 *pSrcB, riscv_matrix_instance_f64 *pDst)
```

Floating-point matrix multiplication.

## **Parameters**

- \*pSrcA [in] points to the first input matrix structure
- \*pSrcB [in] points to the second input matrix structure
- \*pDst [out] points to output matrix structure

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH or RISCV MATH SUCCESS based on the outcome of size checking.

```
riscv_status riscv_mat_mult_fast_q15 (const riscv_matrix_instance_q15 *pSrcA, const riscv_matrix_instance_q15 *pSrcB, riscv_matrix_instance_q15 *pDst, q15_t *pState)
```

Q15 matrix multiplication (fast variant).

Q15 matrix multiplication (fast variant) for RISC-V Core with DSP enabled.

**Scaling and Overflow Behavior** The difference between the function riscv\_mat\_mult\_q15() and this fast variant is that the fast variant use a 32-bit rather than a 64-bit accumulator. The result of each 1.15 x 1.15 multiplication is truncated to 2.30 format. These intermediate results are accumulated in a 32-bit register in 2.30 format. Finally, the accumulator is saturated and converted to a 1.15 result.

The fast version has the same overflow behavior as the standard version but provides less precision since it discards the low 16 bits of each multiplication result. In order to avoid overflows completely the input signals must be scaled down. Scale down one of the input matrices by log2(numColsA) bits to avoid overflows, as a total of numColsA additions are computed internally for each output element.

**Remark** Refer to riscv\_mat\_mult\_q15() for a slower implementation of this function which uses 64-bit accumulation to provide higher precision.

#### **Parameters**

- pSrcA [in] points to the first input matrix structure
- pSrcB [in] points to the second input matrix structure
- pDst [out] points to output matrix structure
- pState [in] points to the array for storing intermediate results

# Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

```
riscv_status riscv_mat_mult_fast_q31 (const riscv_matrix_instance_q31 *pSrcA, const riscv_matrix_instance_q31 riscv_matrix_instance_q31 *pDst) *pSrcA, *pSrcB,
```

- Q31 matrix multiplication (fast variant).
- Q31 matrix multiplication (fast variant) for RISC-V Core with DSP enabled.

Scaling and Overflow Behavior The difference between the function riscv\_mat\_mult\_q31() and this fast variant is that the fast variant use a 32-bit rather than a 64-bit accumulator. The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format. These intermediate results are accumulated in a 32-bit register in 2.30 format. Finally, the accumulator is saturated and converted to a 1.31 result.

The fast version has the same overflow behavior as the standard version but provides less precision since it discards the low 32 bits of each multiplication result. In order to avoid overflows completely the input signals must be scaled down. Scale down one of the input matrices by log2(numColsA) bits to avoid overflows, as a total of numColsA additions are computed internally for each output element.

**Remark** Refer to riscv\_mat\_mult\_q31() for a slower implementation of this function which uses 64-bit accumulation to provide higher precision.

#### **Parameters**

- pSrcA [in] points to the first input matrix structure
- pSrcB [in] points to the second input matrix structure
- pDst [out] points to output matrix structure

# Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

```
riscv_status riscv_mat_mult_q15 (const riscv_matrix_instance_q15 *pSrcA, const riscv_matrix_instance_q15 *pSrcB, riscv_matrix_instance_q15 *pDst, q15_t *pState)

Q15 matrix multiplication.
```

**Scaling and Overflow Behavior** The function is implemented using an internal 64-bit accumulator. The inputs to the multiplications are in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. This approach provides 33 guard bits and there is no risk of overflow. The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.

Refer to riscv\_mat\_mult\_fast\_q15() for a faster but less precise version of this function.

#### **Parameters**

- pSrcA [in] points to the first input matrix structure
- pSrcB [in] points to the second input matrix structure
- pDst [out] points to output matrix structure
- pState [in] points to the array for storing intermediate results (Unused)

# Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

```
riscv_status riscv_mat_mult_q31 (const riscv_matrix_instance_q31 *pSrcA, const riscv_matrix_instance_q31 *pSrcB, riscv_matrix_instance_q31 *pDst)
```

Q31 matrix multiplication.

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. There is no saturation on intermediate additions. Thus, if the accumulator overflows it wraps around and distorts the result. The input signals should be scaled down to avoid intermediate overflows. The input is thus scaled down by log2(numColsA) bits to avoid overflows, as a total of numColsA additions are performed internally. The 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.

**Remark** Refer to riscv\_mat\_mult\_fast\_q31() for a faster but less precise implementation of this function.

## **Parameters**

- pSrcA [in] points to the first input matrix structure
- pSrcB [in] points to the second input matrix structure
- pDst [out] points to output matrix structure

#### Returns execution status

- RISCV MATH SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

```
riscv_status riscv_mat_mult_q7 (const riscv_matrix_instance_q7 *pSrcA, const riscv_matrix_instance_q7 *pSrcB, riscv_matrix_instance_q7 *pDst, q7_t *pState)
```

Q7 matrix multiplication.

## **Scaling and Overflow Behavior:**

The function is implemented using a 32-bit internal accumulator saturated to 1.7 format.

#### **Parameters**

- \*pSrcA [in] points to the first input matrix structure
- \*pSrcB [in] points to the second input matrix structure
- \*pDst [out] points to output matrix structure
- \*pState [in] points to the array for storing intermediate results (Unused in some versions)

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH or RISCV\_MATH\_SUCCESS based on the outcome of size checking.

#### **Matrix Scale**

riscv\_status riscv\_mat\_scale\_f32 (const riscv\_matrix\_instance\_f32 \*pSrc, float32\_t scale, riscv\_matrix\_instance\_f32 \*pDst)

riscv\_status **riscv\_mat\_scale\_q15** (**const** riscv\_matrix\_instance\_q15 \*pSrc, q15\_t scaleFract, int32\_t shift, riscv\_matrix\_instance\_q15 \*pDst)

riscv\_status riscv\_mat\_scale\_q31 (const riscv\_matrix\_instance\_q31 \*pSrc, q31\_t scaleFract, int32\_t shift, riscv\_matrix\_instance\_q31 \*pDst)

### group MatrixScale

Multiplies a matrix by a scalar. This is accomplished by multiplying each element in the matrix by the scalar. For example:

The function checks to make sure that the input and output matrices are of the same size.



In the fixed-point Q15 and Q31 functions, scale is represented by a fractional multiplication scaleFract and an arithmetic shift shift. The shift allows the gain of the scaling operation to exceed 1.0. The overall scale factor applied to the fixed-point data is

# **Functions**

riscv\_status riscv\_mat\_scale\_f32 (const riscv\_matrix\_instance\_f32 \*pSrc, float32\_t scale, riscv\_matrix\_instance\_f32 \*pDst)

Floating-point matrix scaling.

# **Parameters**

- pSrc [in] points to input matrix
- scale [in] scale factor to be applied
- pDst [out] points to output matrix structure

Returns execution status

• RISCV\_MATH\_SUCCESS: Operation successful

• RISCV MATH SIZE MISMATCH: Matrix size check failed

riscv\_status riscv\_mat\_scale\_q15 (const riscv\_matrix\_instance\_q15 \*pSrc, q15\_t scaleFract, int32\_t shift, riscv\_matrix\_instance\_q15 \*pDst)

Q15 matrix scaling.

**Scaling and Overflow Behavior** The input data \*pSrc and scaleFract are in 1.15 format. These are multiplied to yield a 2.30 intermediate result and this is shifted with saturation to 1.15 format.

#### **Parameters**

- pSrc [in] points to input matrix
- scaleFract [in] fractional portion of the scale factor
- **shift** [in] number of bits to shift the result by
- pDst [out] points to output matrix structure

# Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

riscv\_status riscv\_mat\_scale\_q31 (const riscv\_matrix\_instance\_q31 \*pSrc, q31\_t scaleFract, int32\_t shift, riscv\_matrix\_instance\_q31 \*pDst)

Q31 matrix scaling.

**Scaling and Overflow Behavior** The input data \*pSrc and scaleFract are in 1.31 format. These are multiplied to yield a 2.62 intermediate result which is shifted with saturation to 1.31 format.

#### **Parameters**

- pSrc [in] points to input matrix
- scaleFract [in] fractional portion of the scale factor
- **shift** [in] number of bits to shift the result by
- pDst [out] points to output matrix structure

#### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

## **Matrix Subtraction**

```
riscv status riscv mat sub f32 (const
                                                riscv_matrix_instance_f32
                                                                               *pSrcA,
                                                                                            const
                                   riscv_matrix_instance_f32 *pSrcB, riscv_matrix_instance_f32 *pDst)
                                                riscv_matrix_instance_f64
                                                                               *pSrcA,
riscv_status riscv_mat_sub_f64 (const
                                                                                            const
                                   riscv_matrix_instance_f64 *pSrcB, riscv_matrix_instance_f64 *pDst)
                                                riscv_matrix_instance_q15
riscv_status riscv_mat_sub_q15 (const
                                                                               *pSrcA,
                                                                                            const
                                   riscv_matrix_instance_q15 *pSrcB, riscv_matrix_instance_q15 *pDst)
riscv_status riscv_mat_sub_q31 (const
                                                riscv_matrix_instance_q31
                                                                               *pSrcA,
                                                                                            const
                                   riscv_matrix_instance_q31 *pSrcB, riscv_matrix_instance_q31 *pDst)
```

## group MatrixSub

Subtract two matrices.

The functions check to make sure that pSrcA, pSrcB, and pDst have the same number of rows and columns.

# **Functions**

Floating-point matrix subtraction.

### **Parameters**

- pSrcA [in] points to the first input matrix structure
- pSrcB [in] points to the second input matrix structure
- pDst [out] points to output matrix structure

#### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

riscv\_status riscv\_mat\_sub\_f64 (const riscv\_matrix\_instance\_f64 \*pSrcA, const riscv\_matrix\_instance\_f64 \*pSrcB, riscv\_matrix\_instance\_f64 \*pDst)

Floating-point matrix subtraction.

### **Parameters**

- pSrcA [in] points to the first input matrix structure
- pSrcB [in] points to the second input matrix structure
- pDst [out] points to output matrix structure

### **Returns** execution status

- RISCV\_MATH\_SUCCESS : Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

riscv\_status riscv\_mat\_sub\_q15 (const riscv\_matrix\_instance\_q15 \*pSrcA, const riscv\_matrix\_instance\_q15 \*pSrcB, riscv\_matrix\_instance\_q15 \*pDst)

Q15 matrix subtraction.

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated.

# **Parameters**

- pSrcA [in] points to the first input matrix structure
- pSrcB [in] points to the second input matrix structure
- pDst [out] points to output matrix structure

## **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV MATH SIZE MISMATCH: Matrix size check failed

```
riscv_status riscv_mat_sub_q31 (const riscv_matrix_instance_q31 *pSrcA, const riscv_matrix_instance_q31 *pSrcB, riscv_matrix_instance_q31 *pDst)
```

Q31 matrix subtraction.

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q31 range [0x8000000 0x7FFFFFFF] are saturated.

## **Parameters**

- pSrcA [in] points to the first input matrix structure
- pSrcB [in] points to the second input matrix structure
- pDst [out] points to output matrix structure

## **Returns** execution status

- RISCV MATH SUCCESS: Operation successful
- RISCV MATH SIZE MISMATCH: Matrix size check failed

## **Matrix Transpose**

```
riscv_status riscv_mat_trans_f32 (const riscv_matrix_instance_f32 *pSrc, riscv_matrix_instance_f32 *pDst)

riscv_status riscv_mat_trans_f64 (const riscv_matrix_instance_f64 *pSrc, riscv_matrix_instance_f64 *pDst)

riscv_status riscv_mat_trans_f64 (const riscv_matrix_instance_q15 *pSrc, riscv_matrix_instance_q15 *pDst)
```

riscv\_status riscv\_mat\_trans\_q31 (const riscv\_matrix\_instance\_q31 \*pSrc, riscv\_matrix\_instance\_q31 \*pDst)

riscv\_status riscv\_mat\_trans\_q7 (const riscv\_matrix\_instance\_q7 \*pSrc, riscv\_matrix\_instance\_q7 \*pDst)

# group MatrixTrans

Tranposes a matrix.

Transposing an M x N matrix flips it around the center diagonal and results in an N x M matrix.

| <b>a</b> 11 | <b>a</b> 12 | <b>a</b> 13 | Т | <b>a</b> 11 | <b>a</b> 21 | <b>a</b> 31 |
|-------------|-------------|-------------|---|-------------|-------------|-------------|
| <b>a</b> 21 | <b>a</b> 22 | <b>a</b> 23 | = | <b>a</b> 12 | <b>a</b> 22 | <b>a</b> 32 |
| <b>a</b> 31 | <b>a</b> 32 | <b>a</b> 33 |   | <b>a</b> 13 | <b>a</b> 23 | <b>a</b> 33 |

#### **Functions**

riscv\_status riscv\_mat\_trans\_f32 (const riscv\_matrix\_instance\_f32 \*pSrc, riscv\_matrix\_instance\_f32 \*pDst)

Floating-point matrix transpose.

# Parameters

- pSrc [in] points to input matrix
- pDst [out] points to output matrix

### Returns execution status

- RISCV\_MATH\_SUCCESS : Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

riscv\_status riscv\_mat\_trans\_f64 (const riscv\_matrix\_instance\_f64 \*pSrc, riscv\_matrix\_instance\_f64 \*pDst) \*pSrc,

Floating-point matrix transpose.

# **Parameters**

- pSrc [in] points to input matrix
- pDst [out] points to output matrix

## Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

riscv\_status riscv\_mat\_trans\_q15 (const riscv\_matrix\_instance\_q15 \*pSrc, riscv\_matrix\_instance\_q15 \*pDst)

Q15 matrix transpose.

### **Parameters**

- pSrc [in] points to input matrix
- pDst [out] points to output matrix

# **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

riscv\_status riscv\_mat\_trans\_q31 (const riscv\_matrix\_instance\_q31 \*pSrc, riscv\_matrix\_instance\_q31 \*pDst) \*pSrc,

Q31 matrix transpose.

#### **Parameters**

- pSrc [in] points to input matrix
- pDst [out] points to output matrix

#### **Returns** execution status

- RISCV MATH SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

riscv\_status riscv\_mat\_trans\_q7 (const riscv\_matrix\_instance\_q7 \*pDst) \*pSrc,

Q7 matrix transpose.

#### **Parameters**

- pSrc [in] points to input matrix
- pDst [out] points to output matrix

#### Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_SIZE\_MISMATCH: Matrix size check failed

## group groupMatrix

This set of functions provides basic matrix math operations. The functions operate on matrix data structures. For example, the type definition for the floating-point matrix structure is shown below: There are similar definitions for Q15 and Q31 data types.

The structure specifies the size of the matrix and then points to an array of data. The array is of size numRows X numCols and the values are arranged in row order. That is, the matrix element (i, j) is stored at:

**Init Functions** There is an associated initialization function for each type of matrix data structure. The initialization function sets the values of the internal structure fields. Refer to riscv\_mat\_init\_f32(), riscv\_mat\_init\_q31() and riscv\_mat\_init\_q15() for floating-point, Q31 and Q15 types, respectively.

Use of the initialization function is optional. However, if initialization function is used then the instance structure cannot be placed into a const data section. To place the instance structure in a const data section, manually initialize the data structure. For example: where nRows specifies the number of rows, nColumns specifies the number of columns, and pData points to the data array.

Size Checking By default all of the matrix functions perform size checking on the input and output matrices. For example, the matrix addition function verifies that the two input matrices and the output matrix all have the same number of rows and columns. If the size check fails the functions return: Otherwise the functions return There is some overhead associated with this matrix size checking. The matrix size checking is enabled via the #define within the library project settings. By default this macro is defined and size checking is enabled. By changing the project settings and undefining this macro size checking is eliminated and the functions run a bit faster. With size checking disabled the functions always return RISCV\_MATH\_SUCCESS.

# 3.3.6 Transform Functions

# **Complex FFT Functions**

# **Complex FFT Tables**

```
const uint16_t riscvBitRevTable[1024]
const uint64_t twiddleCoefF64_16[32]
const uint64_t twiddleCoefF64_32[64]
const uint64_t twiddleCoefF64_64[128]
const uint64_t twiddleCoefF64_128[256]
const uint64_t twiddleCoefF64_256[512]
const uint64_t twiddleCoefF64_512[1024]
const uint64 t twiddleCoefF64 1024[2048]
const uint64_t twiddleCoefF64_2048[4096]
const uint64_t twiddleCoefF64_4096[8192]
const float32_t twiddleCoef_16[32]
const float32_t twiddleCoef_32[64]
const float32_t twiddleCoef_64[128]
const float32_t twiddleCoef_128[256]
const float32_t twiddleCoef_256[512]
const float32_t twiddleCoef_512[1024]
const float32_t twiddleCoef_1024[2048]
const float32_t twiddleCoef_2048[4096]
const float32 ttwiddleCoef 4096[8192]
const q31_t twiddleCoef_16_q31[24]
const q31_t twiddleCoef_32_q31[48]
const q31_t twiddleCoef_64_q31[96]
const q31_t twiddleCoef_128_q31[192]
const q31_t twiddleCoef_256_q31[384]
const q31_t twiddleCoef_512_q31[768]
const q31_t twiddleCoef_1024_q31[1536]
const q31_t twiddleCoef_2048_q31[3072]
const q31_t twiddleCoef_4096_q31[6144]
const q15_t twiddleCoef_16_q15[24]
const q15_t twiddleCoef_32_q15[48]
const q15_t twiddleCoef_64_q15[96]
const q15_t twiddleCoef_128_q15[192]
```

```
const q15_t twiddleCoef_256_q15[384]

const q15_t twiddleCoef_512_q15[768]

const q15_t twiddleCoef_1024_q15[1536]

const q15_t twiddleCoef_2048_q15[3072]

const q15_t twiddleCoef_4096_q15[6144]

group CFFT_CIFFT
```

### **Variables**

# const uint16\_t riscvBitRevTable[1024]

Table for bit reversal process.

Pseudo code for Generation of Bit reversal Table is

```
where N = 4096, log N2 = 12
```

N is the maximum FFT Size supported

# const uint64\_t twiddleCoefF64\_16[32]

Double Precision Floating-point Twiddle factors Table Generation.

Example code for Double Precision Floating-point Twiddle factors Generation:

```
where N = 16, PI = 3.14159265358979
```

Cos and Sin values are in interleaved fashion

```
const uint64_t twiddleCoefF64_32[64]
```

Example code for Double Precision Floating-point Twiddle factors Generation:

```
where N = 32, PI = 3.14159265358979
```

Cos and Sin values are in interleaved fashion

```
const uint64_t twiddleCoefF64_64[128]
```

Example code for Double Precision Floating-point Twiddle factors Generation:

```
where N = 64, PI = 3.14159265358979
```

Cos and Sin values are in interleaved fashion

```
const uint64_t twiddleCoefF64_128[256]
```

Example code for Double Precision Floating-point Twiddle factors Generation:

```
where N = 128, PI = 3.14159265358979
```

Cos and Sin values are in interleaved fashion

```
const uint64_t twiddleCoefF64_256[512]
```

Example code for Double Precision Floating-point Twiddle factors Generation:

```
where N = 256, PI = 3.14159265358979
```

Cos and Sin values are in interleaved fashion

## const uint64 ttwiddleCoefF64 512[1024]

Example code for Double Precision Floating-point Twiddle factors Generation:

```
where N = 512, PI = 3.14159265358979
```

Cos and Sin values are in interleaved fashion

# const uint64\_t twiddleCoefF64\_1024[2048]

Example code for Double Precision Floating-point Twiddle factors Generation:

```
where N = 1024, PI = 3.14159265358979
```

Cos and Sin values are in interleaved fashion

# const uint64\_t twiddleCoefF64\_2048[4096]

Example code for Double Precision Floating-point Twiddle factors Generation:

```
where N = 2048, PI = 3.14159265358979
```

Cos and Sin values are in interleaved fashion

# const uint64\_t twiddleCoefF64\_4096[8192]

Example code for Double Precision Floating-point Twiddle factors Generation:

```
where N = 4096, PI = 3.14159265358979
```

Cos and Sin values are in interleaved fashion

# const float32\_t twiddleCoef\_16[32]

Example code for Floating-point Twiddle factors Generation:

```
where N = 16, PI = 3.14159265358979
```

Cos and Sin values are in interleaved fashion

# ${\tt const}~{\tt float} 32\_{\tt t}~{\tt twiddleCoef\_32} [64]$

Example code for Floating-point Twiddle factors Generation:

where N = 32, PI = 3.14159265358979

Cos and Sin values are in interleaved fashion

const float32\_t twiddleCoef\_64[128]

Example code for Floating-point Twiddle factors Generation:

where N = 64, PI = 3.14159265358979

Cos and Sin values are in interleaved fashion

const float32 t twiddleCoef 128[256]

Example code for Floating-point Twiddle factors Generation:

where N = 128, PI = 3.14159265358979

Cos and Sin values are in interleaved fashion

const float32\_t twiddleCoef\_256[512]

Example code for Floating-point Twiddle factors Generation:

where N = 256, PI = 3.14159265358979

Cos and Sin values are in interleaved fashion

const float32\_t twiddleCoef\_512[1024]

Example code for Floating-point Twiddle factors Generation:

where N = 512, PI = 3.14159265358979

Cos and Sin values are in interleaved fashion

const float32\_t twiddleCoef\_1024[2048]

Example code for Floating-point Twiddle factors Generation:

where N = 1024, PI = 3.14159265358979

Cos and Sin values are in interleaved fashion

const float32\_t twiddleCoef\_2048[4096]

Example code for Floating-point Twiddle factors Generation:

where N = 2048, PI = 3.14159265358979

Cos and Sin values are in interleaved fashion

const float32\_t twiddleCoef\_4096[8192]

Example code for Floating-point Twiddle factors Generation:

where N = 4096, PI = 3.14159265358979

Cos and Sin values are in interleaved fashion

 $\verb"const" q31_t" \verb"twiddleCoef_16_q31[24]"$ 

Q31 Twiddle factors Table.

```
Example code for Q31 Twiddle factors Generation::
      where N = 16, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to Q31(Fixed point 1.31): round(twiddleCoefQ31(i) * pow(2, 31))
const q31_t twiddleCoef_32_q31[48]
      Example code for Q31 Twiddle factors Generation::
      where N = 32, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to Q31(Fixed point 1.31): round(twiddleCoefQ31(i) * pow(2, 31))
const q31_t twiddleCoef_64_q31[96]
      Example code for Q31 Twiddle factors Generation::
      where N = 64, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to Q31(Fixed point 1.31): round(twiddleCoefQ31(i) * pow(2, 31))
const q31_t twiddleCoef_128_q31[192]
      Example code for Q31 Twiddle factors Generation::
      where N = 128, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to Q31(Fixed point 1.31): round(twiddleCoefQ31(i) * pow(2, 31))
const q31_t twiddleCoef_256_q31[384]
      Example code for Q31 Twiddle factors Generation::
      where N = 256, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to Q31(Fixed point 1.31): round(twiddleCoefQ31(i) * pow(2, 31))
const q31_t twiddleCoef_512_q31[768]
      Example code for Q31 Twiddle factors Generation::
      where N = 512, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to Q31(Fixed point 1.31): round(twiddleCoefQ31(i) * pow(2, 31))
const q31_t twiddleCoef_1024_q31[1536]
```

```
Example code for Q31 Twiddle factors Generation::
      where N = 1024, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to Q31(Fixed point 1.31): round(twiddleCoefQ31(i) * pow(2, 31))
\verb"const" q31_t" \verb"twiddleCoef_2048_q31[3072]"
      Example code for Q31 Twiddle factors Generation::
      where N = 2048, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to Q31(Fixed point 1.31): round(twiddleCoefQ31(i) * pow(2, 31))
const q31_t twiddleCoef_4096_q31[6144]
      Example code for Q31 Twiddle factors Generation::
      where N = 4096, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to Q31(Fixed point 1.31): round(twiddleCoefQ31(i) * pow(2, 31))
const q15_t twiddleCoef_16_q15[24]
     q15 Twiddle factors Table
      Example code for q15 Twiddle factors Generation::
      where N = 16, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to q15(Fixed point 1.15): round(twiddleCoefq15(i) * pow(2, 15))
const q15_t twiddleCoef_32_q15[48]
      Example code for q15 Twiddle factors Generation::
      where N = 32, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to q15(Fixed point 1.15): round(twiddleCoefq15(i) * pow(2, 15))
const q15_t twiddleCoef_64_q15[96]
      Example code for q15 Twiddle factors Generation::
      where N = 64, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to q15(Fixed point 1.15): round(twiddleCoefq15(i) * pow(2, 15))
```

```
const q15_t twiddleCoef_128_q15[192]
      Example code for q15 Twiddle factors Generation::
      where N = 128, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to q15(Fixed point 1.15): round(twiddleCoefq15(i) * pow(2, 15))
const q15_t twiddleCoef_256_q15[384]
      Example code for q15 Twiddle factors Generation::
      where N = 256, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to q15(Fixed point 1.15): round(twiddleCoefq15(i) * pow(2, 15))
const q15_t twiddleCoef_512_q15[768]
      Example code for q15 Twiddle factors Generation::
      where N = 512, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to q15(Fixed point 1.15): round(twiddleCoefq15(i) * pow(2, 15))
const q15_t twiddleCoef_1024_q15[1536]
      Example code for q15 Twiddle factors Generation::
      where N = 1024, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to q15(Fixed point 1.15): round(twiddleCoefq15(i) * pow(2, 15))
const q15_t twiddleCoef_2048_q15[3072]
      Example code for q15 Twiddle factors Generation::
      where N = 2048, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
      Convert Floating point to q15(Fixed point 1.15): round(twiddleCoefq15(i) * pow(2, 15))
const q15_t twiddleCoef_4096_q15[6144]
      Example code for q15 Twiddle factors Generation::
      where N = 4096, PI = 3.14159265358979
      Cos and Sin values are interleaved fashion
```

3.3. NMSIS DSP API 467

Convert Floating point to q15(Fixed point 1.15): round(twiddleCoefq15(i) \* pow(2, 15))

```
void riscv_cfft_f32 (const riscv_cfft_instance_f32 *S, float32_t *p1, uint8_t ifftFlag, uint8_t bitRe-
                        verseFlag)
void riscv_cfft_f64 (const riscv_cfft_instance_f64 *S, float64_t *p1, uint8_t ifftFlag, uint8_t bitRe-
                        verseFlag)
riscv status riscv cfft init f32 (riscv cfft instance f32 *S, uint16 t fftLen)
riscv_status riscv_cfft_init_f64 (riscv_cfft_instance_f64 *S, uint16_t fftLen)
riscv_status riscv_cfft_init_q15 (riscv_cfft_instance_q15 *S, uint16_t fftLen)
riscv_status riscv_cfft_init_q31 (riscv_cfft_instance_q31 *S, uint16_t fftLen)
void riscv_cfft_q15 (const riscv_cfft_instance_q15 *S, q15_t *p1, uint8_t ifftFlag, uint8_t bitReverse-
void riscv_cfft_q31 (const riscv_cfft_instance_q31 *S, q31_t *p1, uint8_t ifftFlag, uint8_t bitReverse-
void riscv cfft radix2 f32 (const riscv cfft radix2 instance f32 *S, float32 t *pSrc)
riscv_status riscv_cfft_radix2_init_f32 (riscv_cfft_radix2_instance_f32 *S, uint16_t fftLen, uint8_t
                                               ifftFlag, uint8_t bitReverseFlag)
riscv_status riscv_cfft_radix2_init_q15 (riscv_cfft_radix2_instance_q15 *S, uint16_t fftLen,
                                               uint8_t ifftFlag, uint8_t bitReverseFlag)
riscv_status riscv_cfft_radix2_init_q31 (riscv_cfft_radix2_instance_q31
                                                                             *S, uint16_t fftLen,
                                               uint8_t ifftFlag, uint8_t bitReverseFlag)
void riscv_cfft_radix2_q15 (const riscv_cfft_radix2_instance_q15 *S, q15_t *pSrc)
void riscv_cfft_radix2_q31 (const riscv_cfft_radix2_instance_q31 *S, q31_t *pSrc)
void riscv_cfft_radix4_f32 (const riscv_cfft_radix4_instance_f32 *S, float32_t *pSrc)
riscv_status riscv_cfft_radix4_init_f32 (riscv_cfft_radix4_instance_f32 *S, uint16_t fftLen, uint8_t
                                               ifftFlag, uint8 t bitReverseFlag)
riscv_status riscv_cfft_radix4_init_q15 (riscv_cfft_radix4_instance_q15 *S, uint16_t fftLen,
                                               uint8_t ifftFlag, uint8_t bitReverseFlag)
riscv_status riscv_cfft_radix4_init_q31 (riscv_cfft_radix4_instance_q31 *S, uint16_t fftLen,
                                               uint8_t ifftFlag, uint8_t bitReverseFlag)
void riscv_cfft_radix4_q15 (const riscv_cfft_radix4_instance_q15 *S, q15_t *pSrc)
void riscv_cfft_radix4_q31 (const riscv_cfft_radix4_instance_q31 *S, q31_t *pSrc)
group ComplexFFT
```

The Fast Fourier Transform (FFT) is an efficient algorithm for computing the Discrete Fourier Transform (DFT). The FFT can be orders of magnitude faster than the DFT, especially for long lengths. The algorithms described in this section operate on complex data. A separate set of functions is devoted to handling of real sequences.

There are separate algorithms for handling floating-point, Q15, and Q31 data types. The algorithms available for each data type are described next.

The FFT functions operate in-place. That is, the array holding the input data will also be used to hold the corresponding result. The input data is complex and contains 2\*fftLen interleaved values as shown below. The FFT result will be contained in the same array and the frequency domain values will have the same interleaving.

**Floating-point** The floating-point complex FFT uses a mixed-radix algorithm. Multiple radix-8 stages are performed along with a single radix-2 or radix-4 stage, as needed. The algorithm supports lengths of [16, 32, 64, ..., 4096] and each length uses a different twiddle factor table.

- The function uses the standard FFT definition and output values may grow by a factor of fftLen when computing the forward transform. The inverse transform includes a scale of 1/fftLen as part of the calculation and this matches the textbook definition of the inverse FFT.
- For the MVE version, the new riscv\_cfft\_init\_f32 initialization function is **mandatory**. **Compilation flags are available to include only the required tables for the needed FFTs.** Other FFT versions can continue to be initialized as explained below.

For not MVE versions, pre-initialized data structures containing twiddle factors and bit reversal tables are provided and defined in riscv\_const\_structs.h. Include this header in your function and then pass one of the constant structures as an argument to riscv\_cfft\_f32. For example:

```
riscv_cfft_f32(riscv_cfft_sR_f32_len64, pSrc, 1, 1)
```

computes a 64-point inverse complex FFT including bit reversal. The data structures are treated as constant data and not modified during the calculation. The same data structure can be reused for multiple transforms including mixing forward and inverse transforms.

Earlier releases of the library provided separate radix-2 and radix-4 algorithms that operated on floating-point data. These functions are still provided but are deprecated. The older functions are slower and less general than the new functions.

An example of initialization of the constants for the riscv\_cfft\_f32 function follows:

```
const static riscv_cfft_instance_f32 *S;
 switch (length) {
   case 16:
      S = &riscv_cfft_sR_f32_len16;
     break:
    case 32:
     S = &riscv_cfft_sR_f32_len32;
     break;
    case 64:
      S = &riscv_cfft_sR_f32_len64;
     break:
    case 128:
      S = &riscv_cfft_sR_f32_len128;
    case 256:
      S = \&riscv_cfft_sR_f32_len256;
     break;
    case 512:
      S = &riscv_cfft_sR_f32_len512;
     break;
    case 1024:
      S = \&riscv_cfft_sR_f32_len1024;
     break;
    case 2048:
      S = &riscv_cfft_sR_f32_len2048;
     break;
    case 4096:
      S = \&riscv_cfft_sR_f32_len4096;
      break;
```

The new riscv\_cfft\_init\_f32 can also be used.

Q15 and Q31 The floating-point complex FFT uses a mixed-radix algorithm. Multiple radix-4 stages are performed along with a single radix-2 stage, as needed. The algorithm supports lengths of [16, 32, 64, ...,

4096] and each length uses a different twiddle factor table.

The function uses the standard FFT definition and output values may grow by a factor of fftLen when computing the forward transform. The inverse transform includes a scale of 1/fftLen as part of the calculation and this matches the textbook definition of the inverse FFT.

Pre-initialized data structures containing twiddle factors and bit reversal tables are provided and defined in riscv\_const\_structs.h. Include this header in your function and then pass one of the constant structures as an argument to riscv\_cfft\_q31. For example:

```
riscv_cfft_q31(riscv_cfft_sR_q31_len64, pSrc, 1, 1)
```

computes a 64-point inverse complex FFT including bit reversal. The data structures are treated as constant data and not modified during the calculation. The same data structure can be reused for multiple transforms including mixing forward and inverse transforms.

Earlier releases of the library provided separate radix-2 and radix-4 algorithms that operated on floating-point data. These functions are still provided but are deprecated. The older functions are slower and less general than the new functions.

An example of initialization of the constants for the riscv\_cfft\_q31 function follows:

```
const static riscv_cfft_instance_q31 *S;
 switch (length) {
   case 16:
      S = &riscv_cfft_sR_q31_len16;
     break;
   case 32:
      S = &riscv_cfft_sR_q31_len32;
     break;
   case 64:
      S = &riscv_cfft_sR_q31_len64;
     break;
    case 128:
      S = &riscv_cfft_sR_q31_len128;
     break;
    case 256:
      S = &riscv_cfft_sR_q31_len256;
     break;
    case 512:
      S = \&riscv_cfft_sR_q31_len512;
     break;
    case 1024:
      S = \&riscv_cfft_sR_q31_len1024;
     break;
    case 2048:
      S = \&riscv_cfft_sR_q31_len2048;
     break:
    case 4096:
      S = \&riscv_cfft_sR_q31_len4096;
  }
```

# **Functions**

void **riscv\_cfft\_f32** (**const** riscv\_cfft\_instance\_f32 \*S, float32\_t \*p1, uint8\_t ifftFlag, uint8\_t bitReverseFlag)

Processing function for the floating-point complex FFT.

### **Parameters**

- S [in] points to an instance of the floating-point CFFT structure
- p1 [inout] points to the complex data buffer of size 2\*fftLen. Processing occurs in-place
- ifftFlag [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

#### Returns none

void **riscv\_cfft\_f64** (**const** riscv\_cfft\_instance\_f64 \*S, float64\_t \*p1, uint8\_t ifftFlag, uint8\_t bitReverseFlag)

Processing function for the Double Precision floating-point complex FFT.

### **Parameters**

- S [in] points to an instance of the Double Precision floating-point CFFT structure
- p1 [inout] points to the complex data buffer of size 2\*fftLen. Processing occurs in-place
- ifftFlag [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

## Returns none

```
riscv_status riscv_cfft_init_f32 (riscv_cfft_instance_f32 *S, uint16_t fftLen)
Initialization function for the cfft f32 function.
```

Use of this function is mandatory only for the MVE version of the FFT. Other versions can still initialize directly the data structure using variables declared in riscv\_const\_structs.h

# **Parameters**

- **S** [inout] points to an instance of the floating-point CFFT structure
- **fftLen** [in] fft length (number of complex samples)

#### **Returns** execution status

• RISCV\_MATH\_SUCCESS: Operation successful

RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

riscv\_status riscv\_cfft\_init\_f64 (riscv\_cfft\_instance\_f64 \*S, uint16\_t fftLen)
Initialization function for the cfft f64 function.

**Use of this function is mandatory only for the MVE version of the FFT.** Other versions can still initialize directly the data structure using variables declared in riscv\_const\_structs.h

#### **Parameters**

- S [inout] points to an instance of the floating-point CFFT structure
- **fftLen** [in] fft length (number of complex samples)

#### **Returns** execution status

- RISCV\_MATH\_SUCCESS : Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

riscv\_status **riscv\_cfft\_init\_q15** (riscv\_cfft\_instance\_q15 \*S, uint16\_t fftLen) Initialization function for the cfft q15 function.

Use of this function is mandatory only for the MVE version of the FFT. Other versions can still initialize directly the data structure using variables declared in riscv\_const\_structs.h

#### **Parameters**

- **S** [inout] points to an instance of the floating-point CFFT structure
- fftLen [in] fft length (number of complex samples)

# **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

riscv\_status **riscv\_cfft\_init\_q31** (riscv\_cfft\_instance\_q31 \*S, uint16\_t fftLen) Initialization function for the cfft q31 function.

Use of this function is mandatory only for the MVE version of the FFT. Other versions can still initialize directly the data structure using variables declared in riscv\_const\_structs.h

#### **Parameters**

- S [inout] points to an instance of the floating-point CFFT structure
- **fftLen [in]** fft length (number of complex samples)

# Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

void **riscv\_cfft\_q15** (**const** riscv\_cfft\_instance\_q15 \*S, q15\_t \*p1, uint8\_t ifftFlag, uint8\_t bitRe-verseFlag)

Processing function for Q15 complex FFT.

# **Parameters**

• S – [in] points to an instance of Q15 CFFT structure

- p1 [inout] points to the complex data buffer of size 2\*fftLen. Processing occurs in-place
- ifftFlag [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

#### Returns none

void **riscv\_cfft\_q31** (**const** riscv\_cfft\_instance\_q31 \*S, q31\_t \*p1, uint8\_t ifftFlag, uint8\_t bitReverseFlag)

Processing function for the Q31 complex FFT.

# **Parameters**

- **S [in]** points to an instance of the fixed-point CFFT structure
- p1 [inout] points to the complex data buffer of size 2\*fftLen. Processing occurs in-place
- ifftFlag [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

## Returns none

void **riscv\_cfft\_radix2\_f32** (**const** riscv\_cfft\_radix2\_instance\_f32 \*S, float32\_t \*pSrc) Radix-2 CFFT/CIFFT.

Deprecated:

Do not use this function. It has been superseded by riscv\_cfft\_f32 and will be removed in the future

## **Parameters**

- S [in] points to an instance of the floating-point Radix-2 CFFT/CIFFT structure
- pSrc [inout] points to the complex data buffer of size 2\*fftLen. Processing occurs in-place

## Returns none

riscv\_status **riscv\_cfft\_radix2\_init\_f32** (riscv\_cfft\_radix2\_instance\_f32 \*S, uint16\_t fftLen, uint8\_t ifftFlag, uint8\_t bitReverseFlag)

Initialization function for the floating-point CFFT/CIFFT.

Deprecated:

Do not use this function. It has been superseded by riscv\_cfft\_f32 and will be removed in the future.

**Details** The parameter ifftFlag controls whether a forward or inverse transform is computed. Set(=1) ifftFlag for calculation of CIFFT otherwise CFFT is calculated

The parameter bitReverseFlag controls whether output is in normal order or bit reversed order. Set(=1) bitReverseFlag for output to be in normal order otherwise output is in bit reversed order.

The parameter fftLen Specifies length of CFFT/CIFFT process. Supported FFT Lengths are 16, 64, 256, 1024.

This Function also initializes Twiddle factor table pointer and Bit reversal table pointer.

#### **Parameters**

- S [inout] points to an instance of the floating-point CFFT/CIFFT structure
- fftLen [in] length of the FFT
- ifftFlag [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: fftLen is not a supported length

riscv\_status riscv\_cfft\_radix2\_init\_q15 (riscv\_cfft\_radix2\_instance\_q15 \*S, uint16\_t fftLen, uint8\_t ifftFlag, uint8\_t bitReverseFlag)

Initialization function for the Q15 CFFT/CIFFT.

Deprecated:

Do not use this function. It has been superseded by riscv\_cfft\_q15 and will be removed

**Details** The parameter ifftFlag controls whether a forward or inverse transform is computed. Set(=1) ifftFlag for calculation of CIFFT otherwise CFFT is calculated

The parameter bitReverseFlag controls whether output is in normal order or bit reversed order. Set(=1) bitReverseFlag for output to be in normal order otherwise output is in bit reversed order.

The parameter fftLen Specifies length of CFFT/CIFFT process. Supported FFT Lengths are 16, 64, 256, 1024.

This Function also initializes Twiddle factor table pointer and Bit reversal table pointer.

### **Parameters**

- **S** [inout] points to an instance of the Q15 CFFT/CIFFT structure.
- **fftLen [in]** length of the FFT.
- ifftFlag [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

#### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: fftLen is not a supported length

riscv\_status **riscv\_cfft\_radix2\_init\_q31** (riscv\_cfft\_radix2\_instance\_q31 \*S, uint16\_t fftLen, uint8\_t ifftFlag, uint8\_t bitReverseFlag)

Initialization function for the Q31 CFFT/CIFFT.

Deprecated:

Do not use this function. It has been superseded by riscy cfft q31 and will be removed in the future.

**Details** The parameter ifftFlag controls whether a forward or inverse transform is computed. Set(=1) ifftFlag for calculation of CIFFT otherwise CFFT is calculated

The parameter bitReverseFlag controls whether output is in normal order or bit reversed order. Set(=1) bitReverseFlag for output to be in normal order otherwise output is in bit reversed order.

The parameter fftLen Specifies length of CFFT/CIFFT process. Supported FFT Lengths are 16, 64, 256, 1024.

This Function also initializes Twiddle factor table pointer and Bit reversal table pointer.

#### **Parameters**

- **S [inout]** points to an instance of the Q31 CFFT/CIFFT structure
- fftLen [in] length of the FFT
- ifftFlag [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

## Returns execution status

- RISCV\_MATH\_SUCCESS : Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: fftLen is not a supported length

void **riscv\_cfft\_radix2\_q15** (**const** riscv\_cfft\_radix2\_instance\_q15 \*S, q15\_t \*pSrc) Processing function for the fixed-point CFFT/CIFFT.

Deprecated:

Do not use this function. It has been superseded by riscv\_cfft\_q15 and will be removed in the future.

# **Parameters**

- S [in] points to an instance of the fixed-point CFFT/CIFFT structure
- pSrc [inout] points to the complex data buffer of size 2\*fftLen. Processing occurs in-place

Returns none

void **riscv\_cfft\_radix2\_q31** (**const** riscv\_cfft\_radix2\_instance\_q31 \*S, q31\_t \*pSrc) Processing function for the fixed-point CFFT/CIFFT.

## Deprecated:

Do not use this function. It has been superseded by riscv\_cfft\_q31 and will be removed in the future.

## **Parameters**

- **S** [in] points to an instance of the fixed-point CFFT/CIFFT structure
- pSrc [inout] points to the complex data buffer of size 2\*fftLen. Processing occurs in-place

### Returns none

void **riscv\_cfft\_radix4\_f32** (**const** riscv\_cfft\_radix4\_instance\_f32 \*S, float32\_t \*pSrc)
Processing function for the floating-point Radix-4 CFFT/CIFFT.

## Deprecated:

Do not use this function. It has been superseded by riscv\_cfft\_f32 and will be removed in the future.

### **Parameters**

- S [in] points to an instance of the floating-point Radix-4 CFFT/CIFFT structure
- pSrc [inout] points to the complex data buffer of size 2\*fftLen. Processing occurs in-place

#### Returns none

```
riscv_status riscv_cfft_radix4_init_f32 (riscv_cfft_radix4_instance_f32 *S, uint16_t fftLen, uint8_t ifftFlag, uint8_t bitReverseFlag)

Initialization function for the floating-point CFFT/CIFFT.
```

#### Deprecated:

Do not use this function. It has been superceded by riscv\_cfft\_f32 and will be removed in the future.

**Details** The parameter ifftFlag controls whether a forward or inverse transform is computed. Set(=1) ifftFlag for calculation of CIFFT otherwise CFFT is calculated

The parameter bitReverseFlag controls whether output is in normal order or bit reversed order. Set(=1) bitReverseFlag for output to be in normal order otherwise output is in bit reversed order.

The parameter fftLen Specifies length of CFFT/CIFFT process. Supported FFT Lengths are 16, 64, 256, 1024.

This Function also initializes Twiddle factor table pointer and Bit reversal table pointer.

#### **Parameters**

- **S** [inout] points to an instance of the floating-point CFFT/CIFFT structure
- **fftLen** [in] length of the FFT
- ifftFlag [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

# **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: fftLen is not a supported length

riscv\_status riscv\_cfft\_radix4\_init\_q15 (riscv\_cfft\_radix4\_instance\_q15 \*S, uint16\_t fftLen, uint8\_t ifftFlag, uint8\_t bitReverseFlag)

Initialization function for the Q15 CFFT/CIFFT.

Deprecated:

Do not use this function. It has been superseded by riscv\_cfft\_q15 and will be removed in the future.

**Details** The parameter ifftFlag controls whether a forward or inverse transform is computed. Set(=1) ifftFlag for calculation of CIFFT otherwise CFFT is calculated

The parameter bitReverseFlag controls whether output is in normal order or bit reversed order. Set(=1) bitReverseFlag for output to be in normal order otherwise output is in bit reversed order.

The parameter fftLen Specifies length of CFFT/CIFFT process. Supported FFT Lengths are 16, 64, 256, 1024.

This Function also initializes Twiddle factor table pointer and Bit reversal table pointer.

#### **Parameters**

- **S** [inout] points to an instance of the Q15 CFFT/CIFFT structure
- **fftLen** [in] length of the FFT
- ifftFlag [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: fftLen is not a supported length

riscv\_status riscv\_cfft\_radix4\_init\_q31 (riscv\_cfft\_radix4\_instance\_q31 \*S, uint16\_t fftLen, uint8\_t ifftFlag, uint8\_t bitReverseFlag)

Initialization function for the Q31 CFFT/CIFFT.

Deprecated:

Do not use this function. It has been superseded by riscv\_cfft\_q31 and will be removed in the future.

**Details** The parameter ifftFlag controls whether a forward or inverse transform is computed. Set(=1) ifftFlag for calculation of CIFFT otherwise CFFT is calculated

The parameter bitReverseFlag controls whether output is in normal order or bit reversed order. Set(=1) bitReverseFlag for output to be in normal order otherwise output is in bit reversed order.

The parameter fftLen Specifies length of CFFT/CIFFT process. Supported FFT Lengths are 16, 64, 256, 1024.

This Function also initializes Twiddle factor table pointer and Bit reversal table pointer.

### **Parameters**

- S [inout] points to an instance of the Q31 CFFT/CIFFT structure.
- fftLen [in] length of the FFT.
- ifftFlag [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

#### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: fftLen is not a supported length

void **riscv\_cfft\_radix4\_q15** (**const** riscv\_cfft\_radix4\_instance\_q15 \*S, q15\_t \*pSrc) Processing function for the Q15 CFFT/CIFFT.

Deprecated:

Do not use this function. It has been superseded by riscv\_cfft\_q15 and will be removed in the future.

**Input and output formats:** Internally input is downscaled by 2 for every stage to avoid saturations inside CFFT/CIFFT process. Hence the output format is different for different FFT sizes. The input and output formats for different FFT sizes and number of bits to upscale are mentioned in the tables below for CFFT and CIFFT:

| CFFT Size | Input format | Output Format | Number of bits to |
|-----------|--------------|---------------|-------------------|
|           |              |               | upscale           |
| 16        | 1.15         | 5.11          | 4                 |
| 64        | 1.15         | 7.9           | 6                 |
| 256       | 1.15         | 9.7           | 8                 |
| 1024      | 1.15         | 11.5          | 10                |

|   | CIFFT Size |
|---|------------|
|   | 16         |
|   | 64         |
| 1 | 256        |
|   | 1024       |

# **Parameters**

- **S [in]** points to an instance of the Q15 CFFT/CIFFT structure.
- pSrc [inout] points to the complex data buffer. Processing occurs in-place.

#### Returns none

void **riscv\_cfft\_radix4\_q31** (**const** riscv\_cfft\_radix4\_instance\_q31 \*S, q31\_t \*pSrc) Processing function for the Q31 CFFT/CIFFT.

Deprecated:

Do not use this function. It has been superseded by riscv\_cfft\_q31 and will be removed in the future.

**Input and output formats:** Internally input is downscaled by 2 for every stage to avoid saturations inside CFFT/CIFFT process. Hence the output format is different for different FFT sizes. The input and output formats for different FFT sizes and number of bits to upscale are mentioned in the tables below for CFFT and CIFFT:

CIFFT Size

| CFFT Size | Input format | Output Format | Number of bits to upscale |
|-----------|--------------|---------------|---------------------------|
| 16        | 1.31         | 5.27          | 4                         |
| 64        | 1.31         | 7.25          | 6                         |
| 256       | 1.31         | 9.23          | 8                         |
| 1024      | 1.31         | 11.21         | 10                        |

# **Parameters**

- S [in] points to an instance of the Q31 CFFT/CIFFT structure
- pSrc [inout] points to the complex data buffer of size 2\*fftLen. Processing occurs in-place

Returns none

# **DCT Type IV Functions**

# **DCT Type IV Tables**

```
const float32_t Weights_128[256]
const float32_t cos_factors_128[128]
const float32_t Weights_512[1024]
const float32_t cos_factors_512[512]
const float32_t Weights_2048[4096]
const float32_t cos_factors_2048[2048]
const float32_t Weights_8192[16384]
const float32_t cos_factors_8192[8192]
const q31_t WeightsQ31_128[256]
const q31_t cos_factorsQ31_128[128]
\verb"const" q31_t WeightsQ31_512[1024]
const q31_t cos_factorsQ31_512[512]
const q31_t WeightsQ31_2048[4096]
\verb"const" q31_t" \verb"cos_factorsQ31_2048" [2048]
const q31_t WeightsQ31_8192[16384]
const q31_t cos_factorsQ31_8192[8192]
group DCT4_IDCT4_Table
     end of RealFFT_Table group
```

### **Variables**

```
const float32_t Weights_128[256]
     Weights Table.
      Weights tables are generated using the formula:
      C command to generate the table
      where N is the Number of weights to be calculated and c is pi/(2*N)
      In the tables below the real and imaginary values are placed alternatively, hence the array length is 2 * N.
      cosFactor tables are generated using the formula:
      C command to generate the table
      where N is the number of factors to generate and c is pi/(2*N)
const float32_t cos_factors_128[128]
const float32_t Weights_512[1024]
const float32_t cos_factors_512[512]
const float32_t Weights_2048[4096]
const float32_t cos_factors_2048[2048]
const float32_t Weights_8192[16384]
const float32_t cos_factors_8192[8192]
const q31_t WeightsQ31_128[256]
      Weights tables are generated using the formula:
      C command to generate the table
      where N is the Number of weights to be calculated and c is pi/(2*N)
      Convert the output to q31 format by multiplying with 2<sup>3</sup>1 and saturated if required.
      In the tables below the real and imaginary values are placed alternatively, hence the array length is 2 *N.
      cosFactor tables are generated using the formula:
      C command to generate the table
      where N is the number of factors to generate and c is pi/(2*N)
      Then converted to q31 format by multiplying with 2<sup>31</sup> and saturated if required.
const q31_t cos_factorsQ31_128[128]
const q31_t WeightsQ31_512[1024]
const q31_t cos_factorsQ31_512[512]
const q31_t WeightsQ31_2048[4096]
const q31_t cos_factorsQ31_2048[2048]
const q31 t WeightsQ31 8192[16384]
const q31_t cos_factorsQ31_8192[8192]
```

void riscv\_dct4\_f32 (const riscv\_dct4\_instance\_f32 \*S, float32\_t \*pState, float32\_t \*pInlineBuffer)

riscv\_status **riscv\_dct4\_init\_f32** (riscv\_dct4\_instance\_f32 \*S, riscv\_rfft\_instance\_f32 \*S\_RFFT, riscv\_cfft\_radix4\_instance\_f32 \*S\_CFFT, uint16\_t Nby2, float32\_t normalize)

riscv\_status **riscv\_dct4\_init\_q15** (riscv\_dct4\_instance\_q15 \*\*S, riscv\_rfft\_instance\_q15 \*\*S\_RFFT, riscv\_cfft\_radix4\_instance\_q15 \*\*S\_CFFT, uint16\_t Nby2, q15\_t normalize)

riscv\_status **riscv\_dct4\_init\_q31** (riscv\_dct4\_instance\_q31 \*S, riscv\_rfft\_instance\_q31 \*S\_RFFT, riscv\_cfft\_radix4\_instance\_q31 \*S\_CFFT, uint16\_t Nby2, q31\_t normalize)

void riscv dct4 q15 (const riscv dct4 instance q15 \*S, q15 t \*pState, q15 t \*pInlineBuffer)

void riscv\_dct4\_q31 (const riscv\_dct4\_instance\_q31 \*S, q31\_t \*pState, q31\_t \*pInlineBuffer)

# group DCT4\_IDCT4

Representation of signals by minimum number of values is important for storage and transmission. The possibility of large discontinuity between the beginning and end of a period of a signal in DFT can be avoided by extending the signal so that it is even-symmetric. Discrete Cosine Transform (DCT) is constructed such that its energy is heavily concentrated in the lower part of the spectrum and is very widely used in signal and image coding applications. The family of DCTs (DCT type- 1,2,3,4) is the outcome of different combinations of homogeneous boundary conditions. DCT has an excellent energy-packing capability, hence has many applications and in data compression in particular.

DCT is essentially the Discrete Fourier Transform(DFT) of an even-extended real signal. Reordering of the input data makes the computation of DCT just a problem of computing the DFT of a real signal with a few additional operations. This approach provides regular, simple, and very efficient DCT algorithms for practical hardware and software implementations.

DCT type-II can be implemented using Fast fourier transform (FFT) internally, as the transform is applied on real values, Real FFT can be used. DCT4 is implemented using DCT2 as their implementations are similar except with some added pre-processing and post-processing. DCT2 implementation can be described in the following steps:

- · Re-ordering input
- · Calculating Real FFT
- Multiplication of weights and Real FFT output and getting real part from the product.



This process is explained by the block diagram below:

### Algorithm

The N-point type-IV DCT is defined as a real, linear transformation by the formula:

$$X_c(k) = \sqrt{\frac{2}{N}} \sum_{n=0}^{N-1} x(n) \cos \left[ \left( n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) \right]$$

where k = 0, 1, 2, ..., N-1

Its inverse is defined as follows:

$$x(n) = \sqrt{\frac{2}{N}} \sum_{k=0}^{N-1} X_c(k) \cos\left[\left(n + \frac{1}{2}\right)\left(k + \frac{1}{2}\right)\right]$$

where n = 0, 1, 2, ..., N-1

The DCT4 matrices become involutory (i.e. they are self-inverse) by multiplying with an overall scale factor of sqrt(2/N). The symmetry of the transform matrix indicates that the fast algorithms for the forward and inverse transform computation are identical. Note that the implementation of Inverse DCT4 and DCT4 is same, hence same process function can be used for both.

**Lengths supported by the transform:** As DCT4 internally uses Real FFT, it supports all the lengths 128, 512, 2048 and 8192. The library provides separate functions for Q15, Q31, and floating-point data types.

**Instance Structure** The instances for Real FFT and FFT, cosine values table and twiddle factor table are stored in an instance data structure. A separate instance structure must be defined for each transform. There are separate instance structure declarations for each of the 3 supported data types.

**Initialization Functions** There is also an associated initialization function for each data type. The initialization function performs the following operations:

- Sets the values of the internal structure fields.
- Initializes Real FFT as its process function is used internally in DCT4, by calling riscy rfft init f32().

Use of the initialization function is optional. However, if the initialization function is used, then the instance structure cannot be placed into a const data section. To place an instance structure into a const data section, the instance structure must be manually initialized. Manually initialize the instance structure as follows: where N is the length of the DCT4; Nby2 is half of the length of the DCT4; normalize is normalizing factor used and is equal to sqrt(2/N); pTwiddle points to the twiddle factor table; pCosFactor points to the cosFactor table; pRfft points to the real FFT instance; pCfft points to the complex FFT instance; The CFFT and RFFT structures also needs to be initialized, refer to riscv\_cfft\_radix4\_f32() and riscv\_rfft\_f32() respectively for details regarding static initialization.

**Fixed-Point Behavior** Care must be taken when using the fixed-point versions of the DCT4 transform functions. In particular, the overflow and saturation behavior of the accumulator used in each function must be considered. Refer to the function specific documentation below for usage guidelines.

# **Functions**

void **riscv\_dct4\_f32** (**const** riscv\_dct4\_instance\_f32 \*S, float32\_t \*pState, float32\_t \*pInlineB-uffer)

Processing function for the floating-point DCT4/IDCT4.

# **Parameters**

- S [in] points to an instance of the floating-point DCT4/IDCT4 structure
- pState [in] points to state buffer
- pInlineBuffer [inout] points to the in-place input and output buffer

# Returns none

riscv\_status **riscv\_dct4\_init\_f32** (riscv\_dct4\_instance\_f32 \*S, riscv\_rfft\_instance\_f32 \*S\_RFFT, riscv\_cfft\_radix4\_instance\_f32 \*S\_CFFT, uint16\_t N, uint16\_t Nby2, float32\_t normalize)

Initialization function for the floating-point DCT4/IDCT4.

| DCT Size | Normalizing factor value |
|----------|--------------------------|
| 2048     | 0.03125                  |
| 512      | 0.0625                   |
| 128      | 0.125                    |

**Normalizing factor** The normalizing factor is sqrt(2/N), which depends on the size of transform N. Floating-point normalizing factors are mentioned in the table below for different DCT sizes:

### **Parameters**

- S [inout] points to an instance of floating-point DCT4/IDCT4 structure
- **S\_RFFT [in]** points to an instance of floating-point RFFT/RIFFT structure
- **S\_CFFT** [in] points to an instance of floating-point CFFT/CIFFT structure
- N [in] length of the DCT4
- Nby2 [in] half of the length of the DCT4
- normalize [in] normalizing factor.

### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: N is not a supported transform length

riscv\_status riscv\_dct4\_init\_q15 (riscv\_dct4\_instance\_q15 \*S, riscv\_rfft\_instance\_q15 \*S\_RFFT, riscv\_cfft\_radix4\_instance\_q15 \*S\_CFFT, uint16\_t N, uint16\_t Nby2, q15\_t normalize)

Initialization function for the O15 DCT4/IDCT4.

| DCT Size | Normalizing factor value (hexadecimal) |  |  |
|----------|----------------------------------------|--|--|
| 2048     | 0x400                                  |  |  |
| 512      | 0x800                                  |  |  |
| 128      | 0x1000                                 |  |  |

**Normalizing factor** The normalizing factor is sqrt (2/N), which depends on the size of transform N. Normalizing factors in 1.15 format are mentioned in the table below for different DCT sizes:

### **Parameters**

- S [inout] points to an instance of Q15 DCT4/IDCT4 structure
- **S\_RFFT** [in] points to an instance of Q15 RFFT/RIFFT structure
- S\_CFFT [in] points to an instance of Q15 CFFT/CIFFT structure

- N [in] length of the DCT4
- Nby2 [in] half of the length of the DCT4
- normalize [in] normalizing factor

## **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: N is not a supported transform length

riscv\_status **riscv\_dct4\_init\_q31** (riscv\_dct4\_instance\_q31 \*S, riscv\_rfft\_instance\_q31 \*S\_RFFT, riscv\_cfft\_radix4\_instance\_q31 \*S\_CFFT, uint16\_t N, uint16\_t Nby2, q31\_t normalize)

Initialization function for the Q31 DCT4/IDCT4.

| DCT Size | Normalizing factor value (hexadecimal) |  |
|----------|----------------------------------------|--|
| 2048     | 0x4000000                              |  |
| 512      | 0x8000000                              |  |
| 128      | 0x10000000                             |  |

**Normalizing factor:** The normalizing factor is sqrt (2/N), which depends on the size of transform N. Normalizing factors in 1.31 format are mentioned in the table below for different DCT sizes:

#### **Parameters**

- **S [inout]** points to an instance of Q31 DCT4/IDCT4 structure.
- **S\_RFFT** [in] points to an instance of Q31 RFFT/RIFFT structure
- **S\_CFFT** [in] points to an instance of Q31 CFFT/CIFFT structure
- N [in] length of the DCT4.
- Nby2 [in] half of the length of the DCT4.
- normalize [in] normalizing factor.

## **Returns** execution status

- RISCV\_MATH\_SUCCESS : Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: N is not a supported transform length

void **riscv\_dct4\_q15** (**const** riscv\_dct4\_instance\_q15 \*S, q15\_t \*pState, q15\_t \*pInlineBuffer) Processing function for the Q15 DCT4/IDCT4.

| DCT Size | Input format | Output format | Number of bits to upscale |
|----------|--------------|---------------|---------------------------|
| 2048     | 1.15         | 11.5          | 10                        |
| 512      | 1.15         | 9.7           | 8                         |
| 128      | 1.15         | 7.9           | 6                         |

**Input an output formats** Internally inputs are downscaled in the RFFT process function to avoid overflows. Number of bits downscaled, depends on the size of the transform. The input and output formats for different DCT sizes and number of bits to upscale are mentioned in the table below:

#### **Parameters**

- **S** [in] points to an instance of the Q15 DCT4 structure.
- pState [in] points to state buffer.
- pInlineBuffer [inout] points to the in-place input and output buffer.

### Returns none

void **riscv\_dct4\_q31** (**const** riscv\_dct4\_instance\_q31 \**S*, q31\_t \**pState*, q31\_t \**pInlineBuffer*) Processing function for the Q31 DCT4/IDCT4.

| DCT Size | Input format | Output format | Number of bits to upscale |
|----------|--------------|---------------|---------------------------|
| 2048     | 2.30         | 12.20         | 11                        |
| 512      | 2.30         | 10.22         | 9                         |
| 128      | 2.30         | 8.24          | 7                         |

Input an output formats Input samples need to be downscaled by 1 bit to avoid saturations in the Q31 DCT process, as the conversion from DCT2 to DCT4 involves one subtraction. Internally inputs are downscaled in the RFFT process function to avoid overflows. Number of bits downscaled, depends on the size of the transform. The input and output formats for different DCT sizes and number of bits to upscale are mentioned in the table below:

#### **Parameters**

- S [in] points to an instance of the Q31 DCT4 structure.
- pState [in] points to state buffer.
- pInlineBuffer [inout] points to the in-place input and output buffer.

Returns none

# **Real FFT Functions**

## **Real FFT Tables**

```
const float32_t realCoefA[8192]

const float32_t realCoefB[8192]

const q31_t realCoefAQ31[8192]

const q31_t realCoefBQ31[8192]

const q15_t __ALIGNED (4)

group RealFFT_Table
```

### **Functions**

```
const q15_t __ALIGNED (4)
     Weights Table.
     Q15 table for reciprocal.
     end of DCT4_IDCT4_Table group
      Generation fixed-point realCoefAQ15 array in Q15 format:
      n = 4096
      Convert to fixed point Q15 format round(pATable[i] * pow(2, 15))
      Generation of real_CoefB array:
      n = 4096
      Convert to fixed point Q15 format round(pBTable[i] * pow(2, 15))
      Weights tables are generated using the formula:
      C command to generate the table
      where N is the Number of weights to be calculated and c is pi/(2*N)
      Converted the output to q15 format by multiplying with 2<sup>31</sup> and saturated if required.
      In the tables below the real and imaginary values are placed alternatively, hence the array length is 2 * N.
      cosFactor tables are generated using the formula:
      C command to generate the table
      where N is the number of factors to generate and c is pi/(2*N)
      Then converted to q15 format by multiplying with 2^31 and saturated if required.
```

# **Variables**

```
const float32_t realCoefA[8192]
    Generation of realCoefA array:
    n = 4096

const float32_t realCoefB[8192]
    Generation of realCoefB array:
    n = 4096

const q31_t realCoefAQ31[8192]
    Generation fixed-point realCoefAQ31 array in Q31 format:
    n = 4096
    Convert to fixed point Q31 format round(pATable[i] * pow(2, 31))

const q31_t realCoefBQ31[8192]
    Generation of realCoefBQ31 array:
    n = 4096
    Convert to fixed point Q31 format round(pBTable[i] * pow(2, 31))
```

```
void riscv rfft f32 (const riscv rfft instance f32 *S, float32 t *pSrc, float32 t *pDst)
void riscv_rfft_fast_f32 (const riscv_rfft_fast_instance_f32 *S, float32_t *p, float32_t *pOut,
                              uint8_t ifftFlag)
void riscv_rfft_fast_f64 (riscv_rfft_fast_instance_f64 *S, float64_t *p, float64_t *pOut, uint8_t ifft-
riscv_status riscv_rfft_fast_init_f32 (riscv_rfft_fast_instance_f32 *S, uint16_t fftLen)
static riscv_status riscv_rfft_32_fast_init_f64 (riscv_rfft_fast_instance_f64 *S)
static riscv_status riscv_rfft_64_fast_init_f64 (riscv_rfft_fast_instance_f64 *S)
static riscv status riscv rfft 128 fast init f64 (riscv rfft fast instance f64 *S)
static riscv_status riscv_rfft_256_fast_init_f64 (riscv_rfft_fast_instance_f64 *S)
static riscv_status riscv_rfft_512_fast_init_f64 (riscv_rfft_fast_instance_f64 *S)
static riscv_status riscv_rfft_1024_fast_init_f64 (riscv_rfft_fast_instance_f64 *S)
static riscv_status riscv_rfft_2048_fast_init_f64 (riscv_rfft_fast_instance_f64 *S)
static riscv_status riscv_rfft_4096_fast_init_f64 (riscv_rfft_fast_instance_f64 *S)
riscv_status riscv_rfft_fast_init_f64 (riscv_rfft_fast_instance_f64 *S, uint16_t fftLen)
riscv status riscv rfft init f32 (riscv rfft instance f32
                                                               *S,
                                                                      riscv cfft radix4 instance f32
                                     *S_CFFT, uint32_t fftLenReal, uint32_t ifftFlagR, uint32_t bi-
                                     tReverseFlag)
riscv_status riscv_rfft_init_q15 (riscv_rfft_instance_q15 *S, uint32_t fftLenReal, uint32_t ifftFlagR,
                                     uint32 t bitReverseFlag)
riscv_status riscv_rfft_init_q31 (riscv_rfft_instance_q31 *S, uint32_t fftLenReal, uint32_t ifftFlagR,
                                     uint32 t bitReverseFlag)
void riscv_rfft_q15 (const riscv_rfft_instance_q15 *S, q15_t *pSrc, q15_t *pDst)
void riscv_rfft_q31 (const riscv_rfft_instance_q31 *S, q31_t *pSrc, q31_t *pDst)
group RealFFT
```

The NMSIS DSP library includes specialized algorithms for computing the FFT of real data sequences. The FFT is defined over complex data but in many applications the input is real. Real FFT algorithms take advantage of the symmetry properties of the FFT and have a speed advantage over complex algorithms of the same length.

The Fast RFFT algorithm relays on the mixed radix CFFT that save processor usage.

The real length N forward FFT of a sequence is computed using the steps shown below.



The real sequence is initially treated as if it were complex to perform a CFFT. Later, a processing stage reshapes the data to obtain half of the frequency spectrum in complex format. Except the first complex number that contains the two real numbers X[0] and X[N/2] all the data is complex. In other words, the first complex sample contains two real values packed.

The input for the inverse RFFT should keep the same format as the output of the forward RFFT. A first processing stage pre-process the data to later perform an inverse CFFT.



The algorithms for floating-point, Q15, and Q31 data are slightly different and we describe each algorithm in turn.

**Floating-point** The main functions are riscv\_rfft\_fast\_f32() and riscv\_rfft\_fast\_init\_f32(). The older functions riscv\_rfft\_f32() and riscv\_rfft\_init\_f32() have been deprecated but are still documented.

The FFT of a real N-point sequence has even symmetry in the frequency domain. The second half of the data equals the conjugate of the first half flipped in frequency. Looking at the data, we see that we can uniquely represent the FFT using only N/2 complex numbers. These are packed into the output array in alternating real and imaginary components:

 $X = \{ real[0], imag[0], real[1], imag[1], real[2], imag[2] \dots real[(N/2)-1], imag[(N/2)-1] \}$ 

It happens that the first complex number (real[0], imag[0]) is actually all real. real[0] represents the DC offset, and imag[0] should be 0. (real[1], imag[1]) is the fundamental frequency, (real[2], imag[2]) is the first harmonic and so on.

The real FFT functions pack the frequency domain data in this fashion. The forward transform outputs the data in this form and the inverse transform expects input data in this form. The function always performs the needed bitreversal so that the input and output data is always in normal order. The functions support lengths of [32, 64, 128, ..., 4096] samples.

Q15 and Q31 The real algorithms are defined in a similar manner and utilize N/2 complex transforms behind the scenes.

The complex transforms used internally include scaling to prevent fixed-point overflows. The overall scaling equals 1/(fftLen/2). Due to the use of complex transform internally, the source buffer is modified by the rfft.

A separate instance structure must be defined for each transform used but twiddle factor and bit reversal tables can be reused

There is also an associated initialization function for each data type. The initialization function performs the following operations:

- Sets the values of the internal structure fields.
- Initializes twiddle factor table and bit reversal table pointers.
- Initializes the internal complex FFT data structure.

Use of the initialization function is optional except for MVE versions where it is mandatory. If you don't use the initialization functions, then the structures should be initialized with code similar to the one below: where fftLenReal is the length of the real transform; fftLenBy2 length of the internal complex transform (fftLenReal/2). ifftFlagR Selects forward (=0) or inverse (=1) transform. bitReverseFlagR Selects bit reversed output (=0) or normal order output (=1). twidCoefRModifier stride modifier for the twiddle factor table. The value is based on the FFT length; pTwiddleARealpoints to the A array of twiddle coefficients; pTwiddleBRealpoints to the B array of twiddle coefficients; pCfft points to the CFFT Instance structure. The CFFT structure must also be initialized.

Note that with MVE versions you can't initialize instance structures directly and **must use the initialization function**.

# **Functions**

```
void riscv_rfft_f32 (const riscv_rfft_instance_f32 *S, float32_t *pSrc, float32_t *pDst)

Processing function for the floating-point RFFT/RIFFT. Source buffer is modified by this function.
```

Deprecated:

Do not use this function. It has been superceded by riscv\_rfft\_fast\_f32 and will be removed in the future.

For the RIFFT, the source buffer must at least have length fftLenReal + 2. The last two elements must be equal to what would be generated by the RFFT: (pSrc[0] - pSrc[1]) and 0.0f

#### **Parameters**

- **S** [in] points to an instance of the floating-point RFFT/RIFFT structure
- pSrc [in] points to the input buffer
- pDst [out] points to the output buffer

Returns none

```
void riscv_rfft_fast_f32 (const riscv_rfft_fast_instance_f32 *S, float32_t *p, float32_t *pOut, uint8_t ifftFlag)
```

Processing function for the floating-point real FFT.

### **Parameters**

- **S** [in] points to an riscv\_rfft\_fast\_instance\_f32 structure
- p [in] points to input buffer (Source buffer is modified by this function.)
- pOut [in] points to output buffer
- ifftFlag [in]
  - value = 0: RFFT
  - value = 1: RIFFT

## Returns none

```
void riscv_rfft_fast_f64 (riscv_rfft_fast_instance_f64 *S, float64_t *p, float64_t *pOut, uint8_t ifftFlag)
```

Processing function for the Double Precision floating-point real FFT.

## **Parameters**

- **S** [in] points to an riscv\_rfft\_fast\_instance\_f64 structure
- p [in] points to input buffer (Source buffer is modified by this function.)
- pOut [in] points to output buffer
- ifftFlag [in]
  - value = 0: RFFT
  - value = 1: RIFFT

### Returns none

```
static riscv_status riscv_rfft_32_fast_init_f32 (riscv_rfft_fast_instance_f32 *S) Initialization function for the 32pt floating-point real FFT.
```

**Parameters S** – [inout] points to an riscv\_rfft\_fast\_instance\_f32 structure

#### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_64\_fast\_init\_f32** (riscv\_rfft\_fast\_instance\_f32 \*S) Initialization function for the 64pt floating-point real FFT.

Parameters S – [inout] points to an riscv\_rfft\_fast\_instance\_f32 structure

**Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_128\_fast\_init\_f32** (riscv\_rfft\_fast\_instance\_f32 \*S) Initialization function for the 128pt floating-point real FFT.

**Parameters S** – [inout] points to an riscv\_rfft\_fast\_instance\_f32 structure

**Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_256\_fast\_init\_f32** (riscv\_rfft\_fast\_instance\_f32 \*S) Initialization function for the 256pt floating-point real FFT.

Parameters S – [inout] points to an riscv\_rfft\_fast\_instance\_f32 structure

**Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_512\_fast\_init\_f32** (riscv\_rfft\_fast\_instance\_f32 \*S) Initialization function for the 512pt floating-point real FFT.

Parameters S – [inout] points to an riscv\_rfft\_fast\_instance\_f32 structure

**Returns** execution status

- RISCV MATH SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_1024\_fast\_init\_f32** (riscv\_rfft\_fast\_instance\_f32 \*S) Initialization function for the 1024pt floating-point real FFT.

Parameters S – [inout] points to an riscv\_rfft\_fast\_instance\_f32 structure

**Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_2048\_fast\_init\_f32** (riscv\_rfft\_fast\_instance\_f32 \*S) Initialization function for the 2048pt floating-point real FFT.

**Parameters** S – [inout] points to an riscv\_rfft\_fast\_instance\_f32 structure

Returns execution status

• RISCV\_MATH\_SUCCESS: Operation successful

RISCV MATH ARGUMENT ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_4096\_fast\_init\_f32** (riscv\_rfft\_fast\_instance\_f32 \*S) Initialization function for the 4096pt floating-point real FFT.

**Parameters S** – [inout] points to an riscv\_rfft\_fast\_instance\_f32 structure

**Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

riscv\_status **riscv\_rfft\_fast\_init\_f32** (riscv\_rfft\_fast\_instance\_f32 \*S, uint16\_t fftLen) Initialization function for the floating-point real FFT.

**Description** The parameter fftLen specifies the length of RFFT/CIFFT process. Supported FFT Lengths are 32, 64, 128, 256, 512, 1024, 2048, 4096.

This Function also initializes Twiddle factor table pointer and Bit reversal table pointer.

#### **Parameters**

- **S** [inout] points to an riscv\_rfft\_fast\_instance\_f32 structure
- fftLen [in] length of the Real Sequence

**Returns** execution status

- RISCV MATH SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: fftLen is not a supported length

**static** riscv\_status **riscv\_rfft\_32\_fast\_init\_f64** (riscv\_rfft\_fast\_instance\_f64 \**S*) Initialization function for the 32pt double precision floating-point real FFT.

Parameters S – [inout] points to an riscv\_rfft\_fast\_instance\_f64 structure

Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_64\_fast\_init\_f64** (riscv\_rfft\_fast\_instance\_f64 \*S) Initialization function for the 64pt Double Precision floating-point real FFT.

Parameters S – [inout] points to an riscv\_rfft\_fast\_instance\_f64 structure

**Returns** execution status

- RISCV\_MATH\_SUCCESS : Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_128\_fast\_init\_f64** (riscv\_rfft\_fast\_instance\_f64 \**S*) Initialization function for the 128pt Double Precision floating-point real FFT.

Parameters S – [inout] points to an riscv\_rfft\_fast\_instance\_f64 structure

**Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV MATH ARGUMENT ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_256\_fast\_init\_f64** (riscv\_rfft\_fast\_instance\_f64 \*S) Initialization function for the 256pt Double Precision floating-point real FFT.

Parameters S – [inout] points to an riscv\_rfft\_fast\_instance\_f64 structure

Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV MATH ARGUMENT ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_512\_fast\_init\_f64** (riscv\_rfft\_fast\_instance\_f64 \**S*) Initialization function for the 512pt Double Precision floating-point real FFT.

**Parameters S** – [inout] points to an riscv\_rfft\_fast\_instance\_f64 structure

**Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_1024\_fast\_init\_f64** (riscv\_rfft\_fast\_instance\_f64 \*S) Initialization function for the 1024pt Double Precision floating-point real FFT.

**Parameters S** – [inout] points to an riscv\_rfft\_fast\_instance\_f64 structure

**Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV MATH ARGUMENT ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_2048\_fast\_init\_f64** (riscv\_rfft\_fast\_instance\_f64 \**S*) Initialization function for the 2048pt Double Precision floating-point real FFT.

Parameters S – [inout] points to an riscv\_rfft\_fast\_instance\_f64 structure

**Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

**static** riscv\_status **riscv\_rfft\_4096\_fast\_init\_f64** (riscv\_rfft\_fast\_instance\_f64 \*S) Initialization function for the 4096pt Double Precision floating-point real FFT.

Parameters S – [inout] points to an riscv\_rfft\_fast\_instance\_f64 structure

**Returns** execution status

- RISCV MATH SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: an error is detected

riscv\_status riscv\_rfft\_fast\_init\_f64 (riscv\_rfft\_fast\_instance\_f64 \*S, uint16\_t fftLen) Initialization function for the Double Precision floating-point real FFT.

**Description** The parameter fftLen specifies the length of RFFT/CIFFT process. Supported FFT Lengths are 32, 64, 128, 256, 512, 1024, 2048, 4096.

This Function also initializes Twiddle factor table pointer and Bit reversal table pointer.

# **Parameters**

- **S** [inout] points to an riscv\_rfft\_fast\_instance\_f64 structure
- **fftLen [in]** length of the Real Sequence

#### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: fftLen is not a supported length

riscv\_status **riscv\_rfft\_init\_f32** (riscv\_rfft\_instance\_f32 \*S, riscv\_cfft\_radix4\_instance\_f32 \*S\_CFFT, uint32\_t fftLenReal, uint32\_t ifftFlagR, uint32\_t bitReverseFlag)

Initialization function for the floating-point RFFT/RIFFT.

## Deprecated:

Do not use this function. It has been superceded by riscv\_rfft\_fast\_init\_f32 and will be removed in the future.

**Description** The parameter fftLenRealspecifies length of RFFT/RIFFT Process. Supported FFT Lengths are 128, 512, 2048.

The parameter ifftFlagR controls whether a forward or inverse transform is computed. Set(=1) ifft-FlagR to calculate RIFFT, otherwise RFFT is calculated.

The parameter bitReverseFlag controls whether output is in normal order or bit reversed order. Set(=1) bitReverseFlag for output to be in normal order otherwise output is in bit reversed order.

This function also initializes Twiddle factor table.

## **Parameters**

- **S** [inout] points to an instance of the floating-point RFFT/RIFFT structure
- **S\_CFFT** [inout] points to an instance of the floating-point CFFT/CIFFT structure
- fftLenReal [in] length of the FFT.
- ifftFlagR [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

## Returns execution status

- RISCV MATH SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: fftLenReal is not a supported length

riscv\_status riscv\_rfft\_init\_q15 (riscv\_rfft\_instance\_q15 \*S, uint32\_t fftLenReal, uint32\_t ifft-FlagR, uint32\_t bitReverseFlag)
Initialization function for the Q15 RFFT/RIFFT.

**Details** The parameter fftLenReal specifies length of RFFT/RIFFT Process. Supported FFT Lengths are 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192.

The parameter ifftFlagR controls whether a forward or inverse transform is computed. Set(=1) ifft-FlagR to calculate RIFFT, otherwise RFFT is calculated.

The parameter bitReverseFlag controls whether output is in normal order or bit reversed order. Set(=1) bitReverseFlag for output to be in normal order otherwise output is in bit reversed order.

This function also initializes Twiddle factor table.

#### **Parameters**

- **S** [inout] points to an instance of the Q15 RFFT/RIFFT structure
- fftLenReal [in] length of the FFT
- ifftFlagR [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

### Returns execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: fftLenReal is not a supported length

```
riscv_status riscv_rfft_init_q31 (riscv_rfft_instance_q31 *S, uint32_t fftLenReal, uint32_t ifft-
FlagR, uint32_t bitReverseFlag)
```

Initialization function for the Q31 RFFT/RIFFT.

**Details** The parameter fftLenReal specifies length of RFFT/RIFFT Process. Supported FFT Lengths are 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192.

The parameter ifftFlagR controls whether a forward or inverse transform is computed. Set(=1) ifft-FlagR to calculate RIFFT, otherwise RFFT is calculated.

The parameter bitReverseFlag controls whether output is in normal order or bit reversed order. Set(=1) bitReverseFlag for output to be in normal order otherwise output is in bit reversed order.

This function also initializes Twiddle factor table.

### **Parameters**

- **S** [inout] points to an instance of the Q31 RFFT/RIFFT structure
- fftLenReal [in] length of the FFT
- ifftFlagR [in] flag that selects transform direction
  - value = 0: forward transform
  - value = 1: inverse transform
- bitReverseFlag [in] flag that enables / disables bit reversal of output
  - value = 0: disables bit reversal of output
  - value = 1: enables bit reversal of output

### **Returns** execution status

- RISCV\_MATH\_SUCCESS: Operation successful
- RISCV\_MATH\_ARGUMENT\_ERROR: fftLenReal is not a supported length

void **riscv\_rfft\_q15** (**const** riscv\_rfft\_instance\_q15 \*S, q15\_t \*pSrc, q15\_t \*pDst)
Processing function for the Q15 RFFT/RIFFT.

**Input an output formats** Internally input is downscaled by 2 for every stage to avoid saturations inside CFFT/CIFFT process. Hence the output format is different for different RFFT sizes. The input and output formats for different RFFT sizes and number of bits to upscale are mentioned in the tables below for RFFT and RIFFT:

| RFFT Size | Input Format | Output Format | Number of bits to |
|-----------|--------------|---------------|-------------------|
|           |              |               | upscale           |
| 32        | 1.15         | 5.11          | 4                 |
| 64        | 1.15         | 6.10          | 5                 |
| 128       | 1.15         | 7.9           | 6                 |
| 256       | 1.15         | 8.8           | 7                 |
| 512       | 1.15         | 9.7           | 8                 |
| 1024      | 1.15         | 10.6          | 9                 |
| 2048      | 1.15         | 11.5          | 10                |
| 4096      | 1.15         | 12.4          | 11                |
| 8192      | 1.15         | 13.3          | 12                |

| RFFT Size | Input Format | Output Format | Number of bits to |
|-----------|--------------|---------------|-------------------|
|           |              |               | upscale           |
| 32        | 1.15         | 5.11          | 0                 |
| 64        | 1.15         | 6.10          | 0                 |
| 128       | 1.15         | 7.9           | 0                 |
| 256       | 1.15         | 8.8           | 0                 |
| 512       | 1.15         | 9.7           | 0                 |
| 1024      | 1.15         | 10.6          | 0                 |
| 2048      | 1.15         | 11.5          | 0                 |
| 4096      | 1.15         | 12.4          | 0                 |
| 8192      | 1.15         | 13.3          | 0                 |

If the input buffer is of length N, the output buffer must have length 2\*N. The input buffer is modified by this function.

For the RIFFT, the source buffer must at least have length fftLenReal + 2. The last two elements must be equal to what would be generated by the RFFT: (pSrc[0] - pSrc[1]) >> 1 and 0

## **Parameters**

- S [in] points to an instance of the Q15 RFFT/RIFFT structure
- pSrc [in] points to input buffer (Source buffer is modified by this function.)
- pDst [out] points to output buffer

# Returns none

void **riscv\_rfft\_q31** (**const** riscv\_rfft\_instance\_q31 \*S, q31\_t \*pSrc, q31\_t \*pDst) Processing function for the Q31 RFFT/RIFFT.

**Input an output formats** Internally input is downscaled by 2 for every stage to avoid saturations inside CFFT/CIFFT process. Hence the output format is different for different RFFT sizes. The input and

output formats for different RFFT sizes and number of bits to upscale are mentioned in the tables below for RFFT and RIFFT:

| RFFT Size | Input Format | Output Format | Number of bits to |
|-----------|--------------|---------------|-------------------|
|           |              |               | upscale           |
| 32        | 1.31         | 5.27          | 4                 |
| 64        | 1.31         | 6.26          | 5                 |
| 128       | 1.31         | 7.25          | 6                 |
| 256       | 1.31         | 8.24          | 7                 |
| 512       | 1.31         | 9.23          | 8                 |
| 1024      | 1.31         | 10.22         | 9                 |
| 2048      | 1.31         | 11.21         | 10                |
| 4096      | 1.31         | 21.20         | 11                |
| 8192      | 1.31         | 13.19         | 12                |

| RFFT Size | Input Format | Output Format | Number of bits to |
|-----------|--------------|---------------|-------------------|
|           |              |               | upscale           |
| 32        | 1.31         | 5.27          | 0                 |
| 64        | 1.31         | 6.26          | 0                 |
| 128       | 1.31         | 7.25          | 0                 |
| 256       | 1.31         | 8.24          | 0                 |
| 512       | 1.31         | 9.23          | 0                 |
| 1024      | 1.31         | 10.22         | 0                 |
| 2048      | 1.31         | 11.21         | 0                 |
| 4096      | 1.31         | 12.20         | 0                 |
| 8192      | 1.31         | 13.19         | 0                 |

If the input buffer is of length N, the output buffer must have length 2\*N. The input buffer is modified by this function.

For the RIFFT, the source buffer must at least have length fftLenReal + 2. The last two elements must be equal to what would be generated by the RFFT: (pSrc[0] - pSrc[1]) >> 1 and 0

# **Parameters**

- S [in] points to an instance of the Q31 RFFT/RIFFT structure
- pSrc [in] points to input buffer (Source buffer is modified by this function)
- pDst [out] points to output buffer

Returns none

group groupTransforms

# 3.3.7 Controller Functions

# **PID Motor Control**

```
__STATIC_FORCEINLINE float32_t riscv_pid_f32 (riscv_pid_instance_f32 *S, float32_t in)
__STATIC_FORCEINLINE q31_t riscv_pid_q31 (riscv_pid_instance_q31 *S, q31_t in)
__STATIC_FORCEINLINE q15_t riscv_pid_q15 (riscv_pid_instance_q15 *S, q15_t in)
void riscv_pid_init_f32 (riscv_pid_instance_f32 *S, int32_t resetStateFlag)
void riscv_pid_init_q15 (riscv_pid_instance_q15 *S, int32_t resetStateFlag)
void riscv_pid_init_q31 (riscv_pid_instance_q31 *S, int32_t resetStateFlag)
void riscv_pid_reset_f32 (riscv_pid_instance_f32 *S)
void riscv_pid_reset_q15 (riscv_pid_instance_q15 *S)
void riscv_pid_reset_q31 (riscv_pid_instance_q31 *S)
group PID
end of SinCos group
```

A Proportional Integral Derivative (PID) controller is a generic feedback control loop mechanism widely used in industrial control systems. A PID controller is the most commonly used type of feedback controller.

This set of functions implements (PID) controllers for Q15, Q31, and floating-point data types. The functions operate on a single sample of data and each call to the function returns a single processed value. S points to an instance of the PID control data structure. in is the input sample value. The functions return the output value.

## Algorithm:

where Kp is proportional constant, Ki is Integral constant and Kd is Derivative constant



The PID controller calculates an "error" value as the difference between the measured output and the reference input. The controller attempts to minimize the error by adjusting the process control inputs. The proportional value determines the reaction to the current error, the integral value determines the reaction based on the sum of recent errors, and the derivative value determines the reaction based on the rate at which the error has been changing.

**Instance Structure** The Gains A0, A1, A2 and state variables for a PID controller are stored together in an instance data structure. A separate instance structure must be defined for each PID Controller. There are separate instance structure declarations for each of the 3 supported data types.

**Reset Functions** There is also an associated reset function for each data type which clears the state array.

**Initialization Functions** There is also an associated initialization function for each data type. The initialization function performs the following operations:

- Initializes the Gains A0, A1, A2 from Kp,Ki, Kd gains.
- Zeros out the values in the state buffer.

Instance structure cannot be placed into a const data section and it is recommended to use the initialization function.

**Fixed-Point Behavior** Care must be taken when using the fixed-point versions of the PID Controller functions. In particular, the overflow and saturation behavior of the accumulator used in each function must be considered. Refer to the function specific documentation below for usage guidelines.

## **Functions**

\_\_STATIC\_FORCEINLINE float32\_t riscv\_pid\_f32 (riscv\_pid\_instance\_f32 \*S, float32\_t in)

Process function for the floating-point PID Control.

#### **Parameters**

- S [inout] is an instance of the floating-point PID Control structure
- in [in] input sample to process

**Returns** processed output sample.

\_\_STATIC\_FORCEINLINE q31\_t riscv\_pid\_q31 (riscv\_pid\_instance\_q31 \*S, q31\_t in)
Process function for the Q31 PID Control.

**Scaling and Overflow Behavior** The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around rather than clip. In order to avoid overflows completely the input signal must be scaled down by 2 bits as there are four additions. After all multiply-accumulates are performed, the 2.62 accumulator is truncated to 1.32 format and then saturated to 1.31 format.

#### **Parameters**

- **S** [inout] points to an instance of the Q31 PID Control structure
- in [in] input sample to process

Returns processed output sample.

\_\_STATIC\_FORCEINLINE q15\_t riscv\_pid\_q15 (riscv\_pid\_instance\_q15 \*S, q15\_t in)
Process function for the Q15 PID Control.

Scaling and Overflow Behavior The function is implemented using a 64-bit internal accumulator. Both Gains and state variables are represented in 1.15 format and multiplications yield a 2.30 result. The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format. There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved. After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits. Lastly, the accumulator is saturated to yield a result in 1.15 format.

#### **Parameters**

- **S** [inout] points to an instance of the Q15 PID Control structure
- in [in] input sample to process

Returns processed output sample.

void **riscv\_pid\_init\_f32** (riscv\_pid\_instance\_f32 \*S, int32\_t resetStateFlag) Initialization function for the floating-point PID Control.

**Details** The resetStateFlag specifies whether to set state to zero or not.

The function computes the structure fields: A0, A1 A2 using the proportional gain(Kp), integral gain(Ki) and derivative gain(Kd) also sets the state variables to all zeros.

#### **Parameters**

- **S** [inout] points to an instance of the PID structure
- resetStateFlag [in]
  - value = 0: no change in state
  - value = 1: reset state

#### Returns none

```
void riscv_pid_init_q15 (riscv_pid_instance_q15 *S, int32_t resetStateFlag) Initialization function for the Q15 PID Control.
```

**Details** The resetStateFlag specifies whether to set state to zero or not.

The function computes the structure fields: A0, A1 A2 using the proportional gain( Kp), integral gain( Ki) and derivative gain( Kd) also sets the state variables to all zeros.

#### **Parameters**

- S [inout] points to an instance of the Q15 PID structure
- resetStateFlag [in]
  - value = 0: no change in state
  - value = 1: reset state

### Returns none

```
void riscv_pid_init_q31 (riscv_pid_instance_q31 *S, int32_t resetStateFlag) Initialization function for the Q31 PID Control.
```

**Details** The resetStateFlag specifies whether to set state to zero or not.

The function computes the structure fields: A0, A1 A2 using the proportional gain( Kp), integral gain( Ki) and derivative gain( Kd) also sets the state variables to all zeros.

#### **Parameters**

- S [inout] points to an instance of the Q31 PID structure
- resetStateFlag [in]
  - value = 0: no change in state

```
- value = 1: reset state
```

**Returns** none

```
void riscv_pid_reset_f32 (riscv_pid_instance_f32 *S)
```

Reset function for the floating-point PID Control.

**Details** The function resets the state buffer to zeros.

Parameters S – [inout] points to an instance of the floating-point PID structure

Returns none

```
void riscv_pid_reset_q15 (riscv_pid_instance_q15 *S)
```

Reset function for the Q15 PID Control.

**Details** The function resets the state buffer to zeros.

Parameters S – [inout] points to an instance of the Q15 PID structure

Returns none

```
void riscv_pid_reset_q31 (riscv_pid_instance_q31 *S)
```

Reset function for the Q31 PID Control.

**Details** The function resets the state buffer to zeros.

Parameters S – [inout] points to an instance of the Q31 PID structure

Returns none

## **Vector Clarke Transform**

```
__STATIC_FORCEINLINE void riscv_clarke_f32 (float32_t Ia, float32_t Ib, float32_t *pIalpha __STATIC_FORCEINLINE void riscv_clarke_q31 (q31_t Ia, q31_t Ib, q31_t *pIalpha, q31_t *pIbe group clarke
```

end of Inverse park group

Forward Clarke transform converts the instantaneous stator phases into a two-coordinate time invariant vector. Generally the Clarke transform uses three-phase currents Ia, Ib and Ic to calculate currents in the two-phase orthogonal stator axis Ialpha and Ibeta. When Ialpha is superposed with Ia as shown in the figure below

and Ia + Ib + Ic = 0, in this condition Ialpha and Ibeta can be calculated using only Ia and



Ib.

The function operates on a single sample of data and each call to the function returns the processed output. The library provides separate functions for Q31 and floating-point data types.

### Algorithm

where Ia and Ib are the instantaneous stator phases and plalpha and plbeta are the two coordinates

pIalpha = Ia  
pIbeta = 
$$(1/\sqrt{3})$$
 Ia +  $(2/\sqrt{3})$  Ib

of time invariant vector.

**Fixed-Point Behavior** Care must be taken when using the Q31 version of the Clarke transform. In particular, the overflow and saturation behavior of the accumulator used must be considered. Refer to the function specific documentation below for usage guidelines.

## **Functions**

\_\_STATIC\_FORCEINLINE void riscv\_clarke\_f32 (float32\_t Ia, float32\_t Ib, float32\_t \*pIa Floating-point Clarke transform.

## **Parameters**

- Ia [in] input three-phase coordinate a
- **Ib [in]** input three-phase coordinate b
- plalpha [out] points to output two-phase orthogonal vector axis alpha
- plbeta [out] points to output two-phase orthogonal vector axis beta

#### Returns none

\_\_STATIC\_FORCEINLINE void riscv\_clarke\_q31 (q31\_t Ia, q31\_t Ib, q31\_t \*pIalpha, q31\_t Clarke transform for Q31 version.

**Scaling and Overflow Behavior** The function is implemented using an internal 32-bit accumulator. The accumulator maintains 1.31 format by truncating lower 31 bits of the intermediate multiplication in 2.62 format. There is saturation on the addition, hence there is no risk of overflow.

#### **Parameters**

- Ia [in] input three-phase coordinate a
- **Ib** [in] input three-phase coordinate b
- plalpha [out] points to output two-phase orthogonal vector axis alpha
- plbeta [out] points to output two-phase orthogonal vector axis beta

Returns none

#### **Vector Inverse Clarke Transform**

\_\_STATIC\_FORCEINLINE void riscv\_inv\_clarke\_f32 (float32\_t Ialpha, float32\_t Ibeta, float32\_t \_\_STATIC\_FORCEINLINE void riscv\_inv\_clarke\_q31 (q31\_t Ialpha, q31\_t Ibeta, q31\_t \*pIa, q31\_group inv\_clarke

end of clarke group

Inverse Clarke transform converts the two-coordinate time invariant vector into instantaneous stator phases.

The function operates on a single sample of data and each call to the function returns the processed output. The library provides separate functions for Q31 and floating-point data types.

## Algorithm

where pla and plb are the instantaneous stator phases and lalpha and lbeta are the two coordinates

pIb = 
$$(-1/2)$$
 Ialpha +  $(\sqrt{3}/2)$  Ibeta

of time invariant vector.

**Fixed-Point Behavior** Care must be taken when using the Q31 version of the Clarke transform. In particular, the overflow and saturation behavior of the accumulator used must be considered. Refer to the function specific documentation below for usage guidelines.

#### **Functions**

\_\_STATIC\_FORCEINLINE void riscv\_inv\_clarke\_f32 (float32\_t Ialpha, float32\_t Ibeta, flo Floating-point Inverse Clarke transform.

#### **Parameters**

- Ialpha [in] input two-phase orthogonal vector axis alpha
- **Ibeta [in]** input two-phase orthogonal vector axis beta
- pla [out] points to output three-phase coordinate a

• plb – [out] points to output three-phase coordinate b

#### Returns none

\_\_STATIC\_FORCEINLINE void riscv\_inv\_clarke\_q31 (q31\_t Ialpha, q31\_t Ibeta, q31\_t \*pIa, Inverse Clarke transform for Q31 version.

**Scaling and Overflow Behavior** The function is implemented using an internal 32-bit accumulator. The accumulator maintains 1.31 format by truncating lower 31 bits of the intermediate multiplication in 2.62 format. There is saturation on the subtraction, hence there is no risk of overflow.

#### **Parameters**

- Ialpha [in] input two-phase orthogonal vector axis alpha
- **Ibeta** [in] input two-phase orthogonal vector axis beta
- pla [out] points to output three-phase coordinate a
- plb [out] points to output three-phase coordinate b

Returns none

## **Vector Park Transform**

\_\_STATIC\_FORCEINLINE void riscv\_park\_f32 (float32\_t Ialpha, float32\_t Ibeta, float32\_t \*pIc\_STATIC\_FORCEINLINE void riscv\_park\_q31 (q31\_t Ialpha, q31\_t Ibeta, q31\_t \*pId, q31\_t \*pIc\_group park

end of PID group

Forward Park transform converts the input two-coordinate vector to flux and torque components. The Park transform can be used to realize the transformation of the <code>Ialpha</code> and the <code>Ibeta</code> currents from the stationary to the moving reference frame and control the spatial relationship between the stator vector current and rotor flux vector. If we consider the d axis aligned with the rotor flux, the diagram below shows the current vector and the relationship from the two reference frames:

The function operates on a single sample of data and each call to the function returns the processed output. The library provides separate functions for Q31 and floating-point data types.



Algorithm

where Ialpha and Ibeta are the stator vector components, pId and pIq are rotor vector components and cosVal and sinVal are the cosine and sine values of theta (rotor flux position).

**Fixed-Point Behavior** Care must be taken when using the Q31 version of the Park transform. In particular, the overflow and saturation behavior of the accumulator used must be considered. Refer to the function specific documentation below for usage guidelines.

#### **Functions**

\_\_STATIC\_FORCEINLINE void riscv\_park\_f32 (float32\_t Ialpha, float32\_t Ibeta, float32\_t Floating-point Park transform.

The function implements the forward Park transform.

#### **Parameters**

- Ialpha [in] input two-phase vector coordinate alpha
- **Ibeta [in]** input two-phase vector coordinate beta
- pId [out] points to output rotor reference frame d
- plq [out] points to output rotor reference frame q
- sinVal [in] sine value of rotation angle theta
- cosVal [in] cosine value of rotation angle theta

## Returns none

\_\_STATIC\_FORCEINLINE void riscv\_park\_q31 (q31\_t Ialpha, q31\_t Ibeta, q31\_t \*pId, q31\_t Park transform for Q31 version.

**Scaling and Overflow Behavior** The function is implemented using an internal 32-bit accumulator. The accumulator maintains 1.31 format by truncating lower 31 bits of the intermediate multiplication in 2.62 format. There is saturation on the addition and subtraction, hence there is no risk of overflow.

#### **Parameters**

- Ialpha [in] input two-phase vector coordinate alpha
- **Ibeta** [in] input two-phase vector coordinate beta
- pId [out] points to output rotor reference frame d
- pIq [out] points to output rotor reference frame q
- sinVal [in] sine value of rotation angle theta
- cosVal [in] cosine value of rotation angle theta

#### Returns none

#### **Vector Inverse Park transform**

\_\_STATIC\_FORCEINLINE void riscv\_inv\_park\_f32 (float32\_t Id, float32\_t Iq, float32\_t \*pIalpha\_static\_forceinLine void riscv\_inv\_park\_q31 (q31\_t Id, q31\_t Iq, q31\_t \*pIalpha, q

end of park group

Inverse Park transform converts the input flux and torque components to two-coordinate vector.

The function operates on a single sample of data and each call to the function returns the processed output. The library provides separate functions for Q31 and floating-point data types.

## Algorithm

where plalpha and plbeta are the stator vector components, Id and Iq are rotor vector components and cosVal and sinVal are the cosine and sine values of theta (rotor flux position).

**Fixed-Point Behavior** Care must be taken when using the Q31 version of the Park transform. In particular, the overflow and saturation behavior of the accumulator used must be considered. Refer to the function specific documentation below for usage guidelines.

### **Functions**

\_\_STATIC\_FORCEINLINE void riscv\_inv\_park\_f32 (float32\_t Id, float32\_t Iq, float32\_t \*p Floating-point Inverse Park transform.

### **Parameters**

- Id [in] input coordinate of rotor reference frame d
- Iq [in] input coordinate of rotor reference frame q
- plalpha [out] points to output two-phase orthogonal vector axis alpha
- plbeta [out] points to output two-phase orthogonal vector axis beta
- sinVal [in] sine value of rotation angle theta
- cosVal [in] cosine value of rotation angle theta

#### Returns none

\_\_STATIC\_FORCEINLINE void riscv\_inv\_park\_q31 (q31\_t Id, q31\_t Iq, q31\_t \*pIalpha, q31\_t Inverse Park transform for Q31 version.

**Scaling and Overflow Behavior** The function is implemented using an internal 32-bit accumulator. The accumulator maintains 1.31 format by truncating lower 31 bits of the intermediate multiplication in 2.62 format. There is saturation on the addition, hence there is no risk of overflow.

### **Parameters**

- Id [in] input coordinate of rotor reference frame d
- Iq [in] input coordinate of rotor reference frame q

- plalpha [out] points to output two-phase orthogonal vector axis alpha
- plbeta [out] points to output two-phase orthogonal vector axis beta
- sinVal [in] sine value of rotation angle theta
- cosVal [in] cosine value of rotation angle theta

#### Returns none

#### **Sine Cosine**

```
void riscv_sin_cos_f32 (float32_t theta, float32_t *pSinVal, float32_t *pCosVal) void riscv_sin_cos_q31 (q31_t theta, q31_t *pSinVal, q31_t *pCosVal) group SinCos
```

Computes the trigonometric sine and cosine values using a combination of table lookup and linear interpolation. There are separate functions for Q31 and floating-point data types. The input to the floating-point version is in degrees while the fixed-point Q31 have a scaled input with the range [-1 0.9999] mapping to [-180 +180] degrees.

The floating point function also allows values that are out of the usual range. When this happens, the function will take extra time to adjust the input value to the range of [-180 180].

The result is accurate to 5 digits after the decimal point.

The implementation is based on table lookup using 360 values together with linear interpolation. The steps used are:

- 1. Calculation of the nearest integer table index.
- 2. Compute the fractional portion (fract) of the input.
- 3. Fetch the value corresponding to index from sine table to y0 and also value from index+1 to y1.
- 4. Sine value is computed as \*psinVal = y0 + (fract \* (y1 y0)).
- 5. Fetch the value corresponding to index from cosine table to y0 and also value from index+1 to y1.
- 6. Cosine value is computed as \*pcosVal = y0 + (fract \* (y1 y0)).

## **Functions**

```
void riscv_sin_cos_f32 (float32_t theta, float32_t *pSinVal, float32_t *pCosVal) Floating-point sin_cos function.
```

## **Parameters**

- theta [in] input value in degrees
- pSinVal [out] points to the processed sine output.
- pCosVal [out] points to the processed cos output.
- theta [in] input value in degrees
- pSinVal [out] points to processed sine output
- pCosVal [out] points to processed cosine output

#### Returns none

```
void riscv_sin_cos_q31 (q31_t theta, q31_t *pSinVal, q31_t *pCosVal) Q31 sin_cos function.
```

The Q31 input value is in the range [-1 0.999999] and is mapped to a degree value in the range [-180 179].

#### **Parameters**

- theta [in] scaled input value in degrees
- pSinVal [out] points to the processed sine output.
- pCosVal [out] points to the processed cosine output.
- theta [in] scaled input value in degrees
- pSinVal [out] points to processed sine output
- pCosVal [out] points to processed cosine output

Returns none

group groupController

## 3.3.8 Statistics Functions

### **Maximum**

```
void riscv_max_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult, uint32_t *pIndex) void riscv_max_no_idx_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult) void riscv_max_q15 (const q15_t *pSrc, uint32_t blockSize, q15_t *pResult, uint32_t *pIndex) void riscv_max_q31 (const q31_t *pSrc, uint32_t blockSize, q31_t *pResult, uint32_t *pIndex) void riscv_max_q7 (const q7_t *pSrc, uint32_t blockSize, q7_t *pResult, uint32_t *pIndex) void riscv_max_q7 (const q7_t *pSrc, uint32_t blockSize, q7_t *pResult, uint32_t *pIndex) group Max
```

Computes the maximum value of an array of data. The function returns both the maximum value and its position within the array. There are separate functions for floating-point, Q31, Q15, and Q7 data types.

#### **Functions**

```
void riscv_max_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult, uint32_t *pIn-
dex)

Maximum value of a floating-point vector.
```

### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] maximum value returned here
- pIndex [out] index of maximum value returned here

Returns none

```
void riscv_max_no_idx_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult) Maximum value of a floating-point vector.
```

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] maximum value returned here

#### Returns none

void riscv\_max\_q15 (const q15\_t \*pSrc, uint32\_t blockSize, q15\_t \*pResult, uint32\_t \*pIndex)

Maximum value of a Q15 vector.

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] maximum value returned here
- pIndex [out] index of maximum value returned here

#### Returns none

void **riscv\_max\_q31** (**const** q31\_t \*pSrc, uint32\_t blockSize, q31\_t \*pResult, uint32\_t \*pIndex) Maximum value of a Q31 vector.

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] maximum value returned here
- pIndex [out] index of maximum value returned here

## Returns none

void **riscv\_max\_q7** (**const** q7\_t \**pSrc*, uint32\_t *blockSize*, q7\_t \**pResult*, uint32\_t \**pIndex*) Maximum value of a Q7 vector.

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] maximum value returned here
- pIndex [out] index of maximum value returned here

### Returns none

#### Mean

```
void riscv_mean_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult) void riscv_mean_q15 (const q15_t *pSrc, uint32_t blockSize, q15_t *pResult) void riscv_mean_q31 (const q31_t *pSrc, uint32_t blockSize, q31_t *pResult) void riscv_mean_q7 (const q7_t *pSrc, uint32_t blockSize, q7_t *pResult) void riscv_mean_q7 (const q7_t *pSrc, uint32_t blockSize, q7_t *pResult) group mean
```

## **Functions**

void **riscv\_mean\_f32** (**const** float32\_t \*pSrc, uint32\_t blockSize, float32\_t \*pResult) Mean value of a floating-point vector.

#### **Parameters**

- pSrc [in] points to the input vector.
- blockSize [in] number of samples in input vector.
- pResult [out] mean value returned here.

## Returns none

```
void riscv_mean_q15 (const q15_t *pSrc, uint32_t blockSize, q15_t *pResult) Mean value of a Q15 vector.
```

**Scaling and Overflow Behavior** The function is implemented using a 32-bit internal accumulator. The input is represented in 1.15 format and is accumulated in a 32-bit accumulator in 17.15 format. There is no risk of internal overflow with this approach, and the full precision of intermediate result is preserved. Finally, the accumulator is truncated to yield a result of 1.15 format.

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] mean value returned here

Returns none

```
void riscv_mean_q31 (const q31_t *pSrc, uint32_t blockSize, q31_t *pResult) Mean value of a Q31 vector.
```

**Scaling and Overflow Behavior** The function is implemented using a 64-bit internal accumulator. The input is represented in 1.31 format and is accumulated in a 64-bit accumulator in 33.31 format. There is no risk of internal overflow with this approach, and the full precision of intermediate result is preserved. Finally, the accumulator is truncated to yield a result of 1.31 format.

## **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] mean value returned here

### Returns none

```
void riscv_mean_q7 (const q7_t *pSrc, uint32_t blockSize, q7_t *pResult) Mean value of a Q7 vector.
```

**Scaling and Overflow Behavior** The function is implemented using a 32-bit internal accumulator. The input is represented in 1.7 format and is accumulated in a 32-bit accumulator in 25.7 format. There is no risk of internal overflow with this approach, and the full precision of intermediate result is preserved. Finally, the accumulator is truncated to yield a result of 1.7 format.

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] mean value returned here

Returns none

#### **Minimum**

```
void riscv_min_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult, uint32_t *pIndex) void riscv_min_q15 (const q15_t *pSrc, uint32_t blockSize, q15_t *pResult, uint32_t *pIndex) void riscv_min_q31 (const q31_t *pSrc, uint32_t blockSize, q31_t *pResult, uint32_t *pIndex) void riscv_min_q7 (const q7_t *pSrc, uint32_t blockSize, q7_t *pResult, uint32_t *pIndex) void riscv_min_q7 (const q7_t *pSrc, uint32_t blockSize, q7_t *pResult, uint32_t *pIndex) group Min
```

Computes the minimum value of an array of data. The function returns both the minimum value and its position within the array. There are separate functions for floating-point, Q31, Q15, and Q7 data types.

### **Functions**

```
void riscv_min_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult, uint32_t *pIn-dex)

Minimum value of a floating-point vector.
```

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] minimum value returned here
- pIndex [out] index of minimum value returned here

### Returns none

```
void riscv_min_q15 (const q15_t *pSrc, uint32_t blockSize, q15_t *pResult, uint32_t *pIndex) Minimum value of a O15 vector.
```

### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] minimum value returned here
- pIndex [out] index of minimum value returned here

#### Returns none

```
void riscv_min_q31 (const q31_t *pSrc, uint32_t blockSize, q31_t *pResult, uint32_t *pIndex) Minimum value of a Q31 vector.
```

### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] minimum value returned here

• pIndex – [out] index of minimum value returned here

#### Returns none

```
void riscv_min_q7 (const q7_t *pSrc, uint32_t blockSize, q7_t *pResult, uint32_t *pIndex) Minimum value of a Q7 vector.
```

## **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] minimum value returned here
- pIndex [out] index of minimum value returned here

Returns none

#### **Power**

```
void riscv_power_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult) void riscv_power_q15 (const q15_t *pSrc, uint32_t blockSize, q63_t *pResult) void riscv_power_q31 (const q31_t *pSrc, uint32_t blockSize, q63_t *pResult) void riscv_power_q7 (const q7_t *pSrc, uint32_t blockSize, q31_t *pResult) group power
```

Calculates the sum of the squares of the elements in the input vector. The underlying algorithm is used:

There are separate functions for floating point, Q31, Q15, and Q7 data types.

Since the result is not divided by the length, those functions are in fact computing something which is more an energy than a power.

### **Functions**

```
void riscv_power_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult) Sum of the squares of the elements of a floating-point vector.
```

### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] sum of the squares value returned here

### Returns none

```
void riscv_power_q15 (const q15_t *pSrc, uint32_t blockSize, q63_t *pResult) Sum of the squares of the elements of a Q15 vector.
```

**Scaling and Overflow Behavior** The function is implemented using a 64-bit internal accumulator. The input is represented in 1.15 format. Intermediate multiplication yields a 2.30 format, and this result is added without saturation to a 64-bit accumulator in 34.30 format. With 33 guard bits in the accumulator, there is no risk of overflow, and the full precision of the intermediate multiplication is preserved. Finally, the return result is in 34.30 format.

## **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] sum of the squares value returned here

#### Returns none

```
void riscv_power_q31 (const q31_t *pSrc, uint32_t blockSize, q63_t *pResult) Sum of the squares of the elements of a Q31 vector.
```

Scaling and Overflow Behavior The function is implemented using a 64-bit internal accumulator. The input is represented in 1.31 format. Intermediate multiplication yields a 2.62 format, and this result is truncated to 2.48 format by discarding the lower 14 bits. The 2.48 result is then added without saturation to a 64-bit accumulator in 16.48 format. With 15 guard bits in the accumulator, there is no risk of overflow, and the full precision of the intermediate multiplication is preserved. Finally, the return result is in 16.48 format.

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] sum of the squares value returned here

#### Returns none

```
void riscv_power_q7 (const q7_t *pSrc, uint32_t blockSize, q31_t *pResult) Sum of the squares of the elements of a Q7 vector.
```

**Scaling and Overflow Behavior** The function is implemented using a 32-bit internal accumulator. The input is represented in 1.7 format. Intermediate multiplication yields a 2.14 format, and this result is added without saturation to an accumulator in 18.14 format. With 17 guard bits in the accumulator, there is no risk of overflow, and the full precision of the intermediate multiplication is preserved. Finally, the return result is in 18.14 format.

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] sum of the squares value returned here

Returns none

## Root mean square (RMS)

```
void riscv_rms_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult) void riscv_rms_q15 (const q15_t *pSrc, uint32_t blockSize, q15_t *pResult) void riscv_rms_q31 (const q31_t *pSrc, uint32_t blockSize, q31_t *pResult) group RMS
```

Calculates the Root Mean Square of the elements in the input vector. The underlying algorithm is used:

There are separate functions for floating point, Q31, and Q15 data types.

### **Functions**

void **riscv\_rms\_f32** (**const** float32\_t \*pSrc, uint32\_t blockSize, float32\_t \*pResult) Root Mean Square of the elements of a floating-point vector.

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] root mean square value returned here

## Returns none

```
void riscv_rms_q15 (const q15_t *pSrc, uint32_t blockSize, q15_t *pResult) Root Mean Square of the elements of a Q15 vector.
```

**Scaling and Overflow Behavior** The function is implemented using a 64-bit internal accumulator. The input is represented in 1.15 format. Intermediate multiplication yields a 2.30 format, and this result is added without saturation to a 64-bit accumulator in 34.30 format. With 33 guard bits in the accumulator, there is no risk of overflow, and the full precision of the intermediate multiplication is preserved. Finally, the 34.30 result is truncated to 34.15 format by discarding the lower 15 bits, and then saturated to yield a result in 1.15 format.

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] root mean square value returned here

Returns none

```
void riscv_rms_q31 (const q31_t *pSrc, uint32_t blockSize, q31_t *pResult) Root Mean Square of the elements of a Q31 vector.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The input is represented in 1.31 format, and intermediate multiplication yields a 2.62 format. The accumulator maintains full precision of the intermediate multiplication results, but provides only a single guard bit. There is no saturation on intermediate additions. If the accumulator overflows, it wraps around and distorts the result. In order to avoid overflows completely, the input signal must be scaled down by log2(blockSize) bits, as a total of blockSize additions are performed internally. Finally, the 2.62 accumulator is right shifted by 31 bits to yield a 1.31 format value.

### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- $\bullet$  pResult [out] root mean square value returned here

Returns none

#### Standard deviation

```
void riscv_std_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult) void riscv_std_q15 (const q15_t *pSrc, uint32_t blockSize, q15_t *pResult) void riscv_std_q31 (const q31_t *pSrc, uint32_t blockSize, q31_t *pResult) group STD
```

Calculates the standard deviation of the elements in the input vector.

The float implementation is relying on riscv\_var\_f32 which is using a two-pass algorithm to avoid problem of numerical instabilities and cancellation errors.

Fixed point versions are using the standard textbook algorithm since the fixed point numerical behavior is different from the float one.

Algorithm for fixed point versions is summarized below:

There are separate functions for floating point, Q31, and Q15 data types.

#### **Functions**

void **riscv\_std\_f32** (**const** float32\_t \**pSrc*, uint32\_t *blockSize*, float32\_t \**pResult*) Standard deviation of the elements of a floating-point vector.

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] standard deviation value returned here

Returns none

```
void riscv_std_q15 (const q15_t *pSrc, uint32_t blockSize, q15_t *pResult) Standard deviation of the elements of a Q15 vector.
```

Scaling and Overflow Behavior The function is implemented using a 64-bit internal accumulator. The input is represented in 1.15 format. Intermediate multiplication yields a 2.30 format, and this result is added without saturation to a 64-bit accumulator in 34.30 format. With 33 guard bits in the accumulator, there is no risk of overflow, and the full precision of the intermediate multiplication is preserved. Finally, the 34.30 result is truncated to 34.15 format by discarding the lower 15 bits, and then saturated to yield a result in 1.15 format.

## **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] standard deviation value returned here

Returns none

```
void riscv_std_q31 (const q31_t *pSrc, uint32_t blockSize, q31_t *pResult) Standard deviation of the elements of a Q31 vector.
```

**Scaling and Overflow Behavior** The function is implemented using an internal 64-bit accumulator. The input is represented in 1.31 format, which is then downshifted by 8 bits which yields 1.23, and intermediate multiplication yields a 2.46 format. The accumulator maintains full precision of the intermediate

multiplication results, but provides only a 16 guard bits. There is no saturation on intermediate additions. If the accumulator overflows it wraps around and distorts the result. In order to avoid overflows completely the input signal must be scaled down by log2(blockSize)-8 bits, as a total of blockSize additions are performed internally. After division, internal variables should be Q18.46 Finally, the 18.46 accumulator is right shifted by 15 bits to yield a 1.31 format value.

#### **Parameters**

- pSrc [in] points to the input vector.
- blockSize [in] number of samples in input vector.
- pResult [out] standard deviation value returned here.

Returns none

#### **Variance**

```
void riscv_var_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult) void riscv_var_q15 (const q15_t *pSrc, uint32_t blockSize, q15_t *pResult) void riscv_var_q31 (const q31_t *pSrc, uint32_t blockSize, q31_t *pResult) group variance
```

Calculates the variance of the elements in the input vector. The underlying algorithm used is the direct method sometimes referred to as the two-pass method:

There are separate functions for floating point, Q31, and Q15 data types.

## **Functions**

```
void riscv_var_f32 (const float32_t *pSrc, uint32_t blockSize, float32_t *pResult) Variance of the elements of a floating-point vector.
```

### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] variance value returned here

## Returns none

```
void riscv_var_q15 (const q15_t *pSrc, uint32_t blockSize, q15_t *pResult) Variance of the elements of a Q15 vector.
```

**Scaling and Overflow Behavior** The function is implemented using a 64-bit internal accumulator. The input is represented in 1.15 format. Intermediate multiplication yields a 2.30 format, and this result is added without saturation to a 64-bit accumulator in 34.30 format. With 33 guard bits in the accumulator, there is no risk of overflow, and the full precision of the intermediate multiplication is preserved. Finally, the 34.30 result is truncated to 34.15 format by discarding the lower 15 bits, and then saturated to yield a result in 1.15 format.

### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector

• pResult – [out] variance value returned here

Returns none

```
void riscv_var_q31 (const q31_t *pSrc, uint32_t blockSize, q31_t *pResult) Variance of the elements of a Q31 vector.
```

Scaling and Overflow Behavior The function is implemented using an internal 64-bit accumulator. The input is represented in 1.31 format, which is then downshifted by 8 bits which yields 1.23, and intermediate multiplication yields a 2.46 format. The accumulator maintains full precision of the intermediate multiplication results, and as a consequence has only 16 guard bits. There is no saturation on intermediate additions. If the accumulator overflows it wraps around and distorts the result. In order to avoid overflows completely the input signal must be scaled down by log2(blockSize)-8 bits, as a total of blockSize additions are performed internally. After division, internal variables should be Q18.46 Finally, the 18.46 accumulator is right shifted by 15 bits to yield a 1.31 format value.

#### **Parameters**

- pSrc [in] points to the input vector
- blockSize [in] number of samples in input vector
- pResult [out] variance value returned here

Returns none

group groupStats

# 3.3.9 Support Functions

## **Vector Copy**

```
void riscv_copy_f32 (const float32_t *pSrc, float32_t *pDst, uint32_t blockSize) void riscv_copy_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t blockSize) void riscv_copy_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t blockSize) void riscv_copy_q7 (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize) void riscv_copy_q7 (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize) group copy
```

Copies sample by sample from source vector to destination vector.

There are separate functions for floating point, Q31, Q15, and Q7 data types.

#### **Functions**

```
void riscv_copy_f32 (const float32_t *pSrc, float32_t *pDst, uint32_t blockSize) Copies the elements of a floating-point vector.
```

#### **Parameters**

- pSrc [in] points to input vector
- pDst [out] points to output vector
- blockSize [in] number of samples in each vector

Returns none

```
void riscv_copy_q15 (const q15_t *pSrc, q15_t *pDst, uint32_t blockSize) Copies the elements of a Q15 vector.
```

#### **Parameters**

- pSrc [in] points to input vector
- pDst [out] points to output vector
- blockSize [in] number of samples in each vector

#### Returns none

```
void riscv_copy_q31 (const q31_t *pSrc, q31_t *pDst, uint32_t blockSize) Copies the elements of a Q31 vector.
```

#### **Parameters**

- pSrc [in] points to input vector
- pDst [out] points to output vector
- blockSize [in] number of samples in each vector

#### Returns none

```
void riscv_copy_q7 (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize)
Copies the elements of a Q7 vector.
```

#### **Parameters**

- pSrc [in] points to input vector
- pDst [out] points to output vector
- blockSize [in] number of samples in each vector

### Returns none

## **Vector Fill**

```
void riscv_fill_f32 (float32_t value, float32_t *pDst, uint32_t blockSize) void riscv_fill_q15 (q15_t value, q15_t *pDst, uint32_t blockSize) void riscv_fill_q31 (q31_t value, q31_t *pDst, uint32_t blockSize) void riscv_fill_q7 (q7_t value, q7_t *pDst, uint32_t blockSize) void riscv_fill_q7 (q7_t value, q7_t *pDst, uint32_t blockSize) group Fill
```

Fills the destination vector with a constant value.

There are separate functions for floating point, Q31, Q15, and Q7 data types.

#### **Functions**

void **riscv\_fill\_f32** (float32\_t *value*, float32\_t \**pDst*, uint32\_t *blockSize*) Fills a constant value into a floating-point vector.

#### **Parameters**

- value [in] input value to be filled
- pDst [out] points to output vector
- blockSize [in] number of samples in each vector

#### Returns none

void **riscv\_fill\_q15** (q15\_t *value*, q15\_t \**pDst*, uint32\_t *blockSize*) Fills a constant value into a Q15 vector.

#### **Parameters**

- value [in] input value to be filled
- pDst [out] points to output vector
- blockSize [in] number of samples in each vector

#### Returns none

void **riscv\_fill\_q31** (q31\_t *value*, q31\_t \**pDst*, uint32\_t *blockSize*) Fills a constant value into a Q31 vector.

#### **Parameters**

- value [in] input value to be filled
- pDst [out] points to output vector
- blockSize [in] number of samples in each vector

#### Returns none

void **riscv\_fill\_q7** (q7\_t *value*, q7\_t \**pDst*, uint32\_t *blockSize*) Fills a constant value into a Q7 vector.

### **Parameters**

- value [in] input value to be filled
- pDst [out] points to output vector
- blockSize [in] number of samples in each vector

### Returns none

## Convert 32-bit floating point value

```
void riscv_float_to_q15 (const float32_t *pSrc, q15_t *pDst, uint32_t blockSize)
void riscv_float_to_q31 (const float32_t *pSrc, q31_t *pDst, uint32_t blockSize)
void riscv_float_to_q7 (const float32_t *pSrc, q7_t *pDst, uint32_t blockSize)
group float_to_x
```

## **Functions**

void **riscv\_float\_to\_q15** (**const** float32\_t \**pSrc*, q15\_t \**pDst*, uint32\_t *blockSize*) Converts the elements of the floating-point vector to Q15 vector.

**Details** The equation used for the conversion process is:

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated.

**Note:** In order to apply rounding, the library should be rebuilt with the ROUNDING macro defined in the preprocessor section of project options.

## **Parameters**

- pSrc [in] points to the floating-point input vector
- pDst [out] points to the Q15 output vector
- blockSize [in] number of samples in each vector

Returns none

```
void riscv_float_to_q31 (const float32_t *pSrc, q31_t *pDst, uint32_t blockSize) Converts the elements of the floating-point vector to Q31 vector.
```

**Details** The equation used for the conversion process is:

**Scaling and Overflow Behavior** The function uses saturating arithmetic. Results outside of the allowable Q31 range[0x80000000 0x7FFFFFFF] are saturated.

**Note:** In order to apply rounding, the library should be rebuilt with the ROUNDING macro defined in the preprocessor section of project options.

#### **Parameters**

- pSrc [in] points to the floating-point input vector
- pDst [out] points to the Q31 output vector
- blockSize [in] number of samples in each vector

Returns none

```
void riscv_float_to_q7 (const float32_t *pSrc, q7_t *pDst, uint32_t blockSize) Converts the elements of the floating-point vector to Q7 vector.
```

### **Description:**

The equation used for the conversion process is:

### **Scaling and Overflow Behavior:**

The function uses saturating arithmetic. Results outside of the allowable Q7 range [0x80 0x7F] will be saturated.

**Note:** In order to apply rounding, the library should be rebuilt with the ROUNDING macro defined in the preprocessor section of project options.

### **Parameters**

- \*pSrc [in] points to the floating-point input vector
- \*pDst [out] points to the Q7 output vector
- blockSize [in] length of the input vector

Returns none.

# Convert 16-bit Integer value

```
void riscv_q15_to_float (const q15_t *pSrc, float32_t *pDst, uint32_t blockSize) void riscv_q15_to_q31 (const q15_t *pSrc, q31_t *pDst, uint32_t blockSize) void riscv_q15_to_q7 (const q15_t *pSrc, q7_t *pDst, uint32_t blockSize) group q15_to_x
```

## **Functions**

```
void riscv_q15_to_float (const q15_t *pSrc, float32_t *pDst, uint32_t blockSize) Converts the elements of the Q15 vector to floating-point vector.
```

**Details** The equation used for the conversion process is:

### **Parameters**

- pSrc [in] points to the Q15 input vector
- pDst [out] points to the floating-point output vector
- blockSize [in] number of samples in each vector

### Returns none

```
void riscv_q15_to_q31 (const q15_t *pSrc, q31_t *pDst, uint32_t blockSize) Converts the elements of the Q15 vector to Q31 vector.
```

**Details** The equation used for the conversion process is:

#### **Parameters**

- pSrc [in] points to the Q15 input vector
- pDst [out] points to the Q31 output vector
- blockSize [in] number of samples in each vector

#### Returns none

```
void riscv_q15_to_q7 (const q15_t *pSrc, q7_t *pDst, uint32_t blockSize) Converts the elements of the Q15 vector to Q7 vector.
```

**Details** The equation used for the conversion process is:

#### **Parameters**

- pSrc [in] points to the Q15 input vector
- pDst [out] points to the Q7 output vector
- blockSize [in] number of samples in each vector

Returns none

# Convert 32-bit Integer value

```
void riscv_q31_to_float (const q31_t *pSrc, float32_t *pDst, uint32_t blockSize) void riscv_q31_to_q15 (const q31_t *pSrc, q15_t *pDst, uint32_t blockSize) void riscv_q31_to_q7 (const q31_t *pSrc, q7_t *pDst, uint32_t blockSize) group q31_to_x
```

### **Functions**

```
void riscv_q31_to_float (const q31_t *pSrc, float32_t *pDst, uint32_t blockSize) Converts the elements of the Q31 vector to floating-point vector.
```

**Details** The equation used for the conversion process is:

#### **Parameters**

- pSrc [in] points to the Q31 input vector
- pDst [out] points to the floating-point output vector
- blockSize [in] number of samples in each vector

Returns none

```
void riscv_q31_to_q15 (const q31_t *pSrc, q15_t *pDst, uint32_t blockSize) Converts the elements of the Q31 vector to Q15 vector.
```

**Details** The equation used for the conversion process is:

# **Parameters**

- pSrc [in] points to the Q31 input vector
- pDst [out] points to the Q15 output vector
- blockSize [in] number of samples in each vector

**Returns** none

```
void riscv_q31_to_q7 (const q31_t *pSrc, q7_t *pDst, uint32_t blockSize) Converts the elements of the Q31 vector to Q7 vector.
```

**Details** The equation used for the conversion process is:

## **Parameters**

- pSrc [in] points to the Q31 input vector
- pDst [out] points to the Q7 output vector
- blockSize [in] number of samples in each vector

Returns none

## **Convert 8-bit Integer value**

```
void riscv_q7_to_float (const q7_t *pSrc, float32_t *pDst, uint32_t blockSize) void riscv_q7_to_q15 (const q7_t *pSrc, q15_t *pDst, uint32_t blockSize) void riscv_q7_to_q31 (const q7_t *pSrc, q31_t *pDst, uint32_t blockSize) group q7_to_x
```

#### **Functions**

```
void riscv_q7_to_float (const q7_t *pSrc, float32_t *pDst, uint32_t blockSize)

Converts the elements of the Q7 vector to floating-point vector.
```

**Details** The equation used for the conversion process is:

#### **Parameters**

- pSrc [in] points to the Q7 input vector
- pDst [out] points to the floating-point output vector
- blockSize [in] number of samples in each vector

Returns none

```
void riscv_q7_to_q15 (const q7_t *pSrc, q15_t *pDst, uint32_t blockSize) Converts the elements of the Q7 vector to Q15 vector.
```

**Details** The equation used for the conversion process is:

### **Parameters**

- pSrc [in] points to the Q7 input vector
- pDst [out] points to the Q15 output vector
- blockSize [in] number of samples in each vector

Returns none

```
void riscv_q7_to_q31 (const q7_t *pSrc, q31_t *pDst, uint32_t blockSize) Converts the elements of the Q7 vector to Q31 vector.
```

**Details** The equation used for the conversion process is:

### **Parameters**

- pSrc [in] points to the Q7 input vector
- pDst [out] points to the Q31 output vector

• blockSize - [in] number of samples in each vector

Returns none

group groupSupport

# 3.3.10 Interpolation Functions

# **Linear Interpolation**

```
float32_t riscv_linear_interp_f32 (riscv_linear_interp_instance_f32 *S, float32_t x)
q31_t riscv_linear_interp_q31 (q31_t *pYData, q31_t x, uint32_t nValues)
q15_t riscv_linear_interp_q15 (q15_t *pYData, q31_t x, uint32_t nValues)
q7_t riscv_linear_interp_q7 (q7_t *pYData, q31_t x, uint32_t nValues)
```

# group LinearInterpolate

Linear interpolation is a method of curve fitting using linear polynomials. Linear interpolation works by effectively drawing a straight line between two neighboring samples and returning the appropriate point along that line

end of SplineInterpolate group



A Linear Interpolate function calculates an output value(y), for the input(x) using linear interpolation of the input values x0, x1 (nearest input values) and the output values y0 and y1(nearest output values)

### Algorithm:

This set of functions implements Linear interpolation process for Q7, Q15, Q31, and floating-point data types. The functions operate on a single sample of data and each call to the function returns a single processed value. S points to an instance of the Linear Interpolate function data structure. x is the input sample value. The functions returns the output value.

if x is outside of the table boundary, Linear interpolation returns first value of the table if x is below input range and returns last value of table if x is above range.

#### **Functions**

float32\_t **riscv\_linear\_interp\_f32** (riscv\_linear\_interp\_instance\_f32 \**S*, float32\_t *x*) Process function for the floating-point Linear Interpolation Function.

#### **Parameters**

- S [inout] is an instance of the floating-point Linear Interpolation structure
- $\mathbf{x} [\mathbf{in}]$  input sample to process

**Returns** y processed output sample.

q31\_t riscv\_linear\_interp\_q31 (q31\_t \*pYData, q31\_t x, uint32\_t nValues)
Process function for the Q31 Linear Interpolation Function.

Input sample  $\times$  is in 12.20 format which contains 12 bits for table index and 20 bits for fractional part. This function can support maximum of table size  $2^{12}$ .

#### **Parameters**

- pYData [in] pointer to Q31 Linear Interpolation table
- **x** [in] input sample to process
- nValues [in] number of table values

**Returns** y processed output sample.

q15\_t riscv\_linear\_interp\_q15 (q15\_t \*pYData, q31\_t x, uint32\_t nValues)
Process function for the Q15 Linear Interpolation Function.

Input sample x is in 12.20 format which contains 12 bits for table index and 20 bits for fractional part. This function can support maximum of table size 2^12.

#### **Parameters**

- pYData [in] pointer to Q15 Linear Interpolation table
- **x** [in] input sample to process
- nValues [in] number of table values

**Returns** y processed output sample.

q7\_t riscv\_linear\_interp\_q7 (q7\_t \*pYData, q31\_t x, uint32\_t nValues)
Process function for the Q7 Linear Interpolation Function.

Input sample x is in 12.20 format which contains 12 bits for table index and 20 bits for fractional part. This function can support maximum of table size  $2^{12}$ .

### **Parameters**

- pYData [in] pointer to Q7 Linear Interpolation table
- $\mathbf{x} [\mathbf{in}]$  input sample to process
- nValues [in] number of table values

**Returns** y processed output sample.

## **Bilinear Interpolation**

```
float32_t riscv_bilinear_interp_f32 (const riscv_bilinear_interp_instance_f32 *S, float32_t X, float32_t Y)

q31_t riscv_bilinear_interp_q31 (riscv_bilinear_interp_instance_q31 *S, q31_t X, q31_t Y)

q15_t riscv_bilinear_interp_q15 (riscv_bilinear_interp_instance_q15 *S, q31_t X, q31_t Y)

q7_t riscv_bilinear_interp_q7 (riscv_bilinear_interp_instance_q7 *S, q31_t X, q31_t Y)

group BilinearInterpolate
```

Bilinear interpolation is an extension of linear interpolation applied to a two dimensional grid. The underlying function f(x, y) is sampled on a regular grid and the interpolation process determines values between the grid points. Bilinear interpolation is equivalent to two step linear interpolation, first in the x-dimension and then in the y-dimension. Bilinear interpolation is often used in image processing to rescale images. The NMSIS DSP library provides bilinear interpolation functions for Q7, Q15, Q31, and floating-point data types.

## Algorithm end of LinearInterpolate group

The instance structure used by the bilinear interpolation functions describes a two dimensional data table. For floating-point, the instance structure is defined as:

where numRows specifies the number of rows in the table; numCols specifies the number of columns in the table; and pData points to an array of size numRows\*numCols values. The data table pTable is organized in row order and the supplied data values fall on integer indexes. That is, table element (x,y) is located at pTable[x + y\*numCols] where x and y are integers.

Let (x, y) specify the desired interpolation point. Then define:

The interpolated output point is computed as: Note that the coordinates (x, y) contain integer and fractional components. The integer components specify which portion of the table to use while the fractional components control the interpolation processor.

if (x,y) are outside of the table boundary, Bilinear interpolation returns zero output.

## **Functions**

```
float32_t riscv_bilinear_interp_f32 (const riscv_bilinear_interp_instance_f32 *S, float32_t X, float32_t Y)
```

Floating-point bilinear interpolation.

### **Parameters**

- **S** [inout] points to an instance of the interpolation structure.
- **X** [in] interpolation coordinate.
- Y [in] interpolation coordinate.

**Returns** out interpolated value.

q31\_t riscv\_bilinear\_interp\_q31 (riscv\_bilinear\_interp\_instance\_q31 \*S, q31\_t X, q31\_t Y) Q31 bilinear interpolation.

#### **Parameters**

- **S [inout]** points to an instance of the interpolation structure.
- **x** [in] interpolation coordinate in 12.20 format.
- Y [in] interpolation coordinate in 12.20 format.

Returns out interpolated value.

q15\_t riscv\_bilinear\_interp\_q15 (riscv\_bilinear\_interp\_instance\_q15 \*S, q31\_t X, q31\_t Y) Q15 bilinear interpolation.

### **Parameters**

- **S [inout]** points to an instance of the interpolation structure.
- **x** [in] interpolation coordinate in 12.20 format.
- Y [in] interpolation coordinate in 12.20 format.

Returns out interpolated value.

q7\_t riscv\_bilinear\_interp\_q7 (riscv\_bilinear\_interp\_instance\_q7 \*S, q31\_t X, q31\_t Y) Q7 bilinear interpolation.

#### **Parameters**

- **S [inout]** points to an instance of the interpolation structure.
- **x** [in] interpolation coordinate in 12.20 format.
- Y [in] interpolation coordinate in 12.20 format.

Returns out interpolated value.

## group groupInterpolation

These functions perform 1- and 2-dimensional interpolation of data. Linear interpolation is used for 1-dimensional data and bilinear interpolation is used for 2-dimensional data.

# 3.3.11 Examples

## **Class Marks Example**

#### group ClassMarks

**Refer** riscv\_class\_marks\_example\_f32.c

Note: This example also demonstrates the usage of static initialization.

#### **Description:**

Demonstrates the use the Maximum, Minimum, Mean, Standard Deviation, Variance and Matrix functions to calculate statistical values of marks obtained in a class.

## **Variables Description:**

- testMarks\_f32 points to the marks scored by 20 students in 4 subjects
- max marks Maximum of all marks
- min\_marks Minimum of all marks
- mean Mean of all marks
- var Variance of the marks
- std Standard deviation of the marks
- numStudents Total number of students in the class

## **NMSIS DSP Software Library Functions Used:**

- riscv\_mat\_init\_f32()
- riscv\_mat\_mult\_f32()
- riscv\_max\_f32()
- riscv\_min\_f32()
- riscv mean f32()
- riscv\_std\_f32()
- riscv\_var\_f32()

## **Convolution Example**

## group ConvolutionExample

**Refer** riscv\_convolution\_example\_f32.c

## **Description:**

Demonstrates the convolution theorem with the use of the Complex FFT, Complex-by-Complex Multiplication, and Support Functions.

## Algorithm:

The convolution theorem states that convolution in the time domain corresponds to multiplication in the frequency domain. Therefore, the Fourier transform of the convolution of two signals is equal to the product of their individual Fourier transforms. The Fourier transform of a signal can be evaluated efficiently using the Fast Fourier Transform (FFT).

Two input signals, a [n] and b [n], with lengths n1 and n2 respectively, are zero padded so that their lengths become N, which is greater than or equal to (n1+n2-1) and is a power of 4 as FFT implementation is radix-4. The convolution of a [n] and b [n] is obtained by taking the FFT of the input signals, multiplying the Fourier transforms of the two signals, and taking the inverse FFT of the multiplied result.

This is denoted by the following equations: where A[k] and B[k] are the N-point FFTs of the signals a[n] and b[n] respectively. The length of the convolved signal is (n1+n2-1).

## **Block Diagram:**



- testInputA\_f32 points to the first input sequence
- srcALen length of the first input sequence
- testInputB\_f32 points to the second input sequence
- srcBLen length of the second input sequence
- outLen length of convolution output sequence, (srcALen + srcBLen 1)
- AxB points to the output array where the product of individual FFTs of inputs is stored.

# NMSIS DSP Software Library Functions Used:

- riscv\_fill\_f32()
- riscv\_copy\_f32()
- riscv\_cfft\_radix4\_init\_f32()
- riscv\_cfft\_radix4\_f32()
- riscv\_cmplx\_mult\_cmplx\_f32()

## **Dot Product Example**

### group DotproductExample

**Refer** riscv\_dotproduct\_example\_f32.c

## **Description:**

Demonstrates the use of the Multiply and Add functions to perform the dot product. The dot product of two vectors is obtained by multiplying corresponding elements and summing the products.

## Algorithm:

The two input vectors A and B with length n, are multiplied element-by-element and then added to obtain dot product.

This is denoted by the following equation:

# **Block Diagram:**



- srcA\_buf\_f32 points to first input vector
- srcB\_buf\_f32 points to second input vector
- testOutput stores dot product of the two input vectors.

# NMSIS DSP Software Library Functions Used:

- riscv\_mult\_f32()
- riscv\_add\_f32()

## Frequency Bin Example

## group FrequencyBin

**Refer** riscv\_fft\_bin\_example\_f32.c

## **Description**

Demonstrates the calculation of the maximum energy bin in the frequency domain of the input signal with the use of Complex FFT, Complex Magnitude, and Maximum functions.

## Algorithm:

The input test signal contains a 10 kHz signal with uniformly distributed white noise. Calculating the FFT of the input signal will give us the maximum energy of the bin corresponding to the input frequency of 10 kHz.



# **Block Diagram:**

The figure below shows the time domain signal of 10 kHz signal with uniformly distributed white noise, and the next figure shows the input in the frequency domain. The bin with maximum energy corresponds to 10 kHz signal.



- testInput\_f32\_10khz points to the input data
- testOutput points to the output data
- fftSize length of FFT
- ifftFlag flag for the selection of CFFT/CIFFT
- doBitReverse Flag for selection of normal order or bit reversed order
- refIndex reference index value at which maximum energy of bin ocuurs
- testIndex calculated index value at which maximum energy of bin ocuurs

## NMSIS DSP Software Library Functions Used:

- riscv\_cfft\_f32()
- riscv\_cmplx\_mag\_f32()
- riscv\_max\_f32()

## **FIR Lowpass Filter Example**

group FIRLPF

**Refer** riscv\_fir\_example\_f32.c

## **Description:**

Removes high frequency signal components from the input using an FIR lowpass filter. The example demonstrates how to configure an FIR filter and then pass data through it in a block-by-block fashion.



## Algorithm:

The input signal is a sum of two sine waves: 1 kHz and 15 kHz. This is processed by an FIR lowpass filter with cutoff frequency 6 kHz. The lowpass filter eliminates the 15 kHz signal leaving only the 1 kHz sine wave at the output.

The lowpass filter was designed using MATLAB with a sample rate of 48 kHz and a length of 29 points. The MATLAB code to generate the filter coefficients is shown below: The first argument is the "order" of the filter and is always one less than the desired length. The second argument is the normalized cutoff frequency. This is in the range 0 (DC) to 1.0 (Nyquist). A 6 kHz cutoff with a Nyquist frequency of 24 kHz lies at a normalized frequency of 6/24 = 0.25. The NMSIS FIR filter function requires the coefficients to be in time reversed order. The resulting filter coefficients and are shown below. Note that the filter is symmetric (a property of linear phase FIR filters) and the point of symmetry is sample 14. Thus the filter will have a delay of 14 samples for all frequencies.



The frequency response of the filter is shown next. The passband gain of the filter is 1.0 and it reaches 0.5 at the cutoff frequency 6 kHz.



The input signal is shown below. The left hand side shows the signal in the time domain while the right hand side is a frequency domain representation. The two sine wave components can be clearly seen.



The output of the filter is shown below. The 15 kHz component has been eliminated.



- testInput\_f32\_1kHz\_15kHz points to the input data
- refOutput points to the reference output data
- testOutput points to the test output data

- firStateF32 points to state buffer
- firCoeffs32 points to coefficient buffer
- blockSize number of samples processed at a time
- numBlocks number of frames

### **NMSIS DSP Software Library Functions Used:**

- riscv\_fir\_init\_f32()
- riscv\_fir\_f32()

# **Graphic Audio Equalizer Example**

### group GEQ5Band

**Refer** riscv\_graphic\_equalizer\_example\_q31.c

Note: The output chirp signal follows the gain or boost of each band.

## **Description:**

This example demonstrates how a 5-band graphic equalizer can be constructed using the Biquad cascade functions. A graphic equalizer is used in audio applications to vary the tonal quality of the audio.

## **Block Diagram:**

The design is based on a cascade of 5 filter sections.

Each filter section is 4th order and consists of a cascade of two Biquads. Each filter has a nominal gain of 0 dB (1.0 in linear units) and boosts or cuts signals within a specific frequency range. The edge frequencies between the 5 bands are 100, 500, 2000, and 6000 Hz. Each band has an adjustable boost or cut in the range of +/- 9 dB. For example, the band that extends from 500 to 2000 Hz has the response shown below:



With 1 dB steps, each filter has a total of 19 different settings. The filter coefficients for all possible 19 settings were precomputed in MATLAB and stored in a table. With 5 different tables, there are a total of 5 x 19 = 95 different 4th order filters. All 95 responses are shown below:



Each 4th order filter has 10 coefficients for a grand total of 950 different filter coefficients that must be tabulated. The input and output data is in Q31 format. For better noise performance, the two low frequency bands are implemented using the high precision 32x64-bit Biquad filters. The remaining 3 high frequency bands use standard 32x32-bit Biquad filters. The input signal used in the example is a logarithmic chirp.



The array bandGains specifies the gain in dB to apply in each band. For example, if bandGains= $\{0, -3, 6, 4, -6\}$ ; then the output signal will be:



## **Variables Description:**

- testInput\_f32 points to the input data
- testRefOutput\_f32 points to the reference output data
- testOutput points to the test output data
- inputQ31 temporary input buffer
- outputQ31 temporary output buffer
- biguadStateBand1Q31 points to state buffer for band1
- biquadStateBand2Q31 points to state buffer for band2
- biquadStateBand3Q31 points to state buffer for band3
- biquadStateBand4Q31 points to state buffer for band4
- biquadStateBand5Q31 points to state buffer for band5
- coeffTable points to coefficient buffer for all bands
- gainDB gain buffer which has gains applied for all the bands

## **NMSIS DSP Software Library Functions Used:**

- riscv\_biquad\_cas\_df1\_32x64\_init\_q31()
- riscv\_biquad\_cas\_df1\_32x64\_q31()
- riscv\_biquad\_cascade\_df1\_init\_q31()
- riscv\_biquad\_cascade\_df1\_q31()
- riscv\_scale\_q31()
- riscv\_scale\_f32()
- riscv\_float\_to\_q31()
- riscv\_q31\_to\_float()

# **Linear Interpolate Example**

# group LinearInterpExample

## NMSIS DSP Software Library Linear Interpolate Example

**Description** This example demonstrates usage of linear interpolate modules and fast math modules. Method 1 uses fast math sine function to calculate sine values using cubic interpolation and method 2 uses linear interpolation function and results are compared to reference output. Example shows linear interpolation function can be used to get higher precision compared to fast math sin calculation.

**Refer** riscv\_linear\_interp\_example\_f32.c

**Block Diagram:** 

3.3. NMSIS DSP API 535



- testInputSin\_f32 points to the input values for sine calculation
- $\bullet$  testRefSinOutput32\_f32 points to the reference values caculated from sin() matlab function
- testOutput points to output buffer calculation from cubic interpolation
- testLinIntOutput points to output buffer calculation from linear interpolation
- snr1 Signal to noise ratio for reference and cubic interpolation output
- snr2 Signal to noise ratio for reference and linear interpolation output

# **NMSIS DSP Software Library Functions Used:**

- riscv\_sin\_f32()
- riscv\_linear\_interp\_f32()

# **Matrix Example**

## group MatrixExample

Refer riscv\_matrix\_example\_f32.c

# **Description:**

Demonstrates the use of Matrix Transpose, Matrix Muliplication, and Matrix Inverse functions to apply least squares fitting to input data. Least squares fitting is the procedure for finding the best-fitting curve that minimizes the sum of the squares of the offsets (least square error) from a given set of data.

# Algorithm:

The linear combination of parameters considered is as follows:

 $A \star X = B$ , where X is the unknown value and can be estimated from A & B.

The least squares estimate X is given by the following equation:

```
X = Inverse(A * A) * A * B
```

# **Block Diagram:**



## **Variables Description:**

- A\_f32 input matrix in the linear combination equation
- B\_f32 output matrix in the linear combination equation
- X\_f32 unknown matrix estimated using A\_f32 & B\_f32 matrices

# **NMSIS DSP Software Library Functions Used:**

- riscv\_mat\_init\_f32()
- riscv\_mat\_trans\_f32()
- riscv\_mat\_mult\_f32()
- riscv\_mat\_inverse\_f32()

# **Signal Convergence Example**

#### group SignalConvergence

**Refer** riscv\_signal\_converge\_example\_f32.c

## **Description:**

Demonstrates the ability of an adaptive filter to "learn" the transfer function of a FIR lowpass filter using the Normalized LMS Filter, Finite Impulse Response (FIR) Filter, and Basic Math Functions.

## Algorithm:

The figure below illustrates the signal flow in this example. Uniformly distributed white noise is passed through an FIR lowpass filter. The output of the FIR filter serves as the reference input of the adaptive filter (normalized LMS filter). The white noise is input to the adaptive filter. The adaptive filter learns the transfer function of the FIR filter. The filter outputs two signals: (1) the output of the internal adaptive FIR filter, and (2) the error signal which is the difference between the adaptive filter and the reference output of the FIR filter. Over time as the adaptive filter learns the transfer function of the FIR filter, the first output approaches the reference output of the FIR filter, and the error signal approaches zero.

The adaptive filter converges properly even if the input signal has a large dynamic range (i.e., varies from small to large values). The coefficients of the adaptive filter are initially zero, and then converge over 1536 samples. The internal function test\_signal\_converge() implements the stopping condition. The function checks if all of the values of the error signal have a magnitude below a threshold DELTA.

# **Block Diagram:**

3.3. NMSIS DSP API 537



- testInput\_f32 points to the input data
- firStateF32 points to FIR state buffer
- lmsStateF32 points to Normalised Least mean square FIR filter state buffer
- FIRCoeff\_f32 points to coefficient buffer
- lmsNormCoeff\_f32 points to Normalised Least mean square FIR filter coefficient buffer
- wire1, wir2, wire3 temporary buffers
- errOutput, err\_signal temporary error buffers

# NMSIS DSP Software Library Functions Used:

- riscv\_lms\_norm\_init\_f32()
- riscv\_fir\_init\_f32()
- riscv\_fir\_f32()
- riscv\_lms\_norm\_f32()
- riscv\_scale\_f32()
- riscv\_abs\_f32()
- riscv sub f32()
- riscv\_min\_f32()
- riscv\_copy\_f32()

# SineCosine Example

# group SinCosExample

**Refer** riscv\_sin\_cos\_example\_f32.c

#### **Description:**

Demonstrates the Pythagorean trignometric identity with the use of Cosine, Sine, Vector Multiplication, and Vector Addition functions.

## Algorithm:

Mathematically, the Pythagorean trignometric identity is defined by the following equation: where x is the angle in radians.

# **Block Diagram:**



- testInput\_f32 array of input angle in radians
- testOutput stores sum of the squares of sine and cosine values of input angle

# NMSIS DSP Software Library Functions Used:

- riscv\_cos\_f32()
- riscv sin f32()
- riscv\_mult\_f32()
- riscv\_add\_f32()

# **Variance Example**

# group VarianceExample

**Refer** riscv\_variance\_example\_f32.c

## **Description:**

Demonstrates the use of Basic Math and Support Functions to calculate the variance of an input sequence with N samples. Uniformly distributed white noise is taken as input.

# Algorithm:

The variance of a sequence is the mean of the squared deviation of the sequence from its mean.

This is denoted by the following equation: where, x[n] is the input sequence, N is the number of input samples, and x' is the mean value of the input sequence, x[n].

The mean value x' is defined as:

## **Block Diagram:**

3.3. NMSIS DSP API 539



- testInput\_f32 points to the input data
- wire1, wir2, wire3 temporary buffers
- blockSize number of samples processed at a time
- refVarianceOut reference variance value

## **NMSIS DSP Software Library Functions Used:**

- riscv\_dot\_prod\_f32()
- riscv\_mult\_f32()
- riscv\_sub\_f32()
- riscv\_fill\_f32()
- riscv\_copy\_f32()

group groupExamples

# 3.4 Changelog

# 3.4.1 V1.0.2

This is release 1.0.2 version of NMSIS-DSP library.

- Sync up to CMSIS DSP library 1.9.0
- Adding initial support for RISC-V vector extension support
- Caution: riscv\_math.h is separated into several header files. Extra PrivateInclude folder is included as header folder.

# 3.4.2 V1.0.1

This is release V1.0.1 version of NMSIS-DSP library.

- Both Nuclei RISC-V 32 and 64 bit cores are supported now.
- Libraries are optimized for RISC-V 32 and 64 bit DSP instructions.
- The NN examples are now using Nuclei SDK as running environment.

# 3.4.3 V1.0.0

This is the first version of NMSIS-DSP library.

We adapt the CMSIS-DSP v1.6.0 library to use RISCV DSP instructions, all the API names now are renamed from  $arm\_xxx$  to  $riscv\_xxx$ .

3.4. Changelog 541

**CHAPTER** 

# **FOUR**

# **NMSIS NN**

# 4.1 Overview

# 4.1.1 Introduction

This user manual describes the NMSIS NN software library, a collection of efficient neural network kernels developed to maximize the performance and minimize the memory footprint of neural networks on Nuclei N/NX Class Processors cores.

The library is divided into a number of functions each covering a specific category:

- Neural Network Convolution Functions
- Neural Network Activation Functions
- Fully-connected Layer Functions
- Neural Network Pooling Functions
- Softmax Functions
- Neural Network Support Functions

The library has separate functions for operating on different weight and activation data types including 8-bit integers (q7\_t) and 16-bit integers (q15\_t). The descrition of the kernels are included in the function description.

The implementation details are also described in this paper CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs<sup>23</sup>.

# 4.1.2 Block Diagram

# 4.1.3 Examples

The library ships with a number of examples which demonstrate how to use the library functions.

- Convolutional Neural Network Example (page 596)
- Gated Recurrent Unit Example (page 597)

<sup>&</sup>lt;sup>23</sup> https://arxiv.org/abs/1801.06601



Fig. 1: NMSIS NN Block Diagram

# 4.1.4 Pre-processor Macros

Each library project have different pre-processor macros.

This library is only built for little endian targets.

**RISCV\_MATH\_DSP:** Define macro RISCV\_MATH\_DSP, If the silicon supports DSP instructions.

**RISCV\_NN\_TRUNCATE:** Define macro RISCV\_NN\_TRUNCATE to use floor instead of round-to-the-nearest-int for the computation.

# 4.2 Using NMSIS-NN

Here we will describe how to run the nmsis nn examples in Nuclei Spike.

# 4.2.1 Preparation

- Nuclei Modified Spike xl\_spike
- Nuclei SDK modified for xl\_spike branch dev\_xlspike
- Nuclei RISCV GNU Toolchain
- CMake >= 3.5

# 4.2.2 Tool Setup

1. Export **PATH** correctly for xl\_spike and riscv-nuclei-elf-gcc

export PATH=/path/to/xl\_spike/bin:/path/to/riscv-nuclei-elf-gcc/bin/:\$PATH

# 4.2.3 Build NMSIS NN Library

- 1. Download or clone NMSIS source code into NMSIS directory.
- 2. cd to NMSIS/NMSIS/ directory
- 3. Build NMSIS NN library using make gen\_nn\_lib
- 4. Strip debug informations using make strip\_nn\_lib to make the generated library smaller
- 5. The nn library will be generated into ./Library/NN/GCC folder
- 6. The nn libraries will be look like this:

```
$ 11 Library/NN/GCC/
total 3000
-rw-r--r-- 1 hqfang nucleisys 128482 Jul 14 14:51 libnmsis_nn_rv32imac.a
-rw-r--r-- 1 hqfang nucleisys 281834 Jul 14 14:51 libnmsis_nn_rv32imacp.a
-rw-r--r-- 1 hqfang nucleisys 128402 Jul 14 14:51 libnmsis_nn_rv32imafc.a
-rw-r--r-- 1 hqfang nucleisys 282750 Jul 14 14:51 libnmsis_nn_rv32imafcp.a
-rw-r--r-- 1 hqfang nucleisys 128650 Jul 14 14:51 libnmsis_nn_rv32imafdc.a
-rw-r--r-- 1 hqfang nucleisys 282978 Jul 14 14:51 libnmsis_nn_rv32imafdcp.a
-rw-r--r-- 1 hqfang nucleisys 183918 Jul 14 14:51 libnmsis_nn_rv64imac.a
-rw-r--r-- 1 hqfang nucleisys 418598 Jul 14 14:51 libnmsis_nn_rv64imacp.a
-rw-r--r-- 1 hqfang nucleisys 184206 Jul 14 14:51 libnmsis_nn_rv64imafc.a
-rw-r--r-- 1 hqfang nucleisys 418070 Jul 14 14:51 libnmsis_nn_rv64imafcp.a
-rw-r--r-- 1 hqfang nucleisys 184454 Jul 14 14:51 libnmsis_nn_rv64imafdc.a
-rw-r--r-- 1 hqfang nucleisys 419774 Jul 14 14:51 libnmsis_nn_rv64imafdcp.a
```

- 7. library name with extra p is build with RISCV DSP enabled.
  - libnmsis\_nn\_rv32imac.a: Build for RISCV\_ARCH=rv32imac without DSP enabled.
  - libnmsis\_nn\_rv32imacp.a: Build for RISCV\_ARCH=rv32imac with DSP enabled.

#### Note:

- You can also directly build both DSP and NN library using make gen
- You can strip the generated DSP and NN library using make strip

#### 4.2.4 How to run

1. Set environment variables NUCLEI\_SDK\_ROOT and NUCLEI\_SDK\_NMSIS, and set Nuclei SDK SoC to *xl-spike* 

```
export NUCLEI_SDK_ROOT=/path/to/nuclei_sdk
export NUCLEI_SDK_NMSIS=/path/to/NMSIS/NMSIS
export SOC=xlspike
```

- 2. Let us take ./cifar10/ for example
- 2. cd ./cifar10/
- 3. Run with RISCV DSP enabled NMSIS-NN library for CORE n307

```
# Clean project
make DSP_ENABLE=ON CORE=n307 clean
# Build project
make DSP_ENABLE=ON CORE=n307 all
# Run application using xl_spike
make DSP_ENABLE=ON CORE=n307 run
```

4. Run with RISCV DSP disabled NMSIS-NN library for CORE n307

```
make DSP_ENABLE=OFF CORE=n307 clean
make DSP_ENABLE=OFF CORE=n307 all
make DSP_ENABLE=OFF CORE=n307 run
```

## Note:

• You can easily run this example in your hardware, if you have enough memory to run it, just modify the SOC to the one your are using in step 1.

# 4.3 NMSIS NN API

If you want to access doxygen generated NMSIS NN API, please click NMSIS NN API Doxygen Documentation.

## 4.3.1 Neural Network Functions

## **Neural Network Activation Functions**

```
void riscv_nn_activations_direct_q15 (q15_t *data, uint16_t size, uint16_t int_width, riscv_nn_activation_type type)

void riscv_nn_activations_direct_q7 (q7_t *data, uint16_t size, uint16_t int_width, riscv_nn_activation_type type)

void riscv_relu6_s8 (q7_t *data, uint16_t size)

void riscv_relu_q15 (q15_t *data, uint16_t size)

void riscv_relu_q7 (q7_t *data, uint16_t size)

group Acti

Perform activation layers, including ReLU (Rectified Linear Unit), sigmoid and tanh
```

#### **Functions**

```
void riscv_nn_activations_direct_q15 (q15_t *data, uint16_t size, uint16_t int_width, riscv_nn_activation_type type) neural network activation function using direct table look-up

Q15 neural network activation function using direct table look-up.
```

**Note:** Refer header file for details.

```
void riscv_nn_activations_direct_q7 (q7_t *data, uint16_t size, uint16_t int_width, riscv nn activation type type)
```

Q7 neural network activation function using direct table look-up.

This is the direct table look-up approach.

Assume here the integer part of the fixed-point is <= 3. More than 3 just not making much sense, makes no difference with saturation followed by any of these activation functions.

#### **Parameters**

- data [inout] pointer to input
- size [in] number of elements
- int\_width [in] bit-width of the integer part, assume to be smaller than 3
- type [in] type of activation functions

void riscv\_relu6\_s8 (q7\_t \*data, uint16\_t size) s8 ReLU6 function

# **Parameters**

- data [inout] pointer to input
- size [in] number of elements

void **riscv\_relu\_q15** (q15\_t \*data, uint16\_t size) Q15 RELU function.

Optimized relu with QSUB instructions.

#### **Parameters**

- data [inout] pointer to input
- size [in] number of elements

void **riscv\_relu\_q7** (q7\_t \*data, uint16\_t size) Q7 RELU function.

Optimized relu with QSUB instructions.

#### **Parameters**

- data [inout] pointer to input
- size [in] number of elements

int32\_t riscv\_convolve\_1\_x\_n\_s8\_get\_buffer\_size(const

# **Neural Network Convolution Functions**

```
riscv_status riscv_convolve_1_x_n_s8 (const nmsis_nn_context *ctx, const nmsis_nn_conv_params *conv_params, const nmsis_nn_per_channel_quant_params *quant_params, const nmsis_nn_dims *input_dims, const q7_t *input_data, const nmsis_nn_dims *filter_dims, const q7_t *filter_data, const nmsis_nn_dims *bias_dims, const int32_t *bias_data, const nmsis_nn_dims *output_dims, q7_t *output_data)
```

const nmsis\_nn\_dims \*filter\_dims)

4.3. NMSIS NN API 547

nmsis nn dims

\*input dims,

riscy status riscy convolve 1x1 HWC q7 fast nonsquare (const q7 t \*Im in, const uint16\_t *dim\_im\_in\_x*, const uint16 t dim im in y, uint16\_t ch\_im\_in, const q7\_t \*wt, const uint16 t ch im out, uint 16 t dim kernel x, const uint16 t dim kernel y, const const uint16\_t padding\_x, const uint16\_t padding\_y, const uint16\_t stride\_x, const uint16\_t stride\_y, const q7\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q7\_t \*Im\_out, const uint16\_t dim\_im\_out\_x, const uint16\_t dim\_im\_out\_y, q15\_t \*bufferA, q7\_t

riscv\_status riscv\_convolve\_1x1\_s8\_fast (const nmsis\_nn\_context \*ctx, const nmsis\_nn\_conv\_params \*conv\_params, const nmsis\_nn\_per\_channel\_quant\_params \*quant\_params, const nmsis\_nn\_dims \*input\_dims, const q7\_t \*input\_data, const nmsis\_nn\_dims \*filter\_dims, const q7\_t \*filter\_data, const nmsis\_nn\_dims \*bias\_dims, const int32\_t \*bias\_data, const nmsis\_nn\_dims \*output\_dims, q7\_t \*output\_data)

\*bufferB)

int32\_t riscv\_convolve\_1x1\_s8\_fast\_get\_buffer\_size (const nmsis\_nn\_dims \*input\_dims)

riscv\_status riscv\_convolve\_HWC\_q15\_basic (const q15\_t \*Im\_in, const uint16\_t dim\_im\_in, const uint16\_t ch\_im\_in, const q15\_t \*wt, const uint16\_t ch\_im\_out, const uint16\_t dim\_kernel, const uint16\_t padding, const uint16\_t stride, const q15\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q15\_t \*Im\_out, const uint16\_t dim\_im\_out, q15\_t \*bufferA, q7\_t \*bufferB)

riscv\_status riscv\_convolve\_HWC\_q15\_fast (const q15\_t \*Im\_in, const uint16\_t dim\_im\_in, const uint16\_t ch\_im\_in, const q15\_t \*wt, const uint16\_t ch\_im\_out, const uint16\_t dim\_kernel, const uint16\_t padding, const uint16\_t stride, const q15\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q15\_t \*Im\_out, const uint16\_t dim\_im\_out, q15\_t \*bufferA, q7\_t \*bufferB)

```
riscv_status riscv_convolve_HWC_q15_fast_nonsquare (const
                                                                  q15 t
                                                                          *Im in,
                                                                                    const
                                                        uint16 t
                                                                   dim_im_in_x
                                                                                    const
                                                                   dim_im_in_y,
                                                        uint16 t
                                                                                    const
                                                        uint16_t ch_im_in, const q15_t
                                                               const uint16 t ch im out,
                                                        const uint16 t dim kernel x, const
                                                        uint16 t dim kernel y, const uint16 t
                                                        padding x, const uint16 t padding y,
                                                        const uint16_t stride_x, const
                                                        uint16_t stride_y, const q15_t *bias,
                                                        const uint16_t bias_shift, const
                                                        uint16_t out_shift, q15_t *Im_out,
                                                        const uint16_t dim_im_out_x, const
                                                        uint16_t dim_im_out_y, q15_t *bufferA,
                                                        q7_t *bufferB)
```

riscv\_status riscv\_convolve\_HWC\_q7\_basic (const q7\_t \*Im\_in, const uint16\_t dim\_im\_in, const uint16\_t ch\_im\_in, const q7\_t \*wt, const uint16\_t ch\_im\_out, const uint16\_t dim\_kernel, const uint16\_t padding, const uint16\_t stride, const q7\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q7\_t \*Im\_out, const uint16\_t dim\_im\_out, q15\_t \*bufferA, q7\_t \*bufferB)

riscv\_status riscv\_convolve\_HWC\_q7\_basic\_nonsquare(const q7\_t \*Im\_in, const uint16\_t uint16 t  $dim_im_in_x$ , const dim\_im\_in\_y, const uint16\_t ch\_im\_in, const q7\_t \*wt, const uint16\_t ch\_im\_out, const uint16\_t dim kernel x, const uint16 t dim kernel y, const uint16 t padding\_x, const uint16\_t padding\_y, const uint16\_t stride\_x, uint16\_t stride\_y, const q7\_t \*bias, const uint16 t bias shift, const uint16\_t out\_shift, q7\_t \*Im\_out, const uint16\_t dim\_im\_out\_x, const uint16\_t dim\_im\_out\_y, q15\_t \*bufferA, q7\_t \*bufferB)

riscv\_status riscv\_convolve\_HWC\_q7\_fast (const q7\_t \*Im\_in, const uint16\_t dim\_im\_in, const uint16\_t ch\_im\_in, const q7\_t \*wt, const uint16\_t ch\_im\_out, const uint16\_t dim\_kernel, const uint16\_t ch\_im\_out, const uint16\_t stride, const q7\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q7\_t \*Im\_out, const uint16\_t dim\_im\_out, q15\_t \*bufferA, q7\_t \*bufferB)

```
riscv_status riscv_convolve_HWC_q7_fast_nonsquare(const_q7_t *Im_in, const_uint16_t
                                                         dim im in x,
                                                                          const
                                                                                      uint16 t
                                                         dim im in y, const uint16 t ch im in,
                                                         const q7_t *wt, const uint16_t
                                                        ch im out, const uint16 t dim kernel x,
                                                        const uint16 t dim kernel y, const
                                                        uint16 t padding x, const uint16 t
                                                        padding_y, const uint16_t stride_x,
                                                         const uint16_t stride_y, const q7_t
                                                         *bias, const uint16_t bias_shift, const
                                                        uint16_t out_shift, q7_t *Im_out, const
                                                        uint16_t dim_im_out_x, const uint16_t
                                                         dim_im_out_y, q15_t *bufferA, q7_t
                                                         *bufferB)
riscv_status riscv_convolve_HWC_q7_RGB (const q7_t *Im_in, const uint16_t dim_im_in, const
                                           uint16_t ch_im_in, const q7_t *wt, const uint16_t
                                           ch_im_out, const uint16_t dim_kernel, const uint16_t
                                           padding, const uint16_t stride, const q7_t *bias,
                                           const uint16_t bias_shift, const uint16_t out_shift, q7_t
                                           *Im_out, const uint16_t dim_im_out, q15_t *bufferA, q7_t
                                           *bufferB)
riscy status riscy convolve s8 (const nmsis nn context *ctx, const nmsis nn conv params
                                 *conv params,
                                                            nmsis_nn_per_channel_quant_params
                                                  const
                                 *quant params, const nmsis nn dims *input dims, const q7 t
                                 *input_data, const nmsis_nn_dims *filter_dims, const q7_t
                                 *filter_data, const nmsis_nn_dims *bias_dims, const int32_t
                                 *bias data, const nmsis nn dims *output dims, q7 t *output data)
int32_t riscv_convolve_s8_get_buffer_size(const nmsis_nn_dims *input_dims, const nm-
                                                 sis nn dims *filter dims)
                                                    nmsis_nn_context
                                                                       *ctx,
riscv_status riscv_convolve_wrapper_s8 (const
                                                                               const
                                           sis nn conv params *conv params,
                                                                                const
                                                                                         nm-
                                           sis nn per channel quant params
                                                                               *quant params,
                                           const nmsis_nn_dims *input_dims, const q7_t *in-
                                           put data, const nmsis nn dims *filter dims, const q7 t
                                           *filter_data, const nmsis_nn_dims *bias_dims, const
                                           int32_t *bias_data, const nmsis_nn_dims *output_dims,
                                           q7_t *output_data)
int32 triscy convolve wrapper s8 get buffer size (const
                                                                        nmsis nn conv params
                                                           *conv_params, const nmsis_nn_dims
                                                           *input_dims, const nmsis_nn_dims
                                                           *filter_dims, const nmsis_nn_dims
                                                           *output_dims)
riscv_status riscv_depthwise_conv_3x3_s8 (const
                                                      nmsis_nn_context
                                                                         *ctx,
                                                                                const
                                             sis_nn_dw_conv_params *dw_conv_params, const
                                             nmsis_nn_per_channel_quant_params *quant_params,
                                             const nmsis_nn_dims *input_dims, const q7_t *in-
                                             put, const nmsis_nn_dims *filter_dims, const q7_t
                                             *kernel, const nmsis_nn_dims *bias_dims, const
                                             int32_t *bias, const nmsis_nn_dims *output_dims,
                                             q7_t *output)
```

static void depthwise\_conv\_s8\_mult\_4 (const int8\_t \*input, const int32\_t input\_x, const int32\_t input\_ch, const int8\_t \*kernel, const int32\_t output\_ch, const int32\_t ch\_mult, const int32\_t kernel\_x, const int32\_t kernel\_y, const int32\_t pad\_x, const int32\_t pad\_y, const int32\_t stride\_x, const int32\_t stride\_y, const int32\_t \*bias, int8\_t \*output, const int32\_t \*output\_shift, const int32\_t \*output\_mult, const int32\_t output\_y, const int32\_t output\_offset, const int32\_t input\_offset, const int32\_t output\_activation\_min, const int32\_t output\_activation\_min, const int32\_t output\_activation\_max)

static void depthwise\_conv\_s8\_generic (const q7\_t \*input, const uint16\_t input\_batches, const uint16\_t input\_x, const uint16\_t input\_y, const uint16\_t input\_ch, const q7\_t \*kernel, const uint16\_t input\_ch, const uint16\_t ch\_mult, const uint16\_t vernel\_x, const uint16\_t vernel\_y, const int32\_t \*bias, q7\_t \*output, const int32\_t \*output\_shift, const int32\_t \*output\_mult, const uint16\_t output\_y, const int32\_t output\_offset, const int32\_t input\_offset, const int32\_t output\_activation\_min, const int32\_t output\_activation\_min, const int32\_t output\_activation\_min, const int32\_t output\_activation\_max)

riscv\_status riscv\_depthwise\_conv\_s8 (const nmsis\_nn\_context \*ctx, const nmsis\_nn\_dw\_conv\_params \*dw\_conv\_params, const nmsis\_nn\_per\_channel\_quant\_params \*quant\_params, const nmsis\_nn\_dims \*input\_dims, const q7\_t \*input, const nmsis\_nn\_dims \*filter\_dims, const q7\_t \*kernel, const nmsis\_nn\_dims \*bias\_dims, const int32\_t \*bias, const nmsis\_nn\_dims \*output\_dims, q7\_t \*output)

riscv\_status riscv\_depthwise\_conv\_s8\_opt (const nmsis\_nn\_context \*ctx, const nmsis\_nn\_dw\_conv\_params \*dw\_conv\_params, const nmsis\_nn\_per\_channel\_quant\_params \*quant\_params, const nmsis\_nn\_dims \*input\_dims, const q7\_t \*input, const nmsis\_nn\_dims \*filter\_dims, const q7\_t \*kernel, const nmsis\_nn\_dims \*bias\_dims, const int32\_t \*bias, const nmsis\_nn\_dims \*output\_dims, q7\_t \*output)

static void depthwise\_conv\_u8\_mult\_4 (const uint8\_t \*input, const int32\_t input\_x, const int32\_t input\_ch, const uint8\_t \*kernel, const int32\_t output\_ch, const int32\_t ch\_mult, const int32\_t kernel\_x, const int32\_t kernel\_y, const int32\_t pad\_x, const int32\_t pad\_y, const int32\_t stride\_x, const int32\_t stride\_y, const int32\_t \*bias, uint8\_t \*output, const int32\_t output\_shift, const int32\_t output\_mult, const int32\_t output\_x, const int32\_t output\_offset, const int32\_t input\_offset, const int32\_t input\_offset, const int32\_t output\_activation\_min, const int32\_t output\_put\_activation\_min, const int32\_t output\_put\_activation\_max)

static void depthwise\_conv\_u8\_generic (const uint8\_t \*input, const int32\_t input\_x, const int32\_t input\_ch, const int32\_t input\_ch, const uint8\_t \*kernel, const int32\_t output\_ch, const int32\_t ch\_mult, const int32\_t kernel\_x, const int32\_t kernel\_y, const int32\_t pad\_x, const int32\_t pad\_y, const int32\_t stride\_x, const int32\_t stride\_y, const int32\_t \*bias, uint8\_t \*output, const int32\_t output\_shift, const int32\_t output\_mult, const int32\_t output\_x, const int32\_t output\_y, const int32\_t output\_put\_offset, const int32\_t input\_offset, const int32\_t filter\_offset, const int32\_t output\_activation\_min, const int32\_t output\_activation\_min, const int32\_t output\_activation\_max)

riscv\_status riscv\_depthwise\_conv\_u8\_basic\_ver1 (const\_uint8\_t \*input, const\_uint16\_t input x, const uint16 t input y, const uint16 t input ch, const uint8 t \*kernel, const uint16 t kernel x, const uint16 t kernel y, const int16 t ch mult, const int16\_t pad\_x, const int16\_t pad\_y, const int16\_t *stride\_x*, **const** int16\_t *stride\_y*, const int16\_t dilation\_x, const int16\_t dilation\_y, const int32\_t \*bias, const int32\_t input\_offset, const int32\_t filter\_offset, const int32\_t output\_offset, uint8\_t \*output, const uint16\_t output\_x, const uint16\_t output\_y, const int32\_t output\_activation\_min, const int32 t output activation max, const int32 t output\_shift, const int32\_t output\_mult)

riscv\_status riscv\_depthwise\_conv\_wrapper\_s8 (const nmsis\_nn\_context \*ctx, const nmsis\_nn\_dw\_conv\_params \*dw\_conv\_params, const nmsis\_nn\_per\_channel\_quant\_params \*quant\_params, const nmsis\_nn\_dims \*in-put\_dims, const q7\_t \*input, const nmsis\_nn\_dims \*filter\_dims, const q7\_t \*filter, const nmsis\_nn\_dims \*bias\_dims, const int32\_t \*bias, const nmsis\_nn\_dims \*out-put\_dims, q7\_t \*output)

```
int32_t riscv_depthwise_conv_wrapper_s8_get_buffer_size (const
                                                                                         nm-
                                                                   sis_nn_dw_conv_params
                                                                   *dw conv params,
                                                                                      const
                                                                   nmsis_nn_dims *input_dims,
                                                                   const
                                                                                nmsis nn dims
                                                                   *filter dims,
                                                                                const nm-
                                                                   sis nn dims *output dims)
riscv status riscv depthwise separable conv HWC q7 (const q7 t *Im in, const uint16 t
                                                          dim_im_in, const uint16_t ch_im_in,
                                                          const q7_t *wt, const uint16_t
                                                          ch_im_out, const uint16_t dim_kernel,
                                                          const uint16_t padding, const
                                                          uint16_t stride, const q7_t *bias,
                                                          const uint16_t bias_shift, const
                                                          uint16_t out_shift, q7_t *Im_out, const
                                                          uint16_t dim_im_out, q15_t *bufferA,
                                                          q7_t *bufferB)
riscv_status riscv_depthwise_separable_conv_HWC_q7_nonsquare (const
                                                                                q7_t
                                                                                      *Im_in,
                                                                       const
                                                                                      uint16 t
                                                                       dim_im_in_x,
                                                                                      const
                                                                                 dim_im_in_y,
                                                                       uint16_t
                                                                       const uint16 t ch im in,
                                                                       const q7_t *wt, const
                                                                       uint16 t
                                                                                   ch im out,
                                                                       const
                                                                                      uint16 t
                                                                       dim kernel x,
                                                                                       const
                                                                      uint16 t
                                                                                 dim kernel y,
                                                                       const
                                                                                      uint16 t
                                                                      padding_x,
                                                                                       const
                                                                      uint16 t
                                                                                   padding_y,
                                                                       const uint16_t stride_x,
                                                                       const uint16_t stride_y,
                                                                                q7_t
                                                                       const
                                                                                        *bias,
                                                                       const
                                                                                      uint16 t
                                                                       bias_shift,
                                                                                      const
                                                                       uint16_t out_shift, q7_t
                                                                       *Im_out, const uint16_t
                                                                       dim_im_out_x,
                                                                                      const
                                                                       uint16 t
                                                                               dim im out v.
                                                                             *bufferA, q7_t
                                                                       q15_t
                                                                       *bufferB)
```

# group NNConv

Collection of convolution, depthwise convolution functions and their variants.

The convolution is implemented in 2 steps: im2col and GEMM

im2col is a process of converting each patch of image data into a column. After im2col, the convolution is computed as matrix-matrix multiplication.

To reduce the memory footprint, the im2col is performed partially. Each iteration, only a few column (i.e., patches) are generated and computed with GEMM kernels similar to NMSIS-DSP riscv\_mat\_mult functions.

## **Functions**

riscv\_status riscv\_convolve\_1\_x\_n\_s8 (const nmsis\_nn\_context \*ctx, const nmsis\_nn\_conv\_params \*conv\_params, const nmsis\_nn\_per\_channel\_quant\_params \*quant\_params, const nmsis\_nn\_dims \*input\_dims, const q7\_t \*input\_data, const nmsis\_nn\_dims \*filter\_dims, const q7\_t \*filter\_data, const nmsis\_nn\_dims \*bias\_dims, const int32\_t \*bias\_data, const nmsis\_nn\_dims \*output\_dims, q7\_t \*output\_data)

#### 1xn convolution

- Supported framework: TensorFlow Lite Micro
- The following constrains on the arguments apply
  - a. input\_dims->n equals 1
  - b. ouput dims->w is a multiple of 4
  - c. Explicit constraints(since it is for 1xN convolution) -## input\_dims->h equals 1 -## output\_dims->h equals 1 -## filter\_dims->h equals 1

Todo:

Remove constraint on output\_dims->w to make the function generic.

#### **Parameters**

- ctx [inout] Function context that contains the additional buffer if required by the function. riscv\_convolve\_1\_x\_n\_s8\_get\_buffer\_size will return the buffer\_size if required
- conv\_params [in] Convolution parameters (e.g. strides, dilations, pads,...). Range of conv\_params->input\_offset: [-127, 128] Range of conv\_params->output\_offset: [-128, 127]
- quant\_params [in] Per-channel quantization info. It contains the multiplier and shift values to be applied to each output channel
- input\_dims [in] Input (activation) tensor dimensions. Format: [N, H, W, C\_IN]
- input\_data [in] Input (activation) data pointer. Data type: int8
- **filter\_dims [in]** Filter tensor dimensions. Format: [C\_OUT, 1, WK, C\_IN] where WK is the horizontal spatial filter dimension
- filter\_data [in] Filter data pointer. Data type: int8
- bias\_dims [in] Bias tensor dimensions. Format: [C\_OUT]
- bias\_data [in] Optional bias data pointer. Data type: int32
- output\_dims [in] Output tensor dimensions. Format: [N, H, W, C\_OUT]
- output\_data [out] Output data pointer. Data type: int8

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH if argument constraints fail. or, RISCV\_MATH\_SUCCESS on successful completion.

int32\_t riscv\_convolve\_1\_x\_n\_s8\_get\_buffer\_size (const nmsis\_nn\_dims \*input\_dims, const nmsis\_nn\_dims \*filter\_dims)

Get the required additional buffer size for 1xn convolution.

#### **Parameters**

- input\_dims [in] Input (activation) tensor dimensions. Format: [N, H, W, C\_IN]
- **filter\_dims [in]** Filter tensor dimensions. Format: [C\_OUT, 1, WK, C\_IN] where WK is the horizontal spatial filter dimension

**Returns** The function returns required buffer size(bytes)

```
riscv status riscv convolve 1x1 HWC q7 fast nonsquare (const q7 t *Im in, const
                                                             uint16_t dim_im_in_x, const
                                                             uint16 t dim im in y, const
                                                             uint16_t ch_im_in, const
                                                             q7_t *wt, const uint16_t
                                                             ch_im_out, const uint16_t
                                                             dim kernel x, const uint16 t
                                                             dim_kernel_y, const uint16_t
                                                             padding_x, const uint16_t
                                                             padding_y, const uint16_t
                                                             stride_x,
                                                                        const
                                                                                uint16_t
                                                             stride_y, const q7_t *bias,
                                                             const uint16 t bias shift,
                                                             const uint16 t out shift,
                                                             q7_t *Im_out, const uint16_t
                                                             dim_im_out_x, const uint16_t
                                                             dim_im_out_y, q15_t *bufferA,
                                                             q7 t*bufferB)
```

Fast Q7 version of 1x1 convolution (non-squure shape)

This function is optimized for convolution with 1x1 kernel size (i.e., dim\_kernel\_x=1 and dim\_kernel\_y=1). It can be used for the second half of MobileNets [1] after depthwise separable convolution.

This function is the version with full list of optimization tricks, but with some constraints: ch\_im\_in is multiple of 4 ch\_im\_out is multiple of 2

[1] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications https://arxiv.org/abs/1704.04861

#### **Parameters**

- Im\_in [in] pointer to input tensor
- dim\_im\_in\_x [in] input tensor dimention x
- dim\_im\_in\_y [in] input tensor dimention y
- ch\_im\_in [in] number of input tensor channels
- wt [in] pointer to kernel weights
- ch im out [in] number of filters, i.e., output tensor channels
- dim\_kernel\_x [in] filter kernel size x
- dim\_kernel\_y [in] filter kernel size y
- padding\_x [in] padding size x
- padding\_y [in] padding size y
- stride\_x [in] convolution stride x
- stride\_y [in] convolution stride y
- bias [in] pointer to bias

- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- Im\_out [inout] pointer to output tensor
- dim\_im\_out\_x [in] output tensor dimension x
- dim\_im\_out\_y [in] output tensor dimension y
- bufferA [inout] pointer to buffer space for input
- bufferB [inout] pointer to buffer space for output

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH or RISCV MATH SUCCESS based on the outcome of size checking.

```
riscv_status riscv_convolve_1x1_s8_fast (const nmsis_nn_context *ctx, const nmsis_nn_conv_params *conv_params, const nmsis_nn_per_channel_quant_params *quant_params, const nmsis_nn_dims *input_dims, const q7_t *input_data, const nmsis_nn_dims *filter_dims, const q7_t *filter_data, const nmsis_nn_dims *bias_dims, const int32_t *bias_data, const nmsis_nn_dims *output_dims, q7_t *output_data)
```

Fast s8 version for 1x1 convolution (non-square shape)

- Supported framework: TensorFlow Lite Micro
- The following constrains on the arguments apply
  - a. input dims->c is a multiple of 4
  - b. conv\_params->padding.w = conv\_params->padding.h = 0
  - c. conv\_params->stride.w = conv\_params->stride.h = 1

#### **Parameters**

- ctx [inout] Function context that contains the additional buffer if required by the function. riscv\_convolve\_1x1\_s8\_fast\_get\_buffer\_size will return the buffer\_size if required
- conv\_params [in] Convolution parameters (e.g. strides, dilations, pads,...). Range of conv\_params->input\_offset: [-127, 128] Range of conv\_params->output\_offset: [-128, 127]
- quant\_params [in] Per-channel quantization info. It contains the multiplier and shift values to be applied to each output channel
- input\_dims [in] Input (activation) tensor dimensions. Format: [N, H, W, C\_IN]
- input\_data [in] Input (activation) data pointer. Data type: int8
- filter\_dims [in] Filter tensor dimensions. Format: [C\_OUT, 1, 1, C\_IN]
- filter\_data [in] Filter data pointer. Data type: int8
- bias\_dims [in] Bias tensor dimensions. Format: [C\_OUT]
- bias\_data [in] Optional bias data pointer. Data type: int32
- output\_dims [in] Output tensor dimensions. Format: [N, H, W, C\_OUT]
- output\_data [out] Output data pointer. Data type: int8

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH if argument constraints fail. or, RISCV\_MATH\_SUCCESS on successful completion.

Get the required buffer size for riscv\_convolve\_1x1\_s8\_fast.

Parameters input\_dims - [in] Input (activation) dimensions

**Returns** The function returns the required buffer size in bytes

riscv\_status riscv\_convolve\_HWC\_q15\_basic (const q15\_t \*Im\_in, const uint16\_t dim\_im\_in, const uint16\_t ch\_im\_in, const q15\_t \*wt, const uint16\_t ch\_im\_out, const uint16\_t dim\_kernel, const uint16\_t padding, const uint16\_t stride, const q15\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q15\_t \*Im\_out, const uint16\_t dim\_im\_out, q15\_t \*bufferA, q7\_t \*bufferB)

Basic Q15 convolution function.

#### **Buffer size:**

bufferA size: ch\_im\_in\*dim\_kernel\*dim\_kernel

bufferB size: 0

This basic version is designed to work for any input tensor and weight dimension.

#### **Parameters**

- Im in [in] pointer to input tensor
- dim im in [in] input tensor dimention
- ch im in [in] number of input tensor channels
- wt [in] pointer to kernel weights
- ch\_im\_out [in] number of filters, i.e., output tensor channels
- dim\_kernel [in] filter kernel size
- padding [in] padding sizes
- stride [in] convolution stride
- bias [in] pointer to bias
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- Im\_out [inout] pointer to output tensor
- dim im out [in] output tensor dimension
- bufferA [inout] pointer to buffer space for input
- bufferB [inout] pointer to buffer space for output

Returns The function returns RISCV\_MATH\_SUCCESS

riscv\_status riscv\_convolve\_HWC\_q15\_fast (const q15\_t \*Im\_in, const uint16\_t dim\_im\_in, const uint16\_t ch\_im\_in, const q15\_t \*wt, const uint16\_t ch\_im\_out, const uint16\_t dim\_kernel, const uint16\_t padding, const uint16\_t stride, const q15\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q15\_t \*Im\_out, const uint16\_t dim\_im\_out, q15\_t \*bufferA, q7\_t \*bufferB)

Fast Q15 convolution function.

## **Buffer size:**

bufferA size: 2\*ch\_im\_in\*dim\_kernel\*dim\_kernel

bufferB size: 0

#### **Input dimension constraints:**

ch\_im\_in is multiple of 2

ch\_im\_out is multiple of 2

#### **Parameters**

- Im\_in [in] pointer to input tensor
- dim\_im\_in [in] input tensor dimention
- ch\_im\_in [in] number of input tensor channels
- wt [in] pointer to kernel weights
- ch\_im\_out [in] number of filters, i.e., output tensor channels
- dim\_kernel [in] filter kernel size
- padding [in] padding sizes
- stride [in] convolution stride
- bias [in] pointer to bias
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- Im\_out [inout] pointer to output tensor
- dim\_im\_out [in] output tensor dimension
- bufferA [inout] pointer to buffer space for input
- bufferB [inout] pointer to buffer space for output

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH or RISCV\_MATH\_SUCCESS based on the outcome of size checking.

 $riscv\_status \; \textbf{riscv\_convolve\_HWC\_q15\_fast\_nonsquare} \; (\\ \textbf{const} \quad q15\_t \quad *Im\_in, \quad \textbf{const} \quad$ 

uint16\_t *dim\_im\_in\_x*, const uint16 t dim im in y, uint16\_t ch\_im\_in, const q15\_t \*wt, const uint16 t ch im out, const uint16 t dim kernel x, const uint16 t dim kernel y, const uint16\_t padding\_x, const uint16 t padding\_y, const uint16\_t stride\_x, const uint16\_t stride\_y, const q15\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q15\_t \*Im\_out, const uint16\_t dim\_im\_out\_x, const uint16\_t dim\_im\_out\_y, q15\_t \*bufferA, q7\_t \*bufferB)

Fast Q15 convolution function (non-squure shape)

## **Buffer size:**

bufferA size: 2\*ch\_im\_in\*dim\_kernel\*dim\_kernel

bufferB size: 0

## Input dimension constraints:

ch\_im\_in is multiple of 2

ch\_im\_out is multiple of 2

#### **Parameters**

- Im\_in [in] pointer to input tensor
- dim\_im\_in\_x [in] input tensor dimention x
- dim\_im\_in\_y [in] input tensor dimention y
- ch\_im\_in [in] number of input tensor channels
- wt [in] pointer to kernel weights
- ch\_im\_out [in] number of filters, i.e., output tensor channels
- dim\_kernel\_x [in] filter kernel size x
- dim\_kernel\_y [in] filter kernel size y
- padding\_x [in] padding size x
- padding\_y [in] padding size y
- stride\_x [in] convolution stride x
- stride\_y [in] convolution stride y
- bias [in] pointer to bias
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- Im\_out [inout] pointer to output tensor
- dim\_im\_out\_x [in] output tensor dimension x
- dim\_im\_out\_y [in] output tensor dimension y

- bufferA [inout] pointer to buffer space for input
- bufferB [inout] pointer to buffer space for output

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH or RISCV\_MATH\_SUCCESS based on the outcome of size checking.

riscv\_status riscv\_convolve\_HWC\_q7\_basic (const q7\_t \*Im\_in, const uint16\_t dim\_im\_in, const uint16\_t ch\_im\_in, const q7\_t \*wt, const uint16\_t ch\_im\_out, const uint16\_t dim\_kernel, const uint16\_t padding, const uint16\_t stride, const q7\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q7\_t \*Im\_out, const uint16\_t dim\_im\_out, q15\_t \*bufferA, q7\_t \*bufferB)

Basic Q7 convolution function.

#### **Buffer size:**

bufferA size: 2\*ch\_im\_in\*dim\_kernel\*dim\_kernel

bufferB size: 0

This basic version is designed to work for any input tensor and weight dimension.

#### **Parameters**

- Im\_in [in] pointer to input tensor
- dim\_im\_in [in] input tensor dimention
- ch\_im\_in [in] number of input tensor channels
- wt [in] pointer to kernel weights
- ch\_im\_out [in] number of filters, i.e., output tensor channels
- dim\_kernel [in] filter kernel size
- padding [in] padding sizes
- stride [in] convolution stride
- bias [in] pointer to bias
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- Im\_out [inout] pointer to output tensor
- $dim_im_out [in]$  output tensor dimension
- bufferA [inout] pointer to buffer space for input
- bufferB [inout] pointer to buffer space for output

Returns The function returns RISCV\_MATH\_SUCCESS

riscv\_status riscv\_convolve\_HWC\_q7\_basic\_nonsquare (const q7\_t \*Im\_in, const

uint16\_t *dim\_im\_in\_x*, const uint16 t dim im in y, uint16\_t ch\_im\_in, const q7\_t \*wt, const uint16 t ch im out, uint16 t dim kernel x, const uint16 t dim kernel y, const const uint16\_t padding\_x, const uint16 t padding\_y, const uint16\_t stride\_x, const uint16\_t stride\_y, const q7\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q7\_t \*Im\_out, const uint16\_t dim\_im\_out\_x, uint16\_t dim\_im\_out\_y, q15\_t \*bufferA, q7\_t \*bufferB)

Basic Q7 convolution function (non-sqaure shape)

Basic Q7 convolution function (non-square shape)

#### **Parameters**

- Im\_in [in] pointer to input tensor
- dim\_im\_in\_x [in] input tensor dimention x
- dim\_im\_in\_y [in] input tensor dimention y
- ch\_im\_in [in] number of input tensor channels
- wt [in] pointer to kernel weights
- ch im out [in] number of filters, i.e., output tensor channels
- dim\_kernel\_x [in] filter kernel size x
- dim\_kernel\_y [in] filter kernel size y
- padding\_x [in] padding size x
- padding\_y [in] padding size y
- stride\_x [in] convolution stride x
- stride\_y [in] convolution stride y
- bias [in] pointer to bias
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- Im\_out [inout] pointer to output tensor
- dim\_im\_out\_x [in] output tensor dimension x
- dim\_im\_out\_y [in] output tensor dimension y
- bufferA [inout] pointer to buffer space for input
- bufferB [inout] pointer to buffer space for output

Returns The function returns RISCV\_MATH\_SUCCESS

riscv\_status riscv\_convolve\_HWC\_q7\_fast (const q7\_t \*Im\_in, const uint16\_t dim\_im\_in, const uint16\_t ch\_im\_in, const q7\_t \*wt, const uint16\_t ch\_im\_out, const uint16\_t dim\_kernel, const uint16\_t padding, const uint16\_t stride, const q7\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q7\_t \*Im\_out, const uint16\_t dim\_im\_out, q15\_t \*bufferA, q7\_t \*bufferB)

Fast Q7 convolution function.

#### **Buffer size:**

bufferA size: 2\*ch\_im\_in\*dim\_kernel\*dim\_kernel

bufferB size: 0

#### **Input dimension constraints:**

ch\_im\_in is multiple of 4 (because of the SIMD32 read and swap )

ch\_im\_out is multiple of 2 (bacause 2x2 mat\_mult kernel)

The im2col converts the Q7 tensor input into Q15 column, which is stored in bufferA. There is reordering happenning during this im2col process with riscv\_q7\_to\_q15\_reordered\_no\_shift. For every four elements, the second and third elements are swapped.

The computation kernel riscv\_nn\_mat\_mult\_kernel\_q7\_q15\_reordered does the GEMM computation with the reordered columns.

To speed-up the determination of the padding condition, we split the computation into 3x3 parts, i.e., {top, mid, bottom} X {left, mid, right}. This reduces the total number of boundary condition checks and improves the data copying performance.

## **Parameters**

- Im\_in [in] pointer to input tensor
- dim\_im\_in [in] input tensor dimention
- ch\_im\_in [in] number of input tensor channels
- wt [in] pointer to kernel weights
- ch\_im\_out [in] number of filters, i.e., output tensor channels
- dim\_kernel [in] filter kernel size
- padding [in] padding sizes
- stride [in] convolution stride
- bias [in] pointer to bias
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- Im\_out [inout] pointer to output tensor
- $dim_im_out [in]$  output tensor dimension
- bufferA [inout] pointer to buffer space for input
- bufferB [inout] pointer to buffer space for output

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH or RISCV\_MATH\_SUCCESS based on the outcome of size checking.

riscv status riscv convolve HWC q7 fast nonsquare (const  $q7_t *Im_in$ const uint16 t  $dim_im_in_x$ , const uint16 t dim im in y, const uint16\_t ch\_im\_in, const q7\_t \*wt, const uint16 t ch im out, uint16 t dim kernel x, const uint16 t const dim kernel y, const uint16\_t padding\_x, const uint16\_t padding\_y, const uint16\_t stride\_x, const uint16\_t stride\_y, const q7\_t \*bias, const uint16\_t bias\_shift, const uint16\_t out\_shift, q7\_t \*Im\_out, const uint16\_t  $dim_im_out_x$ , const uint16 t dim\_im\_out\_y, q15\_t \*bufferA, q7\_t

Fast Q7 convolution function (non-square shape)

This function is the version with full list of optimization tricks, but with some constraints: ch\_im\_in is multiple of 4 ch\_im\_out is multiple of 2

\*bufferB)

#### **Parameters**

- Im\_in [in] pointer to input tensor
- dim\_im\_in\_x [in] input tensor dimention x
- dim\_im\_in\_y [in] input tensor dimention y
- **ch\_im\_in [in]** number of input tensor channels
- wt [in] pointer to kernel weights
- ch\_im\_out [in] number of filters, i.e., output tensor channels
- dim\_kernel\_x [in] filter kernel size x
- dim\_kernel\_y [in] filter kernel size y
- padding\_x [in] padding size x
- padding\_y [in] padding size y
- stride\_x [in] convolution stride x
- stride\_y [in] convolution stride y
- bias [in] pointer to bias
- bias\_shift [in] amount of left-shift for bias
- out shift [in] amount of right-shift for output
- Im\_out [inout] pointer to output tensor
- dim\_im\_out\_x [in] output tensor dimension x
- $dim_im_out_y [in]$  output tensor dimension y
- bufferA [inout] pointer to buffer space for input
- bufferB [inout] pointer to buffer space for output

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH or RISCV\_MATH\_SUCCESS based on the outcome of size checking.

```
riscv_status riscv_convolve_HWC_q7_RGB (const q7_t *Im_in, const uint16_t dim_im_in, const uint16_t ch_im_in, const q7_t *wt, const uint16_t ch_im_out, const uint16_t dim_kernel, const uint16_t padding, const uint16_t stride, const q7_t *bias, const uint16_t bias_shift, const uint16_t out_shift, q7_t *Im_out, const uint16_t dim im out, q15_t *bufferA, q7_t *bufferB)
```

Q7 convolution function for RGB image.

Q7 version of convolution for RGB image.

#### **Buffer size:**

bufferA size: 2\*ch\_im\_in\*dim\_kernel\*dim\_kernel

bufferB size: 0

## **Input dimension constraints:**

ch\_im\_in equals 3

This kernel is written exclusively for convolution with ch\_im\_in equals 3. This applies on the first layer of CNNs which has input image with RGB format.

#### **Parameters**

- Im\_in [in] pointer to input tensor
- dim\_im\_in [in] input tensor dimention
- ch\_im\_in [in] number of input tensor channels
- wt [in] pointer to kernel weights
- ch\_im\_out [in] number of filters, i.e., output tensor channels
- dim\_kernel [in] filter kernel size
- padding [in] padding sizes
- stride [in] convolution stride
- bias [in] pointer to bias
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- Im\_out [inout] pointer to output tensor
- dim im out [in] output tensor dimension
- bufferA [inout] pointer to buffer space for input
- bufferB [inout] pointer to buffer space for output

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH or RISCV\_MATH\_SUCCESS based on the outcome of size checking.

```
riscv_status riscv_convolve_s8 (const nmsis_nn_context *ctx, const nmsis_nn_conv_params *conv_params, const nmsis_nn_per_channel_quant_params *quant_params, const nmsis_nn_dims *input_dims, const q7_t *input_data, const nmsis_nn_dims *filter_dims, const q7_t *filter_data, const nmsis_nn_dims *bias_dims, const int32_t *bias_data, const nmsis_nn_dims *output_dims, q7_t *output_data)
```

Basic s8 convolution function.

- a. Supported framework: TensorFlow Lite micro
- b. q7 is used as data type eventhough it is s8 data. It is done so to be consistent with existing APIs.
- c. Additional memory is required for optimization. Refer to argument 'ctx' for details.

#### **Parameters**

- ctx [inout] Function context that contains the additional buffer if required by the function. riscv\_convolve\_s8\_get\_buffer\_size will return the buffer\_size if required
- conv\_params [in] Convolution parameters (e.g. strides, dilations, pads,...). Range of conv\_params->input\_offset: [-127, 128] Range of conv\_params->output\_offset: [-128, 127]
- quant\_params [in] Per-channel quantization info. It contains the multiplier and shift values to be applied to each output channel
- input\_dims [in] Input (activation) tensor dimensions. Format: [N, H, W, C\_IN]
- input\_data [in] Input (activation) data pointer. Data type: int8
- **filter\_dims [in]** Filter tensor dimensions. Format: [C\_OUT, HK, WK, C\_IN] where HK and WK are the spatial filter dimensions
- filter\_data [in] Filter data pointer. Data type: int8
- bias\_dims [in] Bias tensor dimensions. Format: [C\_OUT]
- bias\_data [in] Optional bias data pointer. Data type: int32
- output\_dims [in] Output tensor dimensions. Format: [N, H, W, C\_OUT]
- output\_data [out] Output data pointer. Data type: int8

**Returns** The function returns RISCV\_MATH\_SUCCESS

int32\_t riscv\_convolve\_s8\_get\_buffer\_size (const nmsis\_nn\_dims \*input\_dims, const nmsis\_nn\_dims \*filter\_dims)

Get the required buffer size for s8 convolution function.

#### **Parameters**

- input\_dims [in] Input (activation) tensor dimensions. Format: [N, H, W, C\_IN]
- filter\_dims [in] Filter tensor dimensions. Format: [C\_OUT, HK, WK, C\_IN] where HK and WK are the spatial filter dimensions

**Returns** The function returns required buffer size(bytes)

```
riscv_status riscv_convolve_wrapper_s8 (const nmsis_nn_context *ctx, const nmsis_nn_conv_params *conv_params, const nmsis_nn_per_channel_quant_params *quant_params, const nmsis_nn_dims *input_dims, const q7_t *input_data, const nmsis_nn_dims *filter_dims, const q7_t *filter_data, const nmsis_nn_dims *bias_dims, const int32_t *bias_data, const nmsis_nn_dims *const nmsis_nn_dims *const
```

s8 convolution layer wrapper function with the main purpose to call the optimal kernel available in nmsisnn to perform the convolution.

#### **Parameters**

• ctx – [inout] Function context that contains the additional buffer if required by the function. riscv\_convolve\_wrapper\_s8\_get\_buffer\_size will return the buffer\_size if required

- conv\_params [in] Convolution parameters (e.g. strides, dilations, pads,...). Range of conv\_params->input\_offset: [-127, 128] Range of conv\_params->output\_offset: [-128, 127]
- quant\_params [in] Per-channel quantization info. It contains the multiplier and shift values to be applied to each output channel
- input\_dims [in] Input (activation) tensor dimensions. Format: [N, H, W, C\_IN]
- input data [in] Input (activation) data pointer. Data type: int8
- **filter\_dims [in]** Filter tensor dimensions. Format: [C\_OUT, HK, WK, C\_IN] where HK and WK are the spatial filter dimensions
- filter\_data [in] Filter data pointer. Data type: int8
- bias\_dims [in] Bias tensor dimensions. Format: [C\_OUT]
- bias\_data [in] Bias data pointer. Data type: int32
- output\_dims [in] Output tensor dimensions. Format: [N, H, W, C\_OUT]
- output\_data [out] Output data pointer. Data type: int8

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH if argument constraints fail. or, RISCV\_MATH\_SUCCESS on successful completion.

Get the required buffer size for riscv\_convolve\_wrapper\_s8.

#### **Parameters**

- conv\_params [in] Convolution parameters (e.g. strides, dilations, pads,...). Range of conv\_params->input\_offset: [-127, 128] Range of conv\_params->output\_offset: [-128, 127]
- input\_dims [in] Input (activation) dimensions. Format: [N, H, W, C\_IN]
- **filter\_dims [in]** Filter dimensions. Format: [C\_OUT, HK, WK, C\_IN] where HK and WK are the spatial filter dimensions
- output dims [in] Output tensor dimensions. Format: [N, H, W, C OUT]

**Returns** The function returns required buffer size(bytes)

```
riscv_status riscv_depthwise_conv_3x3_s8 (const nmsis_nn_context *ctx, const nmsis_nn_dw_conv_params *dw_conv_params, const nmsis_nn_per_channel_quant_params *quant_params, const nmsis_nn_dims *in-put_dims, const q7_t *input, const nmsis_nn_dims *filter_dims, const q7_t *kernel, const nmsis_nn_dims *bias_dims, const int32_t *bias, const nmsis_nn_dims *out-put_dims, q7_t *output)
```

Optimized s8 depthwise convolution function for 3x3 kernel size with some constraints on the input arguments (documented below). Refer riscv\_depthwise\_conv\_s8() for function argument details.

• Supported framework: TensorFlow Lite Micro

- The following constrains on the arguments apply
  - a. Number of input channel equals number of output channels
  - b. Filter height and width equals 3
  - c. Padding along x is either 0 or 1.

Returns The function returns one of the following RISCV\_MATH\_SIZE\_MISMATCH - Unsupported dimension of tensors RISCV\_MATH\_ARGUMENT\_ERROR - Unsupported pad size along the x axis RISCV\_MATH\_SUCCESS - Successful operation

static void depthwise\_conv\_s8\_mult\_4 (const int8\_t \*input, const int32\_t input\_x, const int32\_t input\_y, const int32\_t input\_ch, const int8\_t \*kernel, const int32\_t output\_ch, const int32\_t ch\_mult, const int32\_t kernel\_x, const int32\_t kernel\_y, const int32\_t pad\_x, const int32\_t pad\_y, const int32\_t stride\_x, const int32\_t stride\_y, const int32\_t \*bias, int8\_t \*output, const int32\_t \*output\_shift, const int32\_t \*output\_mult, const int32\_t output\_x, const int32\_t output\_y, const int32\_t output\_put\_offset, const int32\_t output\_activation\_min, const int32\_t output\_activation\_min, const int32\_t output\_activation\_min, const int32\_t output\_activation\_max)

static void depthwise\_conv\_s8\_generic (const q7\_t \*input, const uint16\_t input\_batches, const uint16\_t input\_x, const uint16\_t input\_y, const uint16\_t input\_ch, const q7\_t \*kernel, const uint16\_t output\_ch, const uint16\_t ch\_mult, const uint16\_t kernel\_x, const uint16\_t kernel\_y, const uint16\_t pad\_x, const uint16\_t pad\_y, const uint16\_t stride\_x, const uint16\_t stride\_y, const int32\_t \*bias, q7\_t \*output, const int32\_t \*output\_shift, const int32\_t \*output\_mult, const int32\_t output\_put\_offset, const int32\_t input\_offset, const int32\_t output\_activation\_min, const int32\_t output activation\_min, c

riscv\_status riscv\_depthwise\_conv\_s8 (const nmsis\_nn\_context \*ctx, const nmsis\_nn\_dw\_conv\_params \*dw\_conv\_params, const nmsis\_nn\_per\_channel\_quant\_params \*quant\_params, const nmsis\_nn\_dims \*input\_dims, const q7\_t \*input, const nmsis\_nn\_dims \*filter\_dims, const q7\_t \*kernel, const nmsis\_nn\_dims \*bias\_dims, const int32\_t \*bias, const nmsis\_nn\_dims \*output\_dims, q7 t\*output)

Basic s8 depthwise convolution function that doesn't have any constraints on the input dimensions.

- Supported framework: TensorFlow Lite
- q7 is used as data type eventhough it is s8 data. It is done so to be consistent with existing APIs.

#### **Parameters**

- ctx [inout] Function context (e.g. temporary buffer). Check the function definition file to see if an additional buffer is required. Optional function {API}\_get\_buffer\_size() provides the buffer size if an additional buffer is required. exists if additional memory is.
- dw\_conv\_params [in] Depthwise convolution parameters (e.g. strides, dilations, pads,...) dw\_conv\_params->dilation is not used. Range of dw\_conv\_params->input\_offset: [-127, 128] Range of dw\_conv\_params->input\_offset: [-128, 127]
- quant\_params [in] Per-channel quantization info. It contains the multiplier and shift values to be applied to each output channel
- input\_dims [in] Input (activation) tensor dimensions. Format: [1, H, W, C\_IN] Batch argument N is not used.
- input\_data [in] Input (activation) data pointer. Data type: int8
- filter\_dims [in] Filter tensor dimensions. Format: [1, H, W, C\_OUT]
- filter\_data [in] Filter data pointer. Data type: int8
- bias\_dims [in] Bias tensor dimensions. Format: [C\_OUT]
- bias\_data [in] Bias data pointer. Data type: int32
- output\_dims [in] Output tensor dimensions. Format: [1, H, W, C\_OUT]
- output\_data [inout] Output data pointer. Data type: int8

Returns The function returns RISCV MATH SUCCESS

```
riscv_status riscv_depthwise_conv_s8_opt (const nmsis_nn_context *ctx, const nmsis_nn_dw_conv_params *dw_conv_params, const nmsis_nn_per_channel_quant_params *quant_params, const nmsis_nn_dims *in-put_dims, const q7_t *input, const nmsis_nn_dims *filter_dims, const q7_t *kernel, const nmsis_nn_dims *bias_dims, const int32_t *bias, const nmsis_nn_dims *out-put_dims, q7_t *output)
```

Optimized s8 depthwise convolution function with constraint that in\_channel equals out\_channel. Refer riscv\_depthwise\_conv\_s8() for function argument details.

- Supported framework: TensorFlow Lite
- The following constrains on the arguments apply
  - a. Number of input channel equals number of output channels or ch mult equals 1
- q7 is used as data type eventhough it is s8 data. It is done so to be consistent with existing APIs.
- Reccomended when number of channels is 4 or greater.

**Note:** If number of channels is not a multiple of 4, upto 3 elements outside the boundary will be read out for the following if MVE optimizations(Arm Helium Technology) are used.

- · Output shift
- · Output multiplier
- · Output bias
- kernel

Returns The function returns one of the following RISCV\_MATH\_SIZE\_MISMATCH - input channel != output channel or ch\_mult != 1 RISCV\_MATH\_SUCCESS - Successful operation

```
int32_t riscv_depthwise_conv_s8_opt_get_buffer_size(const nmsis_nn_dims *input_dims, const nmsis_nn_dims sis nn dims *filter dims)
```

Get the required buffer size for optimized s8 depthwise convolution function with constraint that in channel equals out channel.

#### **Parameters**

- input\_dims [in] Input (activation) tensor dimensions. Format: [1, H, W, C\_IN] Batch argument N is not used.
- filter\_dims [in] Filter tensor dimensions. Format: [1, H, W, C\_OUT]

**Returns** The function returns required buffer size in bytes

static void depthwise\_conv\_u8\_mult\_4 (const uint8\_t \*input, const int32\_t input\_x, const int32\_t input\_y, const int32\_t input\_ch, const uint8\_t \*kernel, const int32\_t out-put\_ch, const int32\_t ch\_mult, const int32\_t kernel\_x, const int32\_t kernel\_y, const int32\_t pad\_x, const int32\_t pad\_y, const int32\_t stride\_x, const int32\_t stride\_y, const int32\_t \*bias, uint8\_t \*output, const int32\_t out-put\_shift, const int32\_t output\_mult, const int32\_t output\_x, const int32\_t output\_y, const int32\_t output\_offset, const int32\_t in-put\_offset, const int32\_t filter\_offset, const int32\_t output\_activation\_min, const int32\_t output\_activation\_min, const int32\_t output\_activation\_max)

static void depthwise\_conv\_u8\_generic (const uint8\_t \*input, const int32\_t input\_x, const int32\_t input\_y, const int32\_t input\_ch, const uint8\_t \*kernel, const int32\_t output\_ch, const int32\_t ch\_mult, const int32\_t kernel\_x, const int32\_t kernel\_y, const int32\_t pad\_x, const int32\_t pad\_y, const int32\_t stride\_x, const int32\_t stride\_y, const int32\_t stride\_x, const int32\_t stride\_y, const int32\_t output\_shift, const int32\_t output\_mult, const int32\_t output\_x, const int32\_t output\_y, const int32\_t output\_offset, const int32\_t input\_offset, const int32\_t input\_offset, const int32\_t output\_activation\_min, const int32\_t output activation\_min, const int32\_t output activation max)

riscy status riscy depthwise conv u8 basic ver1 (const uint8 t \*input, const uint16 t input x, const uint16 t input y, const uint16 t input ch, const uint8 t \*kernel, const uint16 t kernel x, const uint16 t kernel y, const int16 t ch mult, const int16 t pad x, const int16 t pad y, const int16 t stride x, const int16 t *stride\_y*, **const** int16\_t *dilation\_x*, const int16\_t dilation\_y, const int32\_t \*bias, const int32\_t input\_offset, const int32\_t filter\_offset, const int32\_t output\_offset, uint8\_t \*output, const uint16\_t output\_x, const uint16\_t output\_y, const int32\_t output\_activation\_min, const int32\_t output\_activation\_max, const

uint8 depthwise convolution function with asymmetric quantization

uint8 depthwise convolution function with asymmetric quantization Unless specified otherwise, arguments are mandatory.

int32 t output shift, const int32 t

output mult)

#### **Parameters**

- input [in] Pointer to input tensor
- input x [in] Width of input tensor
- input\_y [in] Height of input tensor
- input\_ch [in] Channels in input tensor
- kernel [in] Pointer to kernel weights
- kernel\_x [in] Width of kernel
- kernel\_y [in] Height of kernel
- ch\_mult [in] Number of channel multiplier
- pad\_x [in] Padding sizes x
- pad\_y [in] Padding sizes y
- **stride\_x** [in] Convolution stride along the width
- **stride\_y** [in] Convolution stride along the height
- dilation  $\mathbf{x} [\mathbf{in}]$  Dilation along width. Not used and intended for future enhancement.
- dilation\_y [in] Dilation along height. Not used and intended for future enhancement.
- bias [in] Pointer to optional bias values. If no bias is available, NULL is expected
- input\_offset [in] Input tensor zero offset
- filter\_offset [in] Kernel tensor zero offset
- output\_offset [in] Output tensor zero offset
- output [inout] Pointer to output tensor

- output\_x [in] Width of output tensor
- output\_y [in] Height of output tensor
- output\_activation\_min [in] Minimum value to clamp the output to. Range: {0, 255}
- output\_activation\_max [in] Minimum value to clamp the output to. Range: {0, 255}
- output shift [in] Amount of right-shift for output
- output\_mult [in] Output multiplier for requantization

Returns The function returns one of the following RISCV\_MATH\_SIZE\_MISMATCH - Not supported dimension of tensors RISCV\_MATH\_SUCCESS - Successful operation RISCV\_MATH\_ARGUMENT\_ERROR - Implementation not available

```
riscv_status riscv_depthwise_conv_wrapper_s8 (const
                                                               nmsis_nn_context
                                                                                     *ctx,
                                                  const
                                                                nmsis_nn_dw_conv_params
                                                  *dw_conv_params,
                                                                         const
                                                                                     nm-
                                                  sis nn per channel quant params
                                                  *quant params, const nmsis nn dims
                                                  *input_dims, const q7_t *input, const
                                                                   *filter dims,
                                                  nmsis_nn_dims
                                                  q7_t *filter,
                                                                  const nmsis_nn_dims
                                                   *bias dims, const int32 t *bias, const
                                                  nmsis nn dims *output dims, q7 t *output)
```

Wrapper function to pick the right optimized s8 depthwise convolution function.

- Supported framework: TensorFlow Lite
- Picks one of the the following functions
  - a. riscv\_depthwise\_conv\_s8()
  - b. riscv\_depthwise\_conv\_3x3\_s8() RISC-V CPUs with DSP extension only
  - c. riscv\_depthwise\_conv\_s8\_opt()
- q7 is used as data type eventhough it is s8 data. It is done so to be consistent with existing APIs.
- Check details of riscv\_depthwise\_conv\_s8\_opt() for potential data that can be accessed outside of the boundary.

#### **Parameters**

- ctx [inout] Function context (e.g. temporary buffer). Check the function definition file to see if an additional buffer is required. Optional function {API}\_get\_buffer\_size() provides the buffer size if required.
- dw\_conv\_params [in] Depthwise convolution parameters (e.g. strides, dilations, pads,...) dw\_conv\_params->dilation is not used. Range of dw\_conv\_params->input\_offset: [-127, 128] Range of dw\_conv\_params->output\_offset: [-128, 127]
- quant\_params [in] Per-channel quantization info. It contains the multiplier and shift values to be applied to each output channel
- input\_dims [in] Input (activation) tensor dimensions. Format: [H, W, C\_IN] Batch argument N is not used and assumed to be 1.
- input\_data [in] Input (activation) data pointer. Data type: int8
- filter\_dims [in] Filter tensor dimensions. Format: [1, H, W, C\_OUT]

- filter\_data [in] Filter data pointer. Data type: int8
- bias dims [in] Bias tensor dimensions. Format: [C OUT]
- bias\_data [in] Bias data pointer. Data type: int32
- output\_dims [in] Output tensor dimensions. Format: [1, H, W, C\_OUT]
- output data [inout] Output data pointer. Data type: int8

Returns The function returns RISCV MATH SUCCESS - Successful completion.

Get size of additional buffer required by riscv\_depthwise\_conv\_wrapper\_s8()

#### **Parameters**

- dw\_conv\_params [in] Depthwise convolution parameters (e.g. strides, dilations, pads,...) dw\_conv\_params->dilation is not used. Range of dw\_conv\_params->input\_offset: [-127, 128] Range of dw\_conv\_params->input\_offset: [-128, 127]
- input\_dims [in] Input (activation) tensor dimensions. Format: [H, W, C\_IN] Batch argument N is not used and assumed to be 1.
- filter\_dims [in] Filter tensor dimensions. Format: [1, H, W, C\_OUT]
- output\_dims [in] Output tensor dimensions. Format: [1, H, W, C\_OUT]

Returns Size of additional memory required for optimizations in bytes.

```
riscv_status riscv_depthwise_separable_conv_HWC_q7 (const q7_t *Im_in, const uint16_t dim_im_in, const uint16_t ch_im_in, const q7_t *wt, const uint16_t ch_im_out, const uint16_t dim_kernel, const uint16_t padding, const uint16_t padding, const uint16_t stride, const q7_t *bias, const uint16_t bias_shift, const uint16_t out_shift, q7_t *Im_out, const uint16_t dim_im_out, q15_t *bufferA, q7_t *bufferB)
```

Q7 depthwise separable convolution function.

### **Buffer size:**

bufferA size: 2\*ch im in\*dim kernel\*dim kernel

bufferB size: 0

## Input dimension constraints:

ch\_im\_in equals ch\_im\_out

Implementation: There are 3 nested loop here: Inner loop: calculate each output value with MAC instruction over an accumulator Mid loop: loop over different output channel Outer loop: loop over different output (x, y)

#### **Parameters**

- Im\_in [in] pointer to input tensor
- dim\_im\_in [in] input tensor dimension
- ch\_im\_in [in] number of input tensor channels
- wt [in] pointer to kernel weights
- ch\_im\_out [in] number of filters, i.e., output tensor channels
- dim\_kernel [in] filter kernel size
- padding [in] padding sizes
- stride [in] convolution stride
- bias [in] pointer to bias
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- Im\_out [inout] pointer to output tensor
- dim\_im\_out [in] output tensor dimension
- bufferA [inout] pointer to buffer space for input
- bufferB [inout] pointer to buffer space for output

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH or RISCV\_MATH\_SUCCESS based on the outcome of size checking.

```
riscv_status riscv_depthwise_separable_conv_HWC_q7_nonsquare (const q7_t *Im_in,
                                                                       const
                                                                                  uint16_t
                                                                       dim_im_in_x,
                                                                       const
                                                                                  uint16_t
                                                                       dim_im_in_y,
                                                                       const
                                                                                  uint16 t
                                                                       ch im in,
                                                                                   const
                                                                       q7_t *wt, const
                                                                       uint16_t ch_im_out,
                                                                       const
                                                                                  uint16_t
                                                                       dim_kernel_x,
                                                                                  uint16_t
                                                                       const
                                                                       dim_kernel_y,
                                                                       const
                                                                                  uint16_t
                                                                       padding_x, const
                                                                       uint16_t padding_y,
                                                                       const
                                                                                  uint16_t
                                                                       stride x,
                                                                                   const
                                                                       uint16 t
                                                                                 stride_y,
                                                                       const q7_t *bias,
                                                                       const
                                                                                  uint16 t
                                                                       bias_shift,
                                                                                   const
                                                                       uint16_t out_shift,
                                                                       q7_t
                                                                                  *Im out,
                                                                       const
                                                                                  uint16 t
                                                                       dim_im_out_x,
```

Q7 depthwise separable convolution function (non-square shape)

This function is the version with full list of optimization tricks, but with some constraints: ch\_im\_in is equal to ch\_im\_out

#### **Parameters**

- Im\_in [in] pointer to input tensor
- dim\_im\_in\_x [in] input tensor dimension x
- dim\_im\_in\_y [in] input tensor dimension y
- ch\_im\_in [in] number of input tensor channels
- wt [in] pointer to kernel weights
- ch im out [in] number of filters, i.e., output tensor channels
- dim\_kernel\_x [in] filter kernel size x
- dim\_kernel\_y [in] filter kernel size y
- padding\_x [in] padding sizes x
- padding\_y [in] padding sizes y
- stride\_x [in] convolution stride x
- stride\_y [in] convolution stride y
- bias [in] pointer to bias

uint16\_t

const dim\_im\_out\_y, q15\_t \*bufferA, q7\_t

\*bufferB)

- bias\_shift [in] amount of left-shift for bias
- out shift [in] amount of right-shift for output
- Im\_out [inout] pointer to output tensor
- dim\_im\_out\_x [in] output tensor dimension x
- dim\_im\_out\_y [in] output tensor dimension y
- bufferA [inout] pointer to buffer space for input
- bufferB [inout] pointer to buffer space for output

**Returns** The function returns either RISCV\_MATH\_SIZE\_MISMATCH or RISCV\_MATH\_SUCCESS based on the outcome of size checking.

## **Fully-connected Layer Functions**

```
riscv_status riscv_fully_connected_mat_q7_vec_q15 (const q15_t *pV, const q7_t *pM,
                                                        const uint16_t dim_vec, const uint16_t
                                                        num_of_rows, const uint16_t bias_shift,
                                                        const uint16_t out_shift, const q7_t
                                                        *bias, q15_t *pOut, q15_t *vec_buffer)
riscv_status riscv_fully_connected_mat_q7_vec_q15_opt (const_q15_t *pV, const_q7_t
                                                             *pM, const uint16 t dim vec,
                                                                               num_of_rows,
                                                                      uint16 t
                                                             const
                                                                        uint16 t
                                                                                   bias shift,
                                                             const uint16_t out_shift, const
                                                             q7_t *bias, q15_t *pOut, q15_t
                                                             *vec_buffer)
riscv_status riscv_fully_connected_q15 (const q15_t *pV, const q15_t *pM, const uint16_t
                                          dim vec, const uint16 t num of rows, const uint16 t
                                          bias shift, const uint16 t out shift, const q15 t *bias,
                                          q15_t *pOut, q15_t *vec_buffer)
riscv_status riscv_fully_connected_q15_opt (const q15_t *pV, const q15_t *pM, const
                                               uint16_t dim_vec, const uint16_t num_of_rows,
                                               const uint16_t bias_shift, const uint16_t
                                               out_shift, const q15_t *bias, q15_t *pOut, q15_t
                                               *vec_buffer)
riscv_status riscv_fully_connected_q7 (const q7_t *pV, const q7_t *pM, const uint16_t
                                         dim vec, const uint16 t num of rows, const uint16 t
                                         bias_shift, const uint16_t out_shift, const q7_t *bias,
```

riscv\_status riscv\_fully\_connected\_q7\_opt (const q7\_t \*pV, const q7\_t \*pM, const uint16\_t dim\_vec, const uint16\_t num\_of\_rows, const uint16\_t bias\_shift, const uint16\_t out\_shift, const q7\_t \*bias, q7\_t \*pOut, q15\_t \*vec\_buffer)

q7\_t \**pOut*, q15\_t \**vec\_buffer*)

riscv\_status riscv\_fully\_connected\_s8 (const nmsis\_nn\_context \*ctx, const nmsis\_nn\_fc\_params \*fc\_params, const nmsis\_nn\_per\_tensor\_quant\_params \*quant\_params, const nmsis\_nn\_dims \*input\_dims, const q7\_t \*input, const nmsis\_nn\_dims \*filter\_dims, const q7\_t \*kernel, const nmsis\_nn\_dims \*bias\_dims, const int32\_t \*bias, const nmsis\_nn\_dims \*output\_dims, q7\_t \*output)

int32\_t riscv\_fully\_connected\_s8\_get\_buffer\_size (const nmsis\_nn\_dims \*filter\_dims)

#### USE INTRINSIC

group FC

Collection of fully-connected and matrix multiplication functions.

Fully-connected layer is basically a matrix-vector multiplication with bias. The matrix is the weights and the input/output vectors are the activation values. Supported {weight, activation} precisions include {8-bit, 8-bit}, {16-bit, 16-bit}, and {8-bit, 16-bit}.

Here we have two types of kernel functions. The basic function implements the function using regular GEMV approach. The opt functions operates with weights in interleaved formats.

#### **Defines**

#### USE INTRINSIC

Mixed Q15-Q7 opt fully-connected layer function.

#### **Buffer size:**

vec\_buffer size: 0

Q7\_Q15 version of the fully connected layer

Weights are in q7\_t and Activations are in q15\_t

Limitation: x4 version requires weight reordering to work

Here we use only one pointer to read 4 rows in the weight matrix. So if the original q7\_t matrix looks like this:

| a11 | a12 | a13 | a14 | a15 | a16 | a17 |

| a21 | a22 | a23 | a24 | a25 | a26 | a27 |

| a31 | a32 | a33 | a34 | a35 | a36 | a37 |

| a41 | a42 | a43 | a44 | a45 | a46 | a47 |

| a51 | a52 | a53 | a54 | a55 | a56 | a57 |

| a61 | a62 | a63 | a64 | a65 | a66 | a67 |

We operates on multiple-of-4 rows, so the first four rows becomes

| a11 | a21 | a12 | a22 | a31 | a41 | a32 | a42 |

| a13 | a23 | a14 | a24 | a33 | a43 | a34 | a44 |

| a15 | a25 | a16 | a26 | a35 | a45 | a36 | a46 |

The column left over will be in-order. which is: | a17 | a27 | a37 | a47 |

For the left-over rows, we do 1x1 computation, so the data remains as its original order.

So the stored weight matrix looks like this:

| a11 | a21 | a12 | a22 | a31 | a41 |

| a32 | a42 | a13 | a23 | a14 | a24 |

| a33 | a43 | a34 | a44 | a15 | a25 |

| a16 | a26 | a35 | a45 | a36 | a46 |

| a17 | a27 | a37 | a47 | a51 | a52 |

```
| a53 | a54 | a55 | a56 | a57 | a61 |
| a62 | a63 | a64 | a65 | a66 | a67 |
```

### **Parameters**

- pV [in] pointer to input vector
- pM [in] pointer to matrix weights
- dim vec [in] length of the vector
- num\_of\_rows [in] number of rows in weight matrix
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- bias [in] pointer to bias
- pOut [inout] pointer to output vector
- vec\_buffer [inout] pointer to buffer space for input

Returns The function returns RISCV\_MATH\_SUCCESS

### **Functions**

```
riscv_status riscv_fully_connected_mat_q7_vec_q15 (const q15_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q15_t *pOut, q15_t *vec_buffer)
```

Mixed Q15-Q7 fully-connected layer function.

#### **Buffer size:**

vec buffer size: 0

Q7\_Q15 version of the fully connected layer

Weights are in q7\_t and Activations are in q15\_t

## **Parameters**

- pV [in] pointer to input vector
- pM [in] pointer to matrix weights
- dim\_vec [in] length of the vector
- num\_of\_rows [in] number of rows in weight matrix
- bias\_shift [in] amount of left-shift for bias
- out shift [in] amount of right-shift for output
- bias [in] pointer to bias
- pOut [inout] pointer to output vector
- vec\_buffer [inout] pointer to buffer space for input

Returns The function returns RISCV\_MATH\_SUCCESS

```
riscv_status riscv_fully_connected_mat_q7_vec_q15_opt (const q15_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q15_t *pOut, q15_t *vec_buffer)
```

Mixed Q15-Q7 opt fully-connected layer function.

#### **Parameters**

- pV [in] pointer to input vector
- pM [in] pointer to matrix weights
- dim\_vec [in] length of the vector
- num\_of\_rows [in] number of rows in weight matrix
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- bias [in] pointer to bias
- pOut [inout] pointer to output vector
- vec\_buffer [inout] pointer to buffer space for input

Returns The function returns RISCV\_MATH\_SUCCESS

```
riscv_status riscv_fully_connected_q15 (const q15_t *pV, const q15_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q15_t *bias, q15_t *pOut, q15_t *vec_buffer)
```

Q15 opt fully-connected layer function.

Q15 basic fully-connected layer function.

## **Buffer size:**

vec buffer size: 0

#### **Parameters**

- pV [in] pointer to input vector
- pM [in] pointer to matrix weights
- dim\_vec [in] length of the vector
- num\_of\_rows [in] number of rows in weight matrix
- bias\_shift [in] amount of left-shift for bias
- out shift [in] amount of right-shift for output
- bias [in] pointer to bias
- pOut [inout] pointer to output vector
- vec\_buffer [inout] pointer to buffer space for input

Returns The function returns RISCV MATH SUCCESS

```
riscv_status riscv_fully_connected_q15_opt (const q15_t *pV, const q15_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q15_t *bias, q15_t *pOut, q15_t *vec_buffer)

Q15 opt fully-connected layer function.

Buffer size:

vec_buffer size: 0

Here we use only one pointer to read 4 rows in the weight matrix. So if the original matrix looks like this: | a11 | a12 | a13 | | a21 | a22 | a23 |
```

| a31 | a32 | a33 |

| a41 | a42 | a43 |

| a51 | a52 | a53 |

| a61 | a62 | a63 |

We operates on multiple-of-4 rows, so the first four rows becomes

| a11 | a12 | a21 | a22 | a31 | a32 | a41 | a42 |

| a13 | a23 | a33 | a43 |

Remaining rows are kept the same original order.

So the stored weight matrix looks like this:

| a11 | a12 | a21 | a22 | a31 | a32 | a41 | a42 |

| a13 | a23 | a33 | a43 | a51 | a52 | a53 | a61 |

| a62 | a63 |

#### **Parameters**

- pV [in] pointer to input vector
- pM [in] pointer to matrix weights
- dim\_vec [in] length of the vector
- num\_of\_rows [in] number of rows in weight matrix
- bias shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- bias [in] pointer to bias
- pOut [inout] pointer to output vector
- vec\_buffer [inout] pointer to buffer space for input

Returns The function returns RISCV\_MATH\_SUCCESS

riscv\_status riscv\_fully\_connected\_q7 (const q7\_t \*pV, const q7\_t \*pM, const uint16\_t dim\_vec, const uint16\_t num\_of\_rows, const uint16\_t bias\_shift, const uint16\_t out\_shift, const q7\_t \*bias, q7\_t \*pOut, q15\_t \*vec\_buffer)

Q7 basic fully-connected layer function.

#### **Buffer size:**

```
vec_buffer size: dim_vec
```

This basic function is designed to work with regular weight matrix without interleaving.

#### **Parameters**

- pV [in] pointer to input vector
- pM [in] pointer to matrix weights
- dim\_vec [in] length of the vector
- num\_of\_rows [in] number of rows in weight matrix
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- bias [in] pointer to bias
- pOut [inout] pointer to output vector
- vec\_buffer [inout] pointer to buffer space for input

Returns The function returns RISCV MATH SUCCESS

```
riscv_status riscv_fully_connected_q7_opt (const q7_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q7_t *pOut, q15_t *vec_buffer)
```

Q7 opt fully-connected layer function.

#### **Buffer size:**

```
vec_buffer size: dim_vec
```

This opt function is designed to work with interleaved weight matrix. The vector input is assumed in q7\_t format, we call riscv\_q7\_to\_q15\_no\_shift\_shuffle function to expand into q15\_t format with certain weight re-ordering, refer to the function comments for more details. Here we use only one pointer to read 4 rows in the weight matrix. So if the original q7\_t matrix looks like this:

```
| a11 | a12 | a13 | a14 | a15 | a16 | a17 |
| a21 | a22 | a23 | a24 | a25 | a26 | a27 |
| a31 | a32 | a33 | a34 | a35 | a36 | a37 |
| a41 | a42 | a43 | a44 | a45 | a46 | a47 |
| a51 | a52 | a53 | a54 | a55 | a56 | a57 |
| a61 | a62 | a63 | a64 | a65 | a66 | a67 |
```

We operates on multiple-of-4 rows, so the first four rows becomes

```
| a11 | a21 | a13 | a23 | a31 | a41 | a33 | a43 |
| a12 | a22 | a14 | a24 | a32 | a42 | a34 | a44 |
| a15 | a25 | a35 | a45 | a16 | a26 | a36 | a46 |
```

So within the kernel, we first read the re-ordered vector in as:

```
| b1 | b3 | and | b2 | b4 |
```

the four q31\_t weights will look like

```
| a11 | a13 |, | a21 | a23 |, | a31 | a33 |, | a41 | a43 |
| a12 | a14 |, | a22 | a24 |, | a32 | a34 |, | a42 | a44 |
```

The column left over will be in-order, which is:

```
| a17 | a27 | a37 | a47 |
```

For the left-over rows, we do 1x1 computation, so the data remains as its original order.

So the stored weight matrix looks like this:

```
| a11 | a21 | a13 | a23 | a31 | a41 |
| a33 | a43 | a12 | a22 | a14 | a24 |
| a32 | a42 | a34 | a44 | a15 | a25 |
| a35 | a45 | a16 | a26 | a36 | a46 |
| a17 | a27 | a37 | a47 | a51 | a52 |
| a53 | a54 | a55 | a56 | a57 | a61 |
| a62 | a63 | a64 | a65 | a66 | a67 |
```

#### **Parameters**

- pV [in] pointer to input vector
- pM [in] pointer to matrix weights
- dim\_vec [in] length of the vector
- num\_of\_rows [in] number of rows in weight matrix
- bias\_shift [in] amount of left-shift for bias
- out\_shift [in] amount of right-shift for output
- bias [in] pointer to bias
- pOut [inout] pointer to output vector
- vec\_buffer [inout] pointer to buffer space for input

Returns The function returns RISCV\_MATH\_SUCCESS

```
riscv_status riscv_fully_connected_s8 (const nmsis_nn_context *ctx, const nmsis_nn_fc_params *fc_params, const nmsis_nn_fc_params, *fc_params, const nmsis_nn_dims *input_dims, const q7_t *input, const nmsis_nn_dims *filter_dims, const q7_t *kernel, const nmsis_nn_dims *bias_dims, const int32_t *bias, const nmsis_nn_dims *output_dims, q7_t *output)
```

Basic s8 Fully Connected function.

- Supported framework: TensorFlow Lite
- q7 is used as data type eventhough it is s8 data. It is done so to be consistent with existing APIs.

#### **Parameters**

• ctx – [inout] Function context (e.g. temporary buffer). Check the function definition file to see if an additional buffer is required. Optional function {API}\_get\_buffer\_size() provides the buffer size if an additional buffer is required.

- fc\_params [in] Fully Connected layer parameters (e.g. strides, dilations, pads,...)

  Range of fc\_params->input\_offset: [-127, 128] fc\_params->filter\_offset: 0 Range of fc\_params->output\_offset: [-128, 127]
- quant\_params [in] Per-tensor quantization info. It contains the multiplier and shift values to be applied to the output tensor.
- input\_dims [in] Input (activation) tensor dimensions. Format: [N, H, W, C\_IN] Input dimension is taken as Nx(H \* W \* C\_IN)
- input\_data [in] Input (activation) data pointer. Data type: int8
- filter\_dims [in] Two dimensional filter dimensions. Format: [N, C] N : accumulation depth and equals (H \* W \* C\_IN) from input\_dims C : output depth and equals C OUT in output dims H & W : Not used
- filter\_data [in] Filter data pointer. Data type: int8
- bias\_dims [in] Bias tensor dimensions. Format: [C\_OUT] N, H, W: Not used
- bias\_data [in] Bias data pointer. Data type: int32
- output\_dims [in] Output tensor dimensions. Format: [N, C\_OUT] N : Batches C\_OUT : Output depth H & W : Not used.
- output\_data [inout] Output data pointer. Data type: int8

Returns The function returns RISCV\_MATH\_SUCCESS

```
int32_t riscv_fully_connected_s8_get_buffer_size(const nmsis_nn_dims *fil-
ter dims) *fil-
```

Get the required buffer size for S8 basic fully-connected and matrix multiplication layer function for TF Lite.

Parameters filter\_dims - [in] dimension of filter

Returns The function returns required buffer size in bytes

## **Neural Network Pooling Functions**

```
riscv_status riscv_avgpool_s8 (const nmsis_nn_context *ctx, const nmsis_nn_pool_params *pool_params, const nmsis_nn_dims *input_dims, const q7_t *src, const nmsis_nn_dims *filter_dims, const nmsis_nn_dims *output_dims, q7_t *dst)
```

int32\_t riscv\_avgpool\_s8\_get\_buffer\_size (const int output\_x, const int ch\_src)

```
riscv_status riscv_max_pool_s8 (const nmsis_nn_context *ctx, const nmsis_nn_pool_params *pool_params, const nmsis_nn_dims *input_dims, const q7_t *src, const nmsis_nn_dims *filter_dims, const nmsis_nn_dims *output_dims, q7_t *dst)
```

void riscv\_maxpool\_q7\_HWC (q7\_t \*Im\_in, const uint16\_t dim\_im\_in, const uint16\_t ch\_im\_in, const uint16\_t dim\_kernel, const uint16\_t padding, const uint16\_t stride, const uint16\_t dim\_im\_out, q7\_t \*bufferA, q7\_t \*Im\_out)

void riscv\_avepool\_q7\_HWC (q7\_t \*Im\_in, const uint16\_t dim\_im\_in, const uint16\_t ch\_im\_in, const uint16\_t dim\_kernel, const uint16\_t padding, const uint16\_t stride, const uint16\_t dim\_im\_out, q7\_t \*bufferA, q7\_t \*Im\_out)

## group Pooling

Perform pooling functions, including max pooling and average pooling

## **Functions**

```
riscv_status riscv_avgpool_s8 (const nmsis_nn_context *ctx, const nmsis_nn_pool_params *pool_params, const nmsis_nn_dims *input_dims, const q7_t *src, const nmsis_nn_dims *filter_dims, const nmsis_nn_dims *output_dims, q7_t *dst)
```

s8 average pooling function.

• Supported Framework: TensorFlow Lite

#### **Parameters**

- ctx [inout] Function context (e.g. temporary buffer). Check the function definition file to see if an additional buffer is required. Optional function {API}\_get\_buffer\_size() provides the buffer size if an additional buffer is required.
- pool\_params [in] Pooling parameters
- input\_dims [in] Input (activation) tensor dimensions. Format: [H, W, C\_IN] Argument 'N' is not used.
- input\_data [in] Input (activation) data pointer. Data type: int8
- filter\_dims [in] Filter tensor dimensions. Format: [H, W] Argument N and C are not used.
- output\_dims [in] Output tensor dimensions. Format: [H, W, C\_OUT] Argument N is not used. C\_OUT equals C\_IN.
- output\_data [inout] Output data pointer. Data type: int8

Returns The function returns RISCV\_MATH\_SUCCESS - Successful operation

int32\_t riscv\_avgpool\_s8\_get\_buffer\_size (const int *output\_x*, const int *ch\_src*)
Get the required buffer size for S8 average pooling function.

## **Parameters**

- dim\_dst\_width [in] output tensor dimension
- ch\_src [in] number of input tensor channels

**Returns** The function returns required buffer size in bytes

```
riscv_status riscv_max_pool_s8 (const nmsis_nn_context *ctx, const nmsis_nn_pool_params *pool_params, const nmsis_nn_dims *input_dims, const q7_t *src, const nmsis_nn_dims *filter_dims, const nmsis_nn_dims *output_dims, q7_t *dst)
```

s8 max pooling function.

• Supported Framework: TensorFlow Lite

### **Parameters**

- ctx [inout] Function context (e.g. temporary buffer). Check the function definition file to see if an additional buffer is required. Optional function {API}\_get\_buffer\_size() provides the buffer size if an additional buffer is required.
- pool\_params [in] Pooling parameters
- input\_dims [in] Input (activation) tensor dimensions. Format: [H, W, C\_IN] Argument 'N' is not used.

- input\_data [in] Input (activation) data pointer. Data type: int8
- filter\_dims [in] Filter tensor dimensions. Format: [H, W] Argument N and C are not used.
- output\_dims [in] Output tensor dimensions. Format: [H, W, C\_OUT] Argument N is not used. C\_OUT equals C\_IN.
- output\_data [inout] Output data pointer. Data type: int8

Returns The function returns RISCV\_MATH\_SUCCESS - Successful operation

```
void riscv_maxpool_q7_HWC (q7_t *Im_in, const uint16_t dim_im_in, const uint16_t ch_im_in, const uint16_t dim_kernel, const uint16_t padding, const uint16_t stride, const uint16_t dim_im_out, q7_t *bufferA, q7_t *Im_out)
```

Q7 max pooling function.

The pooling function is implemented as split x-pooling then y-pooling.

This pooling function is input-destructive. Input data is undefined after calling this function.

#### **Parameters**

- Im\_in [inout] pointer to input tensor
- dim\_im\_in [in] input tensor dimention
- ch\_im\_in [in] number of input tensor channels
- dim\_kernel [in] filter kernel size
- padding [in] padding sizes
- stride [in] convolution stride
- dim\_im\_out [in] output tensor dimension
- bufferA [inout] Not used
- Im\_out [inout] pointer to output tensor

```
void riscv_avepool_q7_HWC (q7_t *Im_in, const uint16_t dim_im_in, const uint16_t ch_im_in, const uint16_t dim_kernel, const uint16_t padding, const uint16_t stride, const uint16_t dim_im_out, q7_t *bufferA, q7_t *Im_out)
```

Q7 average pooling function.

## **Buffer size:**

bufferA size: 2\*dim im out\*ch im in

The pooling function is implemented as split x-pooling then y-pooling.

This pooling function is input-destructive. Input data is undefined after calling this function.

#### **Parameters**

- Im\_in [inout] pointer to input tensor
- dim\_im\_in [in] input tensor dimention
- ch\_im\_in [in] number of input tensor channels
- dim\_kernel [in] filter kernel size
- padding [in] padding sizes

- stride [in] convolution stride
- dim\_im\_out [in] output tensor dimension
- bufferA [inout] pointer to buffer space for input
- Im\_out [inout] pointer to output tensor

#### **Softmax Functions**

## **Functions**

```
void riscv_softmax_q15 (const q15_t *vec_in, const uint16_t dim_vec, q15_t *p_out) Q15 softmax function.
```

Here, instead of typical e based softmax, we use 2-based softmax, i.e.,:

```
y_i = 2^(x_i) / sum(2^x_j)
```

The relative output will be different here. But mathematically, the gradient will be the same with a log(2) scaling factor.

### **Parameters**

- vec\_in [in] pointer to input vector
- dim\_vec [in] input vector dimention
- p\_out [out] pointer to output vector

```
void riscv_softmax_q7 (const q7_t *vec_in, const uint16_t dim_vec, q7_t *p_out) O7 softmax function.
```

Here, instead of typical natural logarithm e based softmax, we use 2-based softmax here, i.e.,:

```
y_i = 2^(x_i) / sum(2^x_j)
```

The relative output will be different here. But mathematically, the gradient will be the same with a log(2) scaling factor.

#### **Parameters**

- vec\_in [in] pointer to input vector
- dim\_vec [in] input vector dimention
- p\_out [out] pointer to output vector

void riscv\_softmax\_s8 (const int8\_t \*input, const int32\_t num\_rows, const int32\_t row\_size, const int32\_t mult, const int32\_t shift, const int32\_t diff\_min, int8\_t \*output)

S8 softmax function.

**Note:** Supported framework: TensorFlow Lite micro (bit-accurate)

#### **Parameters**

- input [in] Pointer to the input tensor
- num\_rows [in] Number of rows in the input tensor
- row\_size [in] Number of elements in each input row
- mult [in] Input quantization multiplier
- **shift** [in] Input quantization shift within the range [0, 31]
- diff\_min [in] Minimum difference with max in row. Used to check if the quantized exponential operation can be performed
- output [out] Pointer to the output tensor

void riscv\_softmax\_u8 (const uint8\_t \*input, const int32\_t num\_rows, const int32\_t row\_size, const int32\_t mult, const int32\_t shift, const int32\_t diff\_min, uint8\_t \*output)

U8 softmax function.

**Note:** Supported framework: TensorFlow Lite micro (bit-accurate)

## **Parameters**

- input [in] Pointer to the input tensor
- num\_rows [in] Number of rows in the input tensor
- row\_size [in] Number of elements in each input row
- mult [in] Input quantization multiplier
- **shift [in]** Input quantization shift within the range [0, 31]
- diff\_min [in] Minimum difference with max in row. Used to check if the quantized exponential operation can be performed
- output [out] Pointer to the output tensor

void riscv\_softmax\_with\_batch\_q7 (const q7\_t \*vec\_in, const uint16\_t nb\_batches, const uint16\_t dim\_vec, q7\_t \*p\_out)

Q7 softmax function with batch parameter.

Here, instead of typical natural logarithm e based softmax, we use 2-based softmax here, i.e.,:

$$y_i = 2^(x_i) / sum(2^x_j)$$

The relative output will be different here. But mathematically, the gradient will be the same with a log(2) scaling factor.

#### **Parameters**

- vec\_in [in] pointer to input vector
- nb\_batches [in] number of batches
- dim\_vec [in] input vector dimention
- p\_out [out] pointer to output vector

#### group groupNN

A collection of functions to perform basic operations for neural network layers. Functions with a \_s8 suffix support TensorFlow Lite framework.

## 4.3.2 Neural Network Data Conversion Functions

```
void riscv_q7_to_q15_no_shift (const q7_t *pSrc, q15_t *pDst, uint32_t blockSize)
void riscv_q7_to_q15_reordered_no_shift (const q7_t *pSrc, q15_t *pDst, uint32_t blockSize)
void riscv_q7_to_q15_reordered_with_offset (const q7_t *src, q15_t *dst, uint32_t block_size, q15_t offset)
void riscv_q7_to_q15_with_offset (const q7_t *src, q15_t *dst, uint32_t block_size, q15_t offset)
void riscv_q7_to_q15_with_offset (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize)
void riscv_q7_to_q7_no_shift (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize)
void riscv_q7_to_q7_reordered_no_shift (const q7_t *pSrc, q7_t *pDst, uint32_t blockSize)
group nndata_convert
```

Perform data type conversion in-between neural network operations

#### **Functions**

```
void riscv_q7_to_q15_no_shift (const q7_t *pSrc, q15_t *pDst, uint32_t blockSize) Converts the elements of the Q7 vector to Q15 vector without left-shift.
```

Converts the elements of the q7 vector to q15 vector without left-shift.

The equation used for the conversion process is:

## **Description:**

#### **Parameters**

- \*pSrc [in] points to the Q7 input vector
- \*pDst [out] points to the Q15 output vector
- blockSize [in] length of the input vector

```
void riscv_q7_to_q15_reordered_no_shift (const q7_t *pSrc, q15_t *pDst, uint32_t block-
```

Converts the elements of the Q7 vector to reordered Q15 vector without left-shift.

Converts the elements of the q7 vector to reordered q15 vector without left-shift.

This function does the q7 to q15 expansion with re-ordering

is converted into:

This looks strange but is natural considering how sign-extension is done at assembly level.

The expansion of other other oprand will follow the same rule so that the end results are the same.

The tail (i.e., last (N % 4) elements) will still be in original order.

#### **Parameters**

- \*pSrc [in] points to the Q7 input vector
- \*pDst [out] points to the Q15 output vector
- blockSize [in] length of the input vector

```
void riscv_q7_to_q15_reordered_with_offset (const q7_t *src, q15_t *dst, uint32_t block size, q15_t offset)
```

Converts the elements of the Q7 vector to a reordered Q15 vector with an added offset.

Converts the elements of the q7 vector to reordered q15 vector with an added offset.

Note: Refer header file for details.

```
void riscv_q7_to_q15_with_offset (const q7_t *src, q15_t *dst, uint32_t block_size, q15_t offset)
```

Converts the elements from a q7 vector to a q15 vector with an added offset.

The equation used for the conversion process is:

## **Description:**

#### **Parameters**

- src [in] pointer to the q7 input vector
- dst [out] pointer to the q15 output vector
- block\_size [in] length of the input vector
- offset [in] q7 offset to be added to each input vector element.

void riscv\_q7\_to\_q7\_no\_shift (const q7\_t \*pSrc, q7\_t \*pDst, uint32\_t blockSize)

Converts the elements of the Q7 vector to Q7 vector without left-shift.

The equation used for the conversion process is:

## **Description:**

#### **Parameters**

- \*pSrc [in] points to the Q7 input vector
- \*pDst [out] points to the Q7 output vector
- blockSize [in] length of the input vector

Returns none.

```
void riscv_q7_to_q7_reordered_no_shift (const q7_t *pSrc, q7_t *pDst, uint32_t block-
```

Converts the elements of the Q7 vector to reordered Q7 vector without left-shift.

This function does the q7 to q7 expansion with re-ordering

is converted into:

This looks strange but is natural considering how sign-extension is done at assembly level.

The expansion of other other oprand will follow the same rule so that the end results are the same.

The tail (i.e., last (N % 4) elements) will still be in original order.

#### **Parameters**

- \*pSrc [in] points to the Q7 input vector
- \*pDst [out] points to the Q7 output vector
- blockSize [in] length of the input vector

Returns none.

## 4.3.3 Basic Math Functions for Neural Network Computation

void riscv\_nn\_accumulate\_q7\_to\_q15 (q15\_t \*pDst, const q7\_t \*pSrc, uint32\_t length)

```
void riscv_nn_accumulate_q7_to_q7 (q7_t *pDst, const q7_t *pSrc, uint32_t length)
void riscv_nn_add_q7 (const q7_t *input, q31_t *output, uint32_t block_size)
q7_t *riscv_nn_depthwise_conv_nt_t_padded_s8 (const q7_t *lhs, const q7_t *rhs, const
                                                        int32_t input_offset, const uint16_t num_ch,
                                                        const int32_t *out_shift, const int32_t
                                                        *out mult, const int32 t out offset, const
                                                        int32 tactivation min, const int32 tactiva-
                                                        tion max, const uint16 t row x col, const
                                                        int32_t *const output_bias, q7_t *out)
q7_t *riscv_nn_depthwise_conv_nt_t_s8 (const q7_t *lhs, const q7_t *rhs, const int32_t
                                               input_offset, const uint16_t num_ch, const int32_t
                                               *out_shift, const int32_t *out_mult, const int32_t
                                               out offset, const int32 t activation min, const
                                               int32 t activation max, const uint16 t row x col,
                                               const int32_t *const output_bias, q7_t *out)
riscv_status riscv_nn_mat_mul_core_1x_s8 (int32_t row_elements, const int8_t *row_base, const
                                               int8_t *col_base, int32_t *const sum_col, int32_t
                                               *const output)
riscv_status riscv_nn_mat_mul_core_4x_s8 (const int32_t row_elements, const int32_t offset,
                                               const int8 t *row base, const int8 t *col base,
                                               int32_t *const sum_col, int32_t *const output)
riscv_status riscv_nn_mat_mult_nt_t_s8 (const q7_t *lhs, const q7_t *rhs, const q31_t
                                             *bias, q7_t *dst, const int32_t *dst_multipliers, const
                                            int32_t *dst_shifts, const int32_t lhs_rows, const
                                            int32 t rhs rows, const int32 t rhs cols, const int32 t
                                            lhs_offset, const int32_t dst_offset, const int32_t activa-
                                            tion_min, const int32_t activation_max)
void riscv_nn_mult_q15 (q15_t *pSrcA, q15_t *pSrcB, q15_t *pDst, const uint16_t out_shift, uint32_t
                           blockSize)
void riscv_nn_mult_q7 (q7_t*pSrcA, q7_t*pSrcB, q7_t*pDst, const uint16_t out_shift, uint32_t block-
                          Size)
riscv status riscv nn vec mat mult t s8 (const q7 t*lhs, const q7 t*rhs, const q31 t*bias,
                                              q7_t *dst, const int32_t lhs_offset, const int32_t
                                              rhs_offset, const int32_t dst_offset, const int32_t
                                              dst_multiplier, const int32_t dst_shift, const int32_t
                                              rhs_cols, const int32_t rhs_rows, const int32_t acti-
                                              vation_min, const int32_t activation_max)
```

```
riscv_status riscv_nn_vec_mat_mult_t_svdf_s8 (const q7_t *lhs, const q7_t *rhs, q15_t *dst, const int32_t lhs_offset, const int32_t rhs_offset, const int32_t dst_offset, const int32_t dst_multiplier, const int32_t dst_shift, const int32_t rhs_cols, const int32_t rhs_rows, const int32_t activation_min, const int32_t activation max)
```

## group NNBasicMath

Basic Math Functions for Neural Network Computation

## **Functions**

```
void riscv_nn_accumulate_q7_to_q15 (q15_t *pDst, const q7_t *pSrc, uint32_t length) Converts the elements from a q7 vector and accumulate to a q15 vector.
```

The equation used for the conversion process is:

## **Description:**

#### **Parameters**

- \*src [in] points to the q7 input vector
- \*dst [out] points to the q15 output vector
- block\_size [in] length of the input vector

```
void riscv_nn_accumulate_q7_to_q7 (q7_t *pDst, const q7_t *pSrc, uint32_t length) Converts the elements from a q7 vector and accumulate to a q7 vector.
```

The equation used for the conversion process is:

#### **Description:**

## **Parameters**

- \*src [in] points to the q7 input vector
- \*dst [out] points to the q7 output vector
- block\_size [in] length of the input vector

```
void riscv_nn_add_q7 (const q7_t *input, q31_t *output, uint32_t block_size)
```

Non-saturating addition of elements of a q7 vector.

2^24 samples can be added without saturating the result.

## **Description:**

The equation used for the conversion process is:

### **Parameters**

- \*input [in] Pointer to the q7 input vector
- **\*output [out]** Pointer to the q31 output variable.
- block\_size [in] length of the input vector

```
q7_t *riscv_nn_depthwise_conv_nt_t_padded_s8 (const q7_t *lhs, const q7_t *rhs, const int32_t input_offset, const uint16_t num_ch, const int32_t *out_shift, const int32_t *out_shift, const int32_t *out_offset, const int32_t activation_min, const int32_t activation_max, const uint16_t row_x_col, const int32_t *const output_bias, q7_t *out)
```

Depthwise convolution of transposed rhs matrix with 4 lhs matrices. To be used in padded cases where the padding is -lhs\_offset(Range: int8). Dimensions are the same for lhs and rhs.

**Note:** If number of channels is not a multiple of 4, upto 3 elements outside the boundary will be read out for the following.

- Output shift
- · Output multiplier
- Output bias
- rhs

#### **Parameters**

- 1hs [in] Input left-hand side matrix
- **rhs** [in] Input right-hand side matrix (transposed)
- lhs\_offset [in] LHS matrix offset(input offset). Range: -127 to 128
- num\_ch [in] Number of channels in LHS/RHS
- out\_shift [in] Per channel output shift. Length of vector is equal to number of channels
- out\_mult [in] Per channel output multiplier. Length of vector is equal to number of channels
- out\_offset [in] Offset to be added to the output values. Range: -127 to 128
- activation\_min [in] Minimum value to clamp the output to. Range: int8
- activation\_max [in] Maximum value to clamp the output to. Range: int8
- row\_x\_col [in] (row\_dimension \* col\_dimension) of LHS/RHS matrix
- output\_bias [in] Per channel output bias. Length of vector is equal to number of channels
- out [in] Output pointer

**Returns** The function returns one of the two

- Updated output pointer if an implementation is available
- NULL if no implementation is available.

```
q7_t *riscv_nn_depthwise_conv_nt_t_s8 (const q7_t *lhs, const q7_t *rhs, const int32_t input_offset, const uint16_t num_ch, const int32_t *out_shift, const int32_t *out_mult, const int32_t out_offset, const int32_t activation_min, const int32_t activation_max, const uint16_t row_x_col, const int32_t *const output bias, q7 t *out)
```

Depthwise convolution of transposed rhs matrix with 4 lhs matrices. To be used in non-padded cases. Dimensions are the same for lhs and rhs.

**Note:** If number of channels is not a multiple of 4, upto 3 elements outside the boundary will be read out for the following.

- · Output shift
- Output multiplier
- · Output bias
- rhs

#### **Parameters**

- 1hs [in] Input left-hand side matrix
- **rhs [in]** Input right-hand side matrix (transposed)
- 1hs offset [in] LHS matrix offset(input offset). Range: -127 to 128
- num ch [in] Number of channels in LHS/RHS
- out\_shift [in] Per channel output shift. Length of vector is equal to number of channels.
- out\_mult [in] Per channel output multiplier. Length of vector is equal to number of channels.
- out\_offset [in] Offset to be added to the output values. Range: -127 to 128
- activation\_min [in] Minimum value to clamp the output to. Range: int8
- activation\_max [in] Maximum value to clamp the output to. Range: int8
- row\_x\_col [in] (row\_dimension \* col\_dimension) of LHS/RHS matrix
- output\_bias [in] Per channel output bias. Length of vector is equal to number of channels.
- out [in] Output pointer

**Returns** The function returns one of the two

- Updated output pointer if an implementation is available
- NULL if no implementation is available.

```
riscv_status riscv_nn_mat_mul_core_1x_s8 (int32_t row_elements, const int8_t *row_base, const int8_t *col_base, int32_t *const sum_col, int32_t *const output)
```

General Matrix-multiplication without requantization for one row & one column.

Pseudo-code \*output = 0 sum\_col = 0 for (i = 0; i < row\_elements; i++) \*output += row\_base[i] \* col\_base[i] sum\_col += col\_base[i]

#### **Parameters**

- row elements [in] number of row elements
- row\_base [in] pointer to row operand
- col\_base [in] pointer to col operand
- sum col [out] pointer to store sum of column elements
- output [out] pointer to store result of multiply-accumulate

**Returns** The function returns the multiply-accumulated result of the row by column.

```
riscv_status riscv_nn_mat_mul_core_4x_s8 (const int32_t row_elements, const int32_t off-

set, const int8_t *row_base, const int8_t

*col_base, int32_t *const sum_col, int32_t

*const output)
```

General Matrix-multiplication without requantization for four rows and one column.

 $Pseudo-code \ output[0] = 0 \ .. \ output[3] = 0 \ sum\_col = 0 \ for \ (i = 0; \ i < row\_elements; \ i++) \ output[0] \ += \ row\_base[i] \ * \ col\_base[i] \ * \ col\_base[i] \ * \ col\_base[i] \ * \ col\_base[i]$ 

#### **Parameters**

- row\_elements [in] number of row elements
- offset [in] offset between rows. Can be the same as row\_elements. For e.g, in a 1x1 conv scenario with stride as 1.
- row\_base [in] pointer to row operand
- col\_base [in] pointer to col operand
- sum\_col [out] pointer to store sum of column elements
- output [out] pointer to store result(4 int32's) of multiply-accumulate

**Returns** The function returns the multiply-accumulated result of the row by column

```
riscv_status riscv_nn_mat_mult_nt_t_s8 (const q7_t *lhs, const q7_t *rhs, const q31_t *bias, q7_t *dst, const int32_t *dst_multipliers, const int32_t *dst_shifts, const int32_t lhs_rows, const int32_t rhs_rows, const int32_t rhs_cols, const int32_t lhs_offset, const int32_t dst_offset, const int32_t activation_min, const int32_t activation_max)
```

General Matrix-multiplication function with per-channel requantization. This function assumes:

- LHS input matrix NOT transposed (nt)
- RHS input matrix transposed (t)

Note: This operation also performs the broadcast bias addition before the requantization

#### **Parameters**

- 1hs [in] Pointer to the LHS input matrix
- rhs [in] Pointer to the RHS input matrix

- bias [in] Pointer to the bias vector. The length of this vector is equal to the number of output columns (or RHS input rows)
- dst [out] Pointer to the output matrix with "m" rows and "n" columns
- dst\_multipliers [in] Pointer to the multipliers vector needed for the per-channel requantization. The length of this vector is equal to the number of output columns (or RHS input rows)
- dst\_shifts [in] Pointer to the shifts vector needed for the per-channel requantization. The length of this vector is equal to the number of output columns (or RHS input rows)
- lhs\_rows [in] Number of LHS input rows
- rhs\_rows [in] Number of RHS input rows
- rhs\_cols [in] Number of LHS/RHS input columns
- lhs\_offset [in] Offset to be applied to the LHS input value
- dst\_offset [in] Offset to be applied the output result
- activation\_min [in] Minimum value to clamp down the output. Range: int8
- activation\_max [in] Maximum value to clamp up the output. Range: int8

Returns The function returns RISCV\_MATH\_SUCCESS

```
void riscv_nn_mult_q15 (q15_t *pSrcA, q15_t *pSrcB, q15_t *pDst, const uint16_t out_shift, uint32 t blockSize)
```

Q7 vector multiplication with variable output shifts.

q7 vector multiplication with variable output shifts

## **Scaling and Overflow Behavior:**

The function uses saturating arithmetic. Results outside of the allowable Q15 range [0x8000 0x7FFF] will be saturated.

## **Parameters**

- \*pSrcA [in] pointer to the first input vector
- \*pSrcB [in] pointer to the second input vector
- \*pDst [out] pointer to the output vector
- out shift [in] amount of right-shift for output
- blockSize [in] number of samples in each vector

```
void riscv_nn_mult_q7 (q7_t *pSrcA, q7_t *pSrcB, q7_t *pDst, const uint16_t out_shift, uint32_t blockSize)
```

Q7 vector multiplication with variable output shifts.

q7 vector multiplication with variable output shifts

## **Scaling and Overflow Behavior:**

The function uses saturating arithmetic. Results outside of the allowable Q7 range [0x80 0x7F] will be saturated.

#### **Parameters**

• \*pSrcA - [in] pointer to the first input vector

- \*pSrcB [in] pointer to the second input vector
- \*pDst [out] pointer to the output vector
- out\_shift [in] amount of right-shift for output
- blockSize [in] number of samples in each vector

```
riscv_status riscv_nn_vec_mat_mult_t_s8 (const q7_t *lhs, const q7_t *rhs, const q31_t *bias, q7_t *dst, const int32_t lhs_offset, const int32_t rhs_offset, const int32_t dst_offset, const int32_t dst_multiplier, const int32_t dst_shift, const int32_t rhs_cols, const int32_t rhs_rows, const int32_t activation_min, const int32_t activation_max)
```

s8 Vector by Matrix (transposed) multiplication

#### **Parameters**

- **1hs [in]** Input left-hand side vector
- rhs [in] Input right-hand side matrix (transposed)
- bias [in] Input bias
- dst [out] Output vector
- **lhs\_offset [in]** Offset to be added to the input values of the left-hand side vector. Range: -127 to 128
- rhs\_offset [in] Not used
- dst\_offset [in] Offset to be added to the output values. Range: -127 to 128
- dst\_multiplier [in] Output multiplier
- dst\_shift [in] Output shift
- rhs\_cols [in] Number of columns in the right-hand side input matrix
- rhs\_rows [in] Number of rows in the right-hand side input matrix
- activation\_min [in] Minimum value to clamp the output to. Range: int8
- activation\_max [in] Maximum value to clamp the output to. Range: int8

Returns The function returns RISCV MATH SUCCESS

```
riscv_status riscv_nn_vec_mat_mult_t_svdf_s8 (const q7_t *lhs, const q7_t *rhs, q15_t *dst, const int32_t lhs_offset, const int32_t rhs_offset, const int32_t dst_offset, const int32_t dst_multiplier, const int32_t dst_shift, const int32_t rhs_cols, const int32_t rhs_rows, const int32_t activation_min, const int32_t activation max)
```

s8 Vector by Matrix (transposed) multiplication with s16 output

#### **Parameters**

- 1hs [in] Input left-hand side vector
- **rhs** [in] Input right-hand side matrix (transposed)
- dst [out] Output vector

- **lhs\_offset [in]** Offset to be added to the input values of the left-hand side vector. Range: -127 to 128
- rhs\_offset [in] Not used
- **scatter\_offset [in]** Address offset for dst. First output is stored at 'dst', the second at 'dst + scatter\_offset' and so on.
- dst\_multiplier [in] Output multiplier
- dst\_shift [in] Output shift
- rhs\_cols [in] Number of columns in the right-hand side input matrix
- rhs\_rows [in] Number of rows in the right-hand side input matrix
- activation\_min [in] Minimum value to clamp the output to. Range: int16
- activation\_max [in] Maximum value to clamp the output to. Range: int16

Returns The function returns RISCV\_MATH\_SUCCESS

## 4.3.4 Convolutional Neural Network Example

### group CNNExample



**Refer** riscv\_nnexamples\_cifar10.cpp

## **Description:**

Demonstrates a convolutional neural network (CNN) example with the use of convolution, ReLU activation, pooling and fully-connected functions.

#### **Model definition:**

The CNN used in this example is based on CIFAR-10 example from Caffe [1]. The neural network consists of 3 convolution layers interspersed by ReLU activation and max pooling layers, followed by a fully-connected layer at the end. The input to the network is a 32x32 pixel color image, which will be classified into one of the 10 output classes. This example model implementation needs 32.3 KB to store weights, 40 KB for activations and 3.1 KB for storing the im2col data.

## **Variables Description:**

- conv1\_wt, conv2\_wt, conv3\_wt are convolution layer weight matrices
- conv1\_bias, conv2\_bias, conv3\_bias are convolution layer bias arrays
- ip1\_wt, ip1\_bias point to fully-connected layer weights and biases
- input\_data points to the input image data
- output\_data points to the classification output
- col\_buffer is a buffer to store the im2col output

• scratch\_buffer is used to store the activation data (intermediate layer outputs)

## **NMSIS DSP Software Library Functions Used:**

- riscv\_convolve\_HWC\_q7\_RGB()
- riscv\_convolve\_HWC\_q7\_fast()
- riscv\_relu\_q7()
- riscv\_maxpool\_q7\_HWC()
- riscv\_avepool\_q7\_HWC()
- riscv\_fully\_connected\_q7\_opt()
- riscv\_fully\_connected\_q7()

## [1] https://github.com/BVLC/caffe

## 4.3.5 Gated Recurrent Unit Example

group GRUExample

Refer riscv\_nnexamples\_gru.cpp

## **Description:**

Demonstrates a gated recurrent unit (GRU) example with the use of fully-connected, Tanh/Sigmoid activation functions.

#### **Model definition:**

GRU is a type of recurrent neural network (RNN). It contains two sigmoid gates and one hidden state.



The computation can be summarized as:

#### **Variables Description:**

- update\_gate\_weights, reset\_gate\_weights, hidden\_state\_weights are weights corresponding to update gate (W\_z), reset gate (W\_r), and hidden state (W\_n).
- update\_gate\_bias, reset\_gate\_bias, hidden\_state\_bias are layer bias arrays
- test\_input1, test\_input2, test\_history are the inputs and initial history

The buffer is allocated as:

| reset | input | history | update | hidden\_state |

In this way, the concatination is automatically done since (reset, input) and (input, history) are physically concatinated in memory.

The ordering of the weight matrix should be adjusted accordingly.

## **NMSIS DSP Software Library Functions Used:**

- riscv\_fully\_connected\_mat\_q7\_vec\_q15\_opt()
- riscv\_nn\_activations\_direct\_q15()
- riscv\_mult\_q15()
- riscv\_offset\_q15()
- riscv\_sub\_q15()
- riscv\_copy\_q15()

# 4.4 Changelog

## 4.4.1 V1.0.2

This is release 1.0.2 version of NMSIS-NN library.

- Sync up to CMSIS NN library 3.0.0
- Initial support for RISC-V vector extension support

## 4.4.2 V1.0.1

This is release V1.0.1 version of NMSIS-DSP library.

- Both Nuclei RISC-V 32 and 64 bit cores are supported now.
- Libraries are optimized for RISC-V 32 and 64 bit DSP instructions.
- The DSP examples are now using Nuclei SDK as running environment.

## 4.4.3 V1.0.0

This is the first version of NMSIS-NN library.

We adapt the CMSIS-NN v1.0.0 library to use RISCV DSP instructions, all the API names now are renamed from  $arm\_xxx$  to  $riscv\_xxx$ .

## **CHANGELOG**

## 5.1 V1.0.2-RC1

This is the release candidate version V1.0.2-RC1 release of Nuclei MCU Software Interface Standard(NMSIS). The following changes has been made since V1.0.1.

## • Device Templates

- DOWNLOAD\_MODE\_xxx macros are removed from riscv\_encoding.h, it is now defined as enum in <Device.h>, and can be customized by soc vendor.
- startup code now don't rely on DOWNLOAD\_MODE macro, instead it now rely on a new macro called VECTOR\_TABLE\_REMAPPED, when VECTOR\_TABLE\_REMAPPED is defined, it means the vector table's lma!= vma, such as vector table need to be copied from flash to ilm when boot up
- Add BIT, BITS, REG, ADDR related macros in <Device.h>

## · NMSIS-Core

- Nuclei Cache CCM operation APIs are now introduced in core\_feature\_cache.h

### NMSIS-DSP/NN

- Merged the official CMSIS 5.8.0 release, CMSIS-DSP 1.9.0, CMSIS-NN 3.0.0
- RISC-V Vector extension and P-extension support for DSP/NN libraries are added

## 5.2 V1.0.1

This is the offical V1.0.1 release of Nuclei MCU Software Interface Standard(NMSIS).

The following changes has been maded since V1.0.1-RC1.

## • Device Templates

- I/D Cache enable assemble code in startup\_<Device>.S are removed now
- Cache control updates in System <Device>.c
  - \* I-Cache will be enabled if ICACHE PRESENT = 1 defined in <Device.h>
  - \* D-Cache will be enabled if \_\_DCACHE\_PRESENT = 1 defined in <Device.h>

## 5.3 V1.0.1-RC1

This is release candidate version V1.0.1-RC1 of NMSIS.

#### NMSIS-Core

- Add RISC-V DSP 64bit intrinsic functions in core\_feature\_dsp.h
- Add more CSR definitions in riscv\_encoding.h
- Update arm compatiable functions for RISC-V dsp instruction cases in core\_compatiable.h

## • NMSIS-DSP

- Optimize RISC-V 32bit DSP library implementation
- Add support for Nuclei RISC-V 64bit DSP SIMD instruction for DSP library
- Add test cases used for DSP library testing, mainly for internal usage
- Change the examples and tests to use Nuclei SDK as running environment

#### • NMSIS-NN

- Add support for Nuclei RISC-V 64bit DSP SIMD instruction for NN library
- Change the examples and tests to use Nuclei SDK as running environment

## Device Templates

- Add DDR DOWNLOAD MODE in device templates
- Modifications to startup\_<Device>.S files
  - \* \_premain\_init is added to replace \_init
  - \* \_postmain\_fini is added to replace \_fini
- If you have implemented your init or de-init functions through \_\_init or \_\_fini, please use \_\_premain\_init and \_\_postmain\_fini functions defined system\_<Device>.c now

## 5.4 V1.0.0-beta1

Main changes in release V1.0.0-beta1.

#### NMSIS-Core

- Fix SysTick\_Reload implementation
- Update ECLIC\_Register\_IRQ implementation to allow handler == NULL
- Fix MTH offset from 0x8 to 0xB, this will affect function of ECLIC\_GetMth and ECLIC\_SetMth
- Fix wrong macro check in cache function
- Add missing SOC\_INT\_MAX enum definition in Device template
- In System\_<Device>.c, ECLIC NLBits set to \_\_ECLIC\_INTCTLBITS, which means all the bits are for level, no bits for priority

## 5.5 V1.0.0-beta

Main changes in release V1.0.0-beta.

## • NMSIS-Core

- Fix error typedef of CSR\_MCAUSE\_Type
- Change CSR\_MCACHE\_CTL\_DE to future value 0x00010000
- Fix names in CSR naming, CSR\_SCRATCHCSW -> CSR\_MSCRATCHCSW, and CSR\_SCRATCHCSWL -> CSR MSCRATCHCSWL
- Add macros in riscv\_encoding.h: MSTATUS\_FS\_INITIAL, MSTATUS\_FS\_CLEAN, MSTATUS\_FS\_DIRTY

#### Documentation

- Fix an typo in core\_template\_intexc.rst
- Add cross references of Nuclei ISA Spec
- Update appendix
- Refines tables and figures

# 5.6 V1.0.0-alpha.1

API changes has been maded to system timer.

- Start from Nuclei N core version 1.4, MSTOP register is renamed to MTIMECTL to provide more features
- Changes made to NMSIS/Core/core\_feature\_timer.h
  - MSTOP register name changed to MTIMECTL due to core spec changes
  - SysTimer\_SetMstopValue renamed to SysTimer\_SetControlValue
  - SysTimer\_GetMstopValue renamed to SysTimer\_GetControlValue
  - Add SysTimer\_Start and SysTimer\_Stop to start or stop system timer counter
  - SysTick\_Reload function is introduced to reload system timer
  - Macro names started with SysTimer\_xxx are changed, please check in the code.
- Removed unused lines of code in DSP and NN library source code which has unused macros which will not work for RISCV cores.
- Fix some documentation issues, mainly typos and invalid cross references.

5.5. V1.0.0-beta 601

# 5.7 V1.0.0-alpha

This is the V1.0.0-alpha release of Nuclei MCU Software Interface Standard(NMSIS).

In this release, we have release three main components:

- NMSIS-Core: Standardized API for the Nuclei processor core and peripherals.
- NMSIS-DSP: DSP library collection optimized for the Nuclei Processors which has RISC-V SIMD instruction set.
- NMSIS-NN: Efficient neural network library developed to maximize the performance and minimize the memory footprint Nuclei Processors which has RISC-V SIMD instruction set.

We also released totally new Nuclei- $SDK^{24}$  which is an SDK implementation based on the **NMSIS-Core** for Nuclei N/NX evaluation cores running on HummingBird Evaluation Kit.

<sup>&</sup>lt;sup>24</sup> https://github.com/Nuclei-Software/nuclei-sdk

## **CHAPTER**

## SIX

## **GLOSSARY**

- API (Application Program Interface) A defined set of routines and protocols for building application software.
- **DSP** (Digital Signal Processing) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations.
- **ISR** (Interrupt Service Routine) Also known as an interrupt handler, an ISR is a callback function whose execution is triggered by a hardware interrupt (or software interrupt instructions) and is used to handle high-priority conditions that require interrupting the current code executing on the processor.
- **NN** (Neural Network) is a network or circuit of neurons, or in a modern sense, an artificial neural network, composed of artificial neurons or nodes.
- **XIP** (eXecute In Place) a method of executing programs directly from long term storage rather than copying it into RAM, saving writable memory for dynamic data and not the static program code.

## **CHAPTER**

## **SEVEN**

## **APPENDIX**

- Nuclei Tools and Documents: https://nucleisys.com/download.php
- Nuclei riscv-openocd Repo: https://github.com/riscv-mcu/riscv-openocd
- Nuclei riscv-binutils-gdb: https://github.com/riscv-mcu/riscv-binutils-gdb
- Nuclei riscv-gnu-toolchain: https://github.com/riscv-mcu/riscv-gnu-toolchain
- Nuclei riscv-newlib: https://github.com/riscv-mcu/riscv-newlib
- Nuclei riscv-gcc: https://github.com/riscv-mcu/riscv-gcc
- Nuclei SDK: https://github.com/Nuclei-Software/nuclei-sdk
- NMSIS: https://doc.nucleisys.com/nmsis/
- Nuclei Bumblebee Core Document: https://github.com/nucleisys/Bumblebee\_Core\_Doc
- Nuclei RISC-V IP Products: https://www.nucleisys.com/product.php
- RISC-V MCU Community Website: https://www.riscv-mcu.com/
- Nuclei Spec: https://doc.nucleisys.com/nuclei\_spec

## **CHAPTER**

## **EIGHT**

## **INDICES AND TABLES**

- genindex
- search

## **INDEX**

| Symbols                               | RV_CSR_READ_SET (C macro), 64                |
|---------------------------------------|----------------------------------------------|
| _FLD2VAL ( <i>C macro</i> ), 294, 295 | RV_CSR_SET ( <i>C macro</i> ), 64            |
| _VAL2FLD ( <i>C macro</i> ), 294      | RV_CSR_SWAP ( <i>C macro</i> ), 63           |
| ALIGNED (C macro), 62                 | RV_CSR_WRITE ( <i>C macro</i> ), 63          |
| ASM ( <i>C macro</i> ), 61            | RV_FLD ( <i>C macro</i> ), 321               |
| CLZ ( <i>C macro</i> ), 347           | RV_FLOAD ( <i>C macro</i> ), 321             |
| COMPILER_BARRIER (C macro), 62        | RV_FLW ( <i>C macro</i> ), 320               |
| CPU_RELAX ( <i>C macro</i> ), 91      | RV_FSD ( <i>C macro</i> ), 321               |
| DMB ( <i>C macro</i> ), 346           | RV_FSTORE ( <i>C macro</i> ), 322            |
| DSB ( <i>C macro</i> ), 346           | RV_FSW ( <i>C macro</i> ), 320               |
| FENCE ( <i>C macro</i> ), 91          | RV_KSLLI16 ( <i>C macro</i> ), 118           |
| I (C macro), 294                      | RV_KSLLI8 ( <i>C macro</i> ), 129            |
| IM ( <i>C macro</i> ), 294            | RV_KSLLIW ( <i>C macro</i> ), 185            |
| INLINE (C macro), 61                  | RV_SCLIP16 ( <i>C macro</i> ), 161           |
| INTERRUPT (C macro), 62               | RV_SCLIP32 ( <i>C macro</i> ), 246           |
| IO (C macro), 294                     | RV_SCLIP8 ( <i>C macro</i> ), 167            |
| IOM ( <i>C macro</i> ), 294           | RV_SLLI16 ( <i>C macro</i> ), 118, 119       |
| ISB ( <i>C macro</i> ), 346           | RV_SLLI8 ( <i>C macro</i> ), 129, 130        |
| LDRBT ( <i>C macro</i> ), 346         | RV_SRAI16 ( <i>C macro</i> ), 118, 119       |
| LDRHT ( <i>C macro</i> ), 346         | RV_SRAI16_U ( <i>C macro</i> ), 118, 120     |
| LDRT ( <i>C macro</i> ), 346          | RV_SRAI8 ( <i>C macro</i> ), 129, 131        |
| NMSIS_VERSION (C macro), 61           | RV_SRAI8_U ( <i>C macro</i> ), 129, 131      |
| NMSIS_VERSION_MAJOR (C macro), 61     | RV_SRLI16 ( <i>C macro</i> ), 118, 121       |
| NMSIS_VERSION_MINOR (C macro), 61     | RV_SRLI16_U ( <i>C macro</i> ), 118, 122     |
| NMSIS_VERSION_PATCH (C macro), 61     | RV_SRLI8 ( <i>C macro</i> ), 129, 132        |
| NO_RETURN (C macro), 61               | RV_SRLI8_U ( <i>C macro</i> ), 129, 133      |
| NUCLEI_NX_REV (C macro), 60           | RV_UCLIP16 ( <i>C macro</i> ), 161           |
| NUCLEI_N_REV (C macro), 60            | RV_UCLIP32 ( <i>C macro</i> ), 246, 247      |
| O (C macro), 294                      | RV_UCLIP8 ( <i>C macro</i> ), 167, 168       |
| OM ( <i>C macro</i> ), 294            | RWMB ( <i>C macro</i> ), 91                  |
| PACKED ( <i>C macro</i> ), 62         | SMP_RMB ( <i>C macro</i> ), 91               |
| PACKED_STRUCT (C macro), 62           | SMP_RWMB ( <i>C macro</i> ), 91              |
| $\_$ PACKED_UNION ( $C$ macro), 62    | SMP_WMB ( <i>C macro</i> ), 91               |
| RARELY ( <i>C macro</i> ), 62         | SSAT ( <i>C macro</i> ), 347                 |
| RBIT ( <i>C macro</i> ), 347          | STATIC_FORCEINLINE (C macro), 61             |
| RESTRICT (C macro), 62                | STATIC_INLINE (C macro), 61                  |
| RISCV_FLEN ( <i>C macro</i> ), 320    | STRBT ( <i>C macro</i> ), 346                |
| RISCV_XLEN (C macro), 81              | STRHT ( <i>C macro</i> ), 347                |
| RMB ( <i>C macro</i> ), 91            | STRT ( <i>C macro</i> ), 347                 |
| RV_CSR_CLEAR (C macro), 64            | UNALIGNED_UINT16_READ (C macro), 62          |
| RV_CSR_READ ( <i>C macro</i> ), 63    | unaligned_uint16_write (C macro), 62         |
| RV_CSR_READ_CLEAR (C macro), 64       | unaligned_uint32_read ( <i>C macro</i> ), 62 |

| UNALIGNED_UINT32_WRITE ( <i>C macro</i> ), 62<br>USAT ( <i>C macro</i> ), 347 | CCM_CMD_Type::CCM_IC_INVAL (C++ enumerator), 325, 326                       |
|-------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| USED ( <i>C macro</i> ), 61                                                   | CCM_CMD_Type::CCM_IC_INVAL_ALL (C++ enu-                                    |
| USUALLY (C macro), 62                                                         | merator), 325, 327                                                          |
| $\_$ _VECTOR_SIZE ( $C$ macro), 62                                            | $CCM\_CMD\_Type::CCM\_IC\_LOCK(C++enumerator),$                             |
| WEAK ( <i>C macro</i> ), 62                                                   | 325, 327                                                                    |
| WMB ( <i>C macro</i> ), 91                                                    | $CCM\_CMD\_Type::CCM\_IC\_UNLOCK$ ( $C++$ enumera-                          |
| disable_FPU ( <i>C macro</i> ), 320                                           | tor), 325, 327                                                              |
| enable_FPU( <i>C macro</i> ), 320                                             | $CCM\_COMMAND\_COMMAND$ ( $C$ macro), 78                                    |
| get_FCSR ( <i>C macro</i> ), 320                                              | CCM_DATA_DATA (C macro), 78                                                 |
| get_FFLAGS ( <i>C macro</i> ), 320                                            | $CCM_OP_FINFO_Type(C++enum), 325, 326$                                      |
| get_FRM( <i>C macro</i> ), 320                                                | $CCM_OP_FINFO_Type::CCM_OP_ECC_ERR$ ( $C++$                                 |
| has_builtin( <i>C macro</i> ), 61                                             | enumerator), 325, 326                                                       |
| set_FCSR ( <i>C macro</i> ), 320                                              | CCM_OP_FINFO_Type::CCM_OP_EXCEED_ERR                                        |
| set_FFLAGS( <i>C macro</i> ), 320                                             | $(C++\ enumerator), 325, 326$                                               |
| set_FRM ( <i>C macro</i> ), 320                                               | CCM_OP_FINFO_Type::CCM_OP_PERM_CHECK_ERF                                    |
| $_{\text{fini}}$ (C++ function), 343                                          | (C++ enumerator), 325, 326                                                  |
| _init (C++ function), 343                                                     | CCM_OP_FINFO_Type::CCM_OP_REFILL_BUS_ERF                                    |
| $_{\rm postmain\_fini}$ (C++ function), 343                                   | (C++ enumerator), 325, 326                                                  |
| $_{\text{premain\_init}}(C++function), 343$                                   | CCM_OP_FINFO_Type::CCM_OP_SUCCESS (C++                                      |
| Α                                                                             | enumerator), 325, 326                                                       |
|                                                                               | CCM_SUEN_SUEN (C macro), 78                                                 |
| API, 603                                                                      | CCM_SUEN_SUEN_Msk (C macro), 325, 326                                       |
| ^                                                                             | CCM_SUEN_SUEN_Pos (C macro), 325, 326                                       |
| C                                                                             | CLIC_CLICCFG_NLBIT_Msk (C macro), 86                                        |
| CAUSE_BREAKPOINT (C macro), 80                                                | CLIC_CLICCFG_NLBIT_Pos (C macro), 86                                        |
| CAUSE_FAULT_FETCH (C macro), 80                                               | CLIC_CLICINFO_CTLBIT_Msk (C macro), 86                                      |
| CAUSE_FAULT_LOAD (C macro), 80                                                | CLIC_CLICINFO_CTLBIT_Pos (C macro), 86                                      |
| CAUSE_FAULT_STORE (C macro), 80                                               | CLIC_CLICINFO_NUM_Msk (C macro), 87                                         |
| CAUSE_HYPERVISOR_ECALL (C macro), 80                                          | CLIC_CLICINFO_NUM_Pos (C macro), 86                                         |
| CAUSE_ILLEGAL_INSTRUCTION (C macro), 80                                       | CLIC_CLICINFO_VER_Msk (C macro), 86                                         |
| CAUSE_MACHINE_ECALL (C macro), 80                                             | CLIC_CLICINFO_VER_Pos (C macro), 86                                         |
| CAUSE_MISALIGNED_FETCH (C macro), 80                                          | CLIC_CTRL_Type ( $C++$ struct), 88                                          |
| CAUSE_MISALIGNED_LOAD (C macro), 80                                           | CLIC_INTATTR_SHV_Msk (C macro), 87                                          |
| CAUSE_MISALIGNED_STORE (C macro), 80                                          | CLIC_INTATTR_SHV_Pos (C macro), 87                                          |
| CAUSE_SUPERVISOR_ECALL (C macro), 80                                          | CLIC_INTATTR_TRIG_Msk (C macro), 87                                         |
| CAUSE_USER_ECALL (C macro), 80                                                | CLIC_INTATTR_TRIG_Pos (C macro), 87                                         |
| $CCM\_CMD\_Type(C++enum), 325, 326$                                           | CLIC_INTIE_IE_Msk (C macro), 87                                             |
| <pre>CCM_CMD_Type::CCM_DC_INVAL (C++ enumera-</pre>                           | CLIC_INTIE_IE_Pos (C macro), 87                                             |
| tor), 325, 326                                                                | CLIC_INTIP_IP_Msk (C macro), 87                                             |
| $\texttt{CCM\_CMD\_Type::CCM\_DC\_INVAL\_ALL} \ (\textit{C++ enu-}$           | CLIC_INTIP_IP_Pos (C macro), 87                                             |
| merator), 325, 326                                                            | CLIC_Type $(C++struct)$ , 88                                                |
| $CCM\_CMD\_Type::CCM\_DC\_LOCK(C++enumerator),$                               | CLICCFG_Type (C++ union), 88                                                |
| 325, 326                                                                      | CLICCFG_Type::_reserved0 (C++ member), 88                                   |
| $CCM\_CMD\_Type::CCM\_DC\_UNLOCK$ ( $C++$ enumera-                            | CLICCEG_Type::_reserved1 (C++ member), 88                                   |
| tor), 325, 326                                                                | CLICCFG_Type::_reserved2 (C++ member), 88                                   |
| $CCM\_CMD\_Type::CCM\_DC\_WB$ (C++ enumerator),                               | CLICCFG_Type::b(C++ member), 88                                             |
| 325, 326                                                                      | CLICCEG_Type::nlbits (C++ member), 88                                       |
| CCM_CMD_Type::CCM_DC_WB_ALL (C++ enumera-                                     | CLICCFG_Type::w(C++ member), 88                                             |
| tor), 325, 326                                                                | CLICINFO_Type (C++ union), 88                                               |
| CCM_CMD_Type::CCM_DC_WBINVAL(C++ enumer-                                      | CLICINFO_Type::_reserved0 (C++ member), 88 CLICINFO_Type::b(C++ member), 88 |
| ator), 325, 326                                                               | CLICINFO_Type::intctlbits (C++ member), 88                                  |
| $CCM\_CMD\_Type::CCM\_DC\_WBINVAL\_ALL$ (C++                                  | CLICINFO_Type::numint(C++ member), 88                                       |
| enumerator), 325, 326                                                         | OBTOTIVE O_TYPO HUMITHE (CTT Member), 00                                    |

| CSR_HPMCOUNTER22 (C macro), 66                            |
|-----------------------------------------------------------|
| CSR_HPMCOUNTER22H (C macro), 70                           |
| CSR_HPMCOUNTER23 (C macro), 66                            |
| CSR_HPMCOUNTER23H (C macro), 70                           |
| CSR_HPMCOUNTER24 (C macro), 66                            |
| CSR_HPMCOUNTER24H (C macro), 71                           |
| CSR_HPMCOUNTER25 (C macro), 66                            |
| CSR_HPMCOUNTER25H (C macro), 71                           |
| CSR_HPMCOUNTER26 (C macro), 66                            |
| CSR_HPMCOUNTER26H (C macro), 71                           |
| CSR_HPMCOUNTER27 (C macro), 67                            |
| CSR_HPMCOUNTER27H (C macro), 71                           |
| CSR_HPMCOUNTER28 (C macro), 67                            |
| CSR_HPMCOUNTER28H (C macro), 71                           |
| CSR_HPMCOUNTER29 (C macro), 67                            |
| CSR_HPMCOUNTER29H (C macro), 71                           |
| CSR_HPMCOUNTER3 (C macro), 66                             |
| CSR_HPMCOUNTER30 (C macro), 67                            |
| CSR_HPMCOUNTER30H (C macro), 71                           |
| CSR_HPMCOUNTER31 (C macro), 67                            |
| CSR_HPMCOUNTER31H (C macro), 71                           |
| CSR_HPMCOUNTER3H (C macro), 70                            |
| CSR_HPMCOUNTER4 (C macro), 66                             |
| CSR_HPMCOUNTER4H (C macro), 70                            |
| CSR_HPMCOUNTER5 (C macro), 76                             |
| CSR_HPMCOUNTERSH (C macro), 70                            |
|                                                           |
| CSR_HPMCOUNTER6 (C macro), 66                             |
| CSR_HPMCOUNTER6H (C macro), 70                            |
| CSR_HPMCOUNTER7 (C macro), 66                             |
| CSR_HPMCOUNTER7H (C macro), 70                            |
| CSR_HPMCOUNTER8 (C macro), 66                             |
| CSR_HPMCOUNTER8H (C macro), 70                            |
| CSR_HPMCOUNTER9 (C macro), 66                             |
| CSR_HPMCOUNTER9H (C macro), 70                            |
| CSR_INSTRET (C macro), 66                                 |
| CSR_INSTRETH (C macro), 70                                |
| CSR_JALMNXTI (C macro), 72                                |
| CSR_MARCHID (C macro), 70                                 |
| CSR_MBADADDR ( <i>C macro</i> ), 67                       |
| CSR_MCACHE_CTL (C macro), 72                              |
| CSR_MCACHE_CTL_DE (C macro), 74                           |
| CSR_MCACHE_CTL_IE (C macro), 74                           |
| CSR_MCAUSE (C macro), 67                                  |
| $CSR\_MCAUSE\_Type(C++union), 83$                         |
| $CSR\_MCAUSE\_Type::\_reserved0 (C++ member),$            |
| 84                                                        |
| $CSR\_MCAUSE\_Type::\_reserved1 (C++ member),$            |
| 84                                                        |
| CSR_MCAUSE_Type::b( $C++$ member), 84                     |
| CSR_MCAUSE_Type::d( $C++$ member), 84                     |
| CSR_MCAUSE_Type::exccode( <i>C</i> ++ <i>member</i> ), 84 |
| $CSR\_MCAUSE\_Type::interrupt (C++ member),$              |
| 84                                                        |
| $CSR\_MCAUSE\_Type::minhv(C++ member), 84$                |
|                                                           |

| CSR_MCAUSE_Type::mpie ( $C$ ++ $member$ ), 84                                      | CSR_MHPMCOUNTER22H ( $C$ macro), 71                                              |
|------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| CSR_MCAUSE_Type::mpil ( $C++$ member), 84                                          | CSR_MHPMCOUNTER23 (C macro), 69                                                  |
| CSR_MCAUSE_Type::mpp ( $C++$ member), 84                                           | CSR_MHPMCOUNTER23H (C macro), 71                                                 |
| CSR_MCFG_INFO(C macro), 73                                                         | CSR_MHPMCOUNTER24 (C macro), 69                                                  |
| CSR_MCLICBASE ( <i>C macro</i> ), 72                                               | CSR_MHPMCOUNTER24H (C macro), 71                                                 |
| CSR_MCOUNTEREN ( <i>C macro</i> ), 67                                              | CSR_MHPMCOUNTER25 (C macro), 69                                                  |
| CSR_MCOUNTINHIBIT (C macro), 72                                                    | CSR_MHPMCOUNTER25H (C macro), 71                                                 |
| CSR_MCOUNTINHIBIT_Type ( $C++$ union), 84                                          | CSR_MHPMCOUNTER26 (C macro), 69                                                  |
| CSR_MCOUNTINHIBIT_Type::_reserved0( $C++$                                          | CSR_MHPMCOUNTER26H (C macro), 71                                                 |
| member), 84                                                                        | CSR_MHPMCOUNTER27 (C macro), 69                                                  |
| CSR_MCOUNTINHIBIT_Type::_reserved1( $C++$                                          | CSR_MHPMCOUNTER27H (C macro), 71                                                 |
| member), 84                                                                        | CSR_MHPMCOUNTER28 (C macro), 69                                                  |
| CSR_MCOUNTINHIBIT_Type::b ( $C++$ member), 84                                      | CSR_MHPMCOUNTER28H (C macro), 71                                                 |
| CSR_MCOUNTINHIBIT_Type::cy $(C++ member)$ ,                                        | CSR_MHPMCOUNTER29 (C macro), 69                                                  |
| 84                                                                                 | CSR_MHPMCOUNTER29H (C macro), 72                                                 |
| CSR_MCOUNTINHIBIT_Type::d( $C++$ member), 84                                       | CSR_MHPMCOUNTER3 (C macro), 68                                                   |
| CSR_MCOUNTINHIBIT_Type::ir (C++ member),                                           | CSR_MHPMCOUNTER30 (C macro), 69                                                  |
| 84                                                                                 | CSR_MHPMCOUNTER30H (C macro), 72                                                 |
| CSR_MCYCLE (C macro), 68                                                           | CSR_MHPMCOUNTER31 (C macro), 69                                                  |
| CSR_MCYCLEH (C macro), 71                                                          | CSR_MHPMCOUNTER31H (C macro), 72                                                 |
| CSR_MDCAUSE (C macro), 72                                                          | CSR_MHPMCOUNTER3H (C macro), 71                                                  |
| CSR_MDCFG_INFO (C macro), 73                                                       | CSR_MHPMCOUNTER4 (C macro), 68                                                   |
| CSR_MDLM_CTL ( <i>C macro</i> ), 72                                                | CSR_MHPMCOUNTER4H (C macro), 71                                                  |
| CSR_MECC_CODE (C macro), 72                                                        | CSR_MHPMCOUNTER5 (C macro), 68                                                   |
| CSR_MECC_LOCK (C macro), 72                                                        | CSR_MHPMCOUNTER5H (C macro), 71                                                  |
| CSR_MEDELEG (C macro), 72                                                          | CSR_MHPMCOUNTER6 (C macro), 68                                                   |
| CSR_MEPC ( $C$ macro), 67                                                          | CSR_MHPMCOUNTER6H (C macro), 71                                                  |
| CSR_MFIOCFG_INFO (C macro), 72                                                     | CSR_MHPMCOUNTER7 (C macro), 68                                                   |
| CSR_MHARTID (C macro), 70                                                          | CSR_MHPMCOUNTER7H (C macro), 71                                                  |
| CSR_MHPMCOUNTER10 (C macro), 68                                                    | CSR_MHPMCOUNTER8 (C macro), 68                                                   |
| CSR_MHPMCOUNTER10H (C macro), 71                                                   | CSR_MHPMCOUNTER8H (C macro), 71                                                  |
| CSR_MHPMCOUNTER11 (C macro), 68                                                    | CSR_MHPMCOUNTER9 (C macro), 68                                                   |
| CSR_MHPMCOUNTER11H (C macro), 71                                                   | CSR_MHPMCOUNTER9H (C macro), 71                                                  |
| CSR_MHPMCOUNTER12 (C macro), 68                                                    | CSR_MHPMEVENT10 (C macro), 69                                                    |
| CSR_MHPMCOUNTER12 (C macro), 06 CSR_MHPMCOUNTER12H (C macro), 71                   | CSR_MHPMEVENT11 (C macro), 69                                                    |
| CSR_MHPMCOUNTER13 (C macro), 71                                                    | CSR_MHPMEVENT12 (C macro), 69                                                    |
| CSR_MHPMCOUNTER13 (C macro), 06 CSR_MHPMCOUNTER13H (C macro), 71                   | CSR_MHPMEVENT13 (C macro), 69                                                    |
|                                                                                    |                                                                                  |
| CSR_MHPMCOUNTER14 ( <i>C macro</i> ), 68 CSR_MHPMCOUNTER14H ( <i>C macro</i> ), 71 | CSR_MHPMEVENT14 ( <i>C macro</i> ), 69<br>CSR_MHPMEVENT15 ( <i>C macro</i> ), 69 |
| CSR_MHPMCOUNTER15 (C macro), 68                                                    | CSR_MHPMEVENT16 (C macro), 69                                                    |
|                                                                                    |                                                                                  |
| CSR_MHPMCOUNTER15H (C macro), 71                                                   | CSR_MHPMEVENT17 (C macro), 69                                                    |
| CSR_MHPMCOUNTER16 (C macro), 68                                                    | CSR_MHPMEVENT18 (C macro), 69                                                    |
| CSR_MHPMCOUNTER16H (C macro), 71                                                   | CSR_MHPMEVENT19 (C macro), 69                                                    |
| CSR_MHPMCOUNTER17 ( <i>C macro</i> ), 68                                           | CSR_MHPMEVENT20 (C macro), 69                                                    |
| CSR_MHPMCOUNTER17H (C macro), 71                                                   | CSR_MHPMEVENT21 (C macro), 69                                                    |
| CSR_MHPMCOUNTER18 (C macro), 68                                                    | CSR_MHPMEVENT22 (C macro), 69                                                    |
| CSR_MHPMCOUNTER18H (C macro), 71                                                   | CSR_MHPMEVENT23 (C macro), 69                                                    |
| CSR_MHPMCOUNTER19 (C macro), 69                                                    | CSR_MHPMEVENT24 (C macro), 70                                                    |
| CSR_MHPMCOUNTER19H (C macro), 71                                                   | CSR_MHPMEVENT25 (C macro), 70                                                    |
| CSR_MHPMCOUNTER20 (C macro), 69                                                    | CSR_MHPMEVENT26 (C macro), 70                                                    |
| CSR_MHPMCOUNTER20H (C macro), 71                                                   | CSR_MHPMEVENT27 (C macro), 70                                                    |
| CSR_MHPMCOUNTER21 (C macro), 69                                                    | CSR_MHPMEVENT28 (C macro), 70                                                    |
| CSR_MHPMCOUNTER21H (C macro), 71                                                   | CSR_MHPMEVENT29 (C macro), 70                                                    |
| CSR_MHPMCOUNTER22 (C macro), 69                                                    | CSR_MHPMEVENT3 (C macro), 69                                                     |
|                                                                                    |                                                                                  |

| CCD_MUDMEUTENESS (C                                                              | COD MATOCOURT Burney 12 (C)                                                   |
|----------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
| CSR_MHPMEVENT30 ( <i>C macro</i> ), 70<br>CSR_MHPMEVENT31 ( <i>C macro</i> ), 70 | CSR_MMISCCTRL_Type::_reserved3(C++ mem-<br>ber), 85                           |
| CSR_MHPMEVENT31 (C macro), 70 CSR_MHPMEVENT4 (C macro), 69                       |                                                                               |
| CSR_MHPMEVENT5 (C macro), 69                                                     | CSR_MMISCCTRL_Type::b(C++ member), 85 CSR_MMISCCTRL_Type::bpu(C++ member), 85 |
| CSR_MHPMEVENT6 (C macro), 69                                                     | CSR_MMISCCTRL_Type::d( $C++$ member), 85                                      |
| CSR_MHPMEVENT7 (C macro), 69                                                     | CSR_MMISCCTRL_Type::misalign (C++ mem-                                        |
| CSR_MHPMEVENT8 (C macro), 69                                                     | ==                                                                            |
| CSR_MHPMEVENT9 (C macro), 69                                                     | ber), 85                                                                      |
| CSR_MICFG_INFO (C macro), 73                                                     | <pre>CSR_MMISCCTRL_Type::nmi_cause (C++ mem-<br/>ber), 85</pre>               |
| CSR_MIDELEG (C macro), 67                                                        | CSR_MNVEC (C macro), 72                                                       |
|                                                                                  | CSR_MNXTI (C macro), 72                                                       |
| CSR_MIE ( <i>C macro</i> ), 67 CSR_MILM_CTL ( <i>C macro</i> ), 72               | CSR_MPPICFG_INFO (C macro), 72                                                |
| CSR_MIMPID (C macro), 72                                                         | CSR_MSAVECAUSE1 (C macro), 72                                                 |
| CSR_MINSTRET (C macro), 68                                                       | CSR_MSAVECAUSE1 (C macro), 72 CSR_MSAVECAUSE2 (C macro), 72                   |
| CSR_MINSTRETH (C macro), 08                                                      | CSR_MSAVECAUSE2 (C macro), 72 CSR_MSAVEDCAUSE1 (C macro), 72                  |
|                                                                                  |                                                                               |
| CSR_MINTSTATUS (C macro), 72                                                     | CSR_MSAVEDCAUSE2 (C macro), 72                                                |
| CSR_MIP (C macro), 67                                                            | CSR_MSAVEEPC1 (C macro), 72                                                   |
| CSR_MISA (C macro), 67                                                           | CSR_MSAVEEPC2 (C macro), 72                                                   |
| CSR_MISA_Type ( $C$ ++ $union$ ), 81                                             | CSR_MSAVESTATUS (C macro), 72                                                 |
| CSR_MISA_Type::_reserved1 (C++ member), 81                                       | CSR_MSAVESTATUS_Type (C++ union), 85                                          |
| CSR_MISA_Type::_reserved2 (C++ member), 82                                       | CSR_MSAVESTATUS_Type::_reserved0 (C++                                         |
| CSR_MISA_Type::_reserved4 (C++ member), 82                                       | member), 86                                                                   |
| CSR_MISA_Type::_reserved5 (C++ member), 82                                       | CSR_MSAVESTATUS_Type::_reserved1 ( $C++$                                      |
| CSR_MISA_Type::_resreved3 (C++ member), 82                                       | member), 86                                                                   |
| CSR_MISA_Type::a(C++ member), 81                                                 | CSR_MSAVESTATUS_Type::_reserved2 ( $C++$                                      |
| CSR_MISA_Type::b(C++ member), 81, 82                                             | member), 86                                                                   |
| CSR_MISA_Type::c(C++ member), 81                                                 | CSR_MSAVESTATUS_Type::b(C++ member), 86                                       |
| CSR_MISA_Type::d(C++ member), 81                                                 | <pre>CSR_MSAVESTATUS_Type::mpie1 (C++ member),</pre>                          |
| CSR_MISA_Type::e( $C++$ member), 81                                              | 86                                                                            |
| CSR_MISA_Type::f( $C++$ member), 81                                              | CSR_MSAVESTATUS_Type::mpie2 (C++ member), 86                                  |
| CSR_MISA_Type::g( $C++$ member), 81                                              |                                                                               |
| CSR_MISA_Type::h(C++ member), 81                                                 | CSR_MSAVESTATUS_Type::mpp1 ( $C++$ member),                                   |
| CSR_MISA_Type::i(C++ member), 81                                                 | 86                                                                            |
| CSR_MISA_Type:: $j(C++member)$ , 81                                              | CSR_MSAVESTATUS_Type::mpp2 ( $C++$ member),                                   |
| CSR_MISA_Type::1 ( $C++$ member), 81<br>CSR_MISA_Type::m ( $C++$ member), 81     | 86 CCD MCAVECTATUS Types intima (CLI member)                                  |
|                                                                                  | CSR_MSAVESTATUS_Type::ptyp1 (C++ member), 86                                  |
| CSR_MISA_Type::mxl (C++ member), 82                                              | CSR_MSAVESTATUS_Type::ptyp2 (C++ member),                                     |
| CSR_MISA_Type::n ( $C$ ++ member), 82<br>CSR_MISA_Type::p ( $C$ ++ member), 82   | 86                                                                            |
| CSR_MISA_Type::p(C++ member), 82 CSR_MISA_Type::q(C++ member), 82                |                                                                               |
|                                                                                  | CSR_MSAVESTATUS_Type::w (C++ member), 86 CSR_MSCOUNTEREN (C macro), 69        |
| CSR_MISA_Type::s(C++ member), 82                                                 |                                                                               |
| CSR_MISA_Type::t(C++ member), 82                                                 | CSR_MSCRATCH ( <i>C macro</i> ), 67 CSR_MSCRATCHCSW ( <i>C macro</i> ), 72    |
| CSR_MISA_Type::u(C++ member), 82                                                 | _ ` '/                                                                        |
| CSR_MISA_Type::v(C++ member), 82                                                 | CSR_MSCRATCHCSWL (C macro), 72                                                |
| CSR_MISA_Type:: $x(C++member)$ , 82                                              | CSR_MSTATUS (C macro), 67                                                     |
| CSR_MMISC_CTL (C macro), 72                                                      | CSR_MSTATUS_Type (C++ union), 82                                              |
| CSR_MMISCCTRL_Type (C++ union), 85                                               | CSR_MSTATUS_Type::_reserved0 (C++ mem-                                        |
| CSR_MMISCCTRL_Type::_reserved0(C++ mem-                                          | ber), 82                                                                      |
| ber), 85                                                                         | CSR_MSTATUS_Type::_reserved1 (C++ mem-                                        |
| CSR_MMISCCTRL_Type::_reserved1 (C++ mem-                                         | ber), 82                                                                      |
| ber), 85                                                                         | CSR_MSTATUS_Type::_reserved2 (C++ mem-                                        |
| CSR_MMISCCTRL_Type::_reserved2(C++ mem-                                          | ber), 82                                                                      |
| ber), 85                                                                         | CSR_MSTATUS_Type::_reserved3 (C++ mem-                                        |
|                                                                                  | ber), 83                                                                      |

| CSR_MSTATUS_Type::_reserved4 ( $C++$ mem-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | CSR_PMPADDR9 ( $C$ macro), 68                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ber), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | CSR_PMPCFG0 (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| CSR_MSTATUS_Type::_reserved6 ( $C++$ mem-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | CSR_PMPCFG1 (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| ber), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | CSR_PMPCFG2 (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| CSR_MSTATUS_Type::b( $C++$ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | CSR_PMPCFG3 (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| CSR_MSTATUS_Type::d( $C++$ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | CSR_PUSHMCAUSE (C macro), 72                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| CSR_MSTATUS_Type::fs ( $C++$ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | CSR_PUSHMEPC (C macro), 72                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| CSR_MSTATUS_Type::mie(C++ member), 82                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | CSR_PUSHMSUBM (C macro), 72                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| CSR_MSTATUS_Type::mpie(C++ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | CSR_SBADADDR (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| CSR_MSTATUS_Type::mpp ( $C++$ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | CSR_SCAUSE (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| CSR_MSTATUS_Type::mprv(C++ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | CSR_SEPC (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| CSR_MSTATUS_Type::sd(C++ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | CSR_SIE (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| CSR_MSTATUS_Type::sie (C++ member), 82                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | CSR_SIP (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| CSR_MSTATUS_Type::spie(C++ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | CSR_SLEEPVALUE (C macro), 72                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| CSR_MSTATUS_Type::sum (C++ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | CSR_SPTBR (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| CSR_MSTATUS_Type::xs(C++ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | CSR_SSCRATCH (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| CSR_MSUBM ( <i>C macro</i> ), 72                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | CSR_SSTATUS (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| CSR_MSUBM_Type ( $C++$ union), 84                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | CSR_STVEC (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| CSR_MSUBM_Type::_reserved0 (C++ member),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | CSR_TDATA1 ( $C$ macro), 68                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| 85                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | CSR_TDATA2 (C macro), 68                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| CSR_MSUBM_Type::_reserved1 (C++ member),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | CSR_TDATA3 ( <i>C macro</i> ), 68                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| 85                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | CSR_TIME ( <i>C macro</i> ), 66                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| CSR_MSUBM_Type::b( $C++$ member), 85                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | CSR_TIME ( $C$ macro), $00$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | CSR_TSELECT (C macro), 68                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| CSR_MSUBM_Type::d(C++ member), 85                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| CSR_MSUBM_Type::ptyp(C++ member), 85                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | CSR_TXEVT (C macro), 73                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| CSR_MSUBM_Type::typ(C++ member), 85                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | CSR_UCODE ( <i>C macro</i> ), 72                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| CSR_MTLB_CTL (C macro), 72                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | CSR_USTATUS (C macro), 66                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| CSR_MTLBCFG_INFO (C macro), 73                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | CSR_WFE (C macro), 73                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| CSR_MTVAL ( <i>C macro</i> ), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| CSR_MTVAL ( <i>C macro</i> ), 67 CSR_MTVEC ( <i>C macro</i> ), 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | D                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| CSR_MTVAL ( <i>C macro</i> ), 67 CSR_MTVEC ( <i>C macro</i> ), 67 CSR_MTVEC_Type ( <i>C</i> ++ union), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | D  DCAUSE_FAULT_FETCH_INST (C macro), 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| CSR_MTVAL ( <i>C macro</i> ), 67 CSR_MTVEC ( <i>C macro</i> ), 67 CSR_MTVEC_Type ( <i>C</i> ++ union), 83 CSR_MTVEC_Type::addr( <i>C</i> ++ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | D  DCAUSE_FAULT_FETCH_INST ( <i>C macro</i> ), 80  DCAUSE_FAULT_FETCH_PMP ( <i>C macro</i> ), 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| CSR_MTVAL ( <i>C macro</i> ), 67  CSR_MTVEC ( <i>C macro</i> ), 67  CSR_MTVEC_Type ( <i>C++ union</i> ), 83  CSR_MTVEC_Type::addr( <i>C++ member</i> ), 83  CSR_MTVEC_Type::b( <i>C++ member</i> ), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | D  DCAUSE_FAULT_FETCH_INST (C macro), 80  DCAUSE_FAULT_FETCH_PMP (C macro), 80  DCAUSE_FAULT_LOAD_INST (C macro), 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| CSR_MTVAL ( <i>C macro</i> ), 67  CSR_MTVEC ( <i>C macro</i> ), 67  CSR_MTVEC_Type ( <i>C</i> ++ union), 83  CSR_MTVEC_Type::addr ( <i>C</i> ++ member), 83  CSR_MTVEC_Type::b( <i>C</i> ++ member), 83  CSR_MTVEC_Type::d( <i>C</i> ++ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | D  DCAUSE_FAULT_FETCH_INST ( <i>C macro</i> ), 80  DCAUSE_FAULT_FETCH_PMP ( <i>C macro</i> ), 80  DCAUSE_FAULT_LOAD_INST ( <i>C macro</i> ), 80  DCAUSE_FAULT_LOAD_NICE ( <i>C macro</i> ), 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| CSR_MTVAL ( <i>C macro</i> ), 67  CSR_MTVEC ( <i>C macro</i> ), 67  CSR_MTVEC_Type ( <i>C</i> ++ union), 83  CSR_MTVEC_Type::addr ( <i>C</i> ++ member), 83  CSR_MTVEC_Type::b ( <i>C</i> ++ member), 83  CSR_MTVEC_Type::d ( <i>C</i> ++ member), 83  CSR_MTVEC_Type::mode ( <i>C</i> ++ member), 83                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | D  DCAUSE_FAULT_FETCH_INST ( <i>C macro</i> ), 80  DCAUSE_FAULT_FETCH_PMP ( <i>C macro</i> ), 80  DCAUSE_FAULT_LOAD_INST ( <i>C macro</i> ), 80  DCAUSE_FAULT_LOAD_NICE ( <i>C macro</i> ), 80  DCAUSE_FAULT_LOAD_PMP ( <i>C macro</i> ), 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b(C++ member), 83  CSR_MTVEC_Type::d(C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVT (C macro), 72                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | D  DCAUSE_FAULT_FETCH_INST (C macro), 80  DCAUSE_FAULT_FETCH_PMP (C macro), 80  DCAUSE_FAULT_LOAD_INST (C macro), 80  DCAUSE_FAULT_LOAD_NICE (C macro), 80  DCAUSE_FAULT_LOAD_PMP (C macro), 80  DCAUSE_FAULT_STORE_INST (C macro), 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b (C++ member), 83  CSR_MTVEC_Type::d (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVT (C macro), 72  CSR_MTVT2 (C macro), 72                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b (C++ member), 83  CSR_MTVEC_Type::d (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVT (C macro), 72  CSR_MTVT2 (C macro), 72  CSR_MUCOUNTEREN (C macro), 69                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | D  DCAUSE_FAULT_FETCH_INST (C macro), 80  DCAUSE_FAULT_FETCH_PMP (C macro), 80  DCAUSE_FAULT_LOAD_INST (C macro), 80  DCAUSE_FAULT_LOAD_NICE (C macro), 80  DCAUSE_FAULT_LOAD_PMP (C macro), 80  DCAUSE_FAULT_STORE_INST (C macro), 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b (C++ member), 83  CSR_MTVEC_Type::d (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVT (C macro), 72  CSR_MTVT2 (C macro), 72  CSR_MUCOUNTEREN (C macro), 69  CSR_MVENDORID (C macro), 70                                                                                                                                                                                                                                                                                                                                                                                                                                                         | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| CSR_MTVAL (C macro), 67 CSR_MTVEC (C macro), 67 CSR_MTVEC_Type (C++ union), 83 CSR_MTVEC_Type::addr (C++ member), 83 CSR_MTVEC_Type::b (C++ member), 83 CSR_MTVEC_Type::d (C++ member), 83 CSR_MTVEC_Type::mode (C++ member), 83 CSR_MTVEC_Type::mode (C++ member), 83 CSR_MTVT (C macro), 72 CSR_MTVT2 (C macro), 72 CSR_MUCOUNTEREN (C macro), 69 CSR_MVENDORID (C macro), 70 CSR_PMPADDRO (C macro), 67                                                                                                                                                                                                                                                                                                                                                                                                  | D  DCAUSE_FAULT_FETCH_INST (C macro), 80  DCAUSE_FAULT_FETCH_PMP (C macro), 80  DCAUSE_FAULT_LOAD_INST (C macro), 80  DCAUSE_FAULT_LOAD_NICE (C macro), 80  DCAUSE_FAULT_LOAD_PMP (C macro), 80  DCAUSE_FAULT_STORE_INST (C macro), 80  DCAUSE_FAULT_STORE_PMP (C macro), 80  DCAUSE_FAULT_STORE_PMP (C macro), 80  DCSR_CAUSE (C macro), 74                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b (C++ member), 83  CSR_MTVEC_Type::d (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVT (C macro), 72  CSR_MTVT2 (C macro), 72  CSR_MUCOUNTEREN (C macro), 69  CSR_MVENDORID (C macro), 69  CSR_PMPADDRO (C macro), 67  CSR_PMPADDRO (C macro), 67                                                                                                                                                                                                                                                                                                                                                          | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b (C++ member), 83  CSR_MTVEC_Type::d (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVTC (C macro), 72  CSR_MTVT (C macro), 72  CSR_MUCOUNTEREN (C macro), 69  CSR_MVENDORID (C macro), 69  CSR_PMPADDR1 (C macro), 67  CSR_PMPADDR1 (C macro), 67  CSR_PMPADDR10 (C macro), 68                                                                                                                                                                                                                                                                                                                                                                    | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b (C++ member), 83  CSR_MTVEC_Type::d (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVT (C macro), 72  CSR_MTVT2 (C macro), 72  CSR_MUCOUNTEREN (C macro), 69  CSR_MVENDORID (C macro), 67  CSR_PMPADDR1 (C macro), 67  CSR_PMPADDR10 (C macro), 68  CSR_PMPADDR11 (C macro), 68                                                                                                                                                                                                                                                                                                                                                                   | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HWBP (C macro), 75                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b (C++ member), 83  CSR_MTVEC_Type::d (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVT (C macro), 72  CSR_MTVT (C macro), 72  CSR_MUCOUNTEREN (C macro), 69  CSR_MVENDORID (C macro), 67  CSR_PMPADDR1 (C macro), 67  CSR_PMPADDR10 (C macro), 68  CSR_PMPADDR11 (C macro), 68  CSR_PMPADDR12 (C macro), 68                                                                                                                                                                                                                                                                                                                                       | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HWBP (C macro), 75 DCSR_CAUSE_NONE (C macro), 74                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b (C++ member), 83  CSR_MTVEC_Type::d (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVT (C macro), 72  CSR_MTVT2 (C macro), 72  CSR_MUCOUNTEREN (C macro), 69  CSR_MVENDORID (C macro), 67  CSR_PMPADDR1 (C macro), 67  CSR_PMPADDR10 (C macro), 68  CSR_PMPADDR11 (C macro), 68                                                                                                                                                                                                                                                                                                                                                                   | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HWBP (C macro), 75 DCSR_CAUSE_NONE (C macro), 74 DCSR_CAUSE_NONE (C macro), 74                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b (C++ member), 83  CSR_MTVEC_Type::d (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVT (C macro), 72  CSR_MTVT (C macro), 72  CSR_MUCOUNTEREN (C macro), 69  CSR_MVENDORID (C macro), 67  CSR_PMPADDR1 (C macro), 67  CSR_PMPADDR10 (C macro), 68  CSR_PMPADDR11 (C macro), 68  CSR_PMPADDR12 (C macro), 68                                                                                                                                                                                                                                                                                                                                       | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HWBP (C macro), 75 DCSR_CAUSE_NONE (C macro), 74 DCSR_CAUSE_STEP (C macro), 75 DCSR_CAUSE_STEP (C macro), 75 DCSR_CAUSE_SWBP (C macro), 75                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b (C++ member), 83  CSR_MTVEC_Type::d (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVT (C macro), 72  CSR_MTVT2 (C macro), 72  CSR_MUCOUNTEREN (C macro), 69  CSR_MVENDORID (C macro), 67  CSR_PMPADDR1 (C macro), 67  CSR_PMPADDR10 (C macro), 68  CSR_PMPADDR11 (C macro), 68  CSR_PMPADDR12 (C macro), 68  CSR_PMPADDR13 (C macro), 68                                                                                                                                                                                                                                                                  | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HOME (C macro), 75 DCSR_CAUSE_NONE (C macro), 75 DCSR_CAUSE_STEP (C macro), 75 DCSR_CAUSE_SWBP (C macro), 75 DCSR_DCSR_DEBUGINT (C macro), 75 DCSR_DCSR_DEBUGINT (C macro), 75 DCSR_DCSR_DEBUGINT (C macro), 75                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| CSR_MTVAL (C macro), 67  CSR_MTVEC (C macro), 67  CSR_MTVEC_Type (C++ union), 83  CSR_MTVEC_Type::addr (C++ member), 83  CSR_MTVEC_Type::b (C++ member), 83  CSR_MTVEC_Type::d (C++ member), 83  CSR_MTVEC_Type::mode (C++ member), 83  CSR_MTVTC(C macro), 72  CSR_MTVT2 (C macro), 72  CSR_MUCOUNTEREN (C macro), 69  CSR_MVENDORID (C macro), 67  CSR_PMPADDR1 (C macro), 67  CSR_PMPADDR10 (C macro), 68  CSR_PMPADDR12 (C macro), 68  CSR_PMPADDR13 (C macro), 68  CSR_PMPADDR13 (C macro), 68  CSR_PMPADDR14 (C macro), 68                                                                                                                                                                                                                                                                            | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_STEP (C macro), 74 DCSR_CAUSE_STEP (C macro), 75 DCSR_CAUSE_SWBP (C macro), 75 DCSR_DEBUGINT (C macro), 74 DCSR_BEBREAKH (C macro), 74 DCSR_BEBREAKH (C macro), 74                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| CSR_MTVAL (C macro), 67 CSR_MTVEC (C macro), 67 CSR_MTVEC_Type (C++ union), 83 CSR_MTVEC_Type::addr (C++ member), 83 CSR_MTVEC_Type::b (C++ member), 83 CSR_MTVEC_Type::d (C++ member), 83 CSR_MTVEC_Type::mode (C++ member), 83 CSR_MTVEC_Type::mode (C++ member), 83 CSR_MTVT (C macro), 72 CSR_MTVT2 (C macro), 72 CSR_MUCOUNTEREN (C macro), 69 CSR_MVENDORID (C macro), 67 CSR_PMPADDR1 (C macro), 67 CSR_PMPADDR10 (C macro), 67 CSR_PMPADDR11 (C macro), 68 CSR_PMPADDR12 (C macro), 68 CSR_PMPADDR13 (C macro), 68 CSR_PMPADDR14 (C macro), 68 CSR_PMPADDR15 (C macro), 68 CSR_PMPADDR15 (C macro), 68                                                                                                                                                                                              | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_NONE (C macro), 74 DCSR_CAUSE_STEP (C macro), 75 DCSR_CAUSE_SWBP (C macro), 75 DCSR_DEBUGINT (C macro), 74 DCSR_DEBUGINT (C macro), 74 DCSR_DEBUGINT (C macro), 74                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| CSR_MTVAL (C macro), 67 CSR_MTVEC (C macro), 67 CSR_MTVEC_Type (C++ union), 83 CSR_MTVEC_Type::addr (C++ member), 83 CSR_MTVEC_Type::b (C++ member), 83 CSR_MTVEC_Type::d (C++ member), 83 CSR_MTVEC_Type::mode (C++ member), 83 CSR_MTVEC_Type::mode (C++ member), 83 CSR_MTVT (C macro), 72 CSR_MTVT2 (C macro), 72 CSR_MUCOUNTEREN (C macro), 69 CSR_MVENDORID (C macro), 67 CSR_PMPADDR1 (C macro), 67 CSR_PMPADDR1 (C macro), 68 CSR_PMPADDR11 (C macro), 68 CSR_PMPADDR13 (C macro), 68 CSR_PMPADDR14 (C macro), 68 CSR_PMPADDR15 (C macro), 68                                                                               | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HWBP (C macro), 75 DCSR_CAUSE_NONE (C macro), 74 DCSR_CAUSE_STEP (C macro), 75 DCSR_CAUSE_SWBP (C macro), 75 DCSR_DEBUGINT (C macro), 74 DCSR_EBREAKH (C macro), 74 DCSR_EBREAKM (C macro), 74 DCSR_EBREAKS (C macro), 74 DCSR_EBREAKS (C macro), 74                                                                                                                                                                                                                                                                                                                                                                                                           |
| CSR_MTVAL (C macro), 67 CSR_MTVEC (C macro), 67 CSR_MTVEC_Type (C++ union), 83 CSR_MTVEC_Type::addr (C++ member), 83 CSR_MTVEC_Type::b (C++ member), 83 CSR_MTVEC_Type::d (C++ member), 83 CSR_MTVEC_Type::mode (C++ member), 83 CSR_MTVTC_Type::mode (C++ member), 83 CSR_MTVT (C macro), 72 CSR_MTVT2 (C macro), 72 CSR_MUCOUNTEREN (C macro), 69 CSR_MVENDORID (C macro), 67 CSR_PMPADDR1 (C macro), 67 CSR_PMPADDR1 (C macro), 68 CSR_PMPADDR11 (C macro), 68 CSR_PMPADDR13 (C macro), 68 CSR_PMPADDR14 (C macro), 68 CSR_PMPADDR15 (C macro), 68 CSR_PMPADDR15 (C macro), 68 CSR_PMPADDR2 (C macro), 68 CSR_PMPADDR3 (C macro), 67 CSR_PMPADDR3 (C macro), 67 CSR_PMPADDR3 (C macro), 67                                                                                                               | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HWBP (C macro), 75 DCSR_CAUSE_NONE (C macro), 74 DCSR_CAUSE_STEP (C macro), 75 DCSR_CAUSE_SWBP (C macro), 74 DCSR_EBREAKH (C macro), 74 DCSR_EBREAKM (C macro), 74 DCSR_EBREAKM (C macro), 74 DCSR_EBREAKM (C macro), 74 DCSR_EBREAKU (C macro), 74 DCSR_EBREAKU (C macro), 74 DCSR_EBREAKU (C macro), 74                                                                                                                                                                                                                                                                                                                                                     |
| CSR_MTVAL (C macro), 67 CSR_MTVEC (C macro), 67 CSR_MTVEC_Type (C++ union), 83 CSR_MTVEC_Type::addr (C++ member), 83 CSR_MTVEC_Type::b (C++ member), 83 CSR_MTVEC_Type::d (C++ member), 83 CSR_MTVEC_Type::mode (C++ member), 83 CSR_MTVEC_Type::mode (C++ member), 83 CSR_MTVT (C macro), 72 CSR_MTVT2 (C macro), 72 CSR_MUCOUNTEREN (C macro), 69 CSR_MVENDORID (C macro), 67 CSR_PMPADDR1 (C macro), 67 CSR_PMPADDR1 (C macro), 68 CSR_PMPADDR11 (C macro), 68 CSR_PMPADDR12 (C macro), 68 CSR_PMPADDR13 (C macro), 68 CSR_PMPADDR15 (C macro), 68 CSR_PMPADDR15 (C macro), 68 CSR_PMPADDR2 (C macro), 67 CSR_PMPADDR3 (C macro), 67 CSR_PMPADDR3 (C macro), 67 CSR_PMPADDR4 (C macro), 67 CSR_PMPADDR4 (C macro), 67                                                                                    | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HONE (C macro), 75 DCSR_CAUSE_NONE (C macro), 74 DCSR_CAUSE_STEP (C macro), 75 DCSR_CAUSE_SWBP (C macro), 74 DCSR_BEBREAKH (C macro), 74 DCSR_EBREAKM (C macro), 74 DCSR_EBREAKS (C macro), 74 DCSR_EBREAKU (C macro), 74 DCSR_EBREAKU (C macro), 74 DCSR_FULLRESET (C macro), 74 DCSR_HALT (C macro), 74 DCSR_HALT (C macro), 74                                                                                                                                                                                                                                                                                                                             |
| CSR_MTVAL (C macro), 67 CSR_MTVEC (C macro), 67 CSR_MTVEC_Type (C++ union), 83 CSR_MTVEC_Type::addr (C++ member), 83 CSR_MTVEC_Type::b (C++ member), 83 CSR_MTVEC_Type::d (C++ member), 83 CSR_MTVEC_Type::mode (C++ member), 83 CSR_MTVTCC_Type::mode (C++ member), 83 CSR_MTVT (C macro), 72 CSR_MTVT2 (C macro), 72 CSR_MUCOUNTEREN (C macro), 69 CSR_MVENDORID (C macro), 67 CSR_PMPADDR1 (C macro), 67 CSR_PMPADDR1 (C macro), 68 CSR_PMPADDR11 (C macro), 68 CSR_PMPADDR12 (C macro), 68 CSR_PMPADDR13 (C macro), 68 CSR_PMPADDR15 (C macro), 68 CSR_PMPADDR15 (C macro), 67 CSR_PMPADDR3 (C macro), 67 CSR_PMPADDR4 (C macro), 67 CSR_PMPADDR4 (C macro), 67 CSR_PMPADDR5 (C macro), 67 CSR_PMPADDR5 (C macro), 67 CSR_PMPADDR5 (C macro), 67 CSR_PMPADDR5 (C macro), 68                             | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HABP (C macro), 75 DCSR_CAUSE_STEP (C macro), 74 DCSR_CAUSE_STEP (C macro), 75 DCSR_DEBUGINT (C macro), 74 DCSR_BEBREAKH (C macro), 74 DCSR_BREAKH (C macro), 74 |
| CSR_MTVAL (C macro), 67 CSR_MTVEC (C macro), 67 CSR_MTVEC_Type (C++ union), 83 CSR_MTVEC_Type::addr (C++ member), 83 CSR_MTVEC_Type::b (C++ member), 83 CSR_MTVEC_Type::d (C++ member), 83 CSR_MTVEC_Type::mode (C++ member), 83 CSR_MTVTCC_Type::mode (C++ member), 83 CSR_MTVT (C macro), 72 CSR_MTVT2 (C macro), 72 CSR_MUCOUNTEREN (C macro), 69 CSR_MVENDORID (C macro), 67 CSR_PMPADDR1 (C macro), 67 CSR_PMPADDR10 (C macro), 68 CSR_PMPADDR11 (C macro), 68 CSR_PMPADDR12 (C macro), 68 CSR_PMPADDR13 (C macro), 68 CSR_PMPADDR15 (C macro), 68 CSR_PMPADDR15 (C macro), 67 CSR_PMPADDR3 (C macro), 67 CSR_PMPADDR4 (C macro), 67 CSR_PMPADDR5 (C macro), 67 CSR_PMPADDR5 (C macro), 67 CSR_PMPADDR5 (C macro), 68 CSR_PMPADDR5 (C macro), 68 CSR_PMPADDR6 (C macro), 68 CSR_PMPADDR6 (C macro), 68 | DCAUSE_FAULT_FETCH_INST (C macro), 80 DCAUSE_FAULT_FETCH_PMP (C macro), 80 DCAUSE_FAULT_LOAD_INST (C macro), 80 DCAUSE_FAULT_LOAD_NICE (C macro), 80 DCAUSE_FAULT_LOAD_PMP (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_INST (C macro), 80 DCAUSE_FAULT_STORE_PMP (C macro), 80 DCSR_CAUSE (C macro), 74 DCSR_CAUSE_DEBUGINT (C macro), 75 DCSR_CAUSE_HALT (C macro), 75 DCSR_CAUSE_HONE (C macro), 75 DCSR_CAUSE_NONE (C macro), 74 DCSR_CAUSE_STEP (C macro), 75 DCSR_CAUSE_SWBP (C macro), 74 DCSR_BEBREAKH (C macro), 74 DCSR_EBREAKM (C macro), 74 DCSR_EBREAKS (C macro), 74 DCSR_EBREAKU (C macro), 74 DCSR_EBREAKU (C macro), 74 DCSR_FULLRESET (C macro), 74 DCSR_HALT (C macro), 74 DCSR_HALT (C macro), 74                                                                                                                                                                                                                                                                                                                             |

```
DCSR STOPCYCLE (C macro), 74
                                               ECLIC_TRIGGER_Type::ECLIC_POSTIVE_EDGE_TRIGGER
DCSR STOPTIME (C macro), 74
                                                       (C++enumerator), 87
DCSR XDEBUGVER (C macro), 74
                                               ECLIC VECTOR INTERRUPT (C macro), 87
                                               EXC_HANDLER (C++type), 344
depthwise_conv_s8_generic (C++ function),
        551, 567
                                               Exception DumpFrame (C++ function), 344
                                               Exception Get EXC (C++ function), 345
depthwise conv s8 mult 4(C++function), 550,
                                               Exception Init (C++function), 344
                                               Exception_Register_EXC (C++ function), 344
depthwise_conv_u8_generic (C++ function),
        552, 569
depthwise_conv_u8_mult_4 (C++function), 551,
                                               FFLAGS_AE_DZ (C macro), 79
DSP, 603
                                               FFLAGS_AE_NV (C macro), 79
                                               FFLAGS_AE_NX (C macro), 79
E
                                               FFLAGS_AE_OF (C macro), 79
ECLIC (C macro), 87
                                               FFLAGS_AE_UF (C macro), 79
ECLIC_BASE (C macro), 87
                                               FREG (C macro), 79
ECLIC_ClearPendingIRQ (C macro), 308
                                               FRM RNDMODE DYN (C macro), 79
                                               FRM_RNDMODE_RDN (C macro), 79
ECLIC_DisableIRQ (C macro), 308
ECLIC_EnableIRQ (C macro), 307, 308
                                               FRM RNDMODE RMM (C macro), 79
ECLIC_GetCfgNlbits (C macro), 307, 308
                                               FRM_RNDMODE_RNE (C macro), 79
ECLIC_GetCtrlIRQ (C macro), 308, 309
                                               FRM RNDMODE RTZ (C macro), 79
ECLIC_GetEnableIRQ (C macro), 308
                                               FRM RNDMODE RUP (C macro), 79
ECLIC GetInfoCtlbits (C macro), 307, 308
ECLIC GetInfoNum (C macro), 307, 308
ECLIC GetInfoVer (C macro), 307, 308
                                               IRQ COP (C macro), 78
ECLIC_GetLevelIRQ (C macro), 308, 309
                                               IRQ H EXT (C macro), 78
ECLIC GetMth (C macro), 307, 308
                                               IRQ H SOFT (C macro), 78
ECLIC GetPendingIRQ (C macro), 308
                                               IRQ_H_TIMER (C macro), 78
ECLIC GetPriorityIRO (C macro), 308, 309
                                               IRQ HOST (C macro), 78
ECLIC_GetShvIRQ (C macro), 308, 309
                                               IRQ_M_EXT (C macro), 78
ECLIC_GetTrigIRQ (C macro), 308, 309
                                               IRQ M SOFT (C macro), 78
ECLIC_GetVector (C macro), 308, 309
                                               IRQ M TIMER (C macro), 78
ECLIC_Init (C++ function), 345
                                               IRQ_S_EXT (C macro), 78
ECLIC_MAX_NLBITS (C macro), 87
                                               IRQ_S_SOFT (C macro), 78
ECLIC_MODE_MTVEC_Msk (C macro), 87
                                               IRQ_S_TIMER (C macro), 78
ECLIC_NON_VECTOR_INTERRUPT (C macro), 87
                                               IRQn_Type (C++ enum), 306, 310
ECLIC_Register_IRQ (C++ function), 345
                                               IRQn_Type::FirstDeviceSpecificInterrupt_IRQn
ECLIC_SetCfgNlbits (C macro), 307, 308
                                                       (C++ enumerator), 306, 311
ECLIC_SetCtrlIRQ (C macro), 308, 309
                                               IRQn_Type::Reserved0_IRQn(C++enumerator),
ECLIC SetLevelIRQ (C macro), 308, 309
                                                       306, 310
ECLIC_SetMth (C macro), 307, 308
                                               IRQn_Type::Reserved10_IRQn (C++ enumera-
ECLIC SetPendingIRQ (C macro), 308
                                                       tor), 306, 310
ECLIC_SetPriorityIRQ (C macro), 308, 309
                                               IRQn Type::Reserved11 IRQn (C++ enumera-
ECLIC SetShvIRQ (C macro), 308, 309
                                                       tor), 306, 310
ECLIC_SetTrigIRQ (C macro), 308, 309
                                               IRQn_Type::Reserved12_IRQn (C++ enumera-
ECLIC SetVector (C macro), 308, 309
                                                       tor), 306, 310
ECLIC_TRIGGER_Type (C++ enum), 87
                                               IRQn_Type::Reserved13_IRQn (C++ enumera-
ECLIC_TRIGGER_Type::ECLIC_LEVEL_TRIGGER
                                                       tor), 306, 311
        (C++ enumerator), 87
                                               IRQn Type::Reserved14 IRQn (C++ enumera-
ECLIC_TRIGGER_Type::ECLIC_MAX_TRIGGER
                                                       tor), 306, 311
       (C++enumerator), 87
                                               IRQn_Type::Reserved15_IRQn (C++ enumera-
ECLIC_TRIGGER_Type::ECLIC_NEGTIVE_EDGE_TRIGGER_tor), 306, 311
       (C++enumerator), 87
                                               IRQn_Type::Reserved16_IRQn_(C++ enumera-
                                                       tor), 306, 311
```

| <pre>IRQn_Type::Reserved1_IRQn(C++ enumerator),</pre>                                           | MCONTROL_ACTION_TRACE_EMIT (C macro), 75                                                 |
|-------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
| 306, 310                                                                                        | MCONTROL_ACTION_TRACE_START (C macro), 75                                                |
| IRQn_Type::Reserved2_IRQn( $C$ ++ enumerator),                                                  | MCONTROL_ACTION_TRACE_STOP (C macro), 75                                                 |
| 306, 310                                                                                        | MCONTROL_CHAIN (C macro), 75                                                             |
| IRQn_Type::Reserved3_IRQn( $C$ ++ enumerator),                                                  | MCONTROL_DMODE (C macro), 75                                                             |
| 306, 310                                                                                        | MCONTROL_EXECUTE (C macro), 75                                                           |
| IRQn_Type::Reserved4_IRQn( $C$ ++ enumerator),                                                  | MCONTROL_H (C macro), 75                                                                 |
| 306, 310                                                                                        | MCONTROL_LOAD (C macro), 75                                                              |
| IRQn_Type::Reserved5_IRQn( $C$ ++ enumerator),                                                  | MCONTROL_M (C macro), 75                                                                 |
| 306, 310                                                                                        | MCONTROL_MASKMAX ( <i>C macro</i> ), 75                                                  |
| IRQn_Type::Reserved6_IRQn( $C$ ++ enumerator),                                                  | MCONTROL_MATCH (C macro), 75                                                             |
| 306, 310                                                                                        | MCONTROL_MATCH_EQUAL (C macro), 75                                                       |
| IRQn_Type::Reserved7_IRQn( $C$ ++ enumerator), 306,310                                          | MCONTROL_MATCH_GE (C macro), 75                                                          |
|                                                                                                 | MCONTROL_MATCH_LT ( <i>C macro</i> ), 75 MCONTROL_MATCH_MASK_HIGH ( <i>C macro</i> ), 75 |
| <pre>IRQn_Type::Reserved8_IRQn(C++ enumerator),</pre>                                           | MCONTROL_MATCH_MASK_LOW (C macro), 75  MCONTROL_MATCH_MASK_LOW (C macro), 75             |
| IRQn_Type::Reserved9_IRQn(C++ enumerator),                                                      | MCONTROL_MATCH_MASK_LOW (C macro), 75  MCONTROL_MATCH_NAPOT (C macro), 75                |
| 306, 310                                                                                        | MCONTROL_S (C macro), 75                                                                 |
| IRQn_Type::SOC_INT_MAX ( $C++$ enumerator),                                                     | MCONTROL_SELECT (C macro), 75                                                            |
| 306, 311                                                                                        | MCONTROL_SERECT (C macro), 75  MCONTROL_STORE (C macro), 75                              |
| •                                                                                               | MCONTROL_TIMING (C macro), 75                                                            |
| <pre>IRQn_Type::SysTimer_IRQn (C++ enumerator),</pre>                                           | MCONTROL_TYPE (C macro), 75                                                              |
| IRQn_Type::SysTimerSW_IRQn (C++ enumera-                                                        | MCONTROL_TYPE_MATCH (C macro), 75                                                        |
| tor), 306, 310                                                                                  | MCONTROL_TYPE_NONE (C macro), 75                                                         |
| ISR, <b>603</b>                                                                                 | MCONTROL_U (C macro), 75                                                                 |
| 151, 003                                                                                        | MCOUNTINHIBIT_CY (C macro), 76                                                           |
| M                                                                                               | MCOUNTINHIBIT_IR (C macro), 76                                                           |
|                                                                                                 | MDCAUSE_MDCAUSE (C macro), 76                                                            |
| MAX_SYSTEM_EXCEPTION_NUM ( <i>C macro</i> ), 344<br>MCACHE_CTL_DC_ECC_EN ( <i>C macro</i> ), 77 | MDCFG_DC_ECC (C macro), 77                                                               |
| MCACHE_CTL_DC_ECC_EN (C macro), 77  MCACHE_CTL_DC_ECC_EXCP_EN (C macro), 77                     | MDCFG_DC_LSIZE (C macro), 77                                                             |
| MCACHE_CTL_DC_EN (C macro), 77  MCACHE_CTL_DC_EN (C macro), 77                                  | MDCFG_DC_SET (C macro), 77                                                               |
| MCACHE_CTL_DC_EN (C macro), 77  MCACHE_CTL_DC_RWDECC (C macro), 77                              | MDCFG_DC_WAY (C macro), 77                                                               |
| MCACHE_CTL_DC_RWTECC (C macro), 77  MCACHE_CTL_DC_RWTECC (C macro), 77                          | MDCFG_DLM_ECC (C macro), 78                                                              |
| MCACHE_CTL_IC_ECC_EN (C macro), 77                                                              | MDCFG_DLM_SIZE (C macro), 78                                                             |
| MCACHE_CTL_IC_ECC_EN (C macro), 77  MCACHE_CTL_IC_ECC_EXCP_EN (C macro), 77                     | MDLM_CTL_DLM_BPA (C macro), 76                                                           |
| MCACHE_CTL_IC_ECC_EXCF_EN (C macro), // MCACHE_CTL_IC_EN (C macro), 77                          | MDLM_CTL_DLM_ECC_EN (C macro), 76                                                        |
| MCACHE_CTL_IC_EN (C macro), 77  MCACHE_CTL_IC_RWDECC (C macro), 77                              | MDLM_CTL_DLM_ECC_EXCP_EN (C macro), 76                                                   |
|                                                                                                 | MDLM_CTL_DLM_EN (C macro), 76                                                            |
| MCACHE_CTL_IC_RWTECC ( <i>C macro</i> ), 77 MCACHE_CTL_IC_SCPD_MOD ( <i>C macro</i> ), 77       | MDLM_CTL_DLM_RWECC (C macro), 76                                                         |
| MCFG_INFO_CLIC (C macro), 77                                                                    | MECC_CODE_CODE (C macro), 78                                                             |
| MCFG_INFO_CLIC (C macro), 77  MCFG_INFO_DCACHE (C macro), 77                                    | MECC_CODE_RAMID (C macro), 78                                                            |
| MCFG_INFO_DCACHE (C macro), 77 MCFG_INFO_DLM (C macro), 77                                      | MECC_CODE_SRAMID (C macro), 78                                                           |
| MCFG_INFO_ECC (C macro), 77                                                                     | MECC_LOCK_ECC_LOCK (C macro), 78                                                         |
| MCFG_INFO_ECC (C macro), 77  MCFG_INFO_FIO (C macro), 77                                        | MFIOCFG_INFO_FIO_BPA (C macro), 78                                                       |
| MCFG_INFO_ICACHE (C macro), 77                                                                  | MFIOCFG_INFO_FIO_SIZE (C macro), 78                                                      |
| MCFG_INFO_ILM (C macro), 77                                                                     | MICFG_IC_ECC (C macro), 77                                                               |
| MCFG_INFO_NICE (C macro), 77                                                                    | MICFG_IC_LSIZE (C macro), 77                                                             |
| MCFG_INFO_PLIC (C macro), 77                                                                    | MICFG_IC_SET (C macro), 77                                                               |
| MCFG_INFO_PPI (C macro), 77                                                                     | MICFG_IC_WAY (C macro), 77                                                               |
| MCFG_INFO_TEE (C macro), 77                                                                     | MICFG_ILM_ECC (C macro), 77                                                              |
| MCONTROL_ACTION (C macro), 75                                                                   | MICFG_ILM_SIZE (C macro), 77                                                             |
| MCONTROL_ACTION (C matro), 73  MCONTROL_ACTION_DEBUG_EXCEPTION (C                               | MICFG_ILM_XONLY (C macro), 77                                                            |
| <i>macro</i> ), 75                                                                              | MIE_HEIE (C macro), 76                                                                   |
| MCONTROL_ACTION_DEBUG_MODE ( <i>C macro</i> ), 75                                               | MIE_HSIE ( <i>C macro</i> ), 76                                                          |
|                                                                                                 |                                                                                          |

| MIE_HTIE (C macro), 76                | Р                                                                    |
|---------------------------------------|----------------------------------------------------------------------|
| MIE_MEIE (C macro), 76                |                                                                      |
| MIE_MSIE ( <i>C macro</i> ), 76       | PMP_A ( <i>C macro</i> ), 79<br>PMP_A_NA4 ( <i>C macro</i> ), 79     |
| MIE_MTIE (C macro), 76                | PMP_A_NA4 ( <i>C macro</i> ), 79  PMP_A_NAPOT ( <i>C macro</i> ), 79 |
| MIE_SEIE (C macro), 76                | PMP_A_TOR ( <i>C macro</i> ), 79                                     |
| MIE_SSIE (C macro), 76                | PMP_COUNT (C macro), 79                                              |
| MIE_STIE (C macro), 76                | _ , , , , , , , , , , , , , , , , , , ,                              |
| MILM_CTL_ILM_BPA (C macro), 76        | PMP_L ( <i>C macro</i> ), 79                                         |
| MILM_CTL_ILM_ECC_EN (C macro), 76     | PMP_R ( <i>C macro</i> ), 79                                         |
| MILM_CTL_ILM_ECC_EXCP_EN(C macro), 76 | PMP_SHIFT (C macro), 79                                              |
| MILM_CTL_ILM_EN (C macro), 76         | PMP_W ( <i>C macro</i> ), 79                                         |
| MILM_CTL_ILM_RWECC (C macro), 76      | PMP_X ( <i>C macro</i> ), 79                                         |
| MIP_HEIP (C macro), 76                | PRV_H ( <i>C macro</i> ), 78                                         |
| MIP_HSIP (C macro), 75                | PRV_M ( <i>C macro</i> ), 78                                         |
| MIP_HTIP (C macro), 76                | PRV_S ( <i>C macro</i> ), 78                                         |
| MIP_MEIP (C macro), 76                | PRV_U ( <i>C macro</i> ), 78<br>PTE_A ( <i>C macro</i> ), 80         |
| MIP_MSIP (C macro), 75                | PTE_D (C macro), 80                                                  |
| MIP_MTIP (C macro), 76                | PTE_G (C macro), 80                                                  |
| MIP_SEIP (C macro), 76                | PTE_PPN_SHIFT (C macro), 80                                          |
| MIP_SSIP ( <i>C macro</i> ), 75       | PTE_R ( <i>C macro</i> ), 79                                         |
| MIP_STIP ( <i>C macro</i> ), 76       | PTE_SOFT (C macro), 80                                               |
| MMISC_CTL_BPU ( <i>C macro</i> ), 77  | PTE_TABLE (C macro), 80                                              |
| MMISC_CTL_MISALIGN (C macro), 76      | PTE_U ( <i>C macro</i> ), 80                                         |
| MMISC_CTL_NMI_CAUSE_FFF (C macro), 76 | PTE_V ( <i>C macro</i> ), 79                                         |
| MPPICFG_INFO_PPI_BPA (C macro), 78    | PTE_W ( <i>C macro</i> ), 79                                         |
| MPPICFG_INFO_PPI_SIZE (C macro), 78   | PTE_X ( <i>C macro</i> ), 79                                         |
| MSTATUS32_SD (C macro), 74            | 111_X (C macro), 17                                                  |
| MSTATUS64_SD (C macro), 74            | R                                                                    |
| MSTATUS_FS (C macro), 73              | realCoefA ( <i>C</i> ++ <i>member</i> ), 485, 486                    |
| MSTATUS_FS_CLEAN (C macro), 74        | realCoefAQ31 (C++ member), 485, 486                                  |
| MSTATUS_FS_DIRTY (C macro), 74        | realCoefB (C++ member), 485, 486                                     |
| MSTATUS_FS_INITIAL (C macro), 74      | realCoefBQ31 (C++ member), 485, 486                                  |
| MSTATUS_HIE (C macro), 73             | RESTORE_FPU_CONTEXT (C macro), 322                                   |
| MSTATUS_HPIE (C macro), 73            | RESTORE_IRQ_CSR_CONTEXT (C macro), 308, 309                          |
| MSTATUS_MIE (C macro), 73             | riscv_abs_f32 (C++ function), 352, 353                               |
| MSTATUS_MPIE (C macro), 73            | riscv_abs_q15 (C++ function), 352, 353                               |
| MSTATUS_MPP (C macro), 73             | riscv_abs_q31 (C++ function), 352, 353                               |
| MSTATUS_MPRV ( <i>C macro</i> ), 74   | riscv_abs_q7 (C++ function), 352, 353                                |
| MSTATUS_MXR ( <i>C macro</i> ), 74    | riscv_add_f32 ( $C$ ++ function), 354                                |
| MSTATUS_PUM (C macro), 74             | riscv_add_q15 (C++ function), 354                                    |
| MSTATUS_SIE (C macro), 73             | riscv_add_q31 (C++ function), 354                                    |
| MSTATUS_SPIE (C macro), 73            | riscv_add_q7 (C++ function), 354, 355                                |
| MSTATUS_SPP (C macro), 73             | riscv_avepool_q7_HWC(C++ function), 582, 584                         |
| MSTATUS_UIE (C macro), 73             | riscv_avgpool_s8 (C++ function), 582, 583                            |
| MSTATUS_UPIE (C macro), 73            | riscv_avgpool_s8_get_buffer_size (C++                                |
| MSTATUS_VM (C macro), 74              | function), 582, 583                                                  |
| MSTATUS_XS (C macro), 73              | riscv_bilinear_interp_f32 (C++ function),                            |
| MSUBM_PTYP (C macro), 76              | 525                                                                  |
| MSUBM_TYP ( <i>C macro</i> ), 76      | riscv_bilinear_interp_q15 (C++ function),                            |
| MTVT2_COMMON_CODE_ENTRY (C macro), 77 | 525, 526                                                             |
| MTVT2_MTVT2EN ( <i>C macro</i> ), 77  | riscv_bilinear_interp_q31 (C++ function),                            |
| N                                     | 525, 526                                                             |
|                                       |                                                                      |
| NN, <b>603</b>                        | riscv_bilinear_interp_q7(C++function),525, 526                       |

```
riscv_biquad_cas_df1_32x64_init_q31
                                               riscv_cmplx_conj_q31 (C++ function), 369, 370
        (C++ function), 376, 379
                                               riscv_cmplx_dot_prod_f32 (C++ function), 370
riscv_biquad_cas_df1_32x64_q31 (C++ func-
                                               riscv_cmplx_dot_prod_q15 (C++ function), 370
                                               riscv_cmplx_dot_prod_q31 (C++function), 370,
       tion), 376, 379
riscv_biquad_cascade_df1_f32 (C++ func-
                                                       371
       tion), 380, 383
                                               riscv cmplx mag f32 (C++ function), 371, 372
riscv biquad cascade df1 fast q15 (C++
                                              riscv cmplx mag g15 (C++ function), 371, 372
                                               riscv_cmplx_mag_q31 (C++ function), 371, 372
       function), 380, 383
riscv_biquad_cascade_df1_fast_q31 (C++
                                               riscv_cmplx_mag_squared_f32 (C++ function),
                                                       373
       function), 380, 383
riscv_biquad_cascade_df1_init_f32 (C++ riscv_cmplx_mag_squared_q15 (C++ function),
                                                       373
       function), 380, 384
riscv_biquad_cascade_df1_init_q15 (C++
                                              riscv_cmplx_mag_squared_q31 (C++ function),
       function), 380, 384
                                                       373
riscv_biquad_cascade_df1_init_q31 (C++
                                              riscv_cmplx_mult_cmplx_f32 (C++ function),
       function), 380, 385
                                                       374
riscv_biquad_cascade_df1_q15 (C++ func-
                                               riscv_cmplx_mult_cmplx_q15 (C++ function),
       tion), 380, 385
                                                       374
riscv_biquad_cascade_df1_q31 (C++ func-
                                               riscv_cmplx_mult_cmplx_q31 (C++ function),
       tion), 380, 386
                                                       374
riscv_biquad_cascade_df2T_init_f32(C++
                                               riscv_cmplx_mult_real_f32 (C++ function),
       function), 386, 389
                                                       375
riscv_biquad_cascade_df2T_init_f64(C++ riscv_cmplx_mult_real_q15 (C++ function),
       function), 386, 390
\verb|riscv_biquad_cascade_stereo_df2T_init_f32iscv_cmplx_mult_real_q31| (C++ \textit{ function}), \\
       (C++ function), 387, 390
                                                       375, 376
riscv_cfft_f32 (C++ function), 467, 471
                                               riscv_conv_f32 (C++ function), 391, 392
riscv_cfft_f64 (C++ function), 468, 471
                                               riscv_conv_fast_opt_q15 (C++ function), 391,
riscv_cfft_init_f32 (C++ function), 468, 471
riscv\_cfft\_init\_f64 (C++ function), 468, 472
                                               riscv_conv_fast_q15 (C++ function), 391, 393
riscv_cfft_init_q15 (C++ function), 468, 472
                                               riscv_conv_fast_q31 (C++ function), 391, 394
riscv_cfft_init_q31 (C++ function), 468, 472
                                               riscv_conv_opt_q15 (C++ function), 391, 394
riscv_cfft_q15 (C++ function), 468, 472
                                               riscv\_conv\_opt\_q7 (C++ function), 391, 395
riscv_cfft_q31 (C++ function), 468, 473
                                               riscv_conv_partial_f32 (C++ function), 397,
riscv_cfft_radix2_f32(C++function), 468, 473
                                                       398
riscv_cfft_radix2_init_f32 (C++ function),
                                                                                       (C++
                                               riscv_conv_partial_fast_opt_q15
       468, 473
                                                      function), 397, 398
riscv_cfft_radix2_init_q15 (C++ function),
                                               riscv_conv_partial_fast_q15 (C++ function),
       468, 474
                                                       397, 399
                                               riscv_conv_partial_fast_q31 (C++ function),
riscv_cfft_radix2_init_q31 (C++ function),
       468, 475
                                                       397, 399
riscv_cfft_radix2_q15 (C++ function), 468, 475
                                               riscv_conv_partial_opt_q15 (C++ function),
riscv_cfft_radix2_q31 (C++ function), 468, 475
                                                      397, 400
riscv_cfft_radix4_f32 (C++ function), 468, 476
                                               riscv_conv_partial_opt_q7 (C++ function),
riscv_cfft_radix4_init_f32 (C++ function),
                                                       397, 400
        468, 476
                                               riscv_conv_partial_q15 (C++ function), 397,
riscv_cfft_radix4_init_q15 (C++ function),
                                                       401
       468, 477
                                               riscv_conv_partial_q31 (C++ function), 397,
riscv_cfft_radix4_init_q31 (C++ function),
       468, 477
                                               riscv_conv_partial_q7 (C++ function), 397, 402
riscv_cfft_radix4_q15 (C++ function), 468, 478
                                               riscv_conv_q15 (C++ function), 391, 395
riscv_cfft_radix4_q31(C++function), 468, 478
                                               riscv_conv_q31 (C++ function), 391, 396
riscv_cmplx_conj_f32 (C++ function), 369
                                               riscv_conv_q7 (C++ function), 391, 396
riscv_cmplx_conj_q15 (C++ function), 369
                                               riscv convolve 1 x n s8 (C++ function), 547,
```

```
554
                                               riscv_dct4_init_f32 (C++ function), 481, 482
riscv_convolve_1_x_n_s8_get_buffer_size riscv_dct4_init_q15 (C++ function), 481, 483
        (C++ function), 547, 554
                                               riscv dct4 init q31 (C++ function), 481, 484
riscv_convolve_1x1_HWC_q7_fast_nonsquareriscv_dct4_q15 (C++ function), 481, 484
        (C++ function), 547, 555
                                               riscv dct4 q31 (C++ function), 481, 485
riscv convolve 1x1 s8 fast (C++ function),
                                               riscv depthwise conv 3x3 s8 (C++ function),
                                                       550, 566
riscv_convolve_1x1_s8_fast_get_buffer_sizescv_depthwise_conv_s8 (C++ function), 551,
        (C++ function), 548, 557
                                                       567
riscv\_convolve\_HWC\_q15\_basic (C++ func- riscv\_depthwise\_conv\_s8\_opt (C++ function),
       tion), 548, 557
                                                       551, 568
riscv_convolve_HWC_q15_fast (C++ function),
                                               riscv_depthwise_conv_s8_opt_get_buffer_size
        548, 557
                                                       (C++ function), 551, 569
                                               riscv_depthwise_conv_u8_basic_ver1(C++
riscv_convolve_HWC_q15_fast_nonsquare
        (C++ function), 548, 558
                                                       function), 552, 569
riscv_convolve_HWC_q7_basic (C++ function),
                                               riscv_depthwise_conv_wrapper_s8
                                                                                        (C++
        549, 560
                                                       function), 552, 571
riscv_convolve_HWC_q7_basic_nonsquare
                                               riscv_depthwise_conv_wrapper_s8_get_buffer_size
        (C++ function), 549, 560
                                                       (C++ function), 552, 572
riscv_convolve_HWC_q7_fast (C++ function),
                                               riscv_depthwise_separable_conv_HWC_q7
        549, 561
                                                       (C++ function), 553, 572
riscv_convolve_HWC_q7_fast_nonsquare
                                               riscv_depthwise_separable_conv_HWC_q7_nonsquare
        (C++ function), 549, 562
                                                       (C++ function), 553, 573
riscv_convolve_HWC_q7_RGB (C++ function),
                                               riscv dot prod f32 (C++ function), 355
        550, 563
                                               riscv_dot_prod_q15 (C++ function), 355
riscv_convolve_s8 (C++ function), 550, 564
                                               riscv dot prod g31 (C++ function), 355, 356
riscv\_convolve\_s8\_get\_buffer\_size (C++
                                               riscv_dot_prod_q7 (C++ function), 355, 356
       function), 550, 565
                                               riscv_fill_f32 (C++ function), 517, 518
riscv_convolve_wrapper_s8 (C++ function),
                                               riscv_fill_q15 (C++ function), 517, 518
        550, 565
                                               riscv_fill_q31 (C++ function), 517, 518
riscv_convolve_wrapper_s8_get_buffer_sizeiscv_fill_g7 (C++ function), 517, 518
        (C++ function), 550, 566
                                               riscv_fir_decimate_f32 (C++ function), 408,
riscv_copy_f32 (C++ function), 516
                                                       409
riscv_copy_q15 (C++ function), 516
                                               riscv_fir_decimate_fast_q15 (C++ function),
riscv copv q31 (C++ function), 516, 517
                                                       408, 409
riscv_copy_q7 (C++ function), 516, 517
                                               riscv_fir_decimate_fast_q31 (C++ function),
riscv correlate f32 (C++ function), 402, 403
                                                       408, 410
riscv_correlate_fast_opt_q15 (C++ func-
                                               riscv_fir_decimate_init_f32 (C++ function),
        tion), 402, 404
                                                       408, 410
riscv_correlate_fast_q15 (C++ function), 402,
                                               riscv_fir_decimate_init_q15 (C++ function),
                                                       408, 411
riscv_correlate_fast_q31 (C++function), 402,
                                               riscv_fir_decimate_init_q31 (C++ function),
                                                       408, 411
                                               riscv_fir_decimate_q15 (C++ function), 408,
riscv_correlate_opt_q15 (C++ function), 402,
                                                       412
riscv_correlate_opt_q7 (C++ function), 402,
                                               riscv_fir_decimate_q31 (C++ function), 408,
                                                       412
                                               riscv_fir_f32 (C++ function), 413, 415
riscv_correlate_q15 (C++ function), 402, 406
riscv_correlate_q31 (C++ function), 402, 407
                                               riscv_fir_fast_q15 (C++ function), 413, 415
                                               riscv_fir_init_f32 (C++ function), 413, 416
riscv_correlate_q7 (C++ function), 402, 407
riscv_cos_f32 (C++ function), 367
                                               riscv_fir_init_q15 (C++ function), 413, 417
riscv cos q15 (C++ function), 367
                                               riscv_fir_init_q31 (C++ function), 413, 417
riscv_cos_q31 (C++ function), 367
                                               riscv_fir_init_q7 (C++ function), 413, 418
riscv dct4 f32 (C++ function), 480, 482
                                               riscv fir interpolate f32 (C++ function),
```

```
440, 441
                                               riscv_iir_lattice_f32 (C++ function), 427, 429
riscv fir interpolate init f32 (C++ func-
                                               riscv_iir_lattice_init_f32 (C++ function),
        tion), 440, 441
                                                       427, 429
riscv_fir_interpolate_init_q15 (C++ func-
                                               riscv_iir_lattice_init_q15 (C++ function),
       tion), 440, 442
                                                       427, 429
riscv fir interpolate init q31 (C++ func-
                                               riscv iir lattice init q31 (C++ function),
       tion), 440, 442
                                                       427, 429
                                               riscv_iir_lattice_q15 (C++ function), 427, 430
riscv_fir_interpolate_q15 (C++ function),
        440, 443
                                               riscv_iir_lattice_q31 (C++ function), 427, 430
riscv_fir_interpolate_q31 (C++ function),
                                               riscv_linear_interp_f32 (C++ function), 523,
       440, 443
                                                       524
riscv_fir_lattice_f32 (C++ function), 420, 421
                                               riscv_linear_interp_q15 (C++ function), 523,
riscv_fir_lattice_init_f32 (C++ function),
                                                       524
       420, 421
                                               riscv_linear_interp_q31 (C++ function), 523,
riscv_fir_lattice_init_q15 (C++ function),
                                                       524
        420, 421
                                               riscv_linear_interp_q7 (C++ function), 523,
riscv_fir_lattice_init_q31 (C++ function),
                                                       524
       420, 421
                                               riscv lms f32 (C++ function), 431, 433
riscv_fir_lattice_q15 (C++ function), 420, 422
                                               riscv_lms_init_f32 (C++ function), 431, 433
riscv fir lattice q31(C++function), 420, 422
                                               riscv lms init q15 (C++ function), 431, 433
riscv_fir_q15 (C++ function), 413, 418
                                               riscv_lms_init_q31 (C++ function), 431, 434
riscv_fir_q31 (C++ function), 413, 419
                                               riscv_lms_norm_f32 (C++ function), 435, 437
riscv_fir_q7 (C++ function), 413, 419
                                               riscv_lms_norm_init_f32 (C++ function), 435,
riscv fir sparse f32 (C++ function), 422, 424
riscv_fir_sparse_init_f32 (C++ function),
                                               riscv_lms_norm_init_q15 (C++ function), 435,
       422, 424
                                                       438
riscv_fir_sparse_init_q15 (C++ function),
                                               riscv_lms_norm_init_q31 (C++ function), 435,
       422, 424
                                                       438
                                               riscv_lms_norm_q15 (C++ function), 435, 439
riscv_fir_sparse_init_q31 (C++ function),
       422, 425
                                               riscv_lms_norm_q31 (C++ function), 435, 439
riscv_fir_sparse_init_q7(C++function), 422,
                                               riscv_lms_q15 (C++ function), 431, 434
       425
                                               riscv_lms_q31 (C++ function), 431, 435
riscv_fir_sparse_q15 (C++ function), 423, 426
                                               riscv_mat_add_f32 (C++ function), 444, 445
riscv_fir_sparse_q31 (C++ function), 423, 426
                                               riscv_mat_add_q15 (C++ function), 444, 445
riscv fir sparse q7 (C++ function), 423, 427
                                               riscv mat add q31 (C++ function), 444, 445
riscv_float_to_q15 (C++ function), 518, 519
                                               riscv_mat_cmplx_mult_f32 (C++ function), 446
riscv float to q31 (C++ function), 518,519
                                               riscv_mat_cmplx_mult_q15 (C++ function), 446
riscv_float_to_q7 (C++ function), 518, 519
                                               riscv_mat_cmplx_mult_q31(C++function),446,
riscv_fully_connected_mat_q7_vec_q15
                                                       447
        (C++ function), 575, 577
                                               riscv_mat_init_f32 (C++ function), 447, 448
riscv_fully_connected_mat_q7_vec_q15_optriscv_mat_init_f64 (C++ function), 447, 448
       (C++ function), 575, 577
                                               riscv_mat_init_q15 (C++ function), 447, 448
                                               riscv_mat_init_q31 (C++ function), 447, 448
riscv_fully_connected_q15 (C++ function),
       575, 578
                                               riscv_mat_inverse_f32(C++function), 449, 450
riscv_fully_connected_q15_opt (C++ func-
                                               riscv_mat_inverse_f64 (C++ function), 449, 450
                                               riscv_mat_mult_f32 (C++ function), 451, 452
        tion), 575, 578
riscv_fully_connected_q7 (C++function), 575,
                                               riscv_mat_mult_f64 (C++ function), 451, 452
                                               riscv_mat_mult_fast_q15 (C++ function), 451,
        579
riscv_fully_connected_q7_opt (C++ func-
                                                       452
       tion), 575, 580
                                               riscv_mat_mult_fast_q31 (C++ function), 451,
riscv_fully_connected_s8(C++function), 575,
                                                       453
                                               riscv mat mult q15 (C++ function), 451, 453
riscv_fully_connected_s8_get_buffer_sizeriscv_mat_mult_q31 (C++ function), 451, 454
        (C++ function), 575, 582
                                               riscv mat mult q7 (C++ function), 451, 454
```

| riscv_mat_scale_f32 (C++ function), 455<br>riscv_mat_scale_q15 (C++ function), 455, 456 | riscv_nn_depthwise_conv_nt_t_s8 (C++ function), 589, 591                                    |
|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
| riscv_mat_scale_q31 (C++ function), 455, 456<br>riscv_mat_solve_lower_triangular_f32    | riscv_nn_mat_mul_core_1x_s8 (C++ function), 589,592                                         |
| (C++ function), 449, 450                                                                | riscv_nn_mat_mul_core_4x_s8 (C++ function), 589,593                                         |
| riscv_mat_solve_lower_triangular_f64 (C++function), 449, 450                            | riscv_nn_mat_mult_nt_t_s8 (C++ function),                                                   |
| riscv_mat_solve_upper_triangular_f32 (C++ function), 449, 451                           | 589, 593<br>riscv_nn_mult_q15 (C++ function), 589, 594                                      |
| riscv_mat_solve_upper_triangular_f64                                                    | riscv_nn_mult_q7 ( <i>C</i> ++ <i>function</i> ), 589, 594                                  |
| (C++ function), 449, 451                                                                | riscv_nn_vec_mat_mult_t_s8 (C++ function),                                                  |
| riscv_mat_sub_f32 ( <i>C</i> ++ function), 456, 457                                     | 589, 595                                                                                    |
| riscv_mat_sub_f64 ( <i>C</i> ++ function), 456, 457                                     | riscv_nn_vec_mat_mult_t_svdf_s8 (C++                                                        |
| riscv_mat_sub_q15 ( <i>C</i> ++ function), 456, 457                                     | function), 589, 595                                                                         |
| riscv_mat_sub_q31 ( <i>C</i> ++ <i>function</i> ), 456, 458                             | riscv_offset_f32 ( <i>C</i> ++ <i>function</i> ), 359, 360                                  |
| riscv_mat_trans_f32 (C++ function), 458, 459                                            | riscv_offset_q15 ( <i>C</i> ++ <i>function</i> ), 359, 360                                  |
| riscv_mat_trans_f64 ( <i>C</i> ++ <i>function</i> ), 458, 459                           | $riscv_offset_q31$ (C++ function), 359, 360                                                 |
| $riscv_mat_trans_q15$ (C++ function), 458, 459                                          | $riscv_offset_q7 (C++function), 359, 360$                                                   |
| $riscv_mat_trans_q31$ (C++ function), 458, 459                                          | riscv_pid_init_f32 (C++ function), 497, 499                                                 |
| $riscv_mat_trans_q7$ (C++ function), 458, 460                                           | $riscv_pid_init_q15 (C++ function), 497, 499$                                               |
| riscv_max_f32 (C++ function), 507                                                       | riscv_pid_init_q31 (C++ function), 497, 499                                                 |
| riscv_max_no_idx_f32 (C++ function), 507                                                | riscv_pid_reset_f32 (C++ function), 497, 500                                                |
| riscv_max_pool_s8 (C++ function), 582, 583                                              | riscv_pid_reset_q15 (C++ function), 497, 500                                                |
| $riscv_max_q15$ (C++ function), 507, 508                                                | riscv_pid_reset_q31 ( $C$ ++ function), 497, 500                                            |
| $riscv_max_q31$ (C++ function), 507, 508                                                | riscv_power_f32 (C++ function), 511                                                         |
| riscv_max_q7 (C++ function), 507, 508                                                   | riscv_power_q15 (C++ function), 511                                                         |
| riscv_maxpool_q7_HWC( $C++$ function), 582, 584                                         | $riscv_power_q31$ (C++ function), 511, 512                                                  |
| $riscv_mean_f32$ ( $C++$ function), $508,509$                                           | $riscv_power_q7$ (C++ function), 511, 512                                                   |
| $riscv_mean_q15$ ( $C++$ function), $508$ , $509$                                       | riscv_q15_to_float(C++function),520                                                         |
| $riscv_mean_q31$ ( $C++$ function), $508,509$                                           | riscv_q15_to_q31 ( <i>C</i> ++ <i>function</i> ), 520                                       |
| $riscv_mean_q7$ (C++ function), 508, 509                                                | $riscv_q15_{to_q7}$ (C++ function), 520                                                     |
| $riscv_min_f32$ (C++ function), 510                                                     | riscv_q31_to_float(C++function),521                                                         |
| $riscv_min_q15$ (C++ function), 510                                                     | riscv_q31_to_q15 ( <i>C</i> ++ <i>function</i> ), 521                                       |
| $riscv_min_q31$ (C++ function), 510                                                     | riscv_q31_to_q7 ( <i>C</i> ++ <i>function</i> ), 521                                        |
| $riscv_min_q7 (C++ function), 510, 511$                                                 | $riscv_q7_{to_float}(C++function), 522$                                                     |
| riscv_mult_f32 (C++ function), 357                                                      | $riscv_q7_{to_q15}$ (C++ function), 522                                                     |
| $riscv_mult_q15 (C++ function), 357$                                                    | riscv_q7_to_q15_no_shift(C++function),587                                                   |
| riscv_mult_q31 ( $C$ ++ function), 357                                                  | riscv_q7_to_q15_reordered_no_shift ( $C++$                                                  |
| riscv_mult_q7 ( <i>C</i> ++ <i>function</i> ), 357, 358                                 | function), 587                                                                              |
| riscv_negate_f32 (C++ function), 358                                                    | riscv_q7_to_q15_reordered_with_offset                                                       |
| $riscv_negate_q15 (C++ function), 358$                                                  | (C++ function), 587, 588                                                                    |
| riscv_negate_q31 ( $C++$ function), 358, 359                                            | <pre>riscv_q7_to_q15_with_offset (C++ function),</pre>                                      |
| riscv_negate_q7 ( $C++$ function), 358, 359                                             | 587, 588                                                                                    |
| riscv_nn_accumulate_q7_to_q15 (C++ func-<br>tion), 589, 590                             | riscv_q7_to_q31 ( $C$ ++ function), 522<br>riscv_q7_to_q7_no_shift ( $C$ ++ function), 587, |
| riscv_nn_accumulate_q7_to_q7 (C++ func-<br>tion), 589, 590                              | 588 riscv_q7_to_q7_reordered_no_shift (C++                                                  |
| riscv_nn_activations_direct_q15 (C++                                                    | function), 587, 588                                                                         |
| function), 546                                                                          | riscv_relu6_s8 ( <i>C</i> ++ function), 546, 547                                            |
| riscv_nn_activations_direct_q7 (C++ func-                                               | riscv_relu_q15 ( <i>C</i> ++ function), 546, 547                                            |
| tion), 546                                                                              | riscv_relu_q7 (C++ function), 546, 547                                                      |
| riscv_nn_add_q7 (C++ function), 589, 590                                                | riscv_rfft_1024_fast_init_f32 (C++ func-                                                    |
| riscv_nn_depthwise_conv_nt_t_padded_s8                                                  | tion), 490                                                                                  |
| (C++ function), 589, 590                                                                | <i>"</i>                                                                                    |

```
riscv_rfft_1024_fast_init_f64 (C++ func- riscv_sin_f32 (C++ function), 368
        tion), 487, 492
                                                riscv\_sin\_q15 (C++ function), 368
riscv_rfft_128_fast_init_f32 (C++ func-
                                                riscv sin q31 (C++ function), 368
                                                riscv_softmax_q15 (C++ function), 585
       tion), 490
riscv_rfft_128_fast_init_f64 (C++ func-
                                                riscv_softmax_q7 (C++ function), 585
                                                riscv softmax s8 (C++ function), 585, 586
       tion), 487, 491
                                                riscv softmax u8 (C++ function), 585, 586
riscv rfft 2048 fast init f32 (C++ func-
                                                riscv_softmax_with_batch_q7 (C++ function),
        tion), 490
riscv_rfft_2048_fast_init_f64 (C++ func-
                                                        585, 586
                                                riscv_sqrt_q15 (C++ function), 365, 366
        tion), 487, 492
\verb|riscv_rfft_256_fast_init_f32| (\textit{C++} \textit{func-}
                                                riscv_sqrt_q31 (C++ function), 365, 366
                                                riscv_std_f32 (C++ function), 514
       tion), 490
riscv_rfft_256_fast_init_f64 (C++ func-
                                                riscv_std_q15 (C++ function), 514
                                                riscv_std_q31 (C++ function), 514
       tion), 487, 491
riscv_rfft_32_fast_init_f32 (C++ function),
                                                riscv_sub_f32 (C++ function), 364
        489
                                                riscv_sub_q15 (C++ function), 364
                                                riscv_sub_q31 (C++ function), 364
riscv_rfft_32_fast_init_f64 (C++ function),
       487, 491
                                                riscv sub q7 (C++ function), 364, 365
riscv_rfft_4096_fast_init_f32 (C++ func-
                                                riscv_var_f32 (C++ function), 515
       tion), 491
                                                riscv var q15 (C++ function), 515
riscv_rfft_4096_fast_init_f64 (C++ func-
                                                riscv_var_q31 (C++ function), 515, 516
       tion), 487, 492
                                                riscv vsgrt f32 (C++ function), 365, 366
                                                riscv_vsqrt_q15 (C++ function), 365, 366
riscv_rfft_512_fast_init_f32 (C++ func-
                                                riscv_vsqrt_q31 (C++ function), 365, 366
       tion), 490
riscv_rfft_512_fast_init_f64 (C++ func-
                                                riscvBitRevTable (C++ member), 461, 462
       tion), 487, 492
                                                rv_csr_t (C++ type), 81
riscv_rfft_64_fast_init_f32 (C++ function),
                                                rv_fpu_t (C++ type), 323
                                                S
riscv_rfft_64_fast_init_f64 (C++ function),
       487, 491
                                                SAVE_FPU_CONTEXT (C macro), 322
riscv_rfft_f32 (C++ function), 486, 489
                                                SAVE_IRQ_CSR_CONTEXT (C macro), 308, 309
riscv_rfft_fast_f32 (C++ function), 487, 489
                                                SIP_SSIP (C macro), 78
riscv_rfft_fast_f64 (C++ function), 487, 489
                                                SIP_STIP (C macro), 78
riscv_rfft_fast_init_f32 (C++function), 487,
                                                SLEEPVALUE_SLEEPVALUE (C macro), 76
       491
                                                SSTATUS32_SD (C macro), 74
riscv rfft fast init f64 (C++ function), 487,
                                                SSTATUS64_SD (C macro), 74
       492
                                                SSTATUS_FS (C macro), 74
riscv_rfft_init_f32 (C++ function), 487, 493
                                                SSTATUS_PUM (C macro), 74
riscv_rfft_init_q15 (C++ function), 487, 493
                                                SSTATUS SIE (C macro), 74
riscv_rfft_init_q31 (C++ function), 487, 494
                                                SSTATUS_SPIE (C macro), 74
riscv rfft q15 (C++ function), 487, 494
                                                SSTATUS SPP (C macro), 74
riscv rfft q31 (C++ function), 487, 495
                                                SSTATUS_UIE (C macro), 74
riscv rms f32 (C++ function), 512, 513
                                                SSTATUS UPIE (C macro), 74
riscv_rms_q15 (C++ function), 512, 513
                                                SSTATUS XS (C macro), 74
riscv_rms_q31 (C++ function), 512, 513
                                                system default exception handler
                                                                                          (C++
riscv_scale_f32 (C++ function), 361
                                                        function), 344
riscv_scale_q15 (C++ function), 361
                                                SystemCoreClock (C++ member), 343
riscv_scale_q31 (C++ function), 361, 362
                                                SystemCoreClockUpdate (C++ function), 343
riscv_scale_q7 (C++ function), 361, 362
                                                SystemExceptionHandlers (C++ member), 346
riscv_shift_q15 (C++ function), 362, 363
                                                SystemInit (C++ function), 343
riscv_shift_q31 (C++ function), 362, 363
                                                SysTimer (C macro), 89
riscv_shift_q7 (C++ function), 362, 363
                                                SysTimer_BASE (C macro), 89
riscv_sin_cos_f32 (C++ function), 506
                                                SysTimer_MSFRST_KEY(C macro), 89
riscv_sin_cos_q31 (C++ function), 506
                                                SysTimer_MSFTRST_Msk (C macro), 89
```

```
SysTimer_MSIP_MSIP_Msk (C macro), 89
                                              twiddleCoefF64 256 (C++ member), 461, 462
SysTimer_MSIP_MSIP_Pos(C macro), 89
                                              twiddleCoefF64_32 (C++ member), 461, 462
SysTimer MSIP Msk (C macro), 89
                                              twiddleCoefF64 4096 (C++ member), 461, 463
                                              twiddleCoefF64_512 (C++ member), 461, 463
SysTimer_MTIMECTL_CLKSRC_Msk (C macro), 89
SysTimer MTIMECTL CLKSRC Pos (C macro), 89
                                              twiddleCoefF64 64 (C++ member), 461, 462
SysTimer MTIMECTL CMPCLREN Msk (C macro),
                                              TXEVT TXEVT (C macro), 76
SysTimer_MTIMECTL_CMPCLREN_Pos (C macro),
                                              UCODE_OV (C macro), 76
SysTimer_MTIMECTL_Msk (C macro), 89
                                              USE_INTRINSIC (C macro), 576
SysTimer_MTIMECTL_TIMESTOP_Msk (C macro),
SysTimer_MTIMECTL_TIMESTOP_Pos (C macro),
                                              VM_MBARE (C macro), 78
       89
                                              VM MBB (C macro), 78
SysTimer_MTIMER_Msk (C macro), 89
                                              VM_MBBID (C macro), 78
SysTimer_MTIMERCMP_Msk (C macro), 89
                                              VM_SV32 (C macro), 78
SysTimer_Type (C++ struct), 89
                                              VM_SV39 (C macro), 78
                                              VM_SV48 (C macro), 78
                                              W
T_UINT16_READ(C++ member), 63
T_UINT16_WRITE(C++ member), 63
                                              Weights_128 (C++ member), 479, 480
T_UINT32_READ(C++ member), 63
                                              Weights_2048 (C++ member), 479, 480
T_UINT32_WRITE(C++ member), 63
                                              Weights_512 (C++ member), 479, 480
twiddleCoef_1024 (C++ member), 461, 464
                                              Weights_8192 (C++ member), 479, 480
twiddleCoef_1024_q15 (C++ member), 462, 467
                                              WeightsQ31_{128} (C++ member), 479, 480
twiddleCoef_1024_q31 (C++ member), 461, 465
                                              WeightsQ31\_2048 (C++ member), 479, 480
twiddleCoef_128 (C++ member), 461, 464
                                              WeightsQ31_512 (C++ member), 479, 480
twiddleCoef 128 q15 (C++ member), 461, 466
                                              WeightsQ31_8192 (C++ member), 479, 480
twiddleCoef_128_q31 (C++ member), 461, 465
                                              WFE WFE (C macro), 76
twiddleCoef_16 (C++ member), 461, 463
                                              WFI_SleepMode_Type (C++enum), 90, 92
twiddleCoef_16_q15 (C++ member), 461, 466
                                              WFI SleepMode Type::WFI DEEP SLEEP (C++
twiddleCoef_16_q31 (C++ member), 461, 464
                                                      enumerator), 90, 92
twiddleCoef 2048 (C++ member), 461, 464
                                              WFI_SleepMode_Type::WFI_SHALLOW_SLEEP
twiddleCoef_2048_q15 (C++ member), 462, 467
                                                      (C++enumerator), 90, 92
twiddleCoef_2048_q31 (C++ member), 461, 466
twiddleCoef_256 (C++ member), 461, 464
                                              X
twiddleCoef_256_q15 (C++ member), 461, 467
                                              XIP, 603
twiddleCoef_256_q31 (C++ member), 461, 465
twiddleCoef_32 (C++ member), 461, 463
twiddleCoef_32_q15 (C++ member), 461, 466
twiddleCoef_32_q31 (C++ member), 461, 465
twiddleCoef_4096 (C++ member), 461, 464
twiddleCoef_4096_q15 (C++ member), 462, 467
twiddleCoef_4096_q31 (C++ member), 461, 466
twiddleCoef 512 (C++ member), 461, 464
twiddleCoef_512_q15 (C++ member), 462, 467
twiddleCoef 512 q31 (C++ member), 461, 465
twiddleCoef_64 (C++ member), 461, 463
twiddleCoef_64_q15 (C++ member), 461, 466
twiddleCoef 64 q31 (C++ member), 461, 465
twiddleCoefF64 1024 (C++ member), 461, 463
twiddleCoefF64 128 (C++ member), 461, 462
twiddleCoefF64_16 (C++ member), 461, 462
twiddleCoefF64_2048 (C++ member), 461, 463
```