

# Imperas RISCV Custom Instruction Flow Application Note

# Imperas Software Limited

Imperas Buildings, North Weston, Thame, Oxfordshire, OX9 2HA, UK docs@imperas.com

| Author:     | Imperas Software Limited                                   |
|-------------|------------------------------------------------------------|
| Version:    | 1.0                                                        |
| Filename:   | Imperas_RISCV_Custom_Instruction_Flow_Application_Note.doc |
| Project:    | RISC-V Custom Instruction Flow Application Note            |
| Last Saved: | August 7, 2020                                             |
| Keywords:   | OVP RISCV Custom Instruction Extension                     |

# **Copyright Notice**

Copyright © 2020 Imperas Software Limited All rights reserved. This software and documentation contain information that is the property of Imperas Software Limited. The software and documentation are furnished under a license agreement and may be used or copied only in accordance with the terms of the license agreement. No part of the software and documentation may be reproduced, transmitted, or translated, in any form or by any means, electronic, mechanical, manual, optical, or otherwise, without prior written permission of Imperas Software Limited, or as expressly provided by the license agreement.

### Right to Copy Documentation

The license agreement with Imperas permits licensee to make copies of the documentation for its internal use only. Each copy shall include all copyrights, trademarks, service marks, and proprietary rights notices, if any.

#### **Destination Control Statement**

All technical data contained in this publication is subject to the export control laws of the United States of America. Disclosure to nationals of other countries contrary to United States law is prohibited. It is the reader's responsibility to determine the applicable regulations and to comply with them.

#### Disclaimer

IMPERAS SOFTWARE LIMITED, AND ITS LICENSORS MAKE NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

## **Table of Contents**

| 1   | Preface                                                               | 5  |
|-----|-----------------------------------------------------------------------|----|
|     | 1.1 Related Documents                                                 | 5  |
| 2   | Background                                                            | 6  |
| 3   | Example Description                                                   | 7  |
|     | 3.1 Virtual Platform                                                  | 7  |
|     | 3.2 Application Software                                              | 7  |
|     | 3.3 Execution Control                                                 | 8  |
| 4   | Analyzing an Application Program                                      | 10 |
|     | 4.1 Enabling Cycle Approximate Simulation                             | 10 |
|     | 4.2 Basic Block Profiling Application Execution                       | 11 |
|     | 4.3 Profiling Application Execution                                   | 13 |
| 5   | Creating Custom Instructions                                          | 16 |
|     | 5.1 Defining a custom instruction                                     | 16 |
|     | 5.1.1 Instruction Decodes                                             | 16 |
|     | 5.1.2 Instruction Behavior                                            | 17 |
|     | 5.1.2.1 Using a C algorithm function                                  | 18 |
|     | 5.1.2.2 Using VMI Morph Time Functions                                | 18 |
|     | 5.2 Including custom instructions into C applications                 | 19 |
| 6   | Adding Custom Instructions to a Timing Library                        | 22 |
|     | 6.1 Timing Library Extension                                          | 22 |
|     | 6.1.1 Specify Delays in Timing Library Extension                      | 22 |
|     | 6.1.2 Adding Instruction Additional Cycles                            | 23 |
|     | 6.2 Example                                                           | 24 |
| 7   | Analyzing an Application program (that includes a custom instruction) | 26 |
| 8   | Tracing and Debug Support                                             | 28 |
|     | 8.1 Extension Library disassembly callback                            |    |
|     | 8.2 Instruction Tracing                                               |    |
|     | 8.3 Debug                                                             |    |
| 9   |                                                                       |    |
|     | 9.1 Definition in Extension Library                                   |    |
|     | 9.2 Generating Documentation                                          |    |
|     | 9.3 Example Documentation Pages                                       |    |
| 1(  | Analyze Custom Instruction Implementation and Test                    |    |
|     | 10.1 Instruction Coverage                                             |    |
|     | 10.2 Instruction Profiling                                            |    |
|     | 10.3 Code Line Coverage                                               |    |
|     | 10.3.1 On a Linux host                                                |    |
|     | 10.3.2 On a Windows host                                              |    |
|     | 10.3.3 Coverage Data Output                                           |    |
| 1 1 | $\mathcal{E}$                                                         |    |
|     | 11.1 Creating a New Processor Model                                   |    |
|     | 11.1.1 Local VLNV Library                                             |    |
|     | 11.1.2 Linking Existing Model Code                                    |    |
|     | 11.1.3 Building the New Model                                         |    |
|     | 11.1.4 Creating a New Configuration                                   | 46 |

| Imperas RISCV Custom Instruction Flow Applica | ation Note |
|-----------------------------------------------|------------|
|-----------------------------------------------|------------|

© 2020 Imperas Software Limited www.imperas.com/www.OVPworld.org Page 4 of 47

.

## 1 Preface

This document provides a user, who is intending to add a custom instruction to a processor modeled using the OVP APIs, a flow through the stages from the analysis of an application to the optimization of the extension instruction implementation.

## 1.1 Related Documents

The reader should be familiar with Open Virtual Platforms (OVP) VMI modeling APIs as described in the following OVP documentation:

- OVP VMI Morph Time Reference Guide
- OVP VMI Run Time Reference Guide

 $@\ 2020\ Imperas\ Software\ Limited\ \ www.imperas.com/www.OVPworld.org\ \ Page\ 5\ of\ 47$ 

## 2 Background

The OVP Fast Processor models are executed using an Instruction Accurate simulator. An instruction accurate simulator approximates the execution of an instruction to a single unit of time. The unit of time is based upon the processor MIPS (Millions of Instructions Per Second) rate.

An OVP Fast Processor model will contain the implementation of all the instructions defined by the processor ISA.

The RISC-V processor has several defined decode spaces, for example custom0, custom1 etc. into which new custom instructions can be added.

The OVP Fast processor models can be extended without modification to the pre-compiled and verified base processor model source code using one or more extension libraries. An extension library can be loaded as part of a virtual platform definition in addition to the base processor model and can provide decode and implementation of behavior for the instructions as well as additional registers.

Including more timing information for the instruction execution provides a cycle approximate simulation. This provides a better approximation to the time of the execution of an application on the actual hardware. The additional timing information is loaded into the virtual platform as an extension library so there is no change to the functional behavior provided by the processor model.

This application note will go through the complete flow of functional validation of the application, extension of the processor with custom instructions, analysis of the application execution and optimization of the custom instruction implementation and its documentation.



# 3 Example Description

This section introduces the example that is found in an OVP or Imperas product installation at

 $IMPERAS\_HOME/Examples/Models/Processor/FeatureUsage/RISCV\_CustomInstructionFlow$ 

The example is also shown in a video that can be found on the OVP World website from the *Demos & Videos* link and titled *RISC-V Custom Instruction Design and Verification Flow* under the section 'Advanced Topic Videos'.

The example shows the addition of custom instructions into the RISC-V processor *custom1* decode space.

## 3.1 Virtual Platform

The virtual platform is created using the ISS<sup>1</sup>, which instances a RISCV-V processor configured as an RV32I, with M extension and memory which provides the hardware definition shown in the following diagram.







**Bus Physical Connection** 

Virtual Platform Address Mapping

## 3.2 Application Software

The application is based upon the chacha20 encryption algorithm; it takes data from an initialized array and preforms encryption upon it.

In the example there are two implementations, the first provides a C implementation of the algorithm and the second modifies this to use in line assembly to add custom instructions to

.

<sup>&</sup>lt;sup>1</sup> The ISS is a virtual platform definition supporting the standard simulator command line argument and also additional specific command line arguments to define the type and number of processors to be instanced and any memory regions shared between the processors.

<sup>© 2020</sup> Imperas Software Limited www.imperas.com/www.OVPworld.org Page 7 of 47

optimize the applications implementation. The application reads test data from the array and a function *processWord* is called in both implementations to process the data word.

Initial C implementation inner loop, test\_lib\_c.c

```
#define NOINLINE __attribute__((noinline))
static int qrN_c(unsigned int a, unsigned int b, unsigned int rotl) {
   return ((a ^ b) << rotl) | ((a ^ b) >> (32-rotl));
NOINLINE int qrl_c(unsigned int a, unsigned int b) {
   return qrN_c(a, b, 16);
NOINLINE int qr2_c(unsigned int a, unsigned int b) {
   return qrN_c(a, b, 12);
NOINLINE int gr3_c(unsigned int a, unsigned int b) {
   return qrN_c(a, b, 8);
NOINLINE int qr4_c(unsigned int a, unsigned int b) {
   return qrN_c(a, b, 7);
NOINLINE unsigned int processWord(unsigned int res, unsigned int word) {
   res = qr1_c(res, word);
   res = qr2_c(res, word);
   res = qr3_c(res, word);
   res = qr4_c(res, word);
   res = grl_c(res, word);
   res = qr2_c(res, word);
   res = qr3_c(res, word);
   res = qr4_c(res, word);
   return res;
```

Modified implementation inner loop using in line assembly to add *custom1* instructions, *test lib asm.c* 

```
unsigned int processWord(unsigned int input, unsigned int word){
   unsigned int res = input;
   asm __volatile__("mv x10, %0" :: "r"(res));
   asm __volatile__("mv x11, %0" :: "r"(word));
   asm __volatile__(".word 0x00B5050B\n" ::: "x10");
                                                         // QR1
   asm __volatile__(".word 0x00B5150B\n" ::: "x10");
                                                          // QR2
   asm __volatile__(".word 0x00B5250B\n" ::: "x10");
                                                          // QR3
   asm __volatile__(".word 0x00B5350B\n" ::: "x10");
   asm __volatile__(".word 0x00B5050B\n" ::: "x10");
                                                          // QR1
   asm __volatile__(".word 0x00B5150B\n" ::: "x10");
                                                          // QR2
   asm __volatile__(".word 0x00B5250B\n" ::: "x10");
                                                         // QR3
   asm __volatile__(".word 0x00B5350B\n" ::: "x10");
                                                          // QR4
   asm __volatile__("mv %0,x10" : "=r"(res));
   return res;
```

## 3.3 Execution Control

The execution of this example is controlled from a single script RUN\_STAGES.sh that can be executed on a Windows host in an MSYS shell or on a Linux host in a terminal.

When executed the script provides a choice of the run stage to execute, these are listed below:

#### Characterization of C Application

- 1: Instruction Accurate simulation
- 2: Cycle Approximate simulation
- 3: Basic Block Profile
- 4: Function Profile

#### **Develop New Custom Instruction**

- 5: Add custom instructions to application
- 6: Add custom instructions to model
- 7: Cycle Approximate simulation

#### Characterize New instructions in Application

- 8: Function Profile
- 9: Trace custom instructions
- 10: Debug custom instructions

#### Optimize New instruction Implementation, Analyze Test Coverage and Document

- 15: Documenting custom instructions (PDF Linux Only)
- 16: Documenting custom instructions (TEX)
- 20: Custom Instruction Coverage
- 21: Custom Instruction Called C Function Profile
- 22: Custom Instruction Implementation Profile
- 25: Model Source Line Coverage
- 99: exit (without running)

The selected option controls the construction of the command line, for the ISS, which is written into a script file *lastRun.sh* and executed. This script can be used to re-run the last simulation.

© 2020 Imperas Software Limited www.imperas.com/www.OVPworld.org Page 9 of 47

## 4 Analyzing an Application Program

An algorithm implemented as C code can be tested using the Imperas/OVP ISS executable. This allows a virtual platform containing a processor and memory to be easily realized. It also provides the *semihosting* of host features i.e. access to the host file system, stdin, stdout etc. which greatly simplifies testing.

Initially we can run the application using Instruction Accurate simulation to prove the functional correctness.

```
$ ./RUN_STAGES.sh
... snip ...
Please Select an Option: 1
Paused .. press key start simulation
IMPERAS Instruction Set Simulator (ISS)
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/test_c.RISCV32.elf'
Info (OR_PH) Program Headers:
Flags Align
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/exception.RISCV32.elf'
Info (OR_PH) Program Headers:
RES = 6E8D0F5A
Info
Info -----
Info CPU 'iss/cpu0' STATISTICS
Info Type : riscv (RV32I+M)
Info Nominal MIPS : 100
Info Final program counter: 0x100ac
Info Simulated instructions: 316,708,829
    Simulated MIPS : run too short for meaningful result
Info -----
Info -
Info SIMULATION TIME STATISTICS
Info Simulated time : 3.17 seconds
Info User time : 0.30 seconds
Info System time : 0.00 seconds
Info Elapsed time : 0.30 seconds
Info Real time ratio : 10.44x faster
Use script lastRun.sh to re-run with current settings.
```

# 4.1 Enabling Cycle Approximate Simulation

Where a timing library has been created for the specific processor configuration and representative implementation technology, we can enable cycle approximate simulation to give more accurate execution times for the application execution.

Cycle approximate simulation is enabled by loading an extension library that, amongst other things, monitors the instruction stream and memory accesses and provides information back to the simulator so that it can modify the instruction execution rate accordingly.

```
$ ./RUN_STAGES.sh
... snip ...
Please Select an Option: 2
```

```
IMPERAS Instruction Set Simulator (ISS)
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/test_c.RISCV32.elf'
Info (OR_PH) Program Headers:
Flags Align
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/exception.RISCV32.elf'
Info (OR_PH) Program Headers:
Info (CMD_CC) calling 'iss/cpu0/exTT/cpucycles'
Info (CMD_CC) calling 'iss/cpu0/exTT/memorycycles'
Info (CPUEST_MEM) exTT: memorycycles: Memory access cycle penalties for address range
0x28000:0x28FFF are as follows:
Info (CPUEST_MEM) exTT: memorycycles: load : 2
Info (CPUEST_MEM) exTT: memorycycles:    store : 1
                                              Memory delays added
Info (CPUEST_MEM) exTT: memorycycles: fetch : 0
RES = 6E8D0F5A
Info (CPUEST_RSLT) Estimated execution time 5.15 seconds, clock cycles 515,364,836
                                                         Cycle Approximation Summary
Info CPU 'iss/cpu0' STATISTICS
Info Type : riscv (RV32I+M)
Info Nominal MIPS : 100
Info Final program counter : 0x100ac
Info Simulated instructions: 316,708,829
Info Simulated MIPS : 107.4
Info -----
Tnfo
Info -----
Info SIMULATION TIME STATISTICS
Info Simulated time : 5.15 seconds
Info User time : 2.95 seconds
Info System time : 0.00 seconds
Info Elapsed time : 2.98 seconds
Info Real time ratio : 1.73x faster
                        : 1.73x faster
Info ---
Use script lastRun.sh to re-run with current settings
```

## 4.2 Basic Block Profiling Application Execution

A basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. The basic block profile tool can be used to find the most heavily executed instruction sequences in an application. This information can then be used to guide the implementation of custom instructions, compiler optimizations or other purposes.

```
$ ./RUN_STAGES.sh
... snip ...
Please Select an Option: 3
IMPERAS Instruction Set Simulator (ISS)
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/test_c.RISCV32.elf'
Info (OR_PH) Program Headers:
Info (OR_PH) Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align
Info (OR_PD) LOAD 0x00000000 0x00010000 0x00017338 0x00017338 R-E 1000
Info (OR_PD) LOAD 0x00017338 0x00028338 0x000009c0 0x00000040 RW- 1000
                                                                                            Flags Align
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/exception.RISCV32.elf'
Info (OR_PH) Program Headers:
Info (OR_PH) Type Offset
                                         VirtAddr PhysAddr FileSiz MemSiz
                             0x00001000 0x00000000 0x00000000 0x0000000c 0x0000000c R-E 1000
Info (OR_PD) LOAD
Info (CMD_CC) calling 'iss/cpu0/exTT/cpucycles'
Info (CPUEST_CMD) exTT: cpucycles on: time stretch enabled
Info (CMD_CC) calling 'iss/cpu0/exTT/memorycycles'
Info (CPUEST_MEM) exTT: memorycycles: Memory access cycle penalties for address range
0x28000:0x28FFF are as follows:
```

```
Info (CPUEST_MEM) exTT: memorycycles: load : 2
Info (CPUEST_MEM) exTT: memorycycles:    store : 1
Info (CPUEST_MEM) exTT: memorycycles:    fetch : 0
                                                               Basic Block Profile output file written
Info (BBP_WTD) iss/cpu0: Writing basic block profile data to file 'bbProfile_c.txt'.
Info (CPUEST_RSLT) Estimated execution time 5.15 seconds, clock cycles 515,364,836
Info ------
Info CPU 'iss/cpu0' STATISTICS
Info Final program counter : 0x100ac
Info Simulated instructions: 316,708,829
Info Simulated MIPS : 31.0
Info -----
Info
Tnfo ------
Info SIMULATION TIME STATISTICS
Info Simulated time : 5.15 seconds
Info User time : 10.22 seconds
Info User time
Info System time
                           : 0.00 seconds
Info System time : 0.00 seconds
Info Elapsed time : 10.23 seconds
```

The generated profile can be viewed using any txt file viewer; the example uses the Imperas Eclipse GUI, eGui.

Once the file is opened (egui.exe -open bbProfile\_c.txt) in eGui you can see the most used basic blocks are the core algorithm functions of  $qr1_c$ ,  $qr2_c$ ,  $qr3_c$  and  $qr4_c$ .

```
- -
■ bbProfile_c.txt ⋈
   BB@0x000102ac:0x000102d0 executed 5758080 times:
        0x000102ac(qr1 c+0x0): ff010113 addi
                                                        SD.SD.-16
        0x000102b0(qr1_c+0x4): 00812623 sw
        0x000102b4(gr1 c+0x8): 01010413 addi
                                                        50.sp.16
        0x000102b8(qr1_c+0x12): 00b545b3 xor
       0x000102bc(qr1_c+0x16): 01059513 slli
0x000102c0(qr1_c+0x20): 0105d593 srli
                                                          a0,a1,0x10
                                                          a1,a1,0x10
                                                          a0,a1,a0
        0x000102c4(qr1_c+0x24): 00a5e533 or
        0x000102c8(gr1 c+0x28): 00c12403 lw
                                                          s0,12(sp)
        0x000102cc(qr1_c+0x32): 01010113 addi
        0x000102d0(qr1_c+0x36): 00008067 ret
   BB@0x000102d4:0x000102f8 executed 5758080 times:
        0x000102d4(qr2_c+0x0): ff010113 addi
                                                        sp,sp,-16
        0x000102d8(qr2 c+0x4): 00812623 sw
                                                         s0,12(sp)
        0x000102dc(qr2_c+0x8): 01010413 addi
                                                        s0, sp, 16
       0x000102e0(qr2_c+0x12): 00b545b3 xor
0x000102e4(qr2_c+0x16): 00c59513 slli
                                                          a1,a0,a1
                                                          a0,a1,0xc
       0x000102e8(qr2_c+0x20): 0145d593 srli
0x000102ec(qr2_c+0x24): 00a5e533 or
                                                          a1,a1,0x14
                                                          a0,a1,a0
       0x000102f0(qr2_c+0x28): 00c12403 lw
0x000102f4(qr2_c+0x32): 01010113 addi
                                                          s0,12(sp)
                                                         sp, sp, 16
        0x000102f8(qr2_c+0x36): 00008067 ret
   BB@0x000102fc:0x00010320 executed 5758080 times:
        0x000102fc(qr3_c+0x0): ff010113 addi
                                                        sp.sp,-16
        0x00010300(qr3_c+0x4): 00812623 sw
       0x00010304(qr3_c+0x8): 01010413 addi
0x00010308(qr3_c+0x12): 00b545b3 xor
                                                        s0, sp, 16
                                                         a1,a0,a1
       0x0001030c(qr3_c+0x16): 00859513 slli
0x00010310(qr3_c+0x20): 0185d593 srli
                                                          a0,a1,0x8
                                                          a1,a1,0x18
       0x00010314(qr3_c+0x24): 00a5e533 or
0x00010318(qr3_c+0x28): 00c12403 lw
                                                          a0,a1,a0
                                                          s0,12(sp)
        0x0001031c(qr3_c+0x32): 01010113 addi
                                                          sp,sp,16
        0x00010320(qr3_c+0x36): 00008067 ret
   BB@0x00010324:0x00010348 executed 5758080 times:
        0x00010324(qr4_c+0x0): ff010113 addi
                                                        sp,sp,-16
       0x00010328(qr4_c+0x4): 00812623 sw
0x0001032c(qr4_c+0x8): 01010413 addi
                                                        s0,12(sp)
```

# 4.3 Profiling Application Execution

The Imperas Verification, Analysis and Profiling (VAP) tools are provided as part of the Imperas professional product and include the ability to analyze the dynamic function profile of an application executing a data set without any changes to the application.

See the VAP Tools User Guide for additional information.

This profile tool used in conjunction with cycle approximate simulation can be used to find the 'hot spots' in an application. This information can then be used to guide the implementation of custom instructions. This data can also be used to analyze the benefits of custom instructions by comparing before and after profiles.

```
$ ./RUN_STAGES.sh
... snip ...
Please Select an Option: 4
IMPERAS Instruction Set Simulator (ISS)
CpuManagerMulti (64-Bit) v20190225.0 Open Virtual Platform simulator from www.IMPERAS.com.
Copyright (c) 2005-2019 Imperas Software Ltd. Contains Imperas Proprietary Information.
Licensed Software, All Rights Reserved.
Visit www.IMPERAS.com for multicore debug, verification and analysis solutions.
CpuManagerMulti started: Tue Feb 26 15:11:52 2019
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/test_c.RISCV32.elf'
Info (OR_PH) Program Headers:
Info (OR_PH) Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align
Info (OR_PD) LOAD 0x00000000 0x00010000 0x00017338 0x00017338 R-E 1000
Info (OR_PD) LOAD 0x00017338 0x00028338 0x000009c0 0x00000040 RW- 1000
                                                                                          Flags Align
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/exception.RISCV32.elf'
Info (OR_PH) Program Headers:
Info (OR_PH) Type
                             Offset
                                          VirtAddr PhysAddr FileSiz
                                                                              MemSiz
                                                                                        Flags Align
Info (OR_PD) LOAD
                             0x00001000 0x00000000 0x00000000 0x0000000c 0x0000000c R-E 1000
Info (CMD_CC) calling 'iss/cpu0/exTT/cpucycles'
Info (CPUEST_CMD) exTT: cpucycles on: time stretch enabled Function Profiling enabled
Info (CMD_CC) calling 'iss/cpu0/exTT/memorycycles'
Info (CPUEST_MEM) exTT: memorycycles: Memory access cycle penalties for address range
0x28000:0x28FFF are as follows:
Info (CPUEST_MEM) exTT: memorycycles: load : 2
Info (CPUEST_MEM) exTT: memorycycles: store : 1
Info (CPUEST_MEM) exTT: memorycycles: fetch : 0
Info (CMD_CC) calling 'iss/cpu0/vapTools/functionprofile'
                                                                               Profiling output file written
RES = 6E8D0F5A
Info (VAP_TOOLS) iss/cpu0: functionprofile: Writing data file 'iss_cpu0.iprof'
Info (CPUEST_RSLT) Estimated execution time 5.15 seconds, clock cycles 515,364,836
Info --
Info CPU 'iss/cpu0' STATISTICS
Info Type : riscv (RV32I+M)
Info Nominal MIPS : 100
Info Final program counter : 0x100ac
Info Simulated instructions: 316,708,829
Info Simulated MIPS : 23.0
Info -----
Info
Info -
Info SIMULATION TIME STATISTICS
Info Simulated time : 5.15 seconds
Info User time : 13.78 seconds
Info System time : 0.00 seconds
Info Elapsed time : 3114.09 seconds
```

### Imperas RISCV Custom Instruction Flow Application Note

The generated profile can be viewed using the Imperas Eclipse GUI, eGui.

Once the file is opened (egui.exe -open iss\_cpu0.iprof) in eGui you may be required to expand the tab (typically found in the bottom right corner with title seen as 'iProf Fu') and then sort for percentage to show the view below.

The profile information shows that we are spending most of the execution time within the  $qrN\_c$  function that implements the core of the algorithm with almost as much again in the combination of the  $qr1\_c$ ,  $qr2\_c$ ,  $qr3\_c$  and  $qr4\_c$  functions.

 $@\ 2020\ Imperas\ Software\ Limited\ \ www.imperas.com/www.OVPworld.orgPage\ 14\ of\ 47$ 

.

#### Imperas RISCV Custom Instruction Flow Application Note



© 2020 Imperas Software Limited www.imperas.com/www.OVPworld.orgPage 15 of 47

# 5 Creating Custom Instructions

## 5.1 Defining a custom instruction

The processor model behavior and custom instructions are both developed in the same way using the VMI APIs.

Detailed information on how to create instructions using the OVP APIs, including worked examples, is provided in the OVP Processor Modeling Guide with reference to the OVP VMI Morph Time API User Guide. This section will provide a guide to the key aspects of the custom instruction definition used in this example.

#### 5.1.1 Instruction Decodes

A key element for a custom instruction is the decode. This includes the fixed fields defining the instruction class and the fields defining the source and result registers to be used.

This instruction extension library will be used to develop four custom instructions, each uses the same base behavior but applies a different rotation value.

| In the RISC-V ISA these will be R-Type instructions in custom-1 decode space, defined as |
|------------------------------------------------------------------------------------------|
| shown in the following table                                                             |

| Bits    | Bit Value | description                       |  |  |  |  |  |
|---------|-----------|-----------------------------------|--|--|--|--|--|
| 6 - 0   | 00 010 00 | Custom-1 instruction class decode |  |  |  |  |  |
| 11 - 7  | xxxxx     | Identify the result register      |  |  |  |  |  |
| 14 - 12 | 000       | QR1                               |  |  |  |  |  |
|         | 001       | QR2                               |  |  |  |  |  |
|         | 010       | QR3                               |  |  |  |  |  |
|         | 011       | QR4                               |  |  |  |  |  |
|         | 1xx       | Undefined                         |  |  |  |  |  |
| 19 - 15 | xxxxx     | Identify source register 1        |  |  |  |  |  |
| 24 - 20 | xxxxx     | Identify source register 2        |  |  |  |  |  |
| 31 - 25 | 0000000   | Instruction decode                |  |  |  |  |  |

This is used to create an attribute table in which the fixed bits are given their defined value and variable decode bits are indicated.

```
};
```

The decode function uses the base model *fetchInstruction* callback to decode and instruction using the instruction attributes table.

The decode function is then used in the VMIOS\_MORPH\_FN callback to decode the instruction read from the current PC

```
// decode instruction
riscvExtIType type = decode(riscv, object, thisPC, &state.info);
```

#### 5.1.2 Instruction Behavior

The instruction behavior is created in the EXT\_MORPH\_FN callback. Once the instruction is decoded the correct behavior can be generated. This is accomplished using the VMI Morph Time API.

As we see in the code below the common function is used with a rotation value passed as one of the arguments.

```
//
// Dispatch table for instruction translation
//
const static riscvExtMorphAttr dispatchTable[] = {
    [EXT_IT_CHACHA20QR1] = {morph:emitCHACHA20QR, userData:(void *)16},
    [EXT_IT_CHACHA20QR2] = {morph:emitCHACHA20QR, userData:(void *)12},
    [EXT_IT_CHACHA20QR3] = {morph:emitCHACHA20QR, userData:(void *) 8},
    [EXT_IT_CHACHA20QR4] = {morph:emitCHACHA20QR, userData:(void *) 7},
};
```

The dispatch table is called using the external Morph callback

```
// fill translation attributes
state.attrs = &dispatchTable[type];

// translate instruction
riscv->cb.morphExternal(&state, 0, opaque);
```

Setting the opaque pointer to True indicates that this behavior replaces any behavior from the underlying processor model.

© 2020 Imperas Software Limited www.imperas.com/www.OVPworld.orgPage 17 of 47

.

The emitChaCha20 function is shown realized in two different implementations in the following sections.

#### 5.1.2.1 Using a C algorithm function

In the initial application the algorithm was implemented as a C function.

```
static Uns32 qrN_c(Uns32 rs1, Uns32 rs2, Uns32 rot1) {
   return ((rs1 ^ rs2) << rot1) | ((rs1 ^ rs2) >> (32-rot1));
}
```

This can be used directly in the behavior for the instructions in the custom instruction extension by issuing a Morph Time function call.

```
// Emit code implementing CHACHA20QR instruction with rotation passed as
// userData in morph attributes structure
static EXT_MORPH_FN(emitCHACHA20QR) {
   riscvP riscv = state->riscv;
   // get abstract register operands
   riscvRegDesc rd = getRVReg(state, 0);
   riscvRegDesc rs1 = getRVReg(state, 1);
   riscvRegDesc rs2 = getRVReg(state, 2);
   // get VMI registers for abstract operands
   vmiReg rdA = getVMIReg(riscv, rd);
   vmiReg rs1A = getVMIReg(riscv, rs1);
   vmiReg rs2A = getVMIReg(riscv, rs2);
   // emit embedded call to perform operation
   UnsPS rotl = (UnsPS)state->attrs->userData;
   Uns32 bits = 32;
    vmimtArgReg(bits, rs1A);
   vmimtArgReg(bits, rs2A);
   vmimtArqUns32(rot1);
    vmimtCallResult((vmiCallFn)grN_c, bits, rdA);
    // handle extension of result if 64-bit XLEN
   writeRegSize(riscv, rd, bits, True);
```

#### **NOTE**

This will have LOW simulation performance and is NOT the recommended approach when creating instruction behavior. See the next section for implementation of instruction behavior. This can be used to allow a staged development process.

#### **5.1.2.2** Using VMI Morph Time Functions

The recommended approach to obtain the most efficient implementation is to use the VMI Morph Time functions directly to implement the behaviour.

As can be seen in the code for the implementation this function is the same as that shown previously except that instead of calling into the C function to implement the binary operation and rotation, we are now using the VMI MT binary operations to operate on the registers

```
// Emit code implementing CHACHA20QR instruction with rotation passed as
// userData in morph attributes structure
static EXT_MORPH_FN(emitCHACHA20QR) {
   riscvP riscv = state->riscv;
    // get abstract register operands
   riscvRegDesc rd = getRVReg(state, 0);
   riscvRegDesc rs1 = getRVReg(state, 1);
   riscvRegDesc rs2 = getRVReg(state, 2);
   // get VMI registers for abstract operands
   vmiReg rdA = getVMIReg(riscv, rd);
   vmiReg rs1A = getVMIReg(riscv, rs1);
   vmiReg rs2A = getVMIReg(riscv, rs2);
   // implement instruction operation
   UnsPS rotl = (UnsPS)state->attrs->userData;
   Uns32 bits = 32;
   vmimtBinopRRR(bits, vmi_XOR, rdA, rs1A, rs2A, 0);
   vmimtBinopRC(bits, vmi_ROL, rdA, rotl, 0);
    // handle extension of result if 64-bit XLEN
   writeRegSize(riscv, rd, bits, True);
```

# 5.2 Including custom instructions into C applications

The modified C application that is using in line assembly to execute the new instructions can now be executed on the virtual platform simulation that includes the RISC-V processor model and the custom instruction extension

If the modified application is executed without the custom instructions included in the processor.

```
$ ./RUN_STAGES.sh
... snip ...
Please Select an Option: 5
Paused .. press key start simulation

IMPERAS Instruction Set Simulator (ISS)

CpuManagerMulti (64-Bit) v20190225.0 Open Virtual Platform simulator from www.IMPERAS.com.
Copyright (c) 2005-2019 Imperas Software Ltd. Contains Imperas Proprietary Information.
Licensed Software, All Rights Reserved.
Visit www.IMPERAS.com for multicore debug, verification and analysis solutions.

CpuManagerMulti started: Tue Feb 26 16:05:19 2019

Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/test_asm.RISCV32.elf'
Info (OR_PH) Program Headers:
Info (OR_PH) Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align
Info (OR_PD) LOAD 0x00000000 0x00010000 0x00017270 0x00017270 R-E 1000
Info (OR_PD) LOAD 0x00017270 0x00028270 0x00009c0 0x000000440 RW- 1000
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/exception.RISCV32.elf'
```

```
Info (OR_PH) Program Headers:
Flags Align
Warning (RISCV_ADF) CPU 'iss/cpu0' 0x000102c0 00b5050b custom1: Illegal...instruction.....extension...X
(non-standard extensions present) absent or inactive
                                                                     Illegal instruction detected
Info ---
Info CPU 'iss/cpu0' STATISTICS
Info Type : riscv (RV32I+M)
Info Nominal MIPS : 100
Info Final program counter : 0x10238
Info Simulated instructions: 213
Info Simulated MIPS : run too short for meaningful result
Info ------
Tnfo
Info ·
Info SIMULATION TIME STATISTICS
Info Simulated time : 0.00 seconds
Info User time : 0.00 seconds
Info System time : 0.00 seconds
Info Elapsed time : 0.00 seconds
CpuManagerMulti finished: Tue Feb 26 16:05:19 2019
CpuManagerMulti (64-Bit) v20190225.0 Open Virtual Platform simulator from www.IMPERAS.com.
Visit www.IMPERAS.com for multicore debug, verification and analysis solutions.
Use script lastRun.sh to re-run with current settings
```

But when we include the custom instructions into the processor model.

```
$ ./RUN_STAGES.sh
... snip ...
Please Select an Option: 6
Paused .. press key start simulation
IMPERAS Instruction Set Simulator (ISS)
CpuManagerMulti (64-Bit) v20190225.0 Open Virtual Platform simulator from www.IMPERAS.com.
Copyright (c) 2005-2019 Imperas Software Ltd. Contains Imperas Proprietary Information.
Licensed Software, All Rights Reserved.
Visit www.IMPERAS.com for multicore debug, verification and analysis solutions.
CpuManagerMulti started: Tue Feb 26 15:09:44 2019
Info (OP_LPR) Processor iss/cpu0
C:\Imperas\lib\Windows64\ImperasLib\riscv.ovpworld.org\processor\riscv\1.0\model
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/test_asm.RISCV32.elf'
Info (OR_PH) Program Headers:
Info (OR_PH) Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align
Info (OR_PD) LOAD 0x00000000 0x00010000 0x00017270 0x00017270 R-E 1000
Info (OR_PD) LOAD 0x00017270 0x00028270 0x000009c0 0x000000a40 RW- 1000
                                                                                          Flags Align
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/exception.RISCV32.elf'
Info (OR_PH) Program Headers:
Info (OR_PH) Type Offset
                                         VirtAddr PhysAddr FileSiz MemSiz
                             0x00001000 0x00000000 0x00000000 0x0000000c 0x0000000c R-E 1000
Info (OR_PD) LOAD
Info (OP_PEX) Extension iss/cpu0/pk
C:\Imperas\lib\Windows64\ImperasLib\riscv.ovpworld.org\semihosting\pk\1.0\model
Info (OP_PEX) Extension iss/cpu0/exInst
                                                                       instructionExtensionLib
RES = 6E8D0F5A Functionality verified
                                                                               Custom instruction
                                                                               extension library loaded
Info CPU 'iss/cpu0' STATISTICS
```

#### Imperas RISCV Custom Instruction Flow Application Note

## 6 Adding Custom Instructions to a Timing Library

A timing library is used to analyze the instruction execution stream and memory accesses performed by an application and then to use this information to determine additional cycle delays which would be caused on the actual hardware.

The timing library is created as an extension library using the OVP VMI APIs.

The timing library may be used to provide an estimation of the overall application execution or it may feedback the timing information to the simulator which will modify the execution to incorporate the reported instruction cycle delays.

# 6.1 Timing Library Extension

This example includes a simplified timing library that incorporates only instruction delays. The delays can be specified in a file as a pure delay or in the timing library as a delay calculated from previous instructions executed, mode of operation, data values etc.

## 6.1.1 Specify Delays in Timing Library Extension

A specific file *timingToolLib/timingTool.c* contains the information for the RISC-V processor timing. The timing information contained in this file was extracted from the specification documents available for the base processor configuration.

The timing information is grouped into classes of instructions with the same delays. A new class group can be added for the custom instructions that defines the four instruction mnemonics.

The VMIOS\_MORPH\_FN callback is used to obtain the attributes for the instruction and use this to determine the delays, if any, that should be added to the execution.

```
VMIOS_MORPH_FN(rv32CEMorphCB) {
... snip ...
    // get instruction attributes
    octiaAttrP attrs = vmiiaGetAttrs(processor, thisPC, SELECT_ATTRS, True);
... snip ...
    instrClassesE iClass = getInstructionClass(object, thisPC, attrs);

switch (iClass) {
... snip ...
    case IC_custom : {
```

© 2020 Imperas Software Limited www.imperas.com/www.OVPworld.orgPage 22 of 47

.

```
// chacha20qr1-chacha20qr4 group same cycles
                                      // Specify cycles for instruction group
            cycles = 2;
            break;
        default: {
           VMI_ABORT("Invalid instructionCLassE value %d (%s)\n", iClass,
instrClassName(iClass));
           break;
    if (runtimeCB) { // division run-time callback
emitCycleEstimation(processor,object,thisPC,regSource1,regSource2,mduMode,iClass,runtimeCB);
   } else {
       addCycleCount(object, thisPC, cycles, iClass);
                                                           Function to update the cycle count
   // Update previous morph time instruction info
   CEData->prevClassMT = iClass;
   CEData->reqDestMT = reqDest;
    *opaque = False;
    return NULL;
```

This simple timing extension only considers the static delays for the type of instructions being executed.

The timing library uses a common set of *ce* functions that are defined in a standard Imperas library.

```
ceGetProcInfo
ceAddCycles
ceGetDiagnosticLevel
ceGetThisInstructionClass
ceGetThisInstructionData
ceEmitAddCycleCountC
```

The timing can be enabled or disabled and appropriate functions are called to allow updates when this is performed. The interface functions are registered in structure as shown below

```
static ceProcInfo infoRISCV = {
    .constructorCB = riscvConstructor,
    .destructorCB = riscvDestructor,
    .docCB = riscvDoc,
    .enableCB = riscvEnable,
    .morphCB = riscvMorph,
    .disableCB = riscvDisable,
};
```

# 6.1.2 Adding Instruction Additional Cycles

The timing library can be modified to include instruction delays but the core timing functionality also allows for simple instruction delays to be added when there is no dependency on other instructions or on the data value. In this case we use the command *instructiondata* to specify a file to load.

.

The file simply contains the assembler mnemonic for the instruction and the additional delay, as shown in the following output.

```
additional delays
# arbitrary choice for delays
chacha20qr1 2
chacha20qr2 2
chacha20qr3 4
chacha20qr4 4
```

When executed these additional cycle delays are used during the simulation to extend the simulation time.

## 6.2 Example

We can enable cycle approximate simulation to give more accurate execution times for the application execution using the custom instructions.

```
$ ./RUN_STAGES.sh
 .. snip ..
Please Select an Option: 7
Paused .. press key start simulation
IMPERAS Instruction Set Simulator (ISS)
CpuManagerMulti (64-Bit) v20190225.0 Open Virtual Platform simulator from www.IMPERAS.com.
Copyright (c) 2005-2019 Imperas Software Ltd. Contains Imperas Proprietary Information.
Licensed Software, All Rights Reserved.
Visit www.IMPERAS.com for multicore debug, verification and analysis solutions.
CpuManagerMulti started: Tue Feb 26 14:34:55 2019
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/test_asm.RISCV32.elf'
Info (OR_PH) Program Headers:
Info (OR_PH) Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align
Info (OR_PD) LOAD 0x00000000 0x00010000 0x00017270 0x00017270 R-E 1000
Info (OR_PD) LOAD 0x00017270 0x00028270 0x000009c0 0x000000a40 RW- 1000
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/exception.RISCV32.elf'
Info (OR_PH) Program Headers:
Info (OR_PH) Type Offset
                                          VirtAddr PhysAddr
                                                                      FileSiz
                                                                                MemSiz
                                                                                             Flags Align
                              0x00001000 0x00000000 0x00000000 0x0000000c 0x0000000c R-E
Info (OR_PD) LOAD
Info (CMD_CC) calling 'iss/cpu0/exTT/cpucycles'
Info (CPUEST_CMD) exTT: cpucycles on: time stretch enabled
                                                                       Cycle Approximate simulation enabled
Info (CMD_CC) calling 'iss/cpu0/exTT/memorycycles'
Info (CPUEST_MEM) exTT: memorycycles: Memory access cycle penalties for address range
0x28000:0x28FFF are as follows:
Info (CPUEST_MEM) exTT: memorycycles: load : 2
Info (CPUEST_MEM) exTT: memorycycles:    store : 1
Info (CPUEST_MEM) exTT: memorycycles:    fetch : 0
Info (CMD_CC) calling 'iss/cpu0/exTT/instructiondata'
Info (CPUEST_DF1) exTT: instructiondata: Reading instruction data file
'custom_instruction_timing.txt'...
Info (CPUEST_DF3) exTT: instructiondata:
                                                  define
                                                            chacha20qr1 = 2
Info (CPUEST_DF3) exTT: instructiondata: define chacha20qr2 = 2
Info (CPUEST_DF3) exTT: instructiondata: define chacha20qr3 = 4
Info (CPUEST_DF3) exTT: instructiondata: define chacha20qr4 = 4
                                                                                    Instruction delays added
RES = 6E8D0F5A
Info (CPUEST_RSLT) Estimated execution time 1.38 seconds, clock cycles 138.210.542
Info
                                                                            Cycle Approximation Summary
Info CPU 'iss/cpu0' STATISTICS
Info Type
                : riscv (RV32I+M)
```

#### Imperas RISCV Custom Instruction Flow Application Note

© 2020 Imperas Software Limited www.imperas.com/www.OVPworld.orgPage 25 of 47

.

# 7 Analyzing an Application program (that includes a custom instruction)

The same function profile tool executed on the C application in section 4.2 is used to obtain the function profile analysis when custom instructions are used.

```
$ ./RUN_STAGES.sh
... snip ...
Please Select an Option: 8
IMPERAS Instruction Set Simulator (ISS)
Info (CMD_CC) calling 'iss/cpu0/exTT/cpucycles'
Info (CPUEST_CMD) exTT: cpucycles on: time stretch enabled
Info (CMD_CC) calling 'iss/cpu0/exTT/memorycycles'
Info (CPUEST_MEM) exTT: memorycycles: Memory access cycle penalties for address range
0x28000:0x28FFF are as follows:
Info (CPUEST_MEM) exTT: memorycycles:
                                        load : 2
Info (CPUEST_MEM) exTT: memorycycles:
                                        store : 1
Info (CPUEST_MEM) exTT: memorycycles: fetch : 0
Info (CMD_CC) calling 'iss/cpu0/exTT/instructiondata'
Info (CPUEST_DF1) exTT: instructiondata: Reading instruction data file
'custom_instruction_timing.txt'...
Info (CPUEST_DF3) exTT: instructiondata: define chacha20qr1 = 2
Info (CPUEST_DF3) exTT: instructiondata: define chacha20qr2 = 2
Info (CPUEST_DF3) exTT: instructiondata: define chacha20qr3 = 4
Info (CPUEST_DF3) exTT: instructiondata: define chacha20qr4 = 4
Info (CMD_CC) calling 'iss/cpu0/vapTools/functionprofile'
RES = 6E8D0F5A
Info (VAP_TOOLS) iss/cpu0: functionprofile: Writing data file 'iss_cpu0.iprof'
Info (CPUEST_RSLT) Estimated execution time 1.38 seconds, clock cycles 138,210,542
Info
Info ------
Info CPU 'iss/cpu0' STATISTICS
Info Type : riscv (RV32I+M)
Info Nominal MIPS : 100
Info Final program counter : 0x100ac
Info Simulated instructions: 60,474,242
Info Simulated MIPS : 33.7
Info ------
Info
Info ------
Info SIMULATION TIME STATISTICS
Info User time
      System time : 0.00 seconds
Elapsed time : 1.80 seconds
Info System time
CpuManagerMulti finished: Tue Feb 26 16:10:22 2019
CpuManagerMulti (64-Bit) v20190225.0 Open Virtual Platform simulator from www.IMPERAS.com.
Visit www.IMPERAS.com for multicore debug, verification and analysis solutions.
Use script lastRun.sh to re-run with current settings
```

In this execution run there is no longer the appearance of the C algorithm functions we saw previously as this behavior is now performed by the custom instructions.



## 8 Tracing and Debug Support

As well as the custom instructions including the functional behavior they will also appear correctly when instructions are traced or when debugged at the instruction level.

The instruction tracing is defined in the extension library using the disassembly callback

## 8.1 Extension Library disassembly callback

The disassembly callback uses the same instruction decode table but generates a string that represents the current instruction.

```
// Disassembler callback disassembling ChaCha20 instructions
static VMIOS_DISASSEMBLE_FN(doDisass) {
    // decode the instruction to get the type
                         instruction = vmicxtFetch4Byte(processor, thisPC);
   riscvEnhancedInstrType type
                                     = vmidDecode(object->table, instruction);
   if (type != RISCV_EIT_LAST) {
       static char buffer[256];
        // extract instruction fields
       Uns32 rd = RD(instruction);
       Uns32 rs1 = RS1(instruction);
       Uns32 rs2 = RS2(instruction);
       if(type==RISCV_EIT_CHACHA20QR1) {
            sprintf(buffer, "%-8s %s,%s,%s", "chacha20qr1", map[rd], map[rs1], map[rs2]);
            return buffer;
        } else if (type==RISCV_EIT_CHACHA20QR2) {
            sprintf(buffer, "%-8s %s,%s,%s", "chacha20qr2", map[rd], map[rs1], map[rs2]);
            return buffer;
        } else if (type==RISCV_EIT_CHACHA20QR3) {
            sprintf(buffer, "%-8s %s,%s,%s", "chacha20qr3", map[rd], map[rs1], map[rs2]);
            return buffer;
        } else if (type==RISCV_EIT_CHACHA20QR4) {
            sprintf(buffer, "%-8s %s,%s,%s", "chacha20qr4", map[rd], map[rs1], map[rs2]);
           return buffer;
           VMI_ABORT("Invalid decode");
   }
   // instruction not enhanced ChaCha20
   return 0;
```

## 8.2 Instruction Tracing

The extension instruction is implemented in the same manner as instructions in a processor model and so all the same services are provided that can be enabled on the simulator command line or interactively in the debug console.

This example uses the *traceafter* and *tracecount* argument to control tracing of instructions and modified registers over a specific period of execution that incorporates the custom instructions

```
$ ./RUN_STAGES.sh
.. snip ...
Please Select an Option: 9
IMPERAS Instruction Set Simulator (ISS)
Info 1022: 'iss/cpu0', 0x00000000001023c(processData+4c): ff249ae3 bne
                                                                           s1,s2,10230
Info 1023: 'iss/cpu0', 0x00000000010230(processData+40): 0004a583 lw
                                                                             a1,0(s1)
Info al 000062e8 -> 000013fe
Info 1024: 'iss/cpu0', 0x000000000010234(processData+44): 078000ef jal
                                                                              ra,102ac
Info 1025: 'iss/cpu0', 0x0000000000102ac(processWord): ff010113 addi
                                                                           sp,sp,-16
Info sp ffffffd0 -> ffffffc0
Info 1026: 'iss/cpu0', 0x0000000000102b0(processWord+4): 00812623 sw
Info 1027: 'iss/cpu0', 0x0000000000102b4(processWord+8): 01010413 addi
                                                                             s0,12(sp)
                                                                             s0, sp, 16
       s0 fffffff0 -> ffffffd0
Info 1028: 'iss/cpu0', 0x0000000000102b8(processWord+c): 00050513 mv
                                                                             a0,a0
Info 1029: 'iss/cpu0', 0x0000000000102bc(processWord+10): 00058593 mv
Info 1030: 'iss/cpu0', 0x0000000000102c0(processWord+14): chacha20qrl a0,a0,a1
Info a0 b79338c6 -> 2b38b793
Info 1031: 'iss/cpu0', 0x0000000000102c4(processWord+18): chacha20qr2 a0,a0,a1
Info a0 2b38b793 -> 8a46d2b3
Info 1032: 'iss/cpu0', 0x0000000000102c8(processWord+1c): chacha20qr3 a0,a0,a1
Info a0 8a46d2b3 -> 46c14d8a
Info 1033: 'iss/cpu0', 0x00000000000102cc(processWord+20): chacha20qr4 a0,a0,a1
Info a0 46c14d8a -> 60af3a23
Info 1034: 'iss/cpu0', 0x0000000000102d0(processWord+24): chacha20qr1 a0,a0,a1
Info a0 60af3a23 -> 29dd60af
Info 1035: 'iss/cpu0', 0x0000000000102d4(processWord+28): chacha20qr2 a0,a0,a1
Info a0 29dd60af -> d735129d
Info 1036: 'iss/cpu0', 0x0000000000102d8(processWord+2c): chacha20qr3 a0,a0,a1
      a0 d735129d -> 350163d7
Info 1037: 'iss/cpu0', 0x00000000000102dc(processWord+30): chacha20qr4 a0,a0,a1
Info a0 350163d7 -> 80b8149a
Info 1038: 'iss/cpu0', 0x0000000000102e0(processWord+34): 00050513 mv
Info 1039: 'iss/cpu0', 0x0000000000102e4(processWord+38): 00c12403 lw
                                                                              a0,a0
                                                                              s0,12(sp)
Info s0 ffffffd0 -> fffffff0
Info 1040: 'iss/cpu0', 0x0000000000102e8(processWord+3c): 01010113 addi
                                                                              sp,sp,16
      sp ffffffc0 -> ffffffd0
Info 1041: 'iss/cpu0', 0x0000000000102ec(processWord+40): 00008067 ret
Info 1042: 'iss/cpu0', 0x000000000010238(processData+48): 00448493 addi
                                                                              s1,s1,4
Info s1 00025644 -> 00025648
Info 1043: 'iss/cpu0', 0x00000000001023c(processData+4c): ff249ae3 bne
                                                                              s1,s2,10230
Info 1044: 'iss/cpu0', 0x000000000010230(processData+40): 0004a583 lw
                                                                              a1,0(s1)
Info al 000013fe -> 00003bbb
Info 1045: 'iss/cpu0', 0x000000000010234(processData+44): 078000ef jal
                                                                              ra,102ac
Info 1046: 'iss/cpu0', 0x00000000000102ac(processWord): ff010113 addi
                                                                           sp,sp,-16
Info sp ffffffd0 -> ffffffc0
Info 1047: 'iss/cpu0', 0x0000000000102b0(processWord+4): 00812623 sw
Info 1048: 'iss/cpu0', 0x0000000000102b4(processWord+8): 01010413 addi
                                                                             s0,12(sp)
                                                                             s0, sp, 16
Info s0 ffffffff0 -> fffffffd0
Info 1049: 'iss/cpu0', 0x0000000000102b8(processWord+c): 00050513 \mbox{mv}
                                                                             a0,a0
a0 80b8149a -> 2f2180b8
Info 1052: 'iss/cpu0', 0x0000000000102c4(processWord+18): chacha20qr2 a0,a0,a1
Info a0 2f2180b8 -> 1bb032f2
Info 1053: 'iss/cpu0', 0x0000000000102c8(processWord+1c): chacha20qr3 a0,a0,a1
Info a0 1bb032f2 -> b009491b
Info 1054: 'iss/cpu0', 0x0000000000102cc(processWord+20): chacha20qr4 a0,a0,a1
      a0 b009491b -> 04b95058
Info 1055: 'iss/cpu0', 0x00000000000102d0(processWord+24): chacha20qrl a0,a0,a1
Info a0 04b95058 -> 6be304b9
Info 1056: 'iss/cpu0', 0x0000000000102d4(processWord+28): chacha20qr2 a0,a0,a1
Info a0 6be304b9 -> 33f026be
Info 1057: 'iss/cpu0', 0x0000000000102d8(processWord+2c): chacha20qr3 a0,a0,a1
       a0 33f026be -> f01d0533
Info 1058: 'iss/cpu0', 0x0000000000102dc(processWord+30): chacha20qr4 a0,a0,a1
Info a0 f01d0533 -> 0e9f4478
```

```
Info 1059: 'iss/cpu0', 0x0000000000102e0(processWord+34): 00050513 mv
                                                                           a0,a0
Info 1060: 'iss/cpu0', 0x0000000000102e4(processWord+38): 00c12403 lw
                                                                           s0,12(sp)
Info s0 ffffffd0 -> fffffff0
Info 1061: 'iss/cpu0', 0x00000000000102e8(processWord+3c): 01010113 addi
                                                                           sp,sp,16
Info sp ffffffc0 -> ffffffd0
Info 1062: 'iss/cpu0', 0x0000000000102ec(processWord+40): 00008067 ret
Info 1063: 'iss/cpu0', 0x000000000010238(processData+48): 00448493 addi
                                                                           s1,s1,4
Info s1 00025648 -> 0002564c
Info 1064: 'iss/cpu0', 0x000000000001023c(processData+4c): ff249ae3 bne
                                                                           s1,s2,10230
Info 1065: 'iss/cpu0', 0x0000000000010230(processData+40): 0004a583 lw
                                                                           a1,0(s1)
Info al 00003bbb -> 000022a0
RES = 6E8D0F5A
```

As we can see, in the following snippet of the output, the custom instruction modifies register a0 that is identified by the simulator and included in the trace

```
Info 1051: 'iss/cpu0', 0x0000000000102c0(processWord+14): chacha20qr1 a0,a0,a1
Info a0 80b8149a -> 2f2180b8
```

## 8.3 Debug

The application can be debugged using eGui Eclipse and the custom instructions can be seen correctly in the disassembly view.

Starting up the simulation using the *mpdegui* option will start the eGui and automatically connect the debug environment with the running simulator.

```
$ ./RUN_STAGES.sh
Please Select an Option: 10
IMPERAS Instruction Set Simulator (ISS)
... snip ...
Info (GDBT_PORT) Host: GRAHAM-LAPTOP, Port: 63650
Info (DBC_ECL) Starting Eclipse with MPD
Info (GDBT_WAIT) Waiting for remote debugger to connect...
... snip ...
Info (EGUI) egui port number = 62049
Info (GDBT_MPD) Client connected to platform
```

Now in eGui we can debug the application

Use the Debugger Console to set a breakpoint at main and run until this breakpoint



Now we want to get to the *processWord* function, step until you enter this function.

#### Imperas RISCV Custom Instruction Flow Application Note



When we step into this function, we can see the inline assembly that is used to include the custom instructions into the C application.

Before we open the disassembly window, we are going to turn off GDB mode so that the disassembly is generated by the custom instruction extension library. In the Debugger Console set usegdb to zero.

```
set usegdb 0
```

If we now open the disassembly window, we can see the custom instructions. Before we step turn back into GDB mode

In the Debugger Console set usegdb to one.

```
set usegdb 1
```

#### Imperas RISCV Custom Instruction Flow Application Note



# 9 Documenting Custom Instructions

All OVP Fast Processor models include automatically generated documentation. All custom instructions can be fully documented as part of the processor model documentation by adding calls to VMI API documentation functions.

The documentation for these custom instructions is added to provide an overview and a section detailing each instruction

# 9.1 Definition in Extension Library

The documentation is defined by the VMIOS\_DOC\_FN callback. The function may add document sections and then text within those sections

```
// Add documentation for Custom instructions
VMIOS_DOC_FN(extensionDoc) {
                                                                          New Chapter custom
    vmiDocNodeP custom = vmidocAddSection(0, "Instruction Extensions");
    // description
                                                                          Add text into this chapter
    vmidocAddText(
        custom,
         "RISCV processors may add various custom extensions to the basic "
         "RISC-V architecture. "
         "This processor has been extended, using an extension library, "
         "to add several instruction using the Custom0 opcode."
    vmiDocNodeP insts = vmidocAddSection(
                                                                          Add new section into chapter
        custom, "Custom Instructions"
    vmidocAddText(
                                                                          Add text into this section
        insts.
         "This model includes four Chacha20 acceleration instructions "
         "(one for each rotate distance) are added to encode the XOR "
         "and ROTATE parts of the quarter rounds.");
                                                                          Add instruction details
    docChaCha20(insts, "chacha20qr1", "000 (QR1)", "dst = (src1 ^ src2) <<< 16");
    docChaCha20(insts, "chacha20qr2", "001 (QR2)", "dst = (src1 ^ src2) <<< 12");
docChaCha20(insts, "chacha20qr3", "010 (QR3)", "dst = (src1 ^ src2) <<< 8");</pre>
    docChaCha20(insts, "chacha20qr4", "011 (QR4)", "dst = (src1 ^ src2) <<< 7");
    vmidocProcessor(processor, custom);
```

The instructions themselves exist inside one of these sections and can be detailed by the specific fields.

```
//
// Add documentation for ChaCha20 instructions
//
static void docChaCha20(
    vmiDocNodeP insts,
    const char *opcode,
    const char *decode,
    const char *desc
```

## 9.2 Generating Documentation

The documentation can be generated as a TeX file or on Linux the TeX file can be converted to a PDF file. For this example (on Linux) we can generated the PDF as shown below:

This generates the document *OVP\_Model\_Specific\_Information\_riscv\_RV32I+M.pdf* in the pdf directory.

Example pages from the document are shown in the next section.

# 9.3 Example Documentation Pages

The API functions are used with a generation tool to create a pdf document that is based upon a base processor, in this case the RV32I WITH M EXTENSION, with the extension instructions included.

The content page shows the new chapter Instruction Extensions

## Contents

| 1 | Ove  | ervlew 1                                   |  |  |  |  |  |  |  |  |  |  |
|---|------|--------------------------------------------|--|--|--|--|--|--|--|--|--|--|
|   | 1.1  | Description                                |  |  |  |  |  |  |  |  |  |  |
|   | 1.2  | Licensing                                  |  |  |  |  |  |  |  |  |  |  |
|   | 1.3  | Features                                   |  |  |  |  |  |  |  |  |  |  |
|   | 1.4  | Interrupts                                 |  |  |  |  |  |  |  |  |  |  |
|   | 1.5  | Debug Mask                                 |  |  |  |  |  |  |  |  |  |  |
|   | 1.6  | Integration Support                        |  |  |  |  |  |  |  |  |  |  |
|   |      | 1.6.1 CSR Register External Implementation |  |  |  |  |  |  |  |  |  |  |
|   | 1.7  | Limitations 4                              |  |  |  |  |  |  |  |  |  |  |
|   | 1.8  | Verification 4                             |  |  |  |  |  |  |  |  |  |  |
|   | 1.9  | References                                 |  |  |  |  |  |  |  |  |  |  |
|   |      |                                            |  |  |  |  |  |  |  |  |  |  |
| 2 | Inst | Instruction Extensions 5                   |  |  |  |  |  |  |  |  |  |  |
|   | 2.1  | Custom Instructions                        |  |  |  |  |  |  |  |  |  |  |
|   |      | 2.1.1 chacha20qr1                          |  |  |  |  |  |  |  |  |  |  |
|   |      | 2.1.2 chacha20qr2                          |  |  |  |  |  |  |  |  |  |  |
|   |      | 2.1.3 chacha20qr3                          |  |  |  |  |  |  |  |  |  |  |
|   |      | 2.1.4 chacha20qr4                          |  |  |  |  |  |  |  |  |  |  |
| 3 | Cor  | nfiguration 7                              |  |  |  |  |  |  |  |  |  |  |
|   | 3.1  | Location                                   |  |  |  |  |  |  |  |  |  |  |
|   | 3.2  | GDB Path                                   |  |  |  |  |  |  |  |  |  |  |
|   | 3.3  | Semi-Host Library                          |  |  |  |  |  |  |  |  |  |  |
|   | 3.4  | Processor Endian-ness                      |  |  |  |  |  |  |  |  |  |  |
|   | 3.5  | QuantumLeap Support                        |  |  |  |  |  |  |  |  |  |  |
|   | 3.6  | Processor ELF code                         |  |  |  |  |  |  |  |  |  |  |

The chapter contains the overview and the details for the instructions

## Chapter 2

## Instruction Extensions

RISCV processors may add various custom extensions to the basic RISC-V architecture. This processor has been extended, using an extension library, to add several instruction using the Custom0 opcode.

#### 2.1 Custom Instructions

This model includes four Chacha20 acceleration instructions (one for each rotate distance) are added to encode the XOR and ROTATE parts of the quarter rounds.

#### 2.1.1 chacha20qr1

| 31      | 25 | 24  | 20 | 19  | 15 | 14        | 12 | 11 | 7 | 6                  | 0 |
|---------|----|-----|----|-----|----|-----------|----|----|---|--------------------|---|
| 0000000 |    | Rs2 |    | Rs1 |    | 000 (QR1) |    | Rd |   | Custom0<br>0001011 |   |

 $dst = (sre1 \ sre2) <<<16$ 

#### 2.1.2 chacha20qr2

| 31      | 25 | 24 | 20 | 19  | 15 | 14     | 12   | 11 | 7 | 6            | 0           |
|---------|----|----|----|-----|----|--------|------|----|---|--------------|-------------|
| 0000000 |    | Rs | 2  | Rs1 |    | 001 (0 | (R2) | Rd |   | Cust<br>0001 | om0<br>1011 |

 $dst = (sre1 \ sre2) <<<12$ 

## 10 Analyze Custom Instruction Implementation and Test

When a custom instruction has been created we need to be able to ensure that it is fully tested and that it is an efficient implementation.

The following sections show the use of the tools available as standard tools included in the Imperas professional products.

## 10.1 Instruction Coverage

Instruction coverage can be used to indicate if any instructions that are defined are not being executed i.e. not being covered by the suite of tests being used to verify correct operation.

```
$ ./RUN_STAGES.sh
... snip ...
Please Select an Option: 20
IMPERAS Instruction Set Simulator (ISS)
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/exception.RISCV32.elf'
Info (OR_PH) Program Headers:
RES = 6E8D0F5A
... snip ...
Info (ICR_OF) cpu0: instruction profile report file 'coverageReports\cpu0.icr.txt'
# iss/cpu0: Instruction Profile Totals:
# OpCode : Pct : Count (Total instrs= # ------ : ----- : -----
                                                                                 60474242)
      addi: 19.05%: 11522537

mv: 14.28%: 8637331

lw: 9.52%: 5758452

chacha20qr1: 9.52%: 5758080

chacha20qr2: 9.52%: 5758080

chacha20qr3: 9.52%: 5758080

chacha20qr4: 9.52%: 5758080

chacha20qr4: 9.52%: 5758080

sw: 4.76%: 2879403

ret: 4.76%: 2879120

jal: 4.76%: 2879120

jal: 4.76%: 2879120

jal: 4.76%: 2879120

jal: 4.76%: 3062

add: 0.01%: 3059

beqz: 0.00%: 3059

beqz: 0.00%: 77

j: 0.00%: 77

j: 0.00%: 50

sb: 0.00%: 50
                 addi : 19.05% :
                                                     11522537
... snip ...
```

## 10.2 Instruction Profiling

Instruction profiling gives an indication of the amount of host system time spent executing the behavior of each instruction. This provides an indication of how well the instruction has been implemented.

As we saw earlier in section 5.1.2 Instruction Behavior we created two implementations for the behavior of the instruction. One was calling the C function implementation and the second using only the VMI Morph Time API. In this section we will show the instruction profiling results for both of these implementations.

#### First the C algorithm implementation of the instruction behavior

#### Now the VMI Morph Time implementation of behavior

The custom instructions are highlighted in the output from the two runs. The profiling indicates that the C implementation is having a noticeable effect on the overall execution time but when

implemented using the Morph Time API there is no discernable time added executing these instructions.

## 10.3 Code Line Coverage

This is the generation of code line coverage information for the source code of the extension library and also for that of the base RISC-V processor model.

The document *Imperas CPUGenerator Guide* describes in detail how to compile and link the custom extension library with the GCOV libraries. When your test suite is then executed, using this model, code line coverage information will be accumulated. This information is processed to provide annotated source files.

In this example we re-compile the RISC-V processor model and the custom extension library using the GCOV libraries. We then execute a single test using this model and process the code line coverage data using lcov (installation usually only available on a Linux host) to generate a file that can then be examined using eGui

#### 10.3.1 On a Linux host

```
$ ./RUN_STAGES.sh
... snip ...
Please Select an Option: 25
```

The script will compile the models with coverage libraries

```
Building Models for Code Coverage
mkdir -p /d/Imperas/work/RISCV_CustomInstructionFlow/vlnvroot
echo "OTHER_CFLAGS=-DGCOV=1 -fprofile-arcs -ftest-coverage" >
/d/Imperas/work/RISCV_CustomInstructionFlow/vlnvroot/Makefile.gcov
echo "OTHER_LDFLAGS=-lgcov" >> /d/Imperas/work/RISCV_CustomInstructionFlow/vlnvroot/Makefile.gcov
cat "C:/Imperas/ImperasLib/source/riscv.ovpworld.org/processor/riscv/1.0/model/Makefile" >>
/d/Imperas/work/RISCV_CustomInstructionFlow/vlnvroot/Makefile.gcov
make -f /d/Imperas/work/RISCV_CustomInstructionFlow/vlnvroot/Makefile.gcov -C
C:/Imperas/ImperasLib/source/riscv.ovpworld.org/processor/riscv/1.0/model
VLNVROOT=/d/Imperas/work/RISCV_CustomInstructionFlow/vlnvroot
OBJROOT=/d/Imperas/work/RISCV_CustomInstructionFlow/vlnvroot/riscv.ovpworld.org/processor/riscv/1
.0/model/obi
make[1]: Entering directory
`/c/Imperas/ImperasLib/source/riscv.ovpworld.org/processor/riscv/1.0/model'
# Host Depending
... snip ...
# Host Compiling
... snip ...
# Host Linking
... snip ...
# Host Compiling obj/Windows64/customChaCha20.o
# Host Linking model.dll
```

The test simulation is then executed using these models

```
IMPERAS Instruction Set Simulator (ISS)

CpuManagerMulti (64-Bit) v20180917.0 Open Virtual Platform simulator from www.IMPERAS.com.
Copyright (c) 2005-2018 Imperas Software Ltd. Contains Imperas Proprietary Information.
Licensed Software, All Rights Reserved.
```

```
Visit www.IMPERAS.com for multicore debug, verification and analysis solutions.
CpuManagerMulti started: Tue Sep 18 14:10:23 2018
Info (CF_VRM) (riscv.ovpworld.org/processor/riscv/1.0=riscv.ovpworld.org/processor/riscv-
gcov/1.0) mapped riscv.ovpworld.org/processor/riscv/ to riscv.ovpworld.org/processor/riscv-
gcov/1.0
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/test_custom.RISCV32.elf'
Info (OR_PH) Program Headers:
                                VirtAddr PhysAddr FileSiz
Info (OR_PH) Type Offset
                                                            MemSiz
Info (OR_OF) Target 'iss/cpu0' has object file read from 'application/exception.RISCV32.elf'
Info (OR_PH) Program Headers:
Flags Align
RES = 84772366
Info
Info -----
Info CPU 'iss/cpu0' STATISTICS
Info Type : riscv (RV32I+M)
Info Nominal MIPS : 100
Info Final program counter : 0x100ac
Info Simulated instructions: 677,012,570
Info Simulated MIPS : run too short for meaningful result
Info -----
Info
Info ------
Info SIMULATION TIME STATISTICS
Info Simulated time : 6.77 seconds
Info User time : 0.33 seconds
Info System time
Info Elapsed time
                       : 0.05 seconds
: 0.38 seconds
Info Elapsed time : 0.38 seconds
Info Real time ratio : 18.06x faster
CpuManagerMulti finished: Tue Sep 18 14:10:25 2018
CpuManagerMulti (64-Bit) v20180917.0 Open Virtual Platform simulator from www.IMPERAS.com.
Visit www.IMPERAS.com for multicore debug, verification and analysis solutions.
```

The data generated is post processed to generate lcov format files which are combined into a single file and opened in eGui

The overview indicates the percentage of code lines in the file that were executed compared to those code lines found.

#### 10.3.2 On a Windows host

The lcov library is not available on a Windows host to process the generated coverage data files, the script will open a pre-generated coverage file.

```
$ ./RUN_STAGES.sh
... snip ...
Please Select an Option: 25
** Model Source Line Coverage not available on Windows. Opening pre-generated file
```

## 10.3.3 Coverage Data Output

The coverage data file (.lcov) can be opened using the eGui and presents the overview of the source lines executed.



Each of the source files may be opened to show the actual source lines that were executed

```
- -
              customChaCha20.c 🛭 🖟 customChaCha20_CallC.c
                                                                             c riscv32.c
                                                                                               test_custom.c
     // Create the RISCV decode table
112 static vmidDecodeTableP createDecodeTable(void) {
vmidDecodeTableP table = vmidNewDecodeTable(RISCV_INSTR_BITS, RISCV_EIT_LAST);
115
           // R-Type instruction in custom-0 encoding space:
          // k-lype Instruction in Customer (
// opcode [6:0] = 00 010 11 
// funct3[14:12] = 0,1,2,3 (QR1-4) 
// funct7[31:25] = 0000000 
// rs1[19:15] 
// rs2[24:20] 
// rd[11:7]
117
124
            // handle custom instruction
          125
127
129
      return table;
131
132
134 // Decode an extended instruction at the indicated address, or return
     // RISCV_EIT_LAST if this is not an extended instruction.
// If an extended instruction, return the opcode in the byref instruction argument.
// Note: Do not fetch 4 bytes if this is a 2 byte instruction to avoid a fetch
```

And also those that were never executed

```
- -
              customChaCha20.c 

☐ customChaCha20_CallC.c 
☐ riscv32.c
                                                                                            test_custom.c
           vmidocAdd
insts
d\Imperas\work\RISCV_CustomInstructionFlow\instructionExtensionLib\customChaCha20.c
"This model includes four Chacha20 acceleration instructions"
240
241
                "(one for each rotate distance) are added to encode the XOR 
"and ROTATE parts of the quarter rounds.");
242
243
244
           docChaCha20(insts, "chacha20qr1", "000 (QR1)", "dst = (src1 ^ src2) docChaCha20(insts, "chacha20qr2", "001 (QR2)", "dst = (src1 ^ src2) docChaCha20(insts, "chacha20qr3", "010 (QR3)", "dst = (src1 ^ src2) docChaCha20(insts, "chacha20qr4", "011 (QR4)", "dst = (src1 ^ src2)
245
247
248
249
250
251
252
253
           vmidocProcessor(processor, custom);
     254
255
     // CONSTRUCTOR
256
257
258
259
260
     static VMIOS_CONSTRUCTOR_FN(constructor) {
261
262
           Uns32 i:
263
               get handles to the RISCV GPRs
          for(i=0; i<RISCV_GPR_NUM; i++) {
             object->riscvRegs[i] = vmiosGetRegDesc(processor, map[i]);
```

In this case the source lines being shown as never executed provide the disassembly output. The test was not executed using tracing and so this code was never executed.

Typically, many tests will be executed using the processor model with the extension library. The results of all tests can be accumulated to show the result after all tests are run.

# 11 Creating a Processor Model with Custom Instructions

Until now we have discussed the creation of an extension library that is manually added to a processor in a virtual platform simulation to extend the instantiated base processor model.

Where the custom instruction extensions are part of a fixed processor configuration a new model can be created. This provides the base instructions and the custom instructions included within a single processor instantiation.

This section will illustrate how this is done by linking to the base processor models source files and defining a mandatory extension library in the new processor model configuration information table.

In this example we will work through the steps required to create a new processor model based upon the source of the base RISC-V processor model.

## 11.1 Creating a New Processor Model

## 11.1.1 Local VLNV Library

Create a new local VLNV library structure to contain the new processor model. It is always recommended to work outside of an OVP or Imperas product installation.

For example, this can be accomplished on a Linux host or in an MSYS shell on a Windows host using the following commands to create a new library at *myLocalLib/source*.

Note: The Vendor entry should use a company specific name in place of *vendor.com* below:

mkdir -p myLocalLib/source/vendor.com/processor/riscv/1.0/model
cp \$IMPERAS\_HOME/ImperasLib/source/Makefile myLocalLib/source

## 11.1.2 Linking Existing Model Code

The new processor model source we will be based on the processor model that is located at

IMPERAS\_HOME/ImperasLib/source/riscv.ovpworld.org/processor/riscv/1.0/model

For most C source files in the above directory we will create a 'link' file, for example for a file <filename> we will create a new file link.riscv.<filename>, in our new linked model directory. The 'link' file contains the inclusion of the same named source file from the original directory. Note that the header files (\*.h) do not need link files created for them.

As an example, the contents of the 'link' file called link.riscv.riscvDecode.c is

#include "riscv.ovpworld.org/processor/riscv/1.0/model/riscvDecode.c"

© 2020 Imperas Software Limited www.imperas.com/www.OVPworld.orgPage 44 of 47

.

A local version of the file riscvInfo.c must be created to contain the specific configuration information. A copy of the base model configuration file was taken and simplified as shown below

```
// VMI header files
#include "vmi/vmiAttrs.h"
#include "vmi/vmiModelInfo.h"
#include "vmi/vmiMessage.h"
// model header files
#include "riscvStructure.h"
VMI_PROC_INFO_FN(riscvProcInfo) {
   riscvP riscv = (riscvP) processor;
   static const vmiProcessorInfo info = {
       .vlnv.vendor = "vendor.com ",
       .vlnv.library
                        = "processor",
       .vlnv.name = "riscv",
.vlnv.version = "1.0",
       .semihost.vendor = "riscv.ovpworld.org",
       .semihost.library = "semihosting",
       .semihost.name = "pk",
       .semihost.version = "1.0",
       .helper.vendor = "imperas.com",
       helper.library = "intercept",
helper.name = "riscv32CpuHelper",
helper.version = "1.0",
       VMI_EXE_SUFFIX,
       .gdbInitCommands = "set architecture riscv:rv32",
       .family
                       = "vendor",
   return &info;
```

This processor information is indicating the following

- 1) The basic 32-bit semihosting library supports this processor.
- 2) The basic VAP tool cpuHelper for RISCV 32-bit supports this processor.
- 3) The specific elf code of '243' is used to identify ELF files generated for this processor.
- 4) The GDB provided for RISCV provided by OVP can be used to debug applications compiled for this processor.

# 11.1.3 Building the New Model

We must create a Makefile in the new model source directory. The Makefile is shown in the following with references to the base model in the Imperas installations VLNV library.

```
IMPERAS_HOME := $(shell getpath.exe "$(IMPERAS_HOME)")
CFLAGS+=-I$(IMPERAS_HOME)/ImperasLib/source
CFLAGS+=-I$(IMPERAS_HOME)/ImperasLib/source/riscv.ovpworld.org/processor/riscv/1.0/model
```

```
include $(IMPERAS_HOME)/ImperasLib/buildutils/Makefile.host
```

The new processor model, with source in VLNVSRC, should be built using the standard library Makefile, Makefile.Library, found in an installation, into a VLNV binary output library, in VLNVROOT, using the following commands

```
VLNVSRC=$(pwd)/myLocalLib/source
VLNVROOT=$(pwd)/lib/$IMPERAS_ARCH
mkdir -p $VLNVROOT
make -f $IMPERAS_HOME/ImperasLib/buildutils/Makefile.Library VLNVSRC=$VLNVSRC VLNVROOT=$VLNVROOT
```

This will create a binary output library that can be incorporated into a simulation using the -vlnvroot command line argument

```
-vlnvroot $(pwd)/lib/$IMPERAS_ARCH
```

## 11.1.4 Creating a New Configuration

As the model created above is an exact copy of the linked model it will contain all the configuration variants of the base processor model. In order that this new model contains only the variant(s) we wish to define the file link.riscv.riscvConfigList.c should be removed and a new local riscvConfigList.c created.

The following shows a file to create a new configuration definition (an OVP configuration variant) called "myRiscv". The information to configure these arguments should be obtained from the base model.

```
// Defined configurations
static const riscvConfig configList[] = {
                    = "myRiscv",
= ISA_U|RV32IMAC,
         .name
         .user_version = RVUV_2_3,
.priv_version = RVPV_1_11,
.specificDocs = "---- myRiscv Document",
         .PMP\_registers = 8,
         .tval_ii_code = Tru
.local_int_num = 16,
                              = True,
         .time_undefined = True,
         .lr_sc_grain = 64,
.tvec_align = 64,
.d_requires_f = True,
         .fs_always_dirty = True,
         .csrMask = {
              .cause = {u32 : {bits : 0x8000001f}}},
     },
     {0} // null terminator
};
// This returns the supported configuration list
riscvConfigCP riscvGetConfigList(riscvP riscv) {
    return configList;
```

.

11.1.5 Adding Extension Library

Where additional instructions are defined for this processor using an intercept library it is defined to be part of this processor in the configuration and the info structures by adding the structure member .mandatoryExtensions to point to a vmiVlnvInfoList containing one or more VLNV reference intercept/extension libraries.

```
VMI_PROC_INFO_FN(riscvProcInfo) {
    riscvP riscv = (riscvP) processor;

    static const vmiVlnvInfo extension = {
        .vendor = "vendor.com",
        .library = "intercept",
        .name = "myRiscExtensions",
        .version = "1.0",
    };

    static const vmiVlnvInfoList extensions = {
        0,
        &extension
    };

    static const vmiProcessorInfo info = {
        ...
        .mandatoryExtensions = &extensions,
        ...
}
```

This tells the simulator that the library specified in the info structure must always be included when the processor model is instanced into a hardware definition.

##

© 2020 Imperas Software Limited www.imperas.com/www.OVPworld.orgPage 47 of 47